Hey, this position isn't open anymore ⇾ find fresh jobs here

Distributed Machine Learning Research Engineer

United Kingdom - Remote

Applications have closed

Gensyn

The network for machine intelligence

View all jobs at Gensyn

Find more jobs like this Jobs in the United Kingdom

Posted 2 years ago

Determine the most cost-efficient and high performance way to distribute ML model training

🎯 Supercluster: Gensyn is building a permisionless distributed network that unites all of the world's compute into a global machine learning supercluster. It will be accessible to everyone and deliver a lower cost and higher scale in comparison to cloud solutions like AWS
🛠 Written in Rust and Python: a trustless protocol that rolls up work from a machine learning execution framework into a Substrate blockchain for decentralised consensus
🧭 Autonomous environment: fully remote, flat hierarchy, low/no rules: pure focus on delivering the compute protocol that will push the frontiers of artificial intelligence
💰 Backed by leading crypto infrastructure and deep learning investors, including: Eden Block, Galaxy Digital, Maven 11, CoinFund, Hypersphere, Zee Prime, PEER, Entrepreneur First, Counterview Capital, 7percent, and id4; as well as angels from DeepMind, Livepeer, Pocket, The University of Cambridge, Twitter, Google, Parity Technologies, and more

Responsibilities

Research novel ML distribution methods - theorise, design, test, build, and iterate on novel distributed machine learning methods (e.g. Distributed-SGD and Decentralised Mixture of Experts (DMoE))
Overcome bandwidth, latency, and data constraints - deeply understand typical distributed training bottlenecks in both hardware and software and work around them in novel ways
Monitor and evaluate distributed training performance - design and perform representative experiments for distributed model training over heterogeneous infrastructure
Build the offchain runtime - implement novel distributed ML methods in production code for use by ML researchers and engineers globally
Write - contribute to technical reports / papers describing the system and discuss with the community

Requirements

Minimum ✅ / Nice to have 🔥

✅ Experience with highly distributed model training - have previously built training pipelines using data and model parallelism over distributed (ideally highly distributed) hardware
✅ Experience with huge model training - have previously been a core engineering member of a team training an LLM (e.g. BERT, GPT-X, PaLM, BLOOM, etc..) from scratch
✅ Passion for decentralisation - an understanding of web3 technologies and decentralised principles
🔥 Rust experience
🔥 Publications in distributed ML/DL
🔥 Experience with Byzantine-tolerant distributed optimisation
🔥 Some knowledge of protocol design

Benefits

💰 Competitive salary + share of equity and token pool
🌐 Fully remote work
🛫 All expenses paid company meet-ups around the world (Mexico is next)
⭐ 28 paid holiday days per year
💻 Whatever equipment you need
❤️ Paid sick leave

Find more jobs like this Jobs in the United Kingdom

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats: 64 7 0

Categories: Engineering Jobs Machine Learning Jobs Research Jobs

Tags: AWS BERT Blockchain Crypto Deep Learning Engineering GPT LLMs Machine Learning Maven Model training Pipelines Python Research Rust

Perks/benefits: Career development Competitive pay Equity / stock options Flat hierarchy

Regions: Remote/Anywhere Europe

Country: United Kingdom

More jobs like this

« Back to job search To the top ↑

Explore more career opportunities

Find even more open roles below ordered by popularity of job title or skills/products/technologies used.