Distributed Machine Learning Research Engineer

United Kingdom - Remote

Gensyn

The network for machine intelligence

View all jobs at Gensyn

Determine the most cost-efficient and high performance way to distribute ML model training

  • 🎯 Supercluster: Gensyn is building a permisionless distributed network that unites all of the world's compute into a global machine learning supercluster. It will be accessible to everyone and deliver a lower cost and higher scale in comparison to cloud solutions like AWS
  • πŸ›  Written in Rust and Python: a trustless protocol that rolls up work from a machine learning execution framework into a Substrate blockchain for decentralised consensus
  • 🧭 Autonomous environment: fully remote, flat hierarchy, low/no rules: pure focus on delivering the compute protocol that will push the frontiers of artificial intelligence
  • πŸ’° Backed by leading crypto infrastructure and deep learning investors, including: Eden Block, Galaxy Digital, Maven 11, CoinFund, Hypersphere, Zee Prime, PEER, Entrepreneur First, Counterview Capital, 7percent, and id4; as well as angels from DeepMind, Livepeer, Pocket, The University of Cambridge, Twitter, Google, Parity Technologies, and more


Responsibilities

  • Research novel ML distribution methods - theorise, design, test, build, and iterate on novel distributed machine learning methods (e.g. Distributed-SGD and Decentralised Mixture of Experts (DMoE))
  • Overcome bandwidth, latency, and data constraints - deeply understand typical distributed training bottlenecks in both hardware and software and work around them in novel ways
  • Monitor and evaluate distributed training performance - design and perform representative experiments for distributed model training over heterogeneous infrastructure
  • Build the offchain runtime - implement novel distributed ML methods in production code for use by ML researchers and engineers globally
  • Write - contribute to technical reports / papers describing the system and discuss with the community

Requirements

Minimum βœ… / Nice to have πŸ”₯

  • βœ… Experience with highly distributed model training - have previously built training pipelines using data and model parallelism over distributed (ideally highly distributed) hardware
  • βœ… Experience with huge model training - have previously been a core engineering member of a team training an LLM (e.g. BERT, GPT-X, PaLM, BLOOM, etc..) from scratch
  • βœ… Passion for decentralisation - an understanding of web3 technologies and decentralised principles
  • πŸ”₯ Rust experience
  • πŸ”₯ Publications in distributed ML/DL
  • πŸ”₯ Experience with Byzantine-tolerant distributed optimisation
  • πŸ”₯ Some knowledge of protocol design

Benefits

  • πŸ’° Competitive salary + share of equity and token pool
  • 🌐 Fully remote work
  • πŸ›« All expenses paid company meet-ups around the world (Mexico is next)
  • ⭐ 28 paid holiday days per year
  • πŸ’» Whatever equipment you need
  • ❀️ Paid sick leave

* Salary range is an estimate based on our AI, ML, Data Science Salary Index πŸ’°

Job stats:  64  7  0

Tags: AWS BERT Blockchain Crypto Deep Learning Engineering GPT LLMs Machine Learning Maven Model training Pipelines Python Research Rust

Perks/benefits: Career development Competitive pay Equity / stock options Flat hierarchy

Regions: Remote/Anywhere Europe
Country: United Kingdom

More jobs like this