Distributed Machine Learning Research Engineer
United Kingdom - Remote
Applications have closed
Determine the most cost-efficient and high performance way to distribute ML model training
- π― Supercluster: Gensyn is building a permisionless distributed network that unites all of the world's compute into a global machine learning supercluster. It will be accessible to everyone and deliver a lower cost and higher scale in comparison to cloud solutions like AWS
- π Written in Rust and Python: a trustless protocol that rolls up work from a machine learning execution framework into a Substrate blockchain for decentralised consensus
- π§ Autonomous environment: fully remote, flat hierarchy, low/no rules: pure focus on delivering the compute protocol that will push the frontiers of artificial intelligence
- π° Backed by leading crypto infrastructure and deep learning investors, including: Eden Block, Galaxy Digital, Maven 11, CoinFund, Hypersphere, Zee Prime, PEER, Entrepreneur First, Counterview Capital, 7percent, and id4; as well as angels from DeepMind, Livepeer, Pocket, The University of Cambridge, Twitter, Google, Parity Technologies, and more
Responsibilities
- Research novel ML distribution methods - theorise, design, test, build, and iterate on novel distributed machine learning methods (e.g. Distributed-SGD and Decentralised Mixture of Experts (DMoE))
- Overcome bandwidth, latency, and data constraints - deeply understand typical distributed training bottlenecks in both hardware and software and work around them in novel ways
- Monitor and evaluate distributed training performance - design and perform representative experiments for distributed model training over heterogeneous infrastructure
- Build the offchain runtime - implement novel distributed ML methods in production code for use by ML researchers and engineers globally
- Write - contribute to technical reports / papers describing the system and discuss with the community
Requirements
Minimum β / Nice to have π₯
- β Experience with highly distributed model training - have previously built training pipelines using data and model parallelism over distributed (ideally highly distributed) hardware
- β Experience with huge model training - have previously been a core engineering member of a team training an LLM (e.g. BERT, GPT-X, PaLM, BLOOM, etc..) from scratch
- β Passion for decentralisation - an understanding of web3 technologies and decentralised principles
- π₯ Rust experience
- π₯ Publications in distributed ML/DL
- π₯ Experience with Byzantine-tolerant distributed optimisation
- π₯ Some knowledge of protocol design
Benefits
- π° Competitive salary + share of equity and token pool
- π Fully remote work
- π« All expenses paid company meet-ups around the world (Mexico is next)
- β 28 paid holiday days per year
- π» Whatever equipment you need
- β€οΈ Paid sick leave
* Salary range is an estimate based on our AI, ML, Data Science Salary Index π°
Job stats:
64
7
0
Categories:
Engineering Jobs
Machine Learning Jobs
Research Jobs
Tags: AWS BERT Blockchain Crypto Deep Learning Engineering GPT LLMs Machine Learning Maven Model training Pipelines Python Research Rust
Perks/benefits: Career development Competitive pay Equity / stock options Flat hierarchy
Regions:
Remote/Anywhere
Europe
Country:
United Kingdom
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.
Data Scientist II jobsData Engineer II jobsStaff Data Scientist jobsSr. Data Engineer jobsPrincipal Data Engineer jobsStaff Machine Learning Engineer jobsPrincipal Software Engineer jobsBusiness Intelligence Analyst jobsData Science Manager jobsData Manager jobsData Science Intern jobsSoftware Engineer II jobsDevOps Engineer jobsJunior Data Analyst jobsData Analyst Intern jobsData Specialist jobsSr. Data Scientist jobsBusiness Data Analyst jobsStaff Software Engineer jobsLead Data Analyst jobsAI/ML Engineer jobsSenior Backend Engineer jobsData Governance Analyst jobsData Engineer III jobsResearch Scientist jobs
NLP jobsAirflow jobsOpen Source jobsTerraform jobsLinux jobsKPIs jobsEconomics jobsMLOps jobsKafka jobsJavaScript jobsNoSQL jobsData Warehousing jobsPostgreSQL jobsComputer Vision jobsGoogle Cloud jobsRDBMS jobsGitHub jobsPhysics jobsScikit-learn jobsBanking jobsStreaming jobsData warehouse jobsHadoop jobsR&D jobsScala jobs
dbt jobsPandas jobsBigQuery jobsOracle jobsLooker jobsClassification jobsReact jobsScrum jobsCX jobsPySpark jobsDistributed Systems jobsRAG jobsMicroservices jobsPrompt engineering jobsRedshift jobsELT jobsIndustrial jobsJira jobsRobotics jobsGPT jobsTypeScript jobsSAS jobsOpenAI jobsMySQL jobsLangChain jobs