Machine Learning Operations Engineer
Cambridge, MA
Robotics and AI Institute
Welcome. At the Robotics and AI Institute (RAI Institute), our mission is to solve the most important and fundamental challenges in robotics and AI.
Our Mission
Our mission is to solve the most important and fundamental challenges in AI and Robotics to enable future generations of intelligent machines that will help us all live better lives.
Machine Learning Operations (ML-Ops) Engineers build infrastructure that supports the entire lifecycle of Machine Learning (ML) projects from development to scaling and to deployment. If you have a passion for building the foundation that enables robotics research and engineering, you will want to join us!
Our mission is to solve the most important and fundamental challenges in AI and Robotics to enable future generations of intelligent machines that will help us all live better lives.
Machine Learning Operations (ML-Ops) Engineers build infrastructure that supports the entire lifecycle of Machine Learning (ML) projects from development to scaling and to deployment. If you have a passion for building the foundation that enables robotics research and engineering, you will want to join us!
What You Will Do
- Design, develop, and maintain company-wide platforms and tooling that utilize Kubernetes infrastructure to enable machine learning and data processing applications
- Enable self-service access to ML-compute for our on-prem and cloud compute clusters, including support for job scheduling, workload scalability and workload fault tolerance
- Enhance observability across ML applications through integrations with tools and services such as FluentD, Prometheus, Grafana and DataDog
- Integrate ML applications with experiment tracking and management services like Weights and Biases
- Elevate code quality and champion best practices in our engineering processes
- Collaborate with Machine Learning Engineers, Data Engineers, DEVOPs engineers and researchers to build scalable solutions that improve engineering and research velocity.
What You Will Bring
- BS or MS in Computer Science, Engineering, or equivalent
- 3+ years of experience in an MLOPs, DevOps, ML Engineering or software engineering role
- Strong hands-on experience deploying and managing applications running on Kubernetes
- Experience developing MLOPS platforms to manage the lifecycle of ML experiments; including one or more of data and artifact management, reproducibility, fault-tolerance, experiment tracking and model serving
- Experience with Docker and Python environment management tools such as pip, poetry, uv or similar
- Proficient in software practices such as version control (Git), CI/CD (Github Actions, ArgoCD), Infrastructure as Code(Terraform).
Extra Skills We Value
- Experience with Kueue, or similar job scheduling mechanisms
- Experience with workflow orchestration tools such as Airflow, Metaflow, Argo Workflows or similar
- Hands-on experience deploying and managing cloud infra on platforms like GCP and AWS
- Experience with hybrid-cloud compute and data environments
- Experience with Ray, Pytorch Lightning or similar scalable AI/ML platforms
- Experience with application and system, logging with tools and services like FluentD, Prometheus, Grafana and DataDog or similar
- Experience with Bazel build tool or similar
- Experience with ML model serving frameworks such as Torchserve, ONNX runtime or similar
- Experience working with research teams in an academic or industrial environment.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Job stats:
3
0
0
Categories:
Engineering Jobs
Machine Learning Jobs
MLOps Jobs
Tags: Airflow AWS Bazel CI/CD Computer Science DevOps Docker Engineering GCP Git GitHub Grafana Industrial Kubernetes Machine Learning MLOps ONNX Python PyTorch Research Robotics Terraform
Perks/benefits: Career development
Region:
North America
Country:
United States
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.
BI Developer jobsSr. Data Engineer jobsData Engineer II jobsBusiness Intelligence Analyst jobsPrincipal Data Engineer jobsStaff Data Scientist jobsStaff Machine Learning Engineer jobsData Manager jobsData Science Manager jobsPrincipal Software Engineer jobsData Science Intern jobsBusiness Data Analyst jobsJunior Data Analyst jobsData Analyst Intern jobsData Specialist jobsSoftware Engineer II jobsLead Data Analyst jobsResearch Scientist jobsSr. Data Scientist jobsDevOps Engineer jobsStaff Software Engineer jobsAI/ML Engineer jobsData Engineer III jobsSenior Backend Engineer jobsBI Analyst jobs
Git jobsAirflow jobsEconomics jobsOpen Source jobsLinux jobsComputer Vision jobsKafka jobsGoogle Cloud jobsJavaScript jobsMLOps jobsNoSQL jobsData Warehousing jobsTerraform jobsPhysics jobsKPIs jobsRDBMS jobsPostgreSQL jobsScikit-learn jobsBanking jobsHadoop jobsScala jobsGitHub jobsData warehouse jobsStreaming jobsPandas jobs
Classification jobsR&D jobsBigQuery jobsDistributed Systems jobsOracle jobsPySpark jobsdbt jobsLooker jobsCX jobsScrum jobsReact jobsRAG jobsMicroservices jobsRobotics jobsJira jobsRedshift jobsIndustrial jobsSAS jobsData Mining jobsNumPy jobsPrompt engineering jobsGPT jobsELT jobsMySQL jobsData strategy jobs