1.61 ML Infrastructure Engineer — ML Platform, Tooling & Systems
Bay Area
Full Time Senior-level / Expert USD 70K - 300K
Field AI
One Autonomy for All Robots. Field-proven embodied AI software that is finally unlocking the full potential of mobile robots in the real world.Field AI is transforming how robots interact with the real world. We are building risk-aware, reliable, and field-ready AI systems that solve the hardest challenges in autonomy — deploying globally today to unlock the full potential of embodied intelligence. Our solutions go beyond conventional data-driven ML or purely transformer-based models. We’re building real-world AI that learns from experience and delivers tangible, continuous improvements in the field.Are you excited by the challenge of supporting ML teams with robust, scalable infrastructure? Do you want to help accelerate real-time robotics through better developer workflows and reliable systems?Field AI is hiring an ML Infrastructure Engineer to own the software platform and tooling that enables fast, reliable AI development and deployment across our ML and robotics stacks.
What You Will Get To Do
- Build ML Infrastructure & Developer Tooling
- Design and implement internal tools, libraries, and CLI utilities that streamline experimentation, model training, and evaluation.
- Improve local and cloud development environments using Docker, internal package registries, and monorepos.
- Build reusable templates and interfaces for training, evaluation, and inference pipelines.
- Support the ML Lifecycle (Data → Models → Deployment)
- Develop pipelines for dataset ingestion, transformation, versioning, and validation.
- Automate model training, evaluation, packaging, and deployment to cloud and edge environments.
- Ensure integrity and traceability across data, code, and model artifacts.
- Improve Build Systems and Developer Experience
- Maintain and evolve a shared monorepo across ML, robotics, and software teams.
- Leverage Bazel or similar systems to enable fast, reproducible builds and tests.
- Enhance developer workflows to support consistent environments and reduce friction.
- Own CI/CD and Automation for ML Systems
- Build and maintain CI/CD pipelines (e.g., GitHub Actions, AWS Step Functions) for ML experimentation and deployment.
- Automate regression testing and benchmarking models.
- Develop observability tools: dashboards, telemetry systems, and model health monitoring.
- Collaborate Across Engineering & Research Teams
- Work closely with ML scientists, software engineers, and roboticists to translate high-level platform needs into robust engineering solutions.
- Participate in code and design reviews, documentation, and cross-team planning
What You Have
- 3+ years of industry experience in software engineering, infrastructure, MLOps, or DevOps roles.
- Deep familiarity with the ML lifecycle, including data preparation, model training, packaging, and deployment.
- Strong software engineering foundations: proficiency with Git, Python, and system design.
- Experience building and managing containerized environments (e.g., Docker) and working with orchestration tools (e.g., Kubernetes).
- Hands-on experience with CI/CD workflows and infrastructure-as-code (e.g., Terraform, AWS CDK).
- Experience with cloud ML platforms (AWS, GCP, or Azure).
- A strong product mindset — building internal tools with empathy for researchers and engineers.
What Will Set You Apart
- Experience with distributed training frameworks (e.g., PyTorch DDP, FSDP, DeepSpeed, Megatron).
- Familiarity with orchestrating large-scale training jobs using Kubernetes-based platforms (e.g., Ray, SageMaker, EKS, Karpenter).
- Background in hybrid edge-cloud ML deployments or infrastructure supporting robotic systems.
- Prior work in environments requiring real-time ML performance, safety validation, or regulatory traceability.
Why Join Field AI?We are solving one of the world’s most complex challenges: deploying robots in unstructured, previously unknown environments. Our Field Foundational Models™ set a new standard in perception, planning, localization, and manipulation, ensuring our approach is explainable and safe for deployment.
You will have the opportunity to work with a world-class team that thrives on creativity, resilience, and bold thinking. With a decade-long track record of deploying solutions in the field, winning DARPA challenge segments, and bringing expertise from organizations like DeepMind, NASA JPL, Boston Dynamics, NVIDIA, Amazon, Tesla Autopilot, Cruise Self-Driving, Zoox, Toyota Research Institute, and SpaceX, we are set to achieve our ambitious goals.
Be Part of the Next Robotics RevolutionTo tackle such ambitious challenges, we need a team as unique as our vision — innovators who go beyond conventional methods and are eager to tackle tough, uncharted questions. We’re seeking individuals who challenge the status quo, dive into uncharted territory, and bring interdisciplinary expertise. Our team requires not only top AI talent but also exceptional software developers, engineers, product designers, field deployment experts, and communicators.
We are headquartered in always-sunny Mission Viejo (Irvine adjacent), Southern California and have US based and global teammates.
Join us, shape the future, and be part of a fun, close-knit team on an exciting journey!
We celebrate diversity and are committed to creating an inclusive environment for all employees. Candidates and employees are always evaluated based on merit, qualifications, and performance. We will never discriminate on the basis of race, color, gender, national origin, ethnicity, veteran status, disability status, age, sexual orientation, gender identity, martial status, mental or physical disability, or any other legally protected status.
Tags: Architecture AWS Azure Bazel CI/CD DDP DevOps Docker Engineering FSDP GCP Git GitHub Kubernetes Machine Learning ML infrastructure MLOps Model training Pipelines Python PyTorch Research Robotics SageMaker Step Functions Terraform Testing
Perks/benefits: Gear
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.