1.61 ML Infrastructure Engineer — ML Platform, Tooling & Systems

Bay Area

Field AI

One Autonomy for All Robots. Field-proven embodied AI software that is finally unlocking the full potential of mobile robots in the real world.

View all jobs at Field AI

Apply now Apply later

Field AI is transforming how robots interact with the real world. We are building risk-aware, reliable, and field-ready AI systems that address the most complex challenges in robotics, unlocking the full potential of embodied intelligence. We go beyond typical data-driven approaches or pure transformer-based architectures, and are charting a new course, with already-globally-deployed solutions delivering real-world results and rapidly improving models through real-field applications.
Field AI is transforming how robots interact with the real world. We are building risk-aware, reliable, and field-ready AI systems that solve the hardest challenges in autonomy — deploying globally today to unlock the full potential of embodied intelligence. Our solutions go beyond conventional data-driven ML or purely transformer-based models. We’re building real-world AI that learns from experience and delivers tangible, continuous improvements in the field.Are you excited by the challenge of supporting ML teams with robust, scalable infrastructure? Do you want to help accelerate real-time robotics through better developer workflows and reliable systems?Field AI is hiring an ML Infrastructure Engineer to own the software platform and tooling that enables fast, reliable AI development and deployment across our ML and robotics stacks.

What You Will Get To Do

  • Build ML Infrastructure & Developer Tooling
  • Design and implement internal tools, libraries, and CLI utilities that streamline experimentation, model training, and evaluation.
  • Improve local and cloud development environments using Docker, internal package registries, and monorepos.
  • Build reusable templates and interfaces for training, evaluation, and inference pipelines.
  • Support the ML Lifecycle (Data → Models → Deployment)
  • Develop pipelines for dataset ingestion, transformation, versioning, and validation.
  • Automate model training, evaluation, packaging, and deployment to cloud and edge environments.
  • Ensure integrity and traceability across data, code, and model artifacts.
  • Improve Build Systems and Developer Experience
  • Maintain and evolve a shared monorepo across ML, robotics, and software teams.
  • Leverage Bazel or similar systems to enable fast, reproducible builds and tests.
  • Enhance developer workflows to support consistent environments and reduce friction.
  • Own CI/CD and Automation for ML Systems
  • Build and maintain CI/CD pipelines (e.g., GitHub Actions, AWS Step Functions) for ML experimentation and deployment.
  • Automate regression testing and benchmarking models.
  • Develop observability tools: dashboards, telemetry systems, and model health monitoring.
  • Collaborate Across Engineering & Research Teams
  • Work closely with ML scientists, software engineers, and roboticists to translate high-level platform needs into robust engineering solutions.
  • Participate in code and design reviews, documentation, and cross-team planning

What You Have

  • 3+ years of industry experience in software engineering, infrastructure, MLOps, or DevOps roles.
  • Deep familiarity with the ML lifecycle, including data preparation, model training, packaging, and deployment.
  • Strong software engineering foundations: proficiency with Git, Python, and system design.
  • Experience building and managing containerized environments (e.g., Docker) and working with orchestration tools (e.g., Kubernetes).
  • Hands-on experience with CI/CD workflows and infrastructure-as-code (e.g., Terraform, AWS CDK).
  • Experience with cloud ML platforms (AWS, GCP, or Azure).
  • A strong product mindset — building internal tools with empathy for researchers and engineers.

What Will Set You Apart

  • Experience with distributed training frameworks (e.g., PyTorch DDP, FSDP, DeepSpeed, Megatron).
  • Familiarity with orchestrating large-scale training jobs using Kubernetes-based platforms (e.g., Ray, SageMaker, EKS, Karpenter).
  • Background in hybrid edge-cloud ML deployments or infrastructure supporting robotic systems.
  • Prior work in environments requiring real-time ML performance, safety validation, or regulatory traceability.
Compensation and BenefitsOur salary range is generous ($70,000 - $300,000 annual), but we take into consideration an individual's background and experience in determining final salary; base pay offered may vary considerably depending on geographic location, job-related knowledge, skills, and experience.  Also, while we enjoy being together on-site, we are open to exploring a hybrid or remote option.
Why Join Field AI?We are solving one of the world’s most complex challenges: deploying robots in unstructured, previously unknown environments. Our Field Foundational Models™ set a new standard in perception, planning, localization, and manipulation, ensuring our approach is explainable and safe for deployment.
You will have the opportunity to work with a world-class team that thrives on creativity, resilience, and bold thinking. With a decade-long track record of deploying solutions in the field, winning DARPA challenge segments, and bringing expertise from organizations like DeepMind, NASA JPL, Boston Dynamics, NVIDIA, Amazon, Tesla Autopilot, Cruise Self-Driving, Zoox, Toyota Research Institute, and SpaceX, we are set to achieve our ambitious goals.
Be Part of the Next Robotics RevolutionTo tackle such ambitious challenges, we need a team as unique as our vision — innovators who go beyond conventional methods and are eager to tackle tough, uncharted questions. We’re seeking individuals who challenge the status quo, dive into uncharted territory, and bring interdisciplinary expertise. Our team requires not only top AI talent but also exceptional software developers, engineers, product designers, field deployment experts, and communicators.
We are headquartered in always-sunny Mission Viejo (Irvine adjacent), Southern California and have US based and global teammates. 
Join us, shape the future, and be part of a fun, close-knit team on an exciting journey!


We celebrate diversity and are committed to creating an inclusive environment for all employees. Candidates and employees are always evaluated based on merit, qualifications, and performance. We will never discriminate on the basis of race, color, gender, national origin, ethnicity, veteran status, disability status, age, sexual orientation, gender identity, martial status, mental or physical disability, or any other legally protected status.
Apply now Apply later
Job stats:  0  0  0

Tags: Architecture AWS Azure Bazel CI/CD DDP DevOps Docker Engineering FSDP GCP Git GitHub Kubernetes Machine Learning ML infrastructure MLOps Model training Pipelines Python PyTorch Research Robotics SageMaker Step Functions Terraform Testing

Perks/benefits: Gear

Region: North America
Country: United States

More jobs like this