ML Platform & Infrastructure Engineer
Tasks
- Automate training runs data ingestion job orchestration checkpointing artifact management
- Build dashboards and alerting systems for system reliability
- Build evaluation harnesses to benchmark models on every merge
- Design CI CD pipelines for machine learning workflows
- Develop internal SDKs CLIs and lightweight UIs for research workflows
- Implement observability for model latency throughput error rates
- Monitor GPU utilization and cluster health
- Optimize latency and resource usage for experimentation
- Track inference cost and unit economics
Perks/Benefits
Skills/Tech-stack
AWS | CI/CD | Distributed Systems | Docker | Experiment tracking | GPU clusters | Google Cloud | Kubernetes | MLOps | Machine Learning | Model Evaluation | Python | Retool | Streamlit
Education
Regions
Countries
States
Related jobs
-
Data Engineer ID50062 USD 148K-170KAmazon Web Services | Apache Airflow | Apache Spark | Avro | Cloud platformEducation budget | Fitness budget | Flexible schedule | Mentorship | Remote work optionSenior-level Full TimeSan Francisco, United States2h ago
-
DevOps Engineer - Project Delivery Senior Analyst USD 107K-173KAnsible | Argo CD | Bash | CI/CD | DockerSenior-level Full TimeDallas, Texas, United States3h ago
-
Senior Software Engineer, Cross Platform Applications USD 212K-387KArtificial Intelligence | Automation | Code Analysis | Dynamic analysis | JavaScriptSenior-level Full TimeSan Jose, California, United States3h ago
-
Machine Learning Engineer, AI Coding Tools USD 156K-387KDeep learning | GPU clusters | Inference acceleration | Language Models | Large Language ModelsMid-level Full TimeSan Jose, California, United States3h ago
-
Data Engineer USD 62K-62KAzure Data | Azure Data Factory | DBT | Data Factory | Data Modeling401k | Dental insurance | Disability insurance | Flexible spending account | Internal promotion opportunitiesMid-level Full TimeKS, Leawood3h ago
-
Backend Software Engineer - Security Data USD 122K-316KApache Kafka | Apache Spark | Data Modeling | Data Quality | ETLMid-level Full TimeSan Jose, California, United States3h ago
-
C++ | Cloud platform | Conversational AI | Document AI | Evaluation FrameworksConferences and industry events participation | Industry thought leadership | Technical product briefingsSenior-level Full TimeReston, VA, USA; Boulder, CO, USA4h ago
-
Senior Software Engineer, AI/ML GenAI USD 174K-252KC++ | Capacity Management | Cloud platform | Computer Vision | Data ProcessingSenior-level Full TimeSunnyvale, CA, USA4h ago
-
Staff Software Engineer, Data Center Resource Modeling USD 207K-300KAPI Design | C++ | Data Center Management | Data Structures | Data Structures and AlgorithmsSenior-level Full TimeKirkland, WA, USA4h ago
-
C++ | Data Processing | Debugging | Generative AI | Language ModelsSenior-level Full TimeMountain View, CA, USA4h ago
-
C++ | Data Processing | Data Structures | Data Structures and Algorithms | DebuggingSenior-level Full TimeMountain View, CA, USA4h ago
-
Senior Software Engineer, AI/ML, Search Ads USD 174K-252KAds bidding | C++ | Data Science | Data Structures | Data Structures and AlgorithmsSenior-level Full TimeNew York, NY, USA; Mountain View, …4h ago
-
Staff Datacloud Blackbelt Engineer, Data and AI USD 183K-265KAI Engineering | AI/ML | AI/ML workflows | BigQuery | Cloud ArchitectureSenior-level Full TimeSunnyvale, CA, USA4h ago
-
Accelerator development | Co-design | Compilation Optimization | Compiler development | Compute architectureSenior-level Full TimeMountain View, CA, USA; Kirkland, WA, …4h ago
-
Staff Software Engineer, Generative AI, Core ML USD 207K-300KAI Feedback | Computer Vision | Data Processing | Deep learning | Digital TwinSenior-level Full TimeMountain View, CA, USA4h ago
-
Customer Engineer III, Applied AI, Google Cloud USD 174K-252KAgent tooling | C++ | Cloud Native | Cloud Native Architecture | Conversational AISenior-level Full TimeNew York, NY, USA; Chicago, IL, …4h ago
-
APIs | Agent systems | CrewAI | Fine Tuning | Hugging FaceSenior-level Full TimeAddison, TX, USA; Austin, TX, USA4h ago
-
Software Engineer III, AI/ML GenAI, YouTube USD 147K-211KC++ | Computer Vision | Data Processing | Debugging | Distributed ComputingSenior-level Full TimeMountain View, CA, USA4h ago
-
Principal Engineer, Autonomous Cloud USD 307K-427KAgent Frameworks | Artificial Intelligence | Distributed Systems | Evaluation | GenAISenior-level Full TimeSunnyvale, CA, USA; Kirkland, WA, USA4h ago
-
Senior Software Engineer, Generative AI, Search Health USD 174K-252KA/B | A/B Testing | B testing | Data Analysis | Data MiningSenior-level Full TimeMountain View, CA, USA4h ago
-
Software Engineer III, AI/ML, Google Ads USD 147K-211KAlgorithms | C++ | Data Processing | Data Structures | DebuggingSenior-level Full TimeMountain View, CA, USA; Los Angeles, …4h ago
-
Staff Software Engineer, AI/ML GenAI, Google Cloud USD 207K-300KComputer Vision | Data Processing | Debugging | Distributed Computing | Fine TuningSenior-level Full TimeSunnyvale, CA, USA; San Francisco, CA, …4h ago
-
C++ | Data Architecture | Data Mining | Data Modeling | Data ProcessingEntry-level Full TimeSunnyvale, CA, USA4h ago
-
Partner Engineer, AI/ML, Strategic Enablement USD 222K-309KAPI performance | Context window | Context window management | Distributed Systems | Function CallingSenior-level Full TimeNew York, NY, USA; Atlanta, GA, …4h ago
-
Cloud Computing | Cloud TPU | Cloud platform | Data Processing | DebuggingSenior-level Full TimeKirkland, WA, USA; Seattle, WA, USA4h ago