ML Infra Engineer
Tasks
- Allocate and manage cloud compute resources
- Build experiment launching monitoring and debugging tooling
- Design training infrastructure
- Evolve JAX training code for new architectures and evaluation metrics
- Implement training scheduling and job management
- Manage checkpointing and metrics logging
- Optimize performance memory usage and device utilization
- Scale distributed training on GPU and TPU
- Translate research needs into infrastructure capabilities
Perks/Benefits
- N/A
Skills/Tech-stack
AWS | Checkpointing | Data loading | Device Utilization | Distributed Training | Evaluation Pipelines | GCP | GKE | GPU | JAX | Job orchestration | Kubernetes | Machine Learning | Memory Optimization | Metrics logging | Profiling | PyTorch | Slurm | TPU | Throughput Optimization
Education
N/A
Regions
Countries
States
Related jobs
-
AI Engineer USD 109K-140KAgent Orchestration | Computer Vision | Data Validation | Data extraction | Document ClassificationMid-level Full TimeMorristown, NJ, United States8h ago
-
Alerting | Ansible | Bash | CI/CD | CephRemote workSenior-level Full TimeUnited States, United States R9h ago
-
Ansible | Bash | CI/CD | CentOS | CephContract-to-hire | No sponsorship | Remote workSenior-level Full TimeUnited States, United States R9h ago
-
HPG Big Data Engineer / Senior-Level USD 119K-164KAgile | Azure Data | Azure Data Lake | Azure Data Lake Storage | Azure FunctionsSenior-level Full TimeNashville, TN, United States10h ago
-
Machine Learning Engineer USD 131K-178KAWS | Cassandra | Convolutional Neural Networks | Data Lakes | Data PipelinesMid-level Full TimeRemote, NY, US R11h ago
-
Senior Databricks Forward Deployed Engineer - GPS USD 119K-198KAPI Integration | AWS | Airflow | Azure | CI/CDTravelSenior-level Full TimeArlington/Rosslyn, Virginia, United States; Atlanta, Georgia, …13h ago
-
Lead Databricks Forward Deployed Engineer - GPS USD 189K-372KAPI Integration | AWS | Airflow | Apache Spark | AzureSenior-level Full TimeArlington/Rosslyn, Virginia, United States; Atlanta, Georgia, …13h ago
-
GenAI Engineer USD 73K-105KAWS Bedrock | Amazon SageMaker | Amazon Web Services | Data integration | Fine TuningCompetitive benefits package | Onsite work | Travel 0 to 25 percentEntry-level Full TimeArlington/Rosslyn, Virginia, United States13h ago
-
Lead AI and Data Solutions Engineer II USD 137K-229KAmazon Web Services | Apache Spark | Application Programming | Application Programming Interfaces | Cloud ComputingSenior-level Full TimeSacramento, California, United States; Tempe, Arizona, …13h ago
-
Databricks Senior Consultant USD 113K-188KAWS | Azure | Business Intelligence | Cloud platform | Data EngineeringSenior-level Full TimeArlington/Rosslyn, Virginia, United States; Sacramento, California, …13h ago
-
TikTok Shop - E-commerce Anti-Fraud Data Scientist USD 156K-296KA/B | A/B Testing | Analytics | B testing | Big DataMid-level Full TimeSeattle, Washington, United States13h ago
-
Software Engineer, Systems ML - SW/HW Co-design USD 117K-173KAI infrastructure | Bias Mitigation | C# | C++ | Co-designSenior-level Full TimeSunnyvale, CA | Redmond, WA14h ago
-
Software Engineer, Machine Learning USD 213K-293KAPI Design | Agent Orchestration | Artificial Intelligence | Bias Mitigation | C++Senior-level Full TimeSunnyvale, CA | Remote, US | … R14h ago
-
Acoustics | Algorithm Integration | Audio Software | Bring-up | C++Senior-level Full TimeMountain View, CA, USA14h ago
-
Senior Software Engineer, Generative AI, Google Ads USD 174K-252KComputer Vision | Data Processing | Debugging | GenAI | Information RetrievalSenior-level Full TimeMountain View, CA, USA14h ago
-
Staff Software Engineer, AI/ML Performance USD 207K-300KAlgorithms | Auto sharding | C++ | Code debugging | Code generationSenior-level Full TimeSunnyvale, CA, USA14h ago
-
C++ | Data Processing | Debugging | Deep learning | Few-Shot LearningSenior-level Full TimeMountain View, CA, USA14h ago
-
Software Engineer III, Generative AI, Payments Risk USD 147K-211KAgent systems | Algorithms | Analytics | Big Data | Computer VisionSenior-level Full TimeMountain View, CA, USA14h ago
-
C++ | Data Analysis | Data Processing | Deep learning | EmbeddingsSenior-level Full TimeMountain View, CA, USA14h ago
-
CAN | DNP3 | Data Visualization | Docker | Firmware Over The AirSenior-level Full TimeSan Francisco, California, United States18h ago
-
Senior Data Engineers USD 123K-215KAWS | CI/CD | Data Guard | Data Modeling | Data QualityCareer development training | Company retirement match | Counseling support | Disability benefits | Free financial coachingSenior-level Full TimePhoenix, AZ, United States20h ago
-
Machine Learning Research Engineer USD 146K-222KData Analysis | Data Visualization | Deep learning | GPU Programming | Graph Neural Networks401k | Education reimbursement program | Flexible benefits package | Flexible schedule | Relocation assistanceMid-level Full TimeLivermore, CA, United States21h ago
-
Senior Machine Learning Engineer USD 229K-360KAB Testing | AWS SageMaker | Airflow | Amazon S3 | Apache FlinkDisability benefits | Equity awards | Health insurance | Life insurance | Paid time offSenior-level Full TimeSan Jose, California21h ago
-
Software Engineer, Data Infrastructure USD 155K-185KAWS | Apache Airflow | Apache Flink | Apache Kafka | Apache SparkMid-level Full TimeMountain View, CA23h ago
-
Member of Technical Staff, Robotics Research Engineer USD 270K-370KData collection | Deep learning | Demonstration data | Diffusion Models | JAXSenior-level Full TimeNew York1d ago