Sr. Software Development Engineer, MLOPs
Tasks
- Architect large scale data pipelines for robotics datasets
- Build CI/CD pipelines for ML models
- Design scalable ML training infrastructure on Kubernetes
- Develop experiment tracking tooling
- Develop hyperparameter optimization tooling
- Ensure reproducibility for ML workflows
- Establish monitoring, alerting, and observability
- Implement fault tolerant distributed training
- Manage GPU fleet and optimize cost
- Operationalize ML models into production
Perks/Benefits
- N/A
Skills/Tech-stack
Alerting | Amazon EKS | CI/CD | Checkpointing | Data Ingestion | Data Pipelines | Distributed Systems | Experiment tracking | Fault-tolerant | Fault-tolerant systems | GPU scheduling | Hyperparameter Optimization | Kubernetes | MLOps | Machine Learning | Model Deployment | Monitoring | Observability | Reproducibility
Education
N/A
Related jobs
-
Featured Feat. Associate Director, Data Labs USD 167K-167KAWS | Cloud Computing | Compute Infrastructure | Data Analysis | LLM GovernanceConference speaking opportunities | Hybrid work schedule | Media appearancesSenior-level Full TimeWashington, District of Columbia, 20004, United … R3d ago
-
Software Engineer III, Generative AI USD 147K-211KComputer Vision | Data Processing | Debugging | Language Models | Language ProcessingSenior-level Full TimeKirkland, WA, USA4h ago
-
Senior Software Engineer - Database Engineering USD 200K-287KAutomated testing | Debugging | Distributed Systems | Distributed key-value stores | Failure recoverySenior-level Full TimeUS-CA-Menlo Park9h ago
-
Data Analyst - Forecasting and Optimization USD 124K-187KBacktesting | Deep learning | Feature Engineering | Gurobi | HiGHS401k matching | Disability insurance | Health insurance | Life insurance | Medical savings accountMid-level Full TimePhiladelphia, PA, United States10h ago
-
Data Modeling | Data analytics | Language Models | Large Language Models | Machine LearningCoaching | Hybrid work model | Mental health counseling | Mentorship | Paid volunteer timeMid-level Full TimeRaleigh, US, North Carolina11h ago
-
Applied AI Engineer USD 120K-158KA/B | A/B Testing | API Integration | Anthropic API | B testingCareer growth | Fully remote | Global Engineering Organization | High ownership culture | Learning and development budgetMid-level Full TimeUnited States R1d ago
-
Lead AI Engineer (AI Systems & Automation) USD 130K-260KAlerting | Anthropic API | Automation | Distributed Systems | DockerFully remote | Global Engineering Organization | High ownership culture | Learning and development budget | Modern engineering practicesSenior-level Full TimeUnited States R1d ago
-
Supervisor of AI Software Engineering USD 185K-195KAPI Development | Agile | Azure DevOps | CI/CD | CORS401k plan | Disability insurance | Health insurance | Life insurance | PTO programMid-level Full TimeLos Angeles, CA, United States1d ago
-
AI Engineer USD 200K-250KAWS | Automated testing | CI/CD | Deployment Pipelines | Embedding Models401k match | Frequent In Person Collaboration | Generous benefitsSenior-level Full TimeNew York1d ago
-
Member of the Technical Staff - Machine Learning USD 350K-400KBigQuery | Computer Vision | Explore Exploit Tradeoff | Explore/Exploit | GPU memorySenior-level Full TimeSan Francisco HQ1d ago
-
Forward Deployed Process Engineer USD 160K-230KArtificial Intelligence | Business Process | Business Process Consulting | Chain management | Data ScienceTraining opportunities | Travel opportunitiesMid-level Full TimeSan Francisco (Bay Area)1d ago
-
Senior Quantum Embedded Engineer USD 142K-175K10G Ethernet | AMD Xilinx | Bash | C# | C++Hybrid work | Remote workSenior-level Full TimeNew Haven, CT1d ago
-
Senior Quantum Applications Engineer - QEC USD 119K-258KCUDA-Q | Decoder algorithms | Docker | End to End | End-to-End TestingSenior-level Full TimeNew Haven, CT1d ago
-
AWS | Application Security | Artificial Intelligence | Azure | Cloud SecurityConference speaking opportunities | Flexible schedule | Health Premium Plan Option | Mentorship | Paid trainingSenior-level Full TimeLos Angeles, California, United States R1d ago
-
Staff AI engineer USD 170K-254KAI Evaluation | AWS | Agent Orchestration | Caching | Data PipelinesFlexible working hours | Hybrid work culture | Unlimited time offSenior-level Full TimeSan Francisco1d ago
-
Research Scientist - Distributed Machine Learning USD 180K-287KBF16 | CUDA | CUDA kernels | DeepSpeed | Distributed Training401k | Dental insurance | Disability insurance | Employee assistance program | Health insuranceMid-level Full TimeSunnyvale, CA1d ago
-
Machine Learning Infrastructure Engineer USD 216K-330KCUDA | DeepSpeed | Distributed Systems | Distributed Training | FSDPMid-level Full TimeSunnyvale, CA1d ago
-
Machine Learning Engineer USD 140K-222KComputer Vision | Data Preprocessing | Deep learning | Fine Tuning | Human Feedback401k plan | Dental insurance | Disability insurance | Employee assistance program | HolidaysMid-level Full TimeSunnyvale, CA1d ago
-
Data Engineer USD 120K-175KAPIs | AWS | Apache Spark | Data Pipelines | Data Processing401k plan | Dental insurance | Disability insurance | Employee assistance program | HolidaysMid-level Full TimeSunnyvale, CA1d ago
-
Distributed Machine Learning Engineer USD 200K-304KBenchmarking | CUDA | Debugging | Deep learning | Distributed Computing401k plan | Dental insurance | Disability insurance | Employee assistance program | Health insuranceEntry-level Full TimeSunnyvale, CA1d ago
-
Forward Deployed AI Engineer, Operations USD 112K-300KAnalytics | C++ | Data Processing | Data Processing Pipelines | JavaDental insurance | Equity compensation | Medical insurance | Paid time off | Travel opportunitiesSenior-level Full TimeSouth San Francisco, California, USA1d ago
-
ML Engineer, Generative Video USD 175K-275KAutoregressive Generation | CUDA | Debugging | Deep learning | Diffusion Models401k match | Catered lunch | Commuter benefits | Dinner stipend | Generous PTO policyMid-level Full TimeUnion Square, New York City1d ago
-
ML Engineer, Agentic Systems USD 175K-275KExperimentation | Fine Tuning | Language Models | Large Language Models | Machine Learning401k match | Catered lunch | Commuter benefits | Dinner stipend | Grubhub subscriptionEntry-level Full TimeUnion Square, New York City1d ago
-
Machine Learning Engineer V USD 231K-382KAWS | Agent Orchestration | Automated testing | Azure | CI/CDBonus eligibility | Disability insurance | Life insurance | Paid parental leave | Paid time offSenior-level Full TimeRemote, United States R1d ago
-
Computer Vision Engineer, Reality Labs USD 147K-208K3D Reconstruction | C++ | Camera Calibration | Camera Pose Estimation | Computer VisionSenior-level Full TimeRedmond, WA2d ago