Tech Lead Machine Learning Ops Engineer, Global SRE
San Jose, California, United States
USD 187K-359K (estimate) Senior-level Full Time
Tasks
- Ensure stability of AIGC machine learning tasks
- Improve resource efficiency
- Improve training task success rate
- Maintain stability of offline machine learning training tasks
- Maintain stability of online machine learning serving systems
- Manage and plan machine learning resources
- Optimize cost and budget
- Roll out GPU model training in non-China regions
- Set SLOs for online machine learning serving systems
Perks/Benefits
- N/A
Skills/Tech-stack
Cost Optimization | GPU Computing | Learning operations | Machine Learning | Machine Learning Operations | Model Deployment | Model Serving | Model Training | Resource Management | SLO | SRE
Education
N/A
Related jobs
-
Lead Data Engineer – Azure Databricks USD 150K-180KARIMA | Azure Databricks | Data Processing | Data analytics | Machine LearningOnsite workSenior-level Full TimeTampa, United States4h ago
-
HR Analytics & Technology Engineer USD 110K-204KAlteryx | Artificial Intelligence | Data Governance | Data Modeling | Data Quality401k matching | Generous time off | Onsite fitness center | Tuition reimbursementSenior-level Full TimeUS-Rhode Island-Providence5h ago
-
Senior-level Full TimeDallas, Texas, United States5h ago
-
Infra Engineer- US USD 120K-150KAWS | Amazon IAM | Azure | Cloud Cost Optimization | Cloud SecurityMid-level Full TimeNew York City, NY, US5h ago
-
Research Engineer, Gemini Latent Thinking, DeepMind USD 207K-300KAblation Study | Algorithm Development | Deep learning | Experiment design | Language ModelsSenior-level Full TimeCambridge, MA, USA; Mountain View, CA, …6h ago
-
Research Scientist, Robotics, Embodied AI, DeepMind USD 147K-211KDeep learning | Language Models | Machine Learning | Python | Reinforcement LearningSenior-level Full TimeMountain View, CA, USA6h ago
-
Research Scientist, Biomedical AI, DeepMind USD 147K-211KArtificial Intelligence | Benchmarking | Computational pipeline | Data Analysis | EvaluationMid-level Full TimeMountain View, CA, USA6h ago
-
C# | C++ | Co-design | Compiler technology | Computer ArchitectureSenior-level Full TimeSunnyvale, CA, USA6h ago
-
API Integration | Agent systems | Asynchronous processing | Chunking | Cost OptimizationCompetitive salary based on experience | High-impact role | Opportunity to scale AI systems | Strong ownershipMid-level Full TimeAustin, Texas, United States - Remote R17h ago
-
Lead AI Engineer (Gen AI Platform Services) USD 215K-245KAWS | AWS Ultraclusters | Azure | C# | C++Senior-level Full TimeSan Jose, CA, United States17h ago
-
Staff Data Engineer USD 106K-284KAWS | Alerting | Amazon SNS | Apache Kafka | Argo CDMedical, dental, and vision coverage | Paid time off | Retirement savings options | Wellness programsSenior-level Full TimeWork At Home-Texas, United States17h ago
-
API Development | Apache Airflow | Apache Spark | Azure | Business Intelligence401k match | Health insurance | Paid Holidays | Paid time offSenior-level Full TimeRedmond, WA, United States18h ago
-
Apache Airflow | Apache Spark | Cloud Computing | Data Modeling | Data PipelinesDental insurance | Employer-matched 401k | Health insurance | Hybrid schedule | Life insuranceSenior-level Full TimeRedmond, WA, United States18h ago
-
ML Ops Engineer USD 174K-226KAWS | Cloud infrastructure | Cost Optimization | Data Ingestion | GCPHybrid work schedule | In-office at least 3 days per weekMid-level Full TimeSan Francisco HQ Office R18h ago
-
Machine Learning Engineer - 1 USD 130K-228KCNN | Cross-validation | Data Pipelines | Deep learning | Document processingEquity options | Flexible-hybrid work | Medical, dental & vision coverage | Professional development budget | Team offsitesNone Full TimeHybrid - San Mateo, California R19h ago
-
Data Scientist Lead USD 175K-210KAWS | Apache Spark | Data Governance | Data Modeling | DatabricksBackup childcare | Financial coaching | Health care coverage | Mental health support | Onsite wellness centersSenior-level Full TimeOH, United States20h ago
-
Senior Machine Learning Engineer - Cybersecurity USD 80K-200KAnomaly Detection | Behavioral analytics | Cyber Threat | Cyber Threat Detection | CybersecuritySenior-level Full TimeSan Jose, CA, United States20h ago
-
Lead AI Engineer - AI & Credit Analytics USD 156K-234KAWS | CI/CD | Data Governance | Generative AI | LLMOpsFlexible time off | Flexible work environment | Hybrid work option | Matching 401k | Medical/Dental/Vision insuranceSenior-level Full TimeCosta Mesa, CA, United States R20h ago
-
Senior-level Full TimePalo Alto20h ago
-
Software Engineer AI Data Platform - CoreAI USD 84K-180KAutomated testing | Batching | C# | C++ | CI/CDEntry-level Full TimeRedmond, WA, US20h ago
-
Data Governance | Data Modeling | Data Quality | Data Transformation | GenAISenior-level Full TimeHouston, TX, United States20h ago
-
Mid-level Full TimeOmaha, NE21h ago
-
Forward Deployed Engineer - Entry Level USD 90K-175KAmazon SageMaker | CI/CD | Data Analysis | Docker | Hugging FaceEntry-level Full TimeCalifornia, Santa Clara, United States of …21h ago
-
AI/ML Engineer - Shared Services Automation-Remote USD 128K-200KAI Center | Agentic Frameworks | Azure | CI/CD | Cloud infrastructureDental insurance | FSA | HSA | Health insurance | Retirement planMid-level Full TimeRochester, MN, United States R21h ago
-
Sr. AI/ML Engineer - Shared Services Automation-Remote USD 145K-225KAI Center | AI Engineering | Azure | Cloud platform | Communications Mining100 percent remote work | Advancement opportunities | Continuing education | Dental insurance | Flexible spending accountSenior-level Full TimeRochester, MN, United States R21h ago