Tech Lead Machine Learning Ops Engineer, Global SRE
San Jose, California, United States
USD 187K-359K (estimate) Senior-level Full Time
Tasks
- Ensure stability of AIGC machine learning tasks
- Improve resource efficiency
- Improve training task success rate
- Maintain stability of offline machine learning training tasks
- Maintain stability of online machine learning serving systems
- Manage and plan machine learning resources
- Optimize cost and budget
- Roll out GPU model training in non-China regions
- Set SLOs for online machine learning serving systems
Perks/Benefits
- N/A
Skills/Tech-stack
Cost Optimization | GPU Computing | Learning operations | Machine Learning | Machine Learning Operations | Model Deployment | Model Serving | Model Training | Resource Management | SLO | SRE
Education
N/A
Related jobs
-
Data Synthesis | Deep learning | Language Models | Language Processing | Large Language ModelsEntry-level InternshipSan Jose, California, United States7h ago
-
AWS | Alteryx | Amazon SageMaker | Azure | Azure DataMid-level Full TimeNew York, NY, United States7h ago
-
Strategic Intelligence & Advanced Analytics Engineer USD 108K-136KAnomaly Detection | Artificial Intelligence | Azure | Data Pipelines | Data QualityPaid parental leave | Paid time off | Public service loan forgiveness | Tuition reimbursement | Wellness programsMid-level Full TimeTexas-Dallas-5323 Harry Hines Blvd7h ago
-
Fine Tuning | GPU resource management | Intelligent agents | Language Models | Large Language ModelsEntry-level Full TimeSan Jose, California, United States7h ago
-
Software Engineer, Video AI/ML Specialist USD 141K-211KAI | AV1 | AV2 | Audio Processing | Audio/VideoMid-level Full TimeBellevue, WA | Menlo Park, CA …8h ago
-
Tech Lead, AI Research Scientist (Robotics) USD 170K-251KAction Conditioned World Models | Artificial Intelligence | Computer Vision | Deep learning | Dexterous ManipulationMentorship opportunities | Open science contributions | Work authorization supportSenior-level Full TimeMenlo Park, CA8h ago
-
Network Engineer, Deployment & Support USD 101K-156K400G | 800G | AI | Automation | Coherent opticsMid-level Full TimeMenlo Park, CA | Eagle Mountain, …8h ago
-
Artificial Intelligence | Data Analysis | Data Structures | Data structures algorithms | Human-in-the-loopSenior-level Full TimeMountain View, CA, USA8h ago
-
Agent tooling | Artificial Intelligence | C++ | Cloud Architecture | Conversational AISecret clearance | TravelSenior-level Full TimeAtlanta, GA, USA; Austin, TX, USA8h ago
-
Software Engineer III, AI/ML GenAI, Google Cloud Compute USD 147K-211KAudio generation | C++ | Computer Vision | Data Processing | DebuggingSenior-level Full TimeSunnyvale, CA, USA8h ago
-
Senior Photonic Engineer, Machine Learning USD 159K-231KCircuit simulation | Data center | Data center network | Data center network architecture | Digital SignalSenior-level Full TimeSunnyvale, CA, USA8h ago
-
Data Processing | Data Storage | Data Structures | Data Structures and Algorithms | Distributed SystemsSenior-level Full TimeMountain View, CA, USA8h ago
-
Applied AI ML Lead - LLM SUITE ENGINEERING USD 176K-215KAPI Design | AWS | Agentic AI | Caching | Cloud NativeBackup childcare | Financial coaching | Health care coverage | Mental health support | On-site health and wellness centersSenior-level Full TimeWilmington, DE, United States16h ago
-
Senior AI Engineer USD 107K-199KAKS | API Design | Alerts | Anomaly Detection | Apache SparkHybrid work environment | Inclusion support | Learning opportunities | Well-being supportSenior-level Full TimeUSA, Massachusetts, Boston, 200 Berkeley Street, …19h ago
-
Associate AI Engineer USD 80K-134KAPI Development | Azure | Cloud Platforms | Data Preparation | DocumentationFlexible work environment | Hybrid work arrangement | Inclusion programs | Paid time off | Wellness benefitsMid-level Full TimeUSA, Massachusetts, Boston, 200 Berkeley Street, …19h ago
-
Entry-level Full TimeUnited States - Remote R19h ago
-
CI/CD | Docker | Drift Detection | Embeddings | Experiment trackingMentorship | Remote workSenior-level Full TimeUnited States - Remote R19h ago
-
Marketing Intelligence Engineer USD 150K-175KAPIs | Analytics | Automation | Azure | Dashboarding401k matching | Dental insurance | Health insurance | Hybrid work flexibility | Paid parental leaveSenior-level Full TimeMadison, WI19h ago
-
Senior Data Engineer USD 82K-172KAWS | Apache Spark | Artificial Intelligence | BERT | BitbucketContinuing education | Family support benefits | Flexible time off | Healthcare benefits | Learning resourcesSenior-level Full Time606 KING OF PRUSSIA PA, United …19h ago
-
Staff AI/ML Engineer USD 108K-227KAWS | Adversarial Networks | Bitbucket | CUDA | CupyFlexible time off | Learning resources | MentoringSenior-level Full Time606 KING OF PRUSSIA PA, United …19h ago
-
Staff AI/ML Engineer (LLMs) USD 108K-227KAWS Bedrock | Agentic AI | Arize Phoenix | Bitbucket | CUDAFlexible time off | Learning and development resourcesSenior-level Full Time606 KING OF PRUSSIA PA, United …19h ago
-
Senior, Data Scientist (Machine Learning Engineer) USD 110K-220KAccessibility guidelines | Airflow | CI/CD | Computer Vision | Container OrchestrationSenior-level Full Time(USA) Crossman Respect Building CA SUNNYVALE …19h ago
-
Agentic AI Machine Learning Engineer USD 99K-225KAPI Integration | Cloud Computing | Computer Vision | Confluent | Deep learningDependent care | Disability insurance | Health insurance | Life insurance | Paid leaveMid-level Full TimeUSA, DC, Washington (901 15th St …19h ago
-
Senior Data/AI Engineer USD 123K-176KACID | Agentic Frameworks | Apache Spark | Artificial Intelligence | Automated testing401k savings plan | Flexible spending accounts | Health and lifestyle programs | Health savings account | Long-term disabilitySenior-level Full TimeUS-Nationwide-FIELD, United States19h ago
-
Machine Learning Engineer I USD 99K-184KA/B | A/B Testing | AWS | Azure | B testingEmployee wellness program | Health insurance | Life and disability insurance | Paid Holidays | Retirement savings planEntry-level Full TimeCA Burbank Bldg. 750, Second Century, …19h ago