Site Reliability Engineer, ML Compute SRE
Durham, NC, USA; Raleigh, NC, USA
USD 141K-202K Mid-level Full Time Found 11d ago
Tasks
- Collaborate with infrastructure teams to reduce launch risks
- Define and enhance metrics and SLOs
- Design new ML features
- Improve operational experience
- Participate in on-call support
- Support ML operations
Perks/Benefits
Skills/Tech-stack
Algorithms | Automation | Coding | Complexity analysis | Data Structures | Debugging | Distributed Systems | Large Scale Distributed Systems | Machine Learning | Machine Learning Infrastructure | Scale distributed systems | Software development | System design | Troubleshooting
Education
Regions
Countries
States
Language: en |
Views: 0 |
Clicks: 0
Related jobs
-
Anomaly Detection | Automation frameworks | Clustering | Data Analysis | Distributed SystemsEntry-level Full Time6314 Remote/Teleworker US, United States R3d ago
-
Member of Technical Staff, Site Reliability Engineer (HPC) - MAI SuperIntelligence Team USD 119K-304KAWS | Azure | Bash | CI/CD | Capacity PlanningBenefits | Competitive compensation | Equity optionsSenior-level Full TimeMountain View, CA, US11d ago
-
Site Reliability Engineer USD 175K-225KAI Agent | AI Agent architecture | AI infrastructure | Agent architecture | CachesBenefits | EquityMid-level Full TimeNew York, NY13d ago
-
Automation | Documentation | Infrastructure tuning | Mentoring | MonitoringSenior-level Full TimeSan Jose, California, United States14d ago
-
Automation | Cloud Management | Cloud infrastructure | DevOps | Distributed SystemsGlobal team participation | Growth opportunities | Innovation environmentEntry-level Full TimeSan Jose, California, United States14d ago
-
Senior Site Reliability Engineer, ML System USD 136K-359KCoding | Distributed Systems | Machine Learning | Monitoring Tools | Performance AnalysisSenior-level Full TimeSan Jose, California, United States14d ago
-
Automation | Cloud infrastructure | Flink | Kubernetes | Monitoring FrameworksEntry-level Full TimeSeattle, Washington, United States14d ago
-
Site Reliability Engineer - Data (Seattle) USD 177K-341KAutomation | Cloud infrastructure | Distributed Systems | Flink | KubernetesMid-level Full TimeSeattle, Washington, United States14d ago
-
Site Reliability Engineer - Data USD 136K-359KAutomation | Cloud infrastructure | Distributed Systems | Flink | KubernetesMid-level Full TimeSan Jose, California, United States14d ago
-
Site Reliability Engineer, AI Applications USD 136K-359KAI | Audio Processing | Automation | Capacity Planning | Deep learningMid-level Full TimeSan Jose, California, United States14d ago
-
Senior Site Reliability Engineer - Data Infrastructure USD 177K-341KAutomation | Cloud infrastructure | Cost efficiency | Cross-Functional Collaboration | Cross-functionalSenior-level Full TimeSeattle, Washington, United States14d ago