Staff Machine Learning Engineer, ML Infrastructure
Tasks
- Build feedback loops between cloud inference edge devices and data flywheel
- Define SLOs and observability standards for ML services
- Define model lifecycle best practices registry deployment monitoring rollback drift
- Design and evolve cloud inference systems for real time video and events
- Drive architecture decisions for Kubernetes based ML platform
- Identify and remove bottlenecks in ML deployment infrastructure
- Improve throughput latency and cost for production CV models
- Lead deep technical reviews for system design capacity planning reliability
- Lead incident response and postmortems for critical ML systems
- Set technical direction for ML infrastructure
- Shape LLM serving in production
Perks/Benefits
Skills/Tech-stack
AWS EKS | Amazon IAM | Amazon S3 | Autoscaling | Batching | C++ | CI/CD | Docker | Drift Detection | GPU scheduling | Go | Infrastructure as Code | KServe | KV cache | Kafka | Kubernetes | LLM serving | ML inference | MLflow | Model Monitoring | Model Registry | Multi-tenancy | NVIDIA Triton | Python | Quantization | Ray | Rust | Speculative decoding | VLLM | Weights and Biases | “as-code”
Education
N/A
Regions
Countries
States
Cities
Related jobs
-
Featured Feat. Applied AI Engineer - Bay Area USD 211K-263KArtificial Intelligence | C plus plus | C# | Embeddings | Feature Engineering401k | Comprehensive health and wellness benefits | Learning and development opportunities | Unlimited time offMid-level Full TimeHQ (San Francisco)25d ago
-
AI Risk | AI Risk Assessment | Bias Mitigation | C# | C++Senior-level Full TimeBellevue, WA | Menlo Park, CA …3h ago
-
AI workflows | Bias Mitigation | C++ | Capacity Planning | Data ModelingSenior-level Full TimeMenlo Park, CA | Seattle, WA …3h ago
-
Production Engineer (University Grad) USD 177K-200KAI tool integration | APIs | Agent Orchestration | C plus plus | CDNSenior-level Full TimeMenlo Park, CA | Burlingame, CA3h ago
-
C++ | Data Storage | Data transfer | Device Drivers | Distributed SystemsSenior-level Full TimeSunnyvale, CA, USA3h ago
-
C++ | Data Processing | Debugging | Fine Tuning | JAXSenior-level Full TimeMountain View, CA, USA3h ago
-
Software Engineer III, AI/ML, Display Ads USD 147K-211KAlgorithms | C++ | Data Analysis | Data Processing | Data StructuresSenior-level Full TimeMountain View, CA, USA3h ago
-
Staff Software Engineer, ML Fleet Systems USD 207K-301KC++ | Cluster management | Data Structures | Data Structures and Algorithms | DebuggingBonus | Equity | Health benefits | Paid time off | Professional developmentSenior-level Full TimeSunnyvale, CA, USA4h ago
-
ACE | APB | ARM | AXI | Constrained randomSenior-level Full TimeMountain View, CA, USA4h ago
-
Senior Software Engineer, Embedded, Pixel Graphics USD 174K-253KC# | C++ | Device Drivers | Embedded Systems | Embedded operating systemsSenior-level Full TimeMountain View, CA, USA; San Diego, …4h ago
-
Staff Software Engineer, Embedded Systems/Firmware, XR USD 207K-301KC++ | Cross-Functional Collaboration | Cross-functional | Data Structures | Data Structures and AlgorithmsSenior-level Full TimeMiami, FL, USA4h ago
-
Software Engineer, AI/ML, Google Workspace USD 147K-211KData Processing | Debugging | Distributed Computing | Fine Tuning | Generative AIEmployee assistance | Health insurance | Paid time off | Retirement planMid-level Full TimeSunnyvale, CA, USA4h ago
-
Security Engineer, Data Center Network Device Security USD 147K-211KARM Assembly | Assembly | C# | C++ | CodingBonus | Employee stock options | Health insurance | Paid time off | Retirement planMid-level Full TimeSunnyvale, CA, USA4h ago
-
Software Engineer III, Speech Production, Infrastructure USD 147K-211KAutomatic Speech Recognition | C++ | Data Structures | Data Structures and Algorithms | Distributed SystemsSenior-level Full TimeMountain View, CA, USA; New York, …4h ago
-
Senior Machine Learning Engineer (Computer Vision & AI) USD 149K-198KAgile Development | Anomaly Detection | Azure | CUDA | Cloud ComputingTraining opportunitiesSenior-level Full TimeAUBURN HILLS HQ R&D, MI, US9h ago
-
CI/CD | Fastlane | Git | Multithreading | SOLIDAsynchronous culture | Fast-paced environment | Learning Support | Remote workSenior-level Full TimeKansas City, MO, USA9h ago
-
Bash | Data Processing | Docker | GCP | Infrastructure as CodeAsynchronous culture | Company impact on learning accessibility | Flexible management approach | Remote-friendlyMid-level Full TimeBellevue, WA, USA9h ago
-
Bash | Cloud platform | Data Ingestion | Data Processing | DockerAsynchronous culture | Bonus | Equity | Friendly atmosphere | Remote-friendlyMid-level Full TimeDenver, CO, USA9h ago
-
Bash | Cloud platform | Data Pipelines | Data Processing | DockerAsynchronous culture | Career growth | Competitive compensation | Friendly work environment | Remote workMid-level Full TimePalo Alto, CA, USA9h ago
-
Bash | Data Ingestion | Data Processing | Docker | GCPMid-level Full TimeLos Angeles, CA, USA9h ago
-
Bash | Cloud platform | Data Pipelines | Data Processing | DockerAsynchronous culture | Entrepreneurial environment | Flexible management approach | Remote-friendly, distributed teamMid-level Full TimeKirkland, WA, USA9h ago
-
Bash | Cloud platform | Data Processing | Docker | Google CloudAsynchronous culture | Career growth | Competitive benefits | Remote-friendly | Supportive managementMid-level Full TimeSan Jose, CA, USA9h ago
-
Bash | Cloud platform | Data Processing | Docker | GCPAsynchronous culture | Flexible management | Remote-friendlyMid-level Full TimeCupertino, CA, USA9h ago
-
Bash | Cloud platform | Data Ingestion | Data Processing | DockerAsynchronous culture | Bonuses | Equity | Friendly laid-back atmosphere | Hand off managementMid-level Full TimeOakland, CA, USA9h ago
-
Bash | Cloud platform | Data Processing | Docker | Google CloudAsynchronous culture | Flexible management approach | Friendly work environment | Opportunity for major impactMid-level Full TimeSilver Spring, MD, USA9h ago