Principal Engineer, AI Inference Reliability
Remote, California, United States; Sunnyvale CA or Toronto Canada
R
USD 175K-260K (estimate) Senior-level Full Time
Tasks
- Build dashboards and alerts
- Build reliability tooling
- Conduct postmortems
- Define SLOs and ensure alignment
- Define and drive reliability strategy
- Design and implement fault detection
- Design for debuggability
- Design for durability
- Design for redundancy
- Develop reliability best practices
- Implement failover
- Implement graceful degradation
- Implement recovery
- Implement throttling
- Inject distributed faults
- Lead incident management
- Measure reliability metrics
- Mentor engineers on reliability engineering
- Monitor service health metrics
- Perform root cause analysis
- Prevent repeat reliability incidents
- Run chaos testing
- Run load simulation
Perks/Benefits
- N/A
Skills/Tech-stack
Alerting | C++ | Chaos Testing | Distributed Fault Injection | Distributed Systems | Distributed debugging | Failover | Fault detection | Fault injection | Go | Graceful Degradation | Incident Response | Load Testing | Monitoring | Observability | Postmortem | Python | Recovery | Rust | SLA | SLI | SLO | Throttling
Education
Related jobs
-
C plus plus | C# | CAD | Dynamics | FDA Compliance401k | Company holidays | Dental insurance | Health insurance | Paid maternity/paternity leaveSenior-level Full TimeLos Angeles, California R15h ago
-
Lead Data Engineer USD 316K-506KContinuous Delivery | Data Architecture | Data Engineering | Data Governance | Data LakesLearning and development programs | Mentorship | Remote workSenior-level Full TimeChicago, Illinois, USA R15h ago
-
Principal Agentic AI Engineer USD 274K-338KAgent Orchestration | Auditability | Benchmarking | Confidence scoring | Distributed SystemsContinuing education support | Dental insurance | Flexible vacation policy | Health insurance | Paid parental leaveSenior-level Full Timesan francisconew york R16h ago
-
Senior Data Engineer USD 117K-162KAWS | Azure | BigQuery | DBT | Data Architecture401k | Annual wellness stipend | Cell phone reimbursement | Coaches and therapists access | Collective Pause DaysSenior-level Full TimeRemote - US R17h ago
-
Embedded Software Engineer II USD 115K-140KBash | C plus plus | C# | CI/CD | D-busERGs | Family Caregiver Support | Flexible PTO | HSA match | Health benefitsMid-level Full TimeRemote - USA R17h ago
-
AI Engineer USD 131K-185KAnthropic API | Apps Script | Autogen | Cloud deployment | CrewAIAsync first collaboration | Conversion to employment based on performance | Direct access to leadership | Fast feedback loops | Fully remoteMid-level Full TimeUnited R18h ago
-
Senior Solution Engineer USD 165K-216KAnalytics | Cloud Computing | Data Architecture | Data Lake | Data WarehouseSenior-level Full TimeUS-CA-Bay Area-Remote R18h ago
-
Senior Software Engineer USD 140K-185KAWS | Automated testing | Azure | C++ | Git401K company matching | Dental insurance | Dependent care benefits | Flexible spending account | Health insuranceSenior-level Full TimeBoulder, CO R19h ago
-
Software / Computer Science Intern USD 42K-50KData Parsing | Data Querying | Data Storage | Data pipeline | DebuggingCollaborative team activities | Hybrid work arrangement | Mentorship | Occasional local travel | Professional developmentEntry-level InternshipMonroeville, PA R23h ago
-
Machine Learning Engineer USD 150K-215KData Augmentation | Deep learning | Isaac | Loss Functions | Medical ImagingMid-level Full TimeSan Francisco (hybrid) R23h ago
-
Software Engineer II - Model Platform USD 149K-214KAWS | Azure | Cloud Computing | Data Pipelines | Distributed SystemsMid-level Full TimeRemote - USA R1d ago
-
Data Platform Engineer III USD 135K-160KAPIs | AWS Lambda | Agile | Amazon RDS | Amazon S3401k employer match | Dental insurance | ESPP | Flexible spending account | Health insuranceSenior-level Full TimeRemote, United States R1d ago
-
Senior Software Engineer (Typescript / FrontEnd) - AI/ML USD 141K-232KAPI Design | AWS | Azure | Cloud platform | Google CloudFlexible time off | Flexible work environment | Global gatherings | Healthcare | Home office setupSenior-level Full TimeUnited States (remote) R1d ago
-
Data Scientist / AI/ML Engineer (Imagery) VAWFH 1652 USD 153K-207KAccuracy | Computer Vision | Containerization | Data Cleansing | Data PreprocessingSenior-level Full TimeReston, VA R1d ago
-
Senior Machine Learning Ops Engineer USD 150K-173KAWS | Airflow | Bash | Batch inference | CI/CDEmployee mentorship program | Leadership programsSenior-level Full TimeUnited States R1d ago
-
Forward Deployed Machine Learning Engineer USD 180K-300KAPI Design | Cloud Computing | Deep learning | Diffusion Models | Fine TuningIn-person collaboration days | Remote work flexibility | Travel cost coverageSenior-level Full TimeSan Francisco (USA) R1d ago
-
Senior Software Engineer - Platform & MLOps USD 152K-230KAWS | Azure | CI/CD | Datadog | DockerDiscretionary incentive plan | Flexible work policy | Learning and development access | Medical benefitsSenior-level Full TimeSeattle, Washington, United States - Remote R1d ago
-
APIs | Agent architecture | Embeddings | Inference Serving | LLMDirect technical influence | Early stage equity upside | Fast Moving Engineering Culture | High technical autonomy | Remote workMid-level Full TimeSan Francisco, CA; Onsite R1d ago
-
Data Engineer (Remote) USD 97K-130KAWS | Agile | Azure | Azure DevOps | CI/CDOff hours production support rotation | Remote workSenior-level Full Time1 First American Way, Santa Ana, … R1d ago
-
Senior Machine Learning Engineer USD 174K-287KComputer Vision | Deep learning | Gradient optimization | Graph theory | Inference OptimizationPaid parental leave | Paid time offSenior-level Full TimeBoston, United States R1d ago
-
Sr AI Data Engineer USD 95K-159KAWS | Athena | Bedrock | Bedrock Knowledge Bases | CDK401k match | Dental insurance | Disability insurance | Employee assistance program | Health insuranceSenior-level Full TimeRemote, United States R1d ago
-
Sr Machine Learning Engineer, Adobe Firefly Services USD 151K-265KCUDA | Diffusion Models | Distributed Systems | GANs | GPU AccelerationSenior-level Full TimeSan Jose, United States R1d ago
-
Researcher/ML Engineer USD 151K-265KAndroid | App Development | C++ | Computational Photography | Computer VisionMid-level Full TimeSan Jose, United States R1d ago
-
IT Principal Data Engineer USD 151K-155KAPI Integration | AWS EMR | Amazon Redshift | Apache Spark | Data Warehouse401k matching | Company-Paid Holidays | Dental insurance | Disability insurance | Floating holidaysSenior-level Full TimeTennessee - Virtual, United States R1d ago
-
API Development | Agent systems | CI/CD | Churn modeling | Cloud Computing401k matching | Adoption Assistance | Development and career growth opportunities | Fertility treatments | Flexible work schedulesSenior-level Full TimeVirtual Office (Massachusetts), United States R1d ago