Engineering Manager, Inference Routing and Performance
San Francisco, CA | New York City, NY
USD 405K-485K Mid-level Full Time
Tasks
- Build quantitative performance modeling culture
- Coach engineers through shifting roadmap priorities
- Create technical strategy for routing across heterogeneous accelerators
- Develop and retain distributed systems engineers
- Drive deploy safety incident response on call
- Hire and evaluate senior technical candidates
- Investigate tail latency regressions
- Own technical roadmap for cluster level inference efficiency
- Partner to identify fleet level throughput and latency wins
- Run post incident review and process changes
- Sequence competing infrastructure priorities
Perks/Benefits
- Flexible hybrid office policy
- Flexible working hours
- Generous vacation
- Optional equity donation matching
- Parental leave
- Visa sponsorship
Skills/Tech-stack
Cache Management | Deploy Safety | Distributed State | Distributed Systems | High Performance | High-performance networking | Incident Response | Latency optimization | Load Balancing | On-Call | On-call operations | Performance Modeling | Scheduling | Throughput Optimization
Education
Roles
Regions
Countries
States
Related jobs
-
AWS CloudFormation | Amazon Web Services | Anomaly Detection | Attribution Modeling | Budget Quota ManagementSenior-level Full TimeMountain View, California9h ago
-
Engineering Manager - AI Cloud Platform USD 330K-440KBackend Development | Cluster Lifecycle Automation | Cluster lifecycle | Distributed Systems | Go401k match | Commuter stipend | Dental insurance | Flexible paid time off | Health insuranceMid-level Full TimeSan Francisco Office15h ago
-
Senior Manager, Machine Learning USD 141K-187KAWS SageMaker | Amazon ECS | Amazon EKS | Amazon Web Services | Apache SparkGenerous time off | Healthcare | Paid parental leave | Paid personal time off | Paid sick timeSenior-level Full TimeRemote - US R17h ago
-
Engineering Manager, Data Infrastructure USD 165K-242KAgile | Apache Airflow | Apache Iceberg | Apache Spark | Automation401k match | Catered lunch | Employee stock purchase program | Family forming support | Flexible PTOSenior-level Full TimeNew York, NY/Bellevue, WA19h ago
-
Technical Lead Manager, AI/ML Networking USD 207K-300KArtificial Intelligence | C++ | Compute Technologies | Dataplane Encryption | Deep Learning Execution ProviderSenior-level Full TimeRaleigh, NC, USA; Durham, NC, USA23h ago
-
Data Pipelines | Data Processing | Debugging | Distributed Systems | Engineering ManagementSenior-level Full TimePittsburgh, PA, USA23h ago
-
Architecture Reviews | Distributed Systems | Engineering processes | Engineering workflows | File systemsCareer advancement opportunities | Health and wellbeing benefits | Mentorship | Professional development | Work-life balance flexibilitySenior-level Full TimeMassachusetts R1d ago
-
Call Management | Distributed Systems | File systems | Incident Response | LinuxCareer advancement | Health and wellbeing benefits | Mentorship | Professional development | Work-life flexibilitySenior-level Full TimeMinnesota R1d ago
-
Architecture Reviews | Capacity Management | Capacity Planning | Design reviews | Distributed SystemsCareer advancement opportunities | Health and wellbeing benefits | Mentorship | Personal and professional development | Work from home flexibilitySenior-level Full TimeIdaho R1d ago
-
Distributed Systems | Filesystems | Incident Management | Linux | NetworkingCareer advancement opportunities | Health and wellbeing benefits | Mentorship | Personal and professional development | Work-life flexibilityMid-level Full TimeIllinois R1d ago
-
Architecture Review | Automation | Distributed Systems | Filesystem | Incident ResponseHealth and wellbeing benefits | Mentorship and career advancement | Professional development opportunities | Work-life flexibilityMid-level Full TimeColumbia R1d ago
-
Automation | Distributed Systems | Engineering process | Engineering process improvement | File systemsCareer advancement | Flexibility to work remotely | Health and wellbeing benefits | Inclusion and diversity | Mentorship opportunitiesMid-level Full TimeColorado R1d ago
-
Best practices | Distributed Systems | Engineering Best Practices | Filesystems | Incident ManagementCareer advancement opportunities | Health and wellbeing benefits | Mentorship | Professional development support | Work-life flexibilitySenior-level Full TimeFlorida R1d ago
-
Distributed Systems | File systems | Incident Response | Linux | NetworkingCareer advancement | Flexibility | Health and wellbeing benefits | Mentorship | Professional developmentMid-level Full TimeCalifornia R1d ago
-
Capacity Planning | Distributed Systems | Documentation | Engineering processes | FilesystemCareer advancement | Health and wellbeing benefits | Mentorship | Professional development | Work-life flexibilitySenior-level Full TimeConnecticut R1d ago
-
Architecture Reviews | Best practices | Call Management | Distributed Systems | Engineering Best PracticesCareer advancement | Health and wellbeing benefits | Mentorship opportunities | Professional development | Work-life flexibilitySenior-level Full TimeArizona R1d ago
-
AI/ML | Agile | Cloud infrastructure | Distributed Systems | GPU infrastructure401k savings plan | Company holidays | Employee assistance program | Flexible Working Program | Health insuranceSenior-level Full TimeUnited States1d ago
-
Sr. Manager, Business Transformation & Analytics USD 135K-160KAPI Integrations | Artificial Intelligence | CI methodology | Cause analysis | Continuous Improvement401k match | Career progression opportunities | Coaching and professional development | Dental insurance | Disability insuranceSenior-level Full TimeKing of Prussia, PA1d ago
-
Apache Flink | Apache Hive | Apache Hudi | Apache Iceberg | Apache SparkSenior-level Full TimeSunnyvale, CA, USA1d ago
-
Engineering Manager, ML Infrastructure Control Plane USD 207K-300KCompute Technologies | Distributed Systems | Infrastructure | Machine Learning | NetworkingSenior-level Full TimeSunnyvale, CA, USA1d ago
-
APIs | Developer experience | Distributed Systems | EKS | Health MonitoringCareer growth | Knowledge sharing | Mentorship | Work-life balanceSenior-level Full TimeCupertino, California, USA2d ago
-
Access Management | Authentication | Authorization | Cloud Computing | Distributed SystemsAnnual performance bonus | Comprehensive benefits | EquitySenior-level Full TimeMountain View, California2d ago
-
Sr. Technical Program Manager (TPM) USD 225K-265KAI | AWS | Azure | Cloud Computing | ContainerizationHealth insurance | Startup equitySenior-level Full TimeSan Francisco2d ago
-
Data Engineering Manager, Core Experience & Incentives USD 183K-232KAmazon Redshift | Amundsen | Apache Airflow | Apache Flink | Apache KafkaAnnual equity refresh grants | Equity grant | Remote work friendlySenior-level Full TimeUnited States - Remote R2d ago
-
AI Labs – Senior Manager USD 180K-220KAPI first | API-First Development | Agentic AI | Azure | Cloud Architecture401k plan | Commuter benefit | Education assistance | Paid time offSenior-level Full TimeCamden, New Jersey, United States; Conshohocken, …2d ago