Engineering Manager, Inference Routing and Performance
San Francisco, CA | New York City, NY
USD 405K-485K Mid-level Full Time
Tasks
- Build quantitative performance modeling culture
- Coach engineers through shifting roadmap priorities
- Create technical strategy for routing across heterogeneous accelerators
- Develop and retain distributed systems engineers
- Drive deploy safety incident response on call
- Hire and evaluate senior technical candidates
- Investigate tail latency regressions
- Own technical roadmap for cluster level inference efficiency
- Partner to identify fleet level throughput and latency wins
- Run post incident review and process changes
- Sequence competing infrastructure priorities
Perks/Benefits
- Flexible hybrid office policy
- Flexible working hours
- Generous vacation
- Optional equity donation matching
- Parental leave
- Visa sponsorship
Skills/Tech-stack
Cache Management | Deploy Safety | Distributed State | Distributed Systems | High Performance | High-performance networking | Incident Response | Latency optimization | Load Balancing | On-Call | On-call operations | Performance Modeling | Scheduling | Throughput Optimization
Education
Roles
Regions
Countries
States
Related jobs
-
AWS CloudFormation | Amazon Web Services | Anomaly Detection | Attribution Modeling | Budget Quota ManagementSenior-level Full TimeMountain View, California11h ago
-
Engineering Manager - AI Cloud Platform USD 330K-440KBackend Development | Cluster Lifecycle Automation | Cluster lifecycle | Distributed Systems | Go401k match | Commuter stipend | Dental insurance | Flexible paid time off | Health insuranceMid-level Full TimeSan Francisco Office17h ago
-
Senior Manager, Machine Learning USD 141K-187KAWS SageMaker | Amazon ECS | Amazon EKS | Amazon Web Services | Apache SparkGenerous time off | Healthcare | Paid parental leave | Paid personal time off | Paid sick timeSenior-level Full TimeRemote - US R19h ago
-
Engineering Manager, Data Infrastructure USD 165K-242KAgile | Apache Airflow | Apache Iceberg | Apache Spark | Automation401k match | Catered lunch | Employee stock purchase program | Family forming support | Flexible PTOSenior-level Full TimeNew York, NY/Bellevue, WA21h ago
-
Technical Lead Manager, AI/ML Networking USD 207K-300KArtificial Intelligence | C++ | Compute Technologies | Dataplane Encryption | Deep Learning Execution ProviderSenior-level Full TimeRaleigh, NC, USA; Durham, NC, USA1d ago
-
Data Pipelines | Data Processing | Debugging | Distributed Systems | Engineering ManagementSenior-level Full TimePittsburgh, PA, USA1d ago
-
Anomaly Detection | CCPA | Circumvention Detection | Connected Account Detection | Data GovernanceSenior-level Full TimeUnited States1d ago
-
Architecture Reviews | Distributed Systems | Engineering processes | Engineering workflows | File systemsCareer advancement opportunities | Health and wellbeing benefits | Mentorship | Professional development | Work-life balance flexibilitySenior-level Full TimeMassachusetts R1d ago
-
Call Management | Distributed Systems | File systems | Incident Response | LinuxCareer advancement | Health and wellbeing benefits | Mentorship | Professional development | Work-life flexibilitySenior-level Full TimeMinnesota R1d ago
-
Architecture Reviews | Capacity Management | Capacity Planning | Design reviews | Distributed SystemsCareer advancement opportunities | Health and wellbeing benefits | Mentorship | Personal and professional development | Work from home flexibilitySenior-level Full TimeIdaho R1d ago
-
Distributed Systems | Filesystems | Incident Management | Linux | NetworkingCareer advancement opportunities | Health and wellbeing benefits | Mentorship | Personal and professional development | Work-life flexibilityMid-level Full TimeIllinois R1d ago
-
Architecture Review | Automation | Distributed Systems | Filesystem | Incident ResponseHealth and wellbeing benefits | Mentorship and career advancement | Professional development opportunities | Work-life flexibilityMid-level Full TimeColumbia R1d ago
-
Automation | Distributed Systems | Engineering process | Engineering process improvement | File systemsCareer advancement | Flexibility to work remotely | Health and wellbeing benefits | Inclusion and diversity | Mentorship opportunitiesMid-level Full TimeColorado R1d ago
-
Best practices | Distributed Systems | Engineering Best Practices | Filesystems | Incident ManagementCareer advancement opportunities | Health and wellbeing benefits | Mentorship | Professional development support | Work-life flexibilitySenior-level Full TimeFlorida R1d ago
-
Distributed Systems | File systems | Incident Response | Linux | NetworkingCareer advancement | Flexibility | Health and wellbeing benefits | Mentorship | Professional developmentMid-level Full TimeCalifornia R1d ago
-
Capacity Planning | Distributed Systems | Documentation | Engineering processes | FilesystemCareer advancement | Health and wellbeing benefits | Mentorship | Professional development | Work-life flexibilitySenior-level Full TimeConnecticut R1d ago
-
Architecture Reviews | Best practices | Call Management | Distributed Systems | Engineering Best PracticesCareer advancement | Health and wellbeing benefits | Mentorship opportunities | Professional development | Work-life flexibilitySenior-level Full TimeArizona R1d ago
-
AI/ML | Agile | Cloud infrastructure | Distributed Systems | GPU infrastructure401k savings plan | Company holidays | Employee assistance program | Flexible Working Program | Health insuranceSenior-level Full TimeUnited States1d ago
-
Global Manager, Survey Science, Analytics & Programming USD 147K-184KAlteryx | CI/CD | Causal Inference | Cloud Computing | Data ArchitectureSenior-level Full TimeLakewood, CO, US1d ago
-
AI Deployment Engineer USD 160K-224KAgent Framework | Authentication | Container Orchestration | Data Pipelines | Distributed SystemsSenior-level Full TimeUnited States - Remote R1d ago
-
Batch inference | CPU | Deep learning | Distributed Systems | Experimentation401k retirement plan | Disability programs | Family-forming benefits | Flexible spending account | Health insuranceSenior-level Full TimeLos Gatos, United States1d ago
-
Tech Lead Manager, Data Infrastructure USD 250K-375KData Curation | Data Engineering | Data Pipelines | Data Processing | Data QualityCollaborative environment | Fast execution culture | In-office work | Open, inclusive cultureSenior-level Full Time*HQ - San Francisco, CA1d ago
-
Sr. Manager, Business Transformation & Analytics USD 135K-160KAPI Integrations | Artificial Intelligence | CI methodology | Cause analysis | Continuous Improvement401k match | Career progression opportunities | Coaching and professional development | Dental insurance | Disability insuranceSenior-level Full TimeKing of Prussia, PA1d ago
-
Apache Flink | Apache Hive | Apache Hudi | Apache Iceberg | Apache SparkSenior-level Full TimeSunnyvale, CA, USA2d ago
-
Engineering Manager, ML Infrastructure Control Plane USD 207K-300KCompute Technologies | Distributed Systems | Infrastructure | Machine Learning | NetworkingSenior-level Full TimeSunnyvale, CA, USA2d ago