Senior Engineering Manager, AI Runtime
USD 228K-297K Senior-level Full Time
Tasks
- Architect managed GPU training systems
- Build observability and reliability practices
- Define product and technical roadmap
- Develop operational runbooks
- Drive end-to-end delivery
- Implement checkpointing and failure recovery
- Lead and mentor engineering team
- Partner with recruiting to hire talent
Perks/Benefits
- N/A
Skills/Tech-stack
Checkpointing | Cluster Lifecycle Management | Cluster lifecycle | DeepSpeed | Distributed Training | Elastic Training | FSDP | Fault Tolerance | GPU Performance | GPU Performance Optimization | Lifecycle Management | Megatron-LM | NCCL | Observability | Performance optimization | Pipeline parallelism | PyTorch | Tensor Comprehension | Tensor Parallelism
Education
Roles
AI | AI Engineering | AI Engineering Manager | Engineering | Engineering Manager | Manager
Regions
Countries
States
Related jobs
-
Tech Lead, AI Research Scientist (Robotics) USD 170K-251KAction Conditioned World Models | Artificial Intelligence | Computer Vision | Deep learning | Dexterous ManipulationMentorship opportunities | Open science contributions | Work authorization supportSenior-level Full TimeMenlo Park, CA10h ago
-
AI | AI Agents | Agent systems | Cloud Computing | Context engineeringSenior-level Full TimeSan Francisco, CA, USA; New York, …10h ago
-
Senior-level Full TimeNew York, New York, United States14h ago
-
Senior AI Engineer USD 107K-199KAKS | API Design | Alerts | Anomaly Detection | Apache SparkHybrid work environment | Inclusion support | Learning opportunities | Well-being supportSenior-level Full TimeUSA, Massachusetts, Boston, 200 Berkeley Street, …21h ago
-
Associate AI Engineer USD 80K-134KAPI Development | Azure | Cloud Platforms | Data Preparation | DocumentationFlexible work environment | Hybrid work arrangement | Inclusion programs | Paid time off | Wellness benefitsMid-level Full TimeUSA, Massachusetts, Boston, 200 Berkeley Street, …21h ago
-
Entry-level Full TimeUnited States - Remote R21h ago
-
Agentic AI Machine Learning Engineer USD 99K-225KAPI Integration | Cloud Computing | Computer Vision | Confluent | Deep learningDependent care | Disability insurance | Health insurance | Life insurance | Paid leaveMid-level Full TimeUSA, DC, Washington (901 15th St …21h ago
-
AI Engineer, Generative AI Agents USD 130K-188KAWS | Agile | Amazon Bedrock | Context engineering | Cost OptimizationOn-site work requiredSenior-level Full TimeDenver, CO22h ago
-
Principal Agentic AI Engineer USD 274K-338KAgent Orchestration | Auditability | Benchmarking | Confidence scoring | Distributed SystemsContinuing education support | Dental insurance | Flexible vacation policy | Health insurance | Paid parental leaveSenior-level Full Timesan francisconew york R23h ago
-
Senior Manager, Software Engineering - Remote USD 125K-200KAPI | API Gateway | Agentic Workflows | Amazon Web Services | CI/CDComprehensive benefits package | Remote work | Variable pay opportunitySenior-level Full TimeUnited States, UNITED STATES, United States R23h ago
-
AI Engineer USD 115K-192KAWS | Azure | BigQuery | CI/CD | Cloud ComputingChild care assistance | Employee resource groups | Flexible work schedule | Medical dental prescription coverage | Paid HolidaysMid-level Full TimeDearborn, MI, United States1d ago
-
Senior-level Contract Full TimeSan Jose, CA, United States1d ago
-
AI Full Stack Developer & Architect USD 130K-180KCloud Run | Containerization | JavaScript | Kubernetes | MLOpsSenior-level Contract Full TimeSan Jose, CA, United States1d ago
-
AI/ML Engineer (Active TS/SCI ) USD 99K-225KConvolutional Neural Network | Data Versioning | DevSecOps | Faster R-CNN | Feature Engineering401k match | Disability insurance | Full remote flexibility | Home office & equipment plan | Life insuranceMid-level Full TimeDayton, OH1d ago
-
AI Solution Architect – Enterprise AI Platform (Oscar) USD 100K-162KAI Services | API Development | API Management | Agent Orchestration | AlertingFlexible work options | Relocation assistance not available | Sponsorship not available for US work authorizationSenior-level Full TimeDallas, TX, United States1d ago
-
AI Engineer - FDE (Forward Deployed Engineer) USD 152K-210KAWS | Agent systems | Apache Spark | Azure | Cloud platformRemote work | Travel once every 4 to 8 weeksSenior-level Full TimeUnited States1d ago
-
Applied ML and Generative AI Lead - Vice President USD 176K-215KAWS | Azure | Cloud Computing | Cloud platform | Deep learningSenior-level Full TimeJersey City, NJ, United States1d ago
-
Apache Spark | Computer Vision | Data Curation | Data Pipelines | GroundingSenior-level Full TimeSunnyvale, CA | Bellevue, WA | …1d ago
-
AIPS | API Standards | Apigee | Authentication | Best practicesSenior-level Full TimeSeattle, WA, USA; Goleta, CA, USA1d ago
-
AWS | Agent systems | Apache Spark | Azure | Data EngineeringConference speaking | On site customer collaboration | Periodic travel | Remote workSenior-level Full TimeMaryland; Virginia; Washington, D.C.1d ago
-
Data Scientist / AI/ML Engineer (Imagery) VAWFH 1652 USD 153K-207KAccuracy | Computer Vision | Containerization | Data Cleansing | Data PreprocessingSenior-level Full TimeReston, VA R1d ago
-
Senior Machine Learning Ops Engineer USD 150K-173KAWS | Airflow | Bash | Batch inference | CI/CDEmployee mentorship program | Leadership programsSenior-level Full TimeUnited States R1d ago
-
AI Solutions Architect USD 110K-185KAPI Design | AWS | Agent Frameworks | Agentic AI | AuditabilityAccident insurance | Bereavement leave | Defined contribution retirement plan | Dental insurance | Dependent care reimbursement accountsSenior-level Full TimePlano, TX, United States1d ago
-
Advanced AI Architect USD 136K-177KAI Foundry | AWS Bedrock | Argo CD | Artificial Intelligence | Audit LoggingSenior-level Full TimeAEP Headquarters, United States1d ago
-
Staff Data Eng, Data Systems, TCGplayer USD 136K-228KData Architecture | Data Governance | Data Lifecycle Management | Data lifecycle | Event Driven401k eligibility | Medical benefits | Paid time off | Parental leaveSenior-level Full TimeRemote North Carolina, United States R1d ago