Senior Engineering Manager, AI Runtime
USD 228K-297K Senior-level Full Time
Tasks
- Architect managed GPU training systems
- Build observability and reliability practices
- Define product and technical roadmap
- Develop operational runbooks
- Drive end-to-end delivery
- Implement checkpointing and failure recovery
- Lead and mentor engineering team
- Partner with recruiting to hire talent
Perks/Benefits
- N/A
Skills/Tech-stack
Checkpointing | Cluster Lifecycle Management | Cluster lifecycle | DeepSpeed | Distributed Training | Elastic Training | FSDP | Fault Tolerance | GPU Performance | GPU Performance Optimization | Lifecycle Management | Megatron-LM | NCCL | Observability | Performance optimization | Pipeline parallelism | PyTorch | Tensor Comprehension | Tensor Parallelism
Education
Roles
AI | AI Engineering | AI Engineering Manager | Engineering | Engineering Manager | Manager
Regions
Countries
States
Related jobs
-
Generative AI Executive Director USD 150K-210KComputer Vision | DAG | Data parallelism | Deep learning | DeepSpeedBackup childcare | Financial coaching | Health care coverage | Mental health support | On-site health and wellness centersExecutive-level Full TimeNew York, NY, United States6h ago
-
AI Engineering Intern USD 50K-50KAI Automation | AWS | Analytics | Anthropic API | AzureFree on-site lunches | Gym access | Hybrid work | Latest hardware | On-site EV chargingEntry-level InternshipMenlo Park, CA7h ago
-
AI Solution Principal Systems Development Engineer USD 148K-222KCluster management | Containerization | GPU Computing | Generative AI | Hardware/Software Integration401k plan | Choice of medical coverage | Fitness reimbursement | Generous Time Off Programs | Team member discounts on Dell productsSenior-level Full TimeAustin, Texas, United States, United States13h ago
-
Manager- Applied Sciences / Machine Learning USD 163K-331KAlgorithms | C# | C++ | Data Processing | Data StructuresMid-level Full TimeRedmond, WA, US; Mountain View, CA, …14h ago
-
Manager, Data Governance Platform USD 146K-233KAWS | Access Lifecycle | Access Management | Access lifecycle management | Approval Workflows401k | Dental insurance | Medical insurance | Paid time off | Relocation assistance not availableMid-level Full TimeFoster City, CA, United States14h ago
-
Perception Engineering Intern EUR 84K-104K3D Computer Vision | 3D Geometry | Agent systems | Agent-based | Agent-based systemsEntry-level InternshipSunnyvale, CA, United States17h ago
-
Engineering Manager, AI Engineering:Workflow Catalog USD 107K-229KAI Agents | CI/CD | Distributed Systems | Engineering Management | ObservabilityEmployee stock purchase plan | Equity compensation | Flexible paid time off | Growth and development fund | Home office supportMid-level Full TimeRemote, EMEA; Remote, US-Southeast R22h ago
-
Director, AI Engineering USD 180K-220KAutomated testing | Azure | Azure DevOps | Azure DevOps Pipelines | Azure ML401k with company match | Dental insurance | Life insurance | Long-term disability | Medical insuranceExecutive-level Full TimeDallas, TX - Hybrid (3x in … R22h ago
-
Software Engineer III - Python AI/ML USD 175K-215KA/B | A/B Testing | API Security | AWS | AngularBackup childcare | Comprehensive health care coverage | Financial coaching | Mental health support | On-site health and wellness centersSenior-level Full TimeJersey City, NJ, United States22h ago
-
Generative AI Applications Engineer (Agents & RAG) USD 103K-203KA/B | A/B Testing | AWS Bedrock | Amazon Kendra | Azure OpenAISenior-level Full TimeWashington, DC23h ago
-
AI guardrails | API Integration | AutoGPT | Azure AI | Content FilteringHybrid work scheduleMid-level ContractAustin, United States23h ago
-
AI Solutions Strategist (Palantir Foundry + AIP) USD 100K-203KAPI Integration | AWS | Agentic Workflows | Azure | BashSenior-level Full TimeFairfield, CA1d ago
-
AI Solutions Strategist (Palantir Foundry + AIP) USD 100K-203KAIP | API Integration | AWS | Agentic Workflows | Artificial IntelligenceSenior-level Full TimeWashington, DC1d ago
-
AI Solutions Strategist (Palantir Foundry + AIP) USD 100K-203KAPI Integration | AWS | AWS Bedrock | Agentic Workflows | Amazon KendraSenior-level Full TimeSeattle, WA1d ago
-
Lead AI Innovation Developer USD 73K-170KAgentic AI | Algorithms | Amazon Web Services | Appian | Automation PipelinesCareer growth | Hybrid work | Training and development | Travel opportunitiesSenior-level Full TimeUnited States1d ago
-
AI Innovation Architect USD 81K-178KAWS | Agentic AI | Appian | Artificial Intelligence | AutomationHybrid work | Travel as neededSenior-level Full TimeUnited States1d ago
-
Benchmarking | Code review | Data Pipelines | Distributed Systems | Evaluation FrameworksMid-level Full TimeMenlo Park, CA1d ago
-
Code review | Data Deduplication | Data Generation | Data Pipelines | Data ProcessingMid-level Full TimeMenlo Park, CA1d ago
-
Software Engineer - Language (Technical Leadership) USD 213K-293KASR | Automatic Speech Recognition | C++ | Conversational AI | Deep learningSenior-level Full TimeMenlo Park, CA | Seattle, WA …1d ago
-
AI Research Scientist, Media Data Research - MSL FAIR USD 147K-208KComputer Vision | Data Curation | Data Generation | Distributed Computing | HiveEntry-level Full TimeMenlo Park, CA1d ago
-
AI Research Scientist, Text Data Research - MSL FAIR USD 147K-208KAgentic data | Apache Hive | Apache Spark | Data Curation | Data GenerationEntry-level Full TimeBellevue, WA | Menlo Park, CA …1d ago
-
AI accelerators | As-a-Service | Bash | Cloud Performance | Cloud Performance ProfilingSenior-level Full TimeNew York, NY, USA; Atlanta, GA, …1d ago
-
Principal Product Manager - AI/ML USD 159K-230KAPI Development | Agile | Amazon SageMaker | Artificial Intelligence | ExperimentationSenior-level Full TimeNew York, NY R1d ago
-
Sr Mgr, AI/ML Software Engineering USD 177K-266KArtificial Intelligence | Cloud Computing | Data Pipelines | Data platform | Lifecycle ManagementSenior-level Full Time#, CA, US, # R1d ago
-
AI Intern USD 43K-60KAWS | Calculus | Data Analysis | Data Preprocessing | Data cleaningMentorshipEntry-level Full Time InternshipManassas, VA, United States1d ago