Software Engineering Manager, LLM Training
USD 170K-277K Entry-level Full Time
Tasks
- Architect post training infrastructure
- Collaborate with responsible AI teams on compliance and safety
- Define engineering performance goals and metrics
- Develop agentic research and performance optimization agents
- Develop post training platform components
- Drive operational excellence culture
- Enable distributed training parallelism
- Implement observability and profiling for training runs
- Lead and coach engineering team
- Lead containerized training image lifecycle management
- Optimize LLM training performance
Perks/Benefits
- N/A
Skills/Tech-stack
CUDA | Containerization | Context Parallelism | Data I/O | Data parallelism | Distributed Systems | Expert parallelism | Fine Tuning | FlashAttention | High Performance | High Performance Data I O | High-performance data | Hugging Face | Hugging Face Accelerate | Hugging Face Transformers | Human Feedback | I/O | Learning from Human Feedback | Liger Kernels | Low Precision | Low-precision training | Megatron | Model Pruning | Model Quantization | Multi Teacher Distillation | NCCL | Observability | Pipeline parallelism | Profiling | PyTorch | Ray | Reinforcement Learning | Reinforcement Learning GRPO | Reinforcement Learning from Human Feedback | SGLang | Speculative decoding | Supervised Fine Tuning | Telemetry | Tensor Parallelism | VLLM | VeRL
Education
Regions
Countries
States
Related jobs
-
Practice Manager - AI & Data USD 160K-190KAWS | Agent systems | Amazon SageMaker | Apache Spark | AzureCareer growth opportunities | Comprehensive benefits | MentorshipSenior-level Full TimeBroomfield, CO. Greensboro, NC. Troy, Michigan13h ago
-
Software Engineering Manager II, AI/ML, Google Cloud USD 207K-301KData Processing | Debugging | Fine Tuning | Language Processing | Machine LearningSenior-level Full TimeSunnyvale, CA, USA; Kirkland, WA, USA21h ago
-
APIs | AWS | Agile | Agile tools | AngularDental insurance | Medical insurance | Paid time off | Retirement savings options | Vision insuranceSenior-level Full TimeWork At Home-Connecticut, United States1d ago
-
Associate Director, Data Science USD 160K-297KADME | Data Analysis | Data Mining | Data Visualization | Deep learning401k match | Disability benefits | Health benefits | Hybrid work | Life benefitsMid-level Full TimeCambridge (USA), United States R1d ago
-
Sales Data & Analytics Product Manager USD 131K-190KAmazon Web Services | Analytics reporting | Apache Airflow | Apache Kafka | Apache Spark401k | Bonus | Dental coverage | Holidays | Medical coverageMid-level Full TimeUS, MA, Wilmington, United States1d ago
-
Lead Data Engineer USD 170K-220KAccess Control | Automation | BI Tooling | Backups | Business Intelligence401k | Accessories allowance | Education stipend | Equity tax advisory service | Financial Wellness WebinarsSenior-level Full TimeNew York, NY3d ago
-
Amplitude | Analytics engineering | BI | Cloud platform | Data ModelingCo-working access | Employer paid group insurance premiums | Generous parental leave | Health spending account | Pre IPO equity upsideSenior-level Full TimeCanada3d ago
-
Engineering Manager, Data Flow Agentic Data Cloud USD 207K-301KApache Beam | BigQuery | C# | C++ | Cloud StorageSenior-level Full TimeSeattle, WA, USA3d ago
-
Sr. Manager, AI/ML Lead - Foster City, CA USD 169K-219KAWS | CLM | Clinical data | Contract lifecycle | Contract lifecycle managementSenior-level Full TimeUS - CA - Foster City, …4d ago
-
AWS | Agile | Cloud Computing | Data platform | Distributed SystemsRemote eligibleSenior-level Full TimeSan Francisco, CA, United States R4d ago
-
Forward Deployed AI Engineer - Enterprise Lead USD 180K-250KArtificial Intelligence | Automation | Code Quality | Data Analysis | Debugging401k | Commuter benefits | In-office lunch | Medical, dental & vision coverageSenior-level Full TimeSan Francisco4d ago
-
Senior Vice President, Applied AI Product Manager USD 104K-253KAI Lifecycle | AI Lifecycle Management | APIs | Agentic Workflows | Distributed SystemsFlexible global resources | Health and wellbeing programs | Paid leave | Paid volunteer timeSenior-level Full TimeNew York, NY, United States4d ago
-
Principal Technical Program Manager- AI/ML- Payments USD 177K-215KApplied AI | Audit Readiness | Batch Scoring | CI/CD | Cost OptimizationSenior-level Full TimeNew York, NY, United States4d ago
-
AI/ML | AI/ML evaluation | Artificial Intelligence | Cloud infrastructure | Cross-Functional CollaborationDental coverage | Health benefits | Hybrid work | Inclusive culture | Leadership developmentSenior-level Full TimeCanada4d ago
-
Cloud infrastructure | Data Processing | Debugging | Distributed Computing | Fine TuningSenior-level Full TimeSeattle, WA, USA4d ago
-
C plus plus | C# | Computer Vision | Data Compression | Data ProcessingSenior-level Full TimeMountain View, CA, USA4d ago
-
Technical Program Manager, Antigravity (Data), DeepMind USD 217K-237KAccess Control | Data Engineering | Data Governance | Data Infrastructure | Data pipelineMid-level Full TimeMountain View, CA, USA4d ago
-
Actuator control | Computer Vision | Embedded Systems | FPGA | Imitation LearningSenior-level Full TimeSunnyvale, CA, USA4d ago
-
Inference Optimization Manager USD 229K-286KCloud infrastructure | Distributed Systems | GPU Kernel | GPU kernel programming | Inference engine401k matching | Flexible paid time off | Health insurance | Remote work options | Team onsite eventsMid-level Full TimeUnited States / Canada5d ago
-
Senior Analytics Manager - AI Model & Prompt Engineering USD 172K-258KA/B | A/B Testing | AI Evaluation | APIs | AWS401k | Career development | Employee assistance program | Flexible spending accounts | Health savings accountSenior-level Full TimeChicago, Illinois, United States5d ago
-
Generative AI - Group Manager - Senior Vice President USD 176K-265KAI compliance | AI guardrails | AWQ | AWS | Autogen401k | Accident and disability insurance | Life insurance | Medical, dental & vision coverage | Paid HolidaysSenior-level Full Time480 WASHINGTON BOULEVARD JERSEY CITY, United …5d ago
-
Cons-Tech Cons-AI and Quan Modelling-A I Data Scientist - Manager - Multiple Positions - 1717977 USD 169K-169KAgile | Azure DevOps | Cloud Computing | Containerization | Data Mining401k plan | Continuous learning | Dental coverage | Flexible vacation policy | Hybrid work modelMid-level Full TimeHoboken, NJ, US, 07030 R5d ago
-
Lead, Finance Analytics & Enablement AI/ML USD 400K-500KAmplitude | Automation | Business Intelligence | Cloud platform | Data Engineering401k plan | Co-working space access | Disability insurance | Flex Spending Account | Health reimbursement accountSenior-level Full TimeNew York - Remote R5d ago
-
Manager, Software Engineering - Storage Platform USD 258K-376KDatabase migrations | Database provisioning | Distributed Caching | Distributed Systems | Incident ResponseCell phone reimbursement | Company recharge days | Generous PTO | Learning and development stipend | Mental health and wellness benefitsMid-level Full TimeSan Francisco, CA • New York, … R5d ago
-
Benchmarking | C++ | Continuous batching | Deep learning | Disaggregated servingSenior-level Full TimeSunnyvale, CA, USA5d ago