Software Engineering Manager, LLM Training
USD 170K-277K Entry-level Full Time
Tasks
- Architect high throughput post training infrastructure for LLMs
- Collaborate with responsible AI teams on compliance and safety
- Contribute to post training stack frameworks and integrations
- Create inclusive team environment
- Define performance goals metrics and operational excellence
- Develop multi modal post training strategy
- Enable distributed training parallelism strategies
- Implement observability and profiling for training runs
- Lead agentic research and autonomous performance optimization
- Lead and coach engineers
- Manage containerized training environments with golden images
- Optimize customer workloads and platform performance
Perks/Benefits
Skills/Tech-stack
CUDA | Containerization | Data parallelism | Distributed Systems | Docker | Expert parallelism | Fine Tuning | FlashAttention | Hugging Face | Hugging Face Accelerate | Hugging Face Transformers | Human Feedback | Knowledge Distillation | Language Models | Large Language Models | Learning from Human Feedback | Low-precision training | Megatron-LM | Multi-Modal | Multi-modal Models | NCCL | Observability | Pipeline parallelism | Profiling | Pruning | PyTorch | Quantization | Ray | Ray Tune | Reinforcement Learning | Reinforcement Learning from Human Feedback | SGLang | Speculative decoding | Supervised Fine Tuning | Telemetry | Tensor Parallelism | VLLM | VeRL
Education
Bachelor of Engineering | Bachelor of Science | Master of Science | PhD
Regions
Countries
States
Related jobs
-
AI Engineering Lead USD 246K-329KAgentic Workflows | Context Management | Language Models | Language Processing | Large Language ModelsSenior-level Full TimeSF or NYC10h ago
-
Senior Machine Learning Engineering Manager USD 345K-399KCloud Computing | Computer Vision | Content Moderation | Data pipeline | Deep learningEquity compensation | Health benefits | Onsite collaboration daysSenior-level Full TimeSan Mateo, CA, United States15h ago
-
Anomaly Detection | CI/CD | Data Engineering | Data analytics | MLOps401k match | Dental insurance | Flexible work schedules | Life insurance | Medical insuranceSenior-level Full TimeLos Angeles, USA15h ago
-
Principal Data Engineer - League Studios USD 209K-293KAnalytics Platforms | Apache Airflow | Apache Spark | Batch Data Processing | Batch data401k company match | Dental insurance | Flexible work schedules | Life insurance | Medical insuranceSenior-level Full TimeLos Angeles, USA16h ago
-
Engineering Manager, Agentic Systems USD 162K-284KC++ | Deep learning | DeepSpeed | Distributed Training | GPU OptimizationMid-level Full TimeMountain View, CALIFORNIA, United States17h ago
-
Head of Frontier Data - STEM USD 350K-410KAI Feedback | Continuous integration | Data Quality | Data Synthesis | Data labelingCollaborative environment | Five-day workweek | Flexible working hours | Startup speed execution | Supportive work cultureExecutive-level Full TimeSan Francisco, California, United States; United …20h ago
-
Engineering Manager, Inference ML Runtime USD 180K-250KC++ | Cloud infrastructure | Deep learning | Distributed Systems | High PerformanceMid-level Full TimeSunnyvale CA or Toronto Canada21h ago
-
Artificial Intelligence | Cluster computing | Data Preprocessing | Deep learning | Dimensional dataMid-level Full TimeBurlingame, CA | New York, NY23h ago
-
Product Manager I, Youtube Ads, GenAI Creatives USD 138K-198KAdvertising | DeepMind | Generative AI | Language Models | Large Language ModelsMid-level Full TimeMountain View, CA, USA1d ago
-
Context engineering | Cross-Functional Collaboration | Cross-functional | Data-Driven Insights | Data-drivenMid-level Full TimeSunnyvale, CA, USA; San Francisco, CA, …1d ago
-
Artificial Intelligence | Automation | Backend Development | C++ | CSSSenior-level Full TimeKirkland, WA, USA1d ago
-
AI Engineering Manager, Notifications AI USD 170K-277KA/B | A/B Testing | B testing | Big Data | CachingHealth and wellness programs | Time awayMid-level Full TimeMountain View, CA, United States1d ago
-
Senior Manager, Analytical AI Clinical Development USD 152K-202KAWS | Airflow | Azure | CI/CD | Dagster401k plan | Dental insurance | Employee Assistance Program (EAP) | Flexible time off | Health insuranceSenior-level Full TimePrinceton Pike - NJ, United States1d ago
-
AI Governance | AWS | Apache Spark | CRM | Churn modelingEducation support | Flexible hybrid work arrangements | Flexible savings and spending accounts | Income protection insurance options | Medical, dental, and vision coverageSenior-level Full TimeUSA-GA-Atlanta, United States1d ago
-
AWS | Apache Spark | CRM | Churn modeling | Data ComplianceFlexible hybrid work arrangements | Flexible savings and spending accounts | Income protection insurance | Medical, dental, and vision coverage | On-site fitness centerSenior-level Full TimeUSA-GA-Atlanta, United States1d ago
-
AI Product Manager – Business Enablement (Remote) USD 100K-210KAdoption Metrics | Agentic Workflows | Backlog Management | Change Management | Copilot Studio401k matching | Employee stock purchase program | Life insurance | Medical/Dental/Vision insurance | Paid HolidaysMid-level Full TimeRemote - OH, United States R1d ago
-
Artificial Intelligence | Distributed Systems | High Performance | High-Performance Computing | Infrastructure EngineeringSenior-level Full TimeBoston, Massachusetts, USA1d ago
-
Engineering Manager - Real-Time Data Platform USD 66K-117KCloud adoption | Data Standards | Data platforms | DevOps | Distributed SystemsMid-level Full TimeHouston, TX, US, 770321d ago
-
APIs | Agentic Systems | Automation | Data Pipelines | Explainable AICareer growth | Employee development programs | Financial benefits | Health insurance | Parental leaveMid-level Full TimeHouston, TX, US, 770321d ago
-
Chief of Staff to the Managing Director of AI USD 140K-242KAirtable | Artificial Intelligence | Asana | Change Management | Code AutomationEquity participation | Performance bonuses | Remote-friendly | Travel opportunitiesSenior-level Full TimeAtlanta, GA1d ago
-
API Design | Apache Flink | Apache Hadoop | Apache Kafka | Apache SparkHealth and wellness programs | Time offEntry-level Full TimeMountain View, CA, United States1d ago
-
Machine Learning Engineering Manager, Critical Harms USD 295K-345KData Analysis | Distributed Systems | Experimentation | Machine Learning | Software EngineeringEquity compensationMid-level Full TimeSan Mateo, CA, United States1d ago
-
Distinguished Software Engineer - AI Infrastructure USD 248K-406KAI infrastructure | Data Engineering | Data Infrastructure | Distributed Databases | Distributed SystemsExecutive-level Full TimeMountain View, CA, United States1d ago
-
Software Engineer Manager, GKE and AI Infrastructure USD 207K-300KAI workflows | Container Orchestration | Containerization | Distributed Systems | GKESenior-level Full TimeSunnyvale, CA, USA2d ago
-
Autonomous Vehicles | C plus plus | CI/CD | Cloud Computing | Computer VisionBenefits | EquitySenior-level Full TimeUS, CA, Santa Clara, United States2d ago