Software Engineering Manager, LLM Training
USD 170K-277K Entry-level Full Time
Tasks
- Architect high throughput post training infrastructure for LLMs
- Collaborate with responsible AI teams on compliance and safety
- Contribute to post training stack frameworks and integrations
- Create inclusive team environment
- Define performance goals metrics and operational excellence
- Develop multi modal post training strategy
- Enable distributed training parallelism strategies
- Implement observability and profiling for training runs
- Lead agentic research and autonomous performance optimization
- Lead and coach engineers
- Manage containerized training environments with golden images
- Optimize customer workloads and platform performance
Perks/Benefits
Skills/Tech-stack
CUDA | Containerization | Data parallelism | Distributed Systems | Docker | Expert parallelism | Fine Tuning | FlashAttention | Hugging Face | Hugging Face Accelerate | Hugging Face Transformers | Human Feedback | Knowledge Distillation | Language Models | Large Language Models | Learning from Human Feedback | Low-precision training | Megatron-LM | Multi-Modal | Multi-modal Models | NCCL | Observability | Pipeline parallelism | Profiling | Pruning | PyTorch | Quantization | Ray | Ray Tune | Reinforcement Learning | Reinforcement Learning from Human Feedback | SGLang | Speculative decoding | Supervised Fine Tuning | Telemetry | Tensor Parallelism | VLLM | VeRL
Education
Bachelor of Engineering | Bachelor of Science | Master of Science | PhD
Regions
Countries
States
Related jobs
-
AI Engineering Lead USD 246K-329KAgentic Workflows | Context Management | Language Models | Language Processing | Large Language ModelsSenior-level Full TimeSF or NYC9h ago
-
Senior Machine Learning Engineering Manager USD 345K-399KCloud Computing | Computer Vision | Content Moderation | Data pipeline | Deep learningEquity compensation | Health benefits | Onsite collaboration daysSenior-level Full TimeSan Mateo, CA, United States13h ago
-
Engineering Manager, Inference ML Runtime USD 180K-250KC++ | Cloud infrastructure | Deep learning | Distributed Systems | High PerformanceMid-level Full TimeSunnyvale CA or Toronto Canada19h ago
-
Artificial Intelligence | Cluster computing | Data Preprocessing | Deep learning | Dimensional dataMid-level Full TimeBurlingame, CA | New York, NY22h ago
-
Product Manager I, Youtube Ads, GenAI Creatives USD 138K-198KAdvertising | DeepMind | Generative AI | Language Models | Large Language ModelsMid-level Full TimeMountain View, CA, USA22h ago
-
Context engineering | Cross-Functional Collaboration | Cross-functional | Data-Driven Insights | Data-drivenMid-level Full TimeSunnyvale, CA, USA; San Francisco, CA, …22h ago
-
Artificial Intelligence | Automation | Backend Development | C++ | CSSSenior-level Full TimeKirkland, WA, USA22h ago
-
AI Engineering Manager, Notifications AI USD 170K-277KA/B | A/B Testing | B testing | Big Data | CachingHealth and wellness programs | Time awayMid-level Full TimeMountain View, CA, United States1d ago
-
Senior Manager, Analytical AI Clinical Development USD 152K-202KAWS | Airflow | Azure | CI/CD | Dagster401k plan | Dental insurance | Employee Assistance Program (EAP) | Flexible time off | Health insuranceSenior-level Full TimePrinceton Pike - NJ, United States1d ago
-
AI Governance | AWS | Apache Spark | CRM | Churn modelingEducation support | Flexible hybrid work arrangements | Flexible savings and spending accounts | Income protection insurance options | Medical, dental, and vision coverageSenior-level Full TimeUSA-GA-Atlanta, United States1d ago
-
AWS | Apache Spark | CRM | Churn modeling | Data ComplianceFlexible hybrid work arrangements | Flexible savings and spending accounts | Income protection insurance | Medical, dental, and vision coverage | On-site fitness centerSenior-level Full TimeUSA-GA-Atlanta, United States1d ago
-
AI Product Manager – Business Enablement (Remote) USD 100K-210KAdoption Metrics | Agentic Workflows | Backlog Management | Change Management | Copilot Studio401k matching | Employee stock purchase program | Life insurance | Medical/Dental/Vision insurance | Paid HolidaysMid-level Full TimeRemote - OH, United States R1d ago
-
Artificial Intelligence | Distributed Systems | High Performance | High-Performance Computing | Infrastructure EngineeringSenior-level Full TimeBoston, Massachusetts, USA1d ago
-
Engineering Manager - Real-Time Data Platform USD 66K-117KCloud adoption | Data Standards | Data platforms | DevOps | Distributed SystemsMid-level Full TimeHouston, TX, US, 770321d ago
-
APIs | Agentic Systems | Automation | Data Pipelines | Explainable AICareer growth | Employee development programs | Financial benefits | Health insurance | Parental leaveMid-level Full TimeHouston, TX, US, 770321d ago
-
API Design | Apache Flink | Apache Hadoop | Apache Kafka | Apache SparkHealth and wellness programs | Time offEntry-level Full TimeMountain View, CA, United States1d ago
-
Machine Learning Engineering Manager, Critical Harms USD 295K-345KData Analysis | Distributed Systems | Experimentation | Machine Learning | Software EngineeringEquity compensationMid-level Full TimeSan Mateo, CA, United States1d ago
-
Distinguished Software Engineer - AI Infrastructure USD 248K-406KAI infrastructure | Data Engineering | Data Infrastructure | Distributed Databases | Distributed SystemsExecutive-level Full TimeMountain View, CA, United States1d ago
-
Software Engineer Manager, GKE and AI Infrastructure USD 207K-300KAI workflows | Container Orchestration | Containerization | Distributed Systems | GKESenior-level Full TimeSunnyvale, CA, USA1d ago
-
Autonomous Vehicles | C plus plus | CI/CD | Cloud Computing | Computer VisionBenefits | EquitySenior-level Full TimeUS, CA, Santa Clara, United States2d ago
-
Software Engineer - Manager USD 120K-150KAWS ECR | AWS ECS | Accessibility | Agile | Amazon Web Services401k matching | Healthcare benefits | Hybrid work schedule | Paid time off | Paid trainingMid-level Full TimeUSA - Georgia - Alpharetta - …2d ago
-
Artificial Intelligence | CUDA | CUDA-X | CUDA-X libraries | Curriculum DevelopmentMid-level Full TimeUS, CA, Santa Clara, United States2d ago
-
Software Development Manager, Advanced Analytics USD 184K-250KAttribution | Data Engineering | Data Ingestion | Data Modeling | Data ProcessingMid-level Full TimeSeattle, Washington, USA2d ago
-
Engineering Manager, Inference Cloud USD 180K-250KAWS EKS | Active/Active | Admission control | Alerting | BackpressureMid-level Full TimeSunnyvale CA or Toronto Canada2d ago
-
Group Product Manager, Generative AI, Google Cloud USD 240K-334KCloud platform | Generative AI | Google Cloud | Google Cloud Platform | Language ModelsSenior-level Full TimeSunnyvale, CA, USA; Kirkland, WA, USA3d ago