Software Engineering Manager, LLM Training
USD 170K-277K Entry-level Full Time
Tasks
- Architect post training infrastructure
- Collaborate with responsible AI teams on compliance and safety
- Define engineering performance goals and metrics
- Develop agentic research and performance optimization agents
- Develop post training platform components
- Drive operational excellence culture
- Enable distributed training parallelism
- Implement observability and profiling for training runs
- Lead and coach engineering team
- Lead containerized training image lifecycle management
- Optimize LLM training performance
Perks/Benefits
- N/A
Skills/Tech-stack
CUDA | Containerization | Context Parallelism | Data I/O | Data parallelism | Distributed Systems | Expert parallelism | Fine Tuning | FlashAttention | High Performance | High Performance Data I O | High-performance data | Hugging Face | Hugging Face Accelerate | Hugging Face Transformers | Human Feedback | I/O | Learning from Human Feedback | Liger Kernels | Low Precision | Low-precision training | Megatron | Model Pruning | Model Quantization | Multi Teacher Distillation | NCCL | Observability | Pipeline parallelism | Profiling | PyTorch | Ray | Reinforcement Learning | Reinforcement Learning GRPO | Reinforcement Learning from Human Feedback | SGLang | Speculative decoding | Supervised Fine Tuning | Telemetry | Tensor Parallelism | VLLM | VeRL
Education
Regions
Countries
States
Related jobs
-
Apache Flume | C++ | Data Modeling | Data Processing | Data StructuresSenior-level Full TimeMountain View, CA, USA12h ago
-
Apache Flink | Apache Hive | Apache Hudi | Apache Iceberg | Apache SparkSenior-level Full TimeSunnyvale, CA, USA12h ago
-
Associate Director - AI Engineering USD 150K-190KAI Governance | AWS | Agentic Frameworks | Agentic framework | Amazon SageMakerCompany-matched student loan contribution | Continuous learning | Family-friendly benefits | Financial wellness programs | Flexible time offMid-level Full TimeUS - NY NYC - 55 …23h ago
-
Tech Lead Manager, Foundation Models USD 298K-368KBayesian Inference | Deep learning | Generative Modeling | Language Models | Large Language Models401k match | Dental insurance | Disability insurance | Health insurance | HolidaysSenior-level Full TimeMountain View, CA, USA ; Kirkland, …1d ago
-
Technical Product Manager - Robotics USD 147K-183KCross-functional | Cross-functional leadership | Customer discovery | Data Generation | Distributed TrainingCareer growth | Collaborative culture | Flexibility | International environment | Learning opportunitiesMid-level Full TimeUnited States1d ago
-
Senior Manager, Data Engineering USD 149K-222KAWS | Automation | DBT | Data Governance | Data LineageEnhanced health and wellness benefits through One Medical | Flexible PTO | Gympass | Home internet subsidy | Hybrid work policySenior-level Full TimeNew York, New York1d ago
-
Senior Product Manager - Database AI Optimization USD 190K-220KAPM | APM Monitoring | AWS RDS | Artificial Intelligence | Data ObservabilityCommunity guilds | Hybrid work environment | Inclusion talks | Mentor and buddy program | Professional developmentSenior-level Full TimeNew York, New York, USA1d ago
-
Product Manager - Platform, Applied AI/ML USD 107K-165KData Science | Data analytics | Generative AI | Language Processing | Machine LearningBonuses | Equity | Healthcare benefits | Paid Holidays | Paid free daysMid-level Full TimeUS - Sunnyvale, United States1d ago
-
VP, Product & Technology AI Solutions Product Manager USD 135K-226KAcceptance criteria | Access Control | Artificial Intelligence | Auditability | Automation401k matching | Employee stock options | Health benefits | Paid time off | Volunteer time offExecutive-level Full TimeFort Mill/Charlotte, United States1d ago
-
Manager, Data Scientist - Card Payment Fraud Prevention USD 179K-245KAWS | Classification | Clustering | Conda | Confusion matrixMid-level Full TimeMcLean, VA, United States1d ago
-
Engineering Manager, Machine Learning USD 83K-138KAPI Design | Agile | Apache Kafka | Apache Spark | Artificial IntelligenceMid-level Full TimeCustomer Support Center, United States1d ago
-
Senior Software Manager, Agentic AI USD 272K-431KAgent systems | Agentic Frameworks | Data Structures | Deep learning | Enterprise SoftwareBenefits | EquitySenior-level Full TimeUS, CA, Santa Clara, United States1d ago
-
AI/ML Deployment and Enablement Leader USD 170K-200KAI Services | API Development | AWS AI | AWS AI Services | Agentic AI401k matching | Adoption Assistance | Disability insurance | Healthcare (Medical Dental Vision) | Life insuranceSenior-level Full TimeMinneapolis, MN, United States1d ago
-
Artificial Intelligence | Checkpointing | Cloud Computing | Data Ingestion | Data PreprocessingSenior-level Full TimeUS, CA, Santa Clara, United States2d ago
-
Bayesian Modeling | Classical Test Theory | Cohen Kappa | Computational Linguistics | Data PipelinesSenior-level Full TimeMountain View, CA, USA3d ago
-
Apache Iceberg | Apache Spark | Database Internals | Distributed Systems | PostgreSQLSenior-level Full TimeSunnyvale, CA, USA3d ago
-
Manager, Data Engineering USD 129K-177KAgile | Change Management | Cloud Cost Optimization | Cost Optimization | Cybersecurity401k retirement savings | Life insurance | Long-term disability | Medical/Dental/Vision insurance | Occasional travel for meetingsMid-level Full TimeWork at Home - Kentucky, United … R3d ago
-
Group Product Manager - Quantum Networking USD 145K-191KAPI Design | Budgeting | Cloud Computing | Control Plane | Cross-functional401k matching | Adoption leave | Dental insurance | Home Technology Stipend | Legal insuranceSenior-level Full TimeBothell, Washington, United States4d ago
-
Sr. Manager, Senior Data Engineer - Paid Marketing USD 96K-209KAPIs | AWS | Automation | Azure | Cloud Migration401k plan | Childcare discounts | Commuter benefits | Discounts at Marriott properties | Educational assistanceSenior-level Full TimeBethesda, MD, United States4d ago
-
Engineering Manager - Machine Learning USD 151K-203KAgent Orchestration | BigQuery | CUDA | Distributed Systems | DockerAnnual bonus | Comprehensive benefits package | Equity compensation | Hybrid work scheduleMid-level Full TimeSalt Lake City, Utah4d ago
-
Senior Product Manager - AI Platform USD 104K-170KAI strategy | API Design | API Versioning | Agile | AlertingSenior-level Full TimeUS Remote R4d ago
-
Lead AI Operations Program Manager USD 163K-220KAI Agents | Artificial Intelligence | Automation | Bias Mitigation | Chain managementTravelSenior-level Full TimeSunnyvale, CA | Redmond, WA4d ago
-
AI Engagement Manager USD 144K-200KAPI Integration | Account Management | Business reviews | Cloud infrastructure | Customer SuccessMid-level Full TimeSan Francisco4d ago
-
Product Manager, Agentic AI Solutions USD 138K-257KAPI | Access Management | Agentic Systems | Artificial Intelligence | Auditability401k match | Annual equity awards (eligibility) | Comprehensive benefits package | Hybrid work 3 days per week in office | Paid time offMid-level Full TimeCambridge (USA), United States4d ago
-
Benchmarking | CUDA | Data Designer | Data Engineering | Data PipelinesMid-level Full TimeUS, CA, Santa Clara, United States4d ago