Principal High-Performance LLM Training Engineer
US, CA, Santa Clara, United States
USD 272K-431K Senior-level Full Time
Tasks
- Build performance models workload characterizations and simulation methodologies
- Develop production quality software tools and benchmarks
- Drive workloads to speed of light performance by removing bottlenecks
- Lead end to end performance analysis and optimization of LLM training workloads
- Mentor engineers and establish best practices for performance analysis
- Serve as technical authority for AI training performance
- Translate workload insights into hardware and software recommendations
Perks/Benefits
Skills/Tech-stack
Activation checkpointing | Benchmarking | CUDA | Communication and Computation Overlap | Compilers | Data parallelism | Distributed Training | GPU Architecture | High Performance | High-Performance Computing | JAX | Mixed Precision | Mixed-precision training | NEMO | Performance Computing | Performance Modeling | Pipeline parallelism | Profiling | PyTorch | Runtimes | Tensor Parallelism | Transformer Models
Education
Regions
Countries
States
Cities
Related jobs
-
AWS | Alteryx | Amazon SageMaker | Azure | Azure DataMid-level Full TimeNew York, NY, United States3h ago
-
Strategic Intelligence & Advanced Analytics Engineer USD 108K-136KAnomaly Detection | Artificial Intelligence | Azure | Data Pipelines | Data QualityPaid parental leave | Paid time off | Public service loan forgiveness | Tuition reimbursement | Wellness programsMid-level Full TimeTexas-Dallas-5323 Harry Hines Blvd4h ago
-
Senior Software Engineer, Database Internals, AlloyDB USD 174K-252KC# | C++ | Code optimization | Concurrency Control | Database InternalsEntry-level Full TimeSunnyvale, CA, USA5h ago
-
Senior-level Full TimeRaleigh, NC, US16h ago
-
Entry-level Full TimeUnited States - Remote R16h ago
-
CI/CD | Docker | Drift Detection | Embeddings | Experiment trackingMentorship | Remote workSenior-level Full TimeUnited States - Remote R16h ago
-
Senior Data Engineer USD 82K-172KAWS | Apache Spark | Artificial Intelligence | BERT | BitbucketContinuing education | Family support benefits | Flexible time off | Healthcare benefits | Learning resourcesSenior-level Full Time606 KING OF PRUSSIA PA, United …16h ago
-
Staff AI/ML Engineer USD 108K-227KAWS | Adversarial Networks | Bitbucket | CUDA | CupyFlexible time off | Learning resources | MentoringSenior-level Full Time606 KING OF PRUSSIA PA, United …16h ago
-
Staff AI/ML Engineer (LLMs) USD 108K-227KAWS Bedrock | Agentic AI | Arize Phoenix | Bitbucket | CUDAFlexible time off | Learning and development resourcesSenior-level Full Time606 KING OF PRUSSIA PA, United …16h ago
-
Machine Learning Engineer II USD 131K-184KAzure | Batch inference | Data Pipelines | Databricks | Distributed SystemsContinuous learning | Flexible ways of working | Growth mindset cultureMid-level Full TimeUSA TX Houston Hybrid, United States R16h ago
-
Senior, Data Scientist (Machine Learning Engineer) USD 110K-220KAccessibility guidelines | Airflow | CI/CD | Computer Vision | Container OrchestrationSenior-level Full Time(USA) Crossman Respect Building CA SUNNYVALE …16h ago
-
Principal Agentic AI Engineer USD 274K-338KAgent Orchestration | Auditability | Benchmarking | Confidence scoring | Distributed SystemsContinuing education support | Dental insurance | Flexible vacation policy | Health insurance | Paid parental leaveSenior-level Full Timesan francisconew york R18h ago
-
Embedded Software Engineer 3 USD 105K-115KAUTOSAR | CAN | CANape | Classic Platform | Compilers401k matching | Dental insurance | Disability coverage | Life insurance | Medical insuranceSenior-level Full TimeChillicothe, IL19h ago
-
Senior Embedded Software Engineer USD 145K-220KAgile | Buildroot | C# | C++ | CI/CD401k match | Casual dress code | Dental benefits | FSA | Free daily lunchSenior-level Full TimeSan Diego, California, United States20h ago
-
AI Engineer USD 115K-192KAWS | Azure | BigQuery | CI/CD | Cloud ComputingChild care assistance | Employee resource groups | Flexible work schedule | Medical dental prescription coverage | Paid HolidaysMid-level Full TimeDearborn, MI, United States20h ago
-
AI Full Stack Developer & Architect USD 130K-180KCloud Run | Containerization | JavaScript | Kubernetes | MLOpsSenior-level Contract Full TimeSan Jose, CA, United States21h ago
-
Staff Machine Learning Engineer, Voice AI USD 220K-280KAudio codecs | Audio signal processing | Batching | CUDA | Deep learningHealth insurance | Startup equitySenior-level Full TimeSan Francisco22h ago
-
Senior Machine Learning Engineer USD 150K-210KC plus plus | C# | Computer Vision | Data Pipelines | Data collectionSenior-level Full TimeSunnyvale, CA, United States22h ago
-
Staff Software Engineer - Computer Vision USD 160K-210K3D Gaussian Splatting | AWS | Azure | Bundle adjustment | C++401k match | Commuter benefits | Dental insurance | Flexible work | Health insuranceSenior-level Full TimeRedwood City, CA22h ago
-
Applied ML and Generative AI Leader - Executive Director USD 175K-210KChain-of-Thought | Chain-of-Thought prompting | Deep learning | Docker | Generative AIBackup childcare | Financial coaching | Health care | Mental health support | On-site health and wellness centersSenior-level Full TimeJersey City, NJ, United States22h ago
-
AI/ML Engineer (Active TS/SCI ) USD 99K-225KConvolutional Neural Network | Data Versioning | DevSecOps | Faster R-CNN | Feature Engineering401k match | Disability insurance | Full remote flexibility | Home office & equipment plan | Life insuranceMid-level Full TimeDayton, OH22h ago
-
Machine Learning Engineer USD 140K-220KAWS SageMaker | Airflow | Apache Flink | Apache Spark | Azure Machine Learning401k plan | Comprehensive medical and dental coverage | Flexible hybrid work schedule | Flexible time off | Life and disability benefitsMid-level Full TimeSunnyvale, CA23h ago
-
AI Engineer - FDE (Forward Deployed Engineer) USD 152K-210KAWS | Agent systems | Apache Spark | Azure | Cloud platformRemote work | Travel once every 4 to 8 weeksSenior-level Full TimeUnited States1d ago
-
Entry-level Full TimeCary, NC, United States1d ago
-
Machine Learning Engineer USD 150K-215KData Augmentation | Deep learning | Isaac | Loss Functions | Medical ImagingMid-level Full TimeSan Francisco (hybrid) R1d ago