Principal High-Performance LLM Training Engineer
US, CA, Santa Clara, United States
USD 272K-431K Senior-level Full Time
Tasks
- Build performance models workload characterizations and simulation methodologies
- Develop production quality software tools and benchmarks
- Drive workloads to speed of light performance by removing bottlenecks
- Lead end to end performance analysis and optimization of LLM training workloads
- Mentor engineers and establish best practices for performance analysis
- Serve as technical authority for AI training performance
- Translate workload insights into hardware and software recommendations
Perks/Benefits
Skills/Tech-stack
Activation checkpointing | Benchmarking | CUDA | Communication and Computation Overlap | Compilers | Data parallelism | Distributed Training | GPU Architecture | High Performance | High-Performance Computing | JAX | Mixed Precision | Mixed-precision training | NEMO | Performance Computing | Performance Modeling | Pipeline parallelism | Profiling | PyTorch | Runtimes | Tensor Parallelism | Transformer Models
Education
Regions
Countries
States
Cities
Related jobs
-
AWS | Alteryx | Amazon SageMaker | Azure | Azure DataMid-level Full TimeNew York, NY, United States3h ago
-
Strategic Intelligence & Advanced Analytics Engineer USD 108K-136KAnomaly Detection | Artificial Intelligence | Azure | Data Pipelines | Data QualityPaid parental leave | Paid time off | Public service loan forgiveness | Tuition reimbursement | Wellness programsMid-level Full TimeTexas-Dallas-5323 Harry Hines Blvd3h ago
-
Senior Software Engineer, Database Internals, AlloyDB USD 174K-252KC# | C++ | Code optimization | Concurrency Control | Database InternalsEntry-level Full TimeSunnyvale, CA, USA4h ago
-
Senior-level Full TimeRaleigh, NC, US15h ago
-
Entry-level Full TimeUnited States - Remote R15h ago
-
CI/CD | Docker | Drift Detection | Embeddings | Experiment trackingMentorship | Remote workSenior-level Full TimeUnited States - Remote R15h ago
-
Principal Agentic AI Engineer USD 274K-338KAgent Orchestration | Auditability | Benchmarking | Confidence scoring | Distributed SystemsContinuing education support | Dental insurance | Flexible vacation policy | Health insurance | Paid parental leaveSenior-level Full Timesan francisconew york R17h ago
-
Embedded Software Engineer 3 USD 105K-115KAUTOSAR | CAN | CANape | Classic Platform | Compilers401k matching | Dental insurance | Disability coverage | Life insurance | Medical insuranceSenior-level Full TimeChillicothe, IL18h ago
-
Senior Embedded Software Engineer USD 145K-220KAgile | Buildroot | C# | C++ | CI/CD401k match | Casual dress code | Dental benefits | FSA | Free daily lunchSenior-level Full TimeSan Diego, California, United States19h ago
-
AI Engineer USD 115K-192KAWS | Azure | BigQuery | CI/CD | Cloud ComputingChild care assistance | Employee resource groups | Flexible work schedule | Medical dental prescription coverage | Paid HolidaysMid-level Full TimeDearborn, MI, United States19h ago
-
AI Full Stack Developer & Architect USD 130K-180KCloud Run | Containerization | JavaScript | Kubernetes | MLOpsSenior-level Contract Full TimeSan Jose, CA, United States21h ago
-
Staff Machine Learning Engineer, Voice AI USD 220K-280KAudio codecs | Audio signal processing | Batching | CUDA | Deep learningHealth insurance | Startup equitySenior-level Full TimeSan Francisco21h ago
-
Senior Machine Learning Engineer USD 150K-210KC plus plus | C# | Computer Vision | Data Pipelines | Data collectionSenior-level Full TimeSunnyvale, CA, United States21h ago
-
Staff Software Engineer - Computer Vision USD 160K-210K3D Gaussian Splatting | AWS | Azure | Bundle adjustment | C++401k match | Commuter benefits | Dental insurance | Flexible work | Health insuranceSenior-level Full TimeRedwood City, CA21h ago
-
Applied ML and Generative AI Leader - Executive Director USD 175K-210KChain-of-Thought | Chain-of-Thought prompting | Deep learning | Docker | Generative AIBackup childcare | Financial coaching | Health care | Mental health support | On-site health and wellness centersSenior-level Full TimeJersey City, NJ, United States21h ago
-
AI/ML Engineer (Active TS/SCI ) USD 99K-225KConvolutional Neural Network | Data Versioning | DevSecOps | Faster R-CNN | Feature Engineering401k match | Disability insurance | Full remote flexibility | Home office & equipment plan | Life insuranceMid-level Full TimeDayton, OH21h ago
-
Machine Learning Engineer USD 140K-220KAWS SageMaker | Airflow | Apache Flink | Apache Spark | Azure Machine Learning401k plan | Comprehensive medical and dental coverage | Flexible hybrid work schedule | Flexible time off | Life and disability benefitsMid-level Full TimeSunnyvale, CA22h ago
-
AI Engineer - FDE (Forward Deployed Engineer) USD 152K-210KAWS | Agent systems | Apache Spark | Azure | Cloud platformRemote work | Travel once every 4 to 8 weeksSenior-level Full TimeUnited States1d ago
-
Entry-level Full TimeCary, NC, United States1d ago
-
Machine Learning Engineer USD 150K-215KData Augmentation | Deep learning | Isaac | Loss Functions | Medical ImagingMid-level Full TimeSan Francisco (hybrid) R1d ago
-
Cloud Computing | Data Pipelines | Experimental Design | Java | Machine LearningCommunity volunteering | Internal mobility | Learning opportunities | Mentorship programs | Recognition programsMid-level Full TimeAustin, TX, United States1d ago
-
Applied ML and Generative AI Lead - Vice President USD 176K-215KAWS | Azure | Cloud Computing | Cloud platform | Deep learningSenior-level Full TimeJersey City, NJ, United States1d ago
-
Applied AI ML [Multiple Positions Available] USD 175K-215KBERT | C Sharp | Cloud Computing | Component analysis | Data MiningBackup childcare | Financial coaching | Health care coverage | Mental health support | On-site health and wellness centersMid-level Full TimeWilmington, DE, United States1d ago
-
AWS | Amazon S3 | Audio Machine Learning | Data pipeline | Data pipeline designSenior-level Full TimeManhattan Beach, California, United States1d ago
-
Apache Spark | Computer Vision | Data Curation | Data Pipelines | GroundingSenior-level Full TimeSunnyvale, CA | Bellevue, WA | …1d ago