Research Scientist - Distributed Machine Learning
Tasks
- Analyze numeric stability and accuracy versus speed
- Apply kernel fusion communication tuning and memory optimization
- Build and scale distributed pretraining frameworks
- Build logging metrics and experiment tracking tools
- Convert prototypes into CUDA and Triton kernels
- Create launch scripts resilient checkpoints and job monitoring
- Design ablation studies and statistical tests
- Implement custom gradients and tests
- Implement distributed debugging for gradient synchronization and collective operations
- Lead mixed precision training bf16 and fp8
- Mentor interns and junior engineers through code reviews and design docs
- Productionize PyTorch and JAX pretraining stack
- Prototype optimizers and attention methods
- Set up DeepSpeed FSDP and Megatron-LM across multi node GPU clusters
Perks/Benefits
- 401k
- Dental insurance
- Disability insurance
- Employee assistance program
- Health insurance
- Life insurance
- Paid Holidays
- Paid parental leave
- Paid sick leave
- Paid time off
- Visa sponsorship
- Vision insurance
Skills/Tech-stack
BF16 | CUDA | CUDA kernels | DeepSpeed | Distributed Training | Experiment tracking | FP8 | FSDP | Flax | GLOO | JAX | Kernel Fusion | Kubernetes | Logging | Megatron | Megatron-LM | Metrics | Mixed Precision | Mixed-precision training | NCCL | NumPy | Numerical Stability | PyTorch | Ray | Slurm | Triton
Education
Bachelor of Engineering | Bachelor of Science | Master of Science | PhD
Regions
Countries
States
Cities
Related jobs
-
Lead AI Engineer (AI Systems & Automation) USD 130K-260KAlerting | Anthropic API | Automation | Distributed Systems | DockerFully remote | Global Engineering Organization | High ownership culture | Learning and development budget | Modern engineering practicesSenior-level Full TimeUnited States R1d ago
-
AI Engineer USD 200K-250KAWS | Automated testing | CI/CD | Deployment Pipelines | Embedding Models401k match | Frequent In Person Collaboration | Generous benefitsSenior-level Full TimeNew York1d ago
-
AI Research Scientist USD 240K-350KBenchmarking | Convolutional Neural Networks | Diffusion Models | Distributed Training | Federated Learning401k match | Continuing education support | Equity options | Flexible time off | Free parkingMid-level Full TimeAustin, TX1d ago
-
Member of the Technical Staff - Machine Learning USD 350K-400KBigQuery | Computer Vision | Explore Exploit Tradeoff | Explore/Exploit | GPU memorySenior-level Full TimeSan Francisco HQ1d ago
-
Senior Quantum Applications Engineer - QEC USD 119K-258KCUDA-Q | Decoder algorithms | Docker | End to End | End-to-End TestingSenior-level Full TimeNew Haven, CT1d ago
-
Research Scientist - NLP USD 137K-258KAlgorithm Design | Data Processing | Deep learning | Language Modeling | Language Processing401k plan | Disability insurance | Employee assistance program | Life insurance | Medical/Dental/Vision insuranceMid-level Full TimeSunnyvale, CA1d ago
-
Machine Learning Infrastructure Engineer USD 216K-330KCUDA | DeepSpeed | Distributed Systems | Distributed Training | FSDPMid-level Full TimeSunnyvale, CA1d ago
-
Data Engineer USD 120K-175KAPIs | AWS | Apache Spark | Data Pipelines | Data Processing401k plan | Dental insurance | Disability insurance | Employee assistance program | HolidaysMid-level Full TimeSunnyvale, CA1d ago
-
Distributed Machine Learning Engineer USD 200K-304KBenchmarking | CUDA | Debugging | Deep learning | Distributed Computing401k plan | Dental insurance | Disability insurance | Employee assistance program | Health insuranceEntry-level Full TimeSunnyvale, CA1d ago
-
ML Engineer, Generative Video USD 175K-275KAutoregressive Generation | CUDA | Debugging | Deep learning | Diffusion Models401k match | Catered lunch | Commuter benefits | Dinner stipend | Generous PTO policyMid-level Full TimeUnion Square, New York City1d ago
-
Associate Data Scientist USD 141K-202KAPIs | BigQuery | Cloud Run | Cloud Storage | Data GovernanceMid-level Full TimeUS - NJ - BIRLASOFT OFFICE, …1d ago
-
AI Research Scientist, Text Data Research - MSL FAIR USD 147K-208KAgentic data | Apache Hive | Apache Spark | Data Curation | Data Scaling LawsEntry-level Full TimeMenlo Park, CA2d ago
-
Senior Business Data Scientist, AI/ML, Google Cloud USD 163K-237KAI Agents | Deep learning | Generative AI | Hugging Face | Language ModelsSenior-level Full TimeSunnyvale, CA, USA2d ago
-
Senior Software Engineer, Google Cloud Storage USD 174K-253KAccess Control | As-a-Service | C++ | Chaos Testing | Cloud FunctionsSenior-level Full TimeRaleigh, NC, USA; Durham, NC, USA2d ago
-
Senior-level Full TimeHouston, TX, US2d ago
-
AWS | Alerting | Black box monitoring | Black-box | CI/CDBackup childcare | Financial coaching | Mental health support | Mentoring | Onsite health and wellness centersSenior-level Full TimeJersey City, NJ, United States2d ago
-
AI Scientist, Computational Protein Design USD 120K-240KArtificial Intelligence | Deep learning | Distributed Training | GPU Computing | Generative AIMid-level Full TimeSouth San Francisco, California, United States2d ago
-
Machine Learning Engineer USD 101K-224KAgentic Workflows | Embeddings | Fine Tuning | Hugging Face | LLM orchestrationSenior-level Full TimeBellevue, WA, US, 980042d ago
-
Machine Learning Operations Engineer (MLOps) USD 101K-224KAlerting | Azure | CI/CD | Docker | Inference OptimizationSenior-level Full TimeBellevue, WA, US, 980042d ago
-
Senior AI Engineer USD 145K-181KAWS | Alerting | Azure | Docker | Embeddings401k match | Commuter benefits | Dental | Healthcare | Remote friendly workplaceSenior-level Full Time3750 Market Street, Philadelphia, PA, United … R2d ago
-
Senior-level Full TimeFort Meade, MD2d ago
-
Senior Machine Learning Engineer USD 180K-250KComputer Vision | Data Pipelines | Data labeling | Deep learning | Embedding Models100 percent remote | 13 paid holidays | 401k plan | Dental insurance | Medical insuranceSenior-level Full TimeRemote USA R2d ago
-
Staff Machine Learning Engineer USD 278K-330KAutomatic Speech Recognition | Cloud Computing | Data Augmentation | Data Pipelines | Data PreprocessingSenior-level Full TimeMountain View, CA2d ago
-
Senior Machine Learning Engineer USD 230K-265KAutomatic Speech Recognition | Cloud Computing | Data Augmentation | Data Preprocessing | Decoding strategiesSenior-level Full TimeMountain View, CA2d ago
-
Senior Machine Learning Engineer USD 208K-263KAccelerated inference | Active Learning | BEV | Birds Eye View | C++401k | Commuter benefits | Dental insurance | Disability insurance | EquitySenior-level Full TimeSan Francisco, CA2d ago