Find jobs in AI/ML, Data Science and Big Data
5 results
for Gradient Sharding
(Skill/Tech stack)
-
Auto-regressive models | Custom Kernels | Data Engineering | DeepSpeed | Distributed TrainingSenior-level Full TimeLondon, UK1d ago
-
Accelerators | Autoregressive Transformers | Custom Kernels | Data Engineering | DeepSpeedSenior-level Full TimeMountain View, CA, USA: San Francisco, …1d ago
-
Activation checkpointing | Attention Mechanisms | CUDA | Collective operations | Data parallelismSenior-level Full TimeMountain View, California; San Francisco, California19d ago
-
Data Engineering | Distributed Training | Evaluation | Gradient Sharding | JAXCompany benefits | Discretionary annual bonus | Equity incentive planSenior-level Full TimeMountain View, CA, USA1mo ago
-
Distributed Training | Gradient Sharding | JAX | Machine Learning | Model OptimizationBonus program | Equity incentive plan | Health and wellness benefits | Retirement benefitsSenior-level Full TimeMountain View, CA, USA1mo ago