Find jobs in AI/ML, Data Science and Big Data
3 results
for Gradient Sharding
(Skill/Tech stack)
-
Auto-regressive models | Custom Kernels | Data Engineering | DeepSpeed | Distributed TrainingSenior-level Full TimeLondon, UK21d ago
-
Accelerators | Autoregressive Transformers | Custom Kernels | Data Engineering | DeepSpeedSenior-level Full TimeMountain View, CA, USA: San Francisco, …21d ago
-
Activation checkpointing | Attention Mechanisms | CUDA | Collective operations | Data parallelismSenior-level Full TimeMountain View, California; San Francisco, California1mo ago