EDB-IPP Project: Advancing GPU Optimization for Large Language Models
Tasks
- Build high throughput low latency inference for state space models
- Create model serving strategies for production LLM applications
- Design token aware load balanced scheduling algorithms
- Develop and train PyTorch models
- Develop hardware aware compiler kernel and data layout optimizations
- Implement memory aware training and serving techniques
- Optimize GPU clusters for LLM training and serving
- Research scalable hybrid parallelism for LLMs
Perks/Benefits
- Computational resources access
- Full sponsorship
- Hired by Rakuten Asia after completion
- Research exchanges
Skills/Tech-stack
Continuous batching | Data parallelism | Deep learning | Distributed Training | Dynamic Memory | Dynamic memory management | Expert parallelism | GPU Optimization | Language Processing | Machine Learning | Memory Efficient Checkpointing | Memory Management | Mixture of Experts | Model Offloading | Model Parallelism | Natural Language | Natural Language Processing | Pipeline parallelism | PyTorch | Quantization | Speculative decoding | Token Scheduling
Education
Roles
Related jobs
- No jobs found.