aijobs.net

Sign in

EDB-IPP Project: Advancing GPU Optimization for Large Language Models

Crimson House Singapore

SGD 60K-120K (estimate) Mid-level Full Time

Apply Save

Found 24d ago

Tasks

Build high throughput low latency inference for state space models
Create model serving strategies for production LLM applications
Design token aware load balanced scheduling algorithms
Develop and train PyTorch models
Develop hardware aware compiler kernel and data layout optimizations
Implement memory aware training and serving techniques
Optimize GPU clusters for LLM training and serving
Research scalable hybrid parallelism for LLMs

Perks/Benefits

Skills/Tech-stack

Continuous batching | Data parallelism | Deep learning | Distributed Training | Dynamic Memory | Dynamic memory management | Expert parallelism | GPU Optimization | Language Processing | Machine Learning | Memory Efficient Checkpointing | Memory Management | Mixture of Experts | Model Offloading | Model Parallelism | Natural Language | Natural Language Processing | Pipeline parallelism | PyTorch | Quantization | Speculative decoding | Token Scheduling

Education

Roles

PhD Student | Student

Regions

Countries

Cities

Apply Save

Language: en Views: 2

Clicks: 0

Saves: 0

Related jobs

No jobs found.