Inference Engineer - Acceleration
Tasks
- Analyze token cost
- Apply quality regression gates
- Drive low precision MoE serving
- Implement prefill decode split
- Instrument inference stack
- Manage KV cache hierarchy
- Optimize throughput latency uptime
- Tune scheduling and admission control
Perks/Benefits
- Commuting subsidy
- Learning and development budget
- Offsites and team events
- Pension plan
- Vacation days
Skills/Tech-stack
Admission control | CUDA | Cutlass | FlashAttention | KV cache | Long Context | Long context attention | MOE | Nsight | Quantization | RDMA | SGLang | Scheduling | TensorRT-LLM | Triton | VLLM
Education
N/A
Roles
Related jobs
-
AWS | Azure | Batching | CUDA | Deep learningCareer development opportunities | Flexible remote work | International work environment | Technical ownership and autonomy | Work-life balanceMid-level Full TimeSwitzerland6d ago
-
Embedded Software Engineer (m/w/d) CHF 65K-88KC++ | CI/CD | Embedded Systems | Hardware-in-the-loop | Machine architectureMid-level Full TimeSargans, Switzerland14d ago
-
Robotics Platform Jetson Integration Engineer CHF 128K-192KC++ | CUDA | Continuous Deployment | Continuous integration | DeepStreamIn-person work requirementSenior-level Full TimeZürich19d ago
-
ML Infra Engineer CHF 92K-130KAWS | Ansible | CI | CI/CD | CUDABias for action | Career growth | Collaborative team | On-site roleMid-level Full TimeZürich, Zurich, Switzerland30d ago
-
Senior ML Engineer (Evaluation) CHF 128K-192KAlerting | Artifact versioning | CI Code Review | CI/CD | CUDAAutonomy | Commuting subsidy | Learning and development budget | Offsites and team events | Pension planSenior-level Full TimeZürich, Switzerland1mo ago