Find jobs in AI/ML, Data Science and Big Data
15 results
for Paged Attention
(Skill/Tech stack)
-
AI Performance Optimization Engineer USD 100K-150KBenchmarking | C++ | Continuous batching | Cutlass | DeepSpeedCareer growth potential | Full-time benefits | H1B transfer support for qualified candidates | Long-term engagement | Remote workMid-level Full TimeUnited States - Remote R1d ago
-
AI Performance Optimization Engineer USD 100K-150KBenchmarking | C++ | Compiler optimization | Continuous batching | Deep learningMid-level Full TimeUnited States - Remote R1d ago
-
AI Performance Optimization Engineer USD 100K-150KC++ | Communication Primitives | Continuous batching | Distributed Training | FSDPCareer growth | Health benefits | Mentorship | Remote workMid-level Full TimeUnited States - Remote R1d ago
-
AI Performance Optimization Engineer USD 100K-150KC++ | Continuous batching | Custom Kernel | Custom kernel development | Cutlass100 percent remote | Benefits package | Full-time employmentMid-level Full TimeUnited States - Remote R4d ago
-
AI Performance Optimization Engineer USD 100K-150KBenchmarking | C++ | Continuous batching | Data loading | Data loading optimizationMid-level Full TimeUnited States - Remote R4d ago
-
AI Performance Optimization Engineer USD 100K-150KC++ | CUDA | Continuous batching | Cutlass | DeepSpeedMid-level Full TimeUnited States - Remote R4d ago
-
AI Performance Optimization Engineer USD 100K-150KBenchmarking | C++ | CUDA | Continuous batching | CutlassBenefits package | Remote workMid-level Full TimeUnited States - Remote R5d ago
-
AI Performance Optimization Engineer USD 100K-150KBenchmarking | C++ | Compiler optimization | Continuous batching | CutlassCareer growth | Health benefits | Mentorship | Remote workMid-level Full TimeUnited States - Remote R5d ago
-
Senior ML Engineer - Kimchi (LLM Inference Optimization GBP 110K-141KActivations quantization | Amazon Web Services | ArgoCD | CUDA | CUDA-adjacent toolingEquipment budget | Equity options | Extra days off | Hackathon | Learning budgetSenior-level Full TimeUnited Kingdom R8d ago
-
Senior ML Engineer - Kimchi (LLM Inference Optimization) PLN 292K-400KAWS | ArgoCD | Azure | CUDA | Chunked prefillAnnual hackathon | Conference access | Equipment budget | Equity options | Extra days offSenior-level Full TimePoland R8d ago
-
AWS | Argo CD | ArgoCD | Azure | CUDAConference access | Equipment budget | Equity options | Extra days off | Flexible work hoursSenior-level Full TimeFrance R8d ago
-
AI Software Engineer Intern CNY 38K-50KCUDA | Distributed Systems | FP8 | FasterTransformer | Flash AttentionOn-site workEntry-level Full Time InternshipCHN - Minhang, China1mo ago
-
AI Software Engineer Intern CNY 38K-50KCUDA | Compiler optimization | Continuous batching | Distributed Systems | Dynamic batchingOn-site workEntry-level Full Time InternshipCHN - Minhang, China1mo ago
-
Member of technical staff (Inference) - Paris EUR 80K-120KC++ | CUDA | Caching | Continuous batching | Distributed ComputingCareer development | Continuous learning | Hybrid work | Professional growthSenior-level Full TimeParis1mo ago
-
Member of technical staff (Inference) - London GBP 230K-325KC++ | CUDA | CUDA kernel | CUDA kernel programming | CachingContinuous learning | Hybrid work | Professional developmentSenior-level Full TimeLondon1mo ago