Find jobs in AI/ML, Data Science and Big Data
23 results
for Flash Attention
(Skill/Tech stack)
-
AI Performance Optimization Engineer USD 100K-150KC++ | Communication Primitives | Continuous batching | Distributed Training | FSDPCareer growth | Health benefits | Mentorship | Remote workMid-level Full TimeUnited States - Remote R1d ago
-
AI Performance Optimization Engineer USD 100K-150KBenchmarking | C++ | Compiler optimization | Continuous batching | CutlassCareer growth | Health benefits | Mentorship | Remote workMid-level Full TimeUnited States - Remote R5d ago
-
Sr. Machine Learning Engineer, ML Models CAD 100K-500KDebugging | Deep learning | Flash Attention | Kernel Fusion | Machine LearningHybrid work | Remote optionSenior-level Full TimeToronto, Ontario, Canada7d ago
-
Senior-level Full TimeMilpitas, CA, United States12d ago
-
AI/ML ASIC Architect USD 163K-249KARM | ASIC architecture | AXI interconnect | Area Optimization | Attention MechanismsSenior-level Full TimeMilpitas, CA, United States12d ago
-
Compute Shaders | Diffusion Models | Distributed inference | Edge Computing | Expert parallelism100 percent remoteSenior-level Full TimeRemote job R14d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 201K-332KComputer Vision | Diffusion Models | Edge Computing | Expert parallelism | Flash AttentionRemote workSenior-level Full TimeRemote job R14d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 201K-332KCompute Shaders | Diffusion Models | Distributed inference | Edge Computing | Expert parallelismEnglish communication support | Remote workSenior-level Full TimeRemote job R14d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 201K-332KDiffusion Models | Distributed Inference Systems | Distributed inference | Edge Computing | Expert parallelismRemote workSenior-level Full TimeRemote job R14d ago
-
Diffusion Models | Distributed Inference Systems | Distributed inference | Expert parallelism | Flash Attention100 percent remote | Worldwide remoteSenior-level Full TimeRemote job R14d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KCompute Shaders | Diffusion Models | Distributed Inference Systems | Distributed inference | Edge ComputingRemote workSenior-level Full TimeRemote job R16d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KComputer Vision | Deep learning | Diffusion Models | Distributed inference | Edge ComputingRemote workSenior-level Full TimeRemote job R16d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KCompute Shaders | Diffusion Models | Distributed Inference Systems | Distributed inference | Edge ComputingRemote workSenior-level Full TimeRemote job R16d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KDiffusion Models | Distributed Inference Systems | Distributed inference | Edge Computing | Expert parallelismCareer growth | Collaborative research environment | English communication support | Remote work opportunitySenior-level Full TimeRemote job R16d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KCompute Shaders | Custom Compute Shaders | Data Pipelines | Diffusion Models | Distributed Inference SystemsRemote workSenior-level Full TimeRemote job R16d ago
-
Machine Learning Engineer 5 INR 2500K-4500K3D Reconstruction | Adapters | CLIP | Computer Vision | ControlNetSenior-level Full TimeBangalore, India R21d ago
-
Machine Learning Software Engineer II USD 131K-177KAWS CloudFormation | AWS ECS | AWS Lambda | CD pipelines | CI/CDMid-level Full TimeRemote, United States R28d ago
-
AI Software Engineer Intern CNY 38K-50KCUDA | Distributed Systems | FP8 | FasterTransformer | Flash AttentionOn-site workEntry-level Full Time InternshipCHN - Minhang, China1mo ago
-
AI Software Engineer Intern CNY 38K-50KCUDA | Compiler optimization | Continuous batching | Distributed Systems | Dynamic batchingOn-site workEntry-level Full Time InternshipCHN - Minhang, China1mo ago
-
Applied AI Frameworks Engineer INR 3125K-5000KAttention Mechanisms | C++ | CUDA | Co-design | Computer ArchitectureCross geo collaboration | On-site work | Open source collaborationSenior-level Full TimeIND - Bangalore, India1mo ago
-
Member of technical staff (Inference) - Paris EUR 80K-120KC++ | CUDA | Caching | Continuous batching | Distributed ComputingCareer development | Continuous learning | Hybrid work | Professional growthSenior-level Full TimeParis1mo ago
-
Senior Performance Analyst, Inference USD 175K-260KAttention Mechanism | CUDA | Flash Attention | GPU kernel optimization | KV cacheSenior-level Full TimeSunnyvale, CA1mo ago
-
Member of technical staff (Inference) - London GBP 230K-325KC++ | CUDA | CUDA kernel | CUDA kernel programming | CachingContinuous learning | Hybrid work | Professional developmentSenior-level Full TimeLondon1mo ago