Find jobs in AI/ML, Data Science and Big Data
13 results
for Flash Attention
(Skill/Tech stack)
-
Deep learning | Distributed Training | Flash Attention | Inference Optimization | Kernel FusionHybrid workSenior-level Full TimeToronto, Ontario, Canada14d ago
-
Applied Scientist 5 INR 2475K-4500K3D Reconstruction | Adapters | CLIP | Computer Vision | ControlNetSenior-level Full TimeBangalore, India R23d ago
-
Applied Scientist 5.5 INR 2475K-4500K3D Reconstruction | Adapters | CLIP | Computer Vision | ControlNetSenior-level Full TimeBangalore, India R23d ago
-
Senior-level Full TimeMilpitas, CA, United States1mo ago
-
AI/ML ASIC Architect USD 163K-249KARM | ASIC architecture | AXI interconnect | Area Optimization | Attention MechanismsSenior-level Full TimeMilpitas, CA, United States1mo ago
-
Compute Shaders | Diffusion Models | Distributed inference | Edge Computing | Expert parallelism100 percent remoteSenior-level Full TimeRemote job R1mo ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 201K-332KComputer Vision | Diffusion Models | Edge Computing | Expert parallelism | Flash AttentionRemote workSenior-level Full TimeRemote job R1mo ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 201K-332KCompute Shaders | Diffusion Models | Distributed inference | Edge Computing | Expert parallelismEnglish communication support | Remote workSenior-level Full TimeRemote job R1mo ago
-
Diffusion Models | Distributed Inference Systems | Distributed inference | Expert parallelism | Flash Attention100 percent remote | Worldwide remoteSenior-level Full TimeRemote job R1mo ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KCompute Shaders | Diffusion Models | Distributed Inference Systems | Distributed inference | Edge ComputingRemote workSenior-level Full TimeRemote job R1mo ago
-
Machine Learning Software Engineer II USD 131K-177KAWS CloudFormation | AWS ECS | AWS Lambda | CD pipelines | CI/CDMid-level Full TimeRemote, United States R1mo ago
-
AI Software Engineer Intern CNY 38K-50KCUDA | Distributed Systems | FP8 | FasterTransformer | Flash AttentionOn-site workEntry-level Full Time InternshipCHN - Minhang, China1mo ago
-
AI Software Engineer Intern CNY 38K-50KCUDA | Compiler optimization | Continuous batching | Distributed Systems | Dynamic batchingOn-site workEntry-level Full Time InternshipCHN - Minhang, China1mo ago