Find jobs in AI/ML, Data Science and Big Data
14 results
for Flash Attention
(Skill/Tech stack)
-
Machine Learning Software Engineer II USD 131K-177KAWS CloudFormation | AWS ECS | AWS Lambda | CD pipelines | CI/CDMid-level Full TimeRemote, United States R8d ago
-
Applied AI (Frameworks) Engineer INR 2040K-4725KC++ | CUDA | Compiler optimization | Computer Architecture | ConvolutionOn-site workSenior-level Full TimeIND - Bangalore, India9d ago
-
AI Software Engineer Intern CNY 38K-50KCUDA | Distributed Systems | FP8 | FasterTransformer | Flash AttentionOn-site workEntry-level Full Time InternshipCHN - Minhang, China14d ago
-
AI Software Engineer Intern CNY 38K-50KCUDA | Compiler optimization | Continuous batching | Distributed Systems | Dynamic batchingOn-site workEntry-level Full Time InternshipCHN - Minhang, China14d ago
-
Applied AI Frameworks Engineer INR 3125K-5000KAttention Mechanisms | C++ | CUDA | Co-design | Computer ArchitectureCross geo collaboration | On-site work | Open source collaborationSenior-level Full TimeIND - Bangalore, India16d ago
-
Applied AI Frameworks Engineer INR 3125K-5000KC++ | CUDA | Compiler optimization | Computer Architecture | ConvolutionOn-site workSenior-level Full TimeIND - Bangalore, India16d ago
-
LLM Inference Performance & Evals Engineer CAD 142K-195KAttention Mechanisms | C# | C++ | Compiler optimization | DebuggingJob stability | Open source collaboration | Research publicationsMid-level Full TimeToronto, Ontario, Canada27d ago
-
Member of technical staff (Inference) - Paris EUR 80K-120KC++ | CUDA | Caching | Continuous batching | Distributed ComputingCareer development | Continuous learning | Hybrid work | Professional growthSenior-level Full TimeParis28d ago
-
Senior Performance Analyst, Inference USD 175K-260KAttention Mechanism | CUDA | Flash Attention | GPU kernel optimization | KV cacheSenior-level Full TimeSunnyvale, CA29d ago
-
Member of technical staff (Inference) - London GBP 230K-325KC++ | CUDA | CUDA kernel | CUDA kernel programming | CachingContinuous learning | Hybrid work | Professional developmentSenior-level Full TimeLondon1mo ago
-
Principal Machine Learning Engineer USD 32K-32KCI/CD | Cloud Platforms | Containerization | Distributed Training | DockerBirthday celebrations | Company lunches | Dental insurance | Flexible working hours | Generous holiday allowanceSenior-level Full TimeLondon, England, United Kingdom1mo ago
-
Tech Lead Manager- MLRE, ML Systems USD 264K-331KCUDA | Distributed Systems | Flash Attention | GRPO | Human FeedbackCommuter stipend | Generous PTO | Health, dental and vision coverage | Learning and development stipend | Retirement benefitsSenior-level Full TimeSan Francisco, CA; New York, NY1mo ago
-
Senior Machine Learning Engineer USD 32K-32KDistributed Training | Dynamic batching | Flash Attention | Inference Optimization | Machine Learning401k matching | Adoption Assistance | Birthday celebrations | Company lunches | Dental coverageSenior-level Full TimeLondon, England, United Kingdom1mo ago
-
Performance Engineer, GPU USD 280K-850KBandwidth Optimization | CUDA | Cluster Orchestration | Collective communication | Custom OperatorsFlexible working hours | Generous vacation | Hybrid work 25 percent | Optional equity donation matching | Parental leaveSenior-level Full TimeSan Francisco, CA | New York …1mo ago