Find jobs in AI/ML, Data Science and Big Data
18 results
for FlashAttention
(Skill/Tech stack)
-
Apache TVM | C++ | CUDA | CuTile | FlashAttentionEmployee benefits | EquitySenior-level Full TimeUS, CA, Santa Clara, United States2d ago
-
Mid-level Full Time北京 R2d ago
-
Inference Engineer - Acceleration CHF 110K-160KAdmission control | CUDA | Cutlass | FlashAttention | KV cacheCommuting subsidy | Learning and development budget | Offsites and team events | Pension plan | Vacation daysMid-level Full TimeZürich, Switzerland2d ago
-
AI Performance Optimization Engineer USD 136K-258KC++ | Continuous batching | Deep learning | Distributed Systems | FSDPMid-level Full TimeUnited States - Remote R3d ago
-
AI Performance Optimization Engineer USD 136K-258KC++ | Cache optimization | Continuous batching | Cutlass | Deep learningMid-level Full TimeUnited States - Remote R4d ago
-
AI Performance Optimization Engineer USD 136K-258KAccess patterns | Benchmarking | C++ | Cache optimization | Compiler optimizationFull-time W2 employment | Health benefits | Remote workMid-level Full TimeUnited States - Remote R4d ago
-
AI Performance Optimization Engineer USD 159K-264KC++ | Continuous batching | Cutlass | Deep learning | DeepSpeedRemote workMid-level Full TimeUnited States - Remote R4d ago
-
AI Performance Optimization Engineer USD 136K-258KBenchmarking | C++ | Compiler optimization | Continuous batching | DebuggingMid-level Full TimeUnited States - Remote R4d ago
-
AI Performance Optimization Engineer USD 136K-258KAccess Optimization | Attention Optimization | Benchmarking | C++ | Compiler optimizationMid-level Full TimeUnited States - Remote R5d ago
-
Research Engineer, ML Systems (All Industry Levels) USD 225K-400KCUDA | CUDA kernels | Cloud | Cutlass | DeepSpeedMid-level Full TimeRedwood City, CA5d ago
-
AI Platform Engineer INR 1500K-2500KAutomated Evaluation | CI/CD | CUDA | Continuous Checkpointing | Continuous batchingMid-level Full TimeBangalore, India8d ago
-
Software Engineering Manager, LLM Training USD 170K-277KCUDA | Containerization | Context Parallelism | Data I/O | Data parallelismEntry-level Full TimeMountain View, CA, United States8d ago
-
Senior-level Full TimeChina, Shanghai1mo ago
-
AI Inference Engineer - Model Optimization & Deployment USD 205K-303KAccuracy evaluation | BF16 | C++ | CUDA | CUDA kernelsSenior-level Full TimeFoster City, CA1mo ago
-
Senior-level Full TimeBeijing Yizhuang, China1mo ago
-
A/B | A/B Testing | AUC | AWQ | AWS SageMakerSenior-level Full TimeTel-Aviv, Israel1mo ago
-
Senior Software Engineer, LLM Performance USD 180K-339KC++ | CUDA | Cutlass | FlashAttention | FlashInferSenior-level Full TimeSF Bay Area (Hybrid) R1mo ago
-
Machine Learning Engineer, AI Models EUR 72K-96KC++ | CUDA | FlashAttention | Kernel Fusion | Memory hierarchySenior-level Full TimeCyprus1mo ago