Find jobs in AI/ML, Data Science and Big Data
10 results
for FlashAttention
(Skill/Tech stack)
-
Mid-level Full TimeSingapore4d ago
-
Mid-level Full Time北京 R6d ago
-
AI Performance Optimization Engineer USD 100K-150KBenchmarking | C++ | CUDA | Continuous batching | CutlassRemote workMid-level Full TimeUnited States - Remote R8d ago
-
AI Performance Optimization Engineer USD 100K-150KAccess Optimization | Attention Mechanisms | Benchmarking | C plus plus | CPUMid-level Full TimeUnited States - Remote R8d ago
-
Bayesian optimization | Data Generation | Debugging | DeepSpeed | Distributed SystemsAdditional time off for learning and development | Annual leave | Cycle to work scheme | Employee assistance program | Group personal pensionEntry-level ContractLondon, United Kingdom12d ago
-
Research Engineer (LLM Training and Performance) GBP 80K-120KAOTAutograd | CUDA | CuTe | Cutlass | Data loadersSenior-level Full TimeAmsterdam, Netherlands; Belgrade, Serbia; Berlin, Germany; …14d ago
-
Engineering Manager, Model Inference USD 220K-270KAPIs | Attention Mechanism | Batching | Distributed Systems | Docker401k matching | Commuter benefits | Flexible PTO | Flexible spending accounts | Generous time offMid-level Full TimeSF Office1mo ago
-
Inference Engineer - Acceleration CHF 110K-160KAdmission control | CUDA | Cutlass | FlashAttention | KV cacheCommuting subsidy | Learning and development budget | Offsites and team events | Pension plan | Vacation daysMid-level Full TimeZürich, Switzerland1mo ago
-
AI Platform Engineer INR 1500K-2500KAutomated Evaluation | CI/CD | CUDA | Continuous Checkpointing | Continuous batchingMid-level Full TimeBangalore, India1mo ago
-
Software Engineering Manager, LLM Training USD 170K-277KCUDA | Containerization | Context Parallelism | Data I/O | Data parallelismEntry-level Full TimeMountain View, CA, United States1mo ago