Find jobs in AI/ML, Data Science and Big Data
73 results
for KV cache
(Skill/Tech stack)
-
AI Performance Optimization Engineer USD 100K-150KBenchmarking | C++ | Continuous batching | Cutlass | DeepSpeedCareer growth potential | Full-time benefits | H1B transfer support for qualified candidates | Long-term engagement | Remote workMid-level Full TimeUnited States - Remote R1d ago
-
AI Performance Optimization Engineer USD 100K-150KBenchmarking | C++ | Compiler optimization | Continuous batching | Deep learningMid-level Full TimeUnited States - Remote R1d ago
-
AI Performance Optimization Engineer USD 100K-150KC++ | Communication Primitives | Continuous batching | Distributed Training | FSDPCareer growth | Health benefits | Mentorship | Remote workMid-level Full TimeUnited States - Remote R1d ago
-
AI Performance Optimization Engineer USD 100K-150KC++ | Continuous batching | Custom Kernel | Custom kernel development | Cutlass100 percent remote | Benefits package | Full-time employmentMid-level Full TimeUnited States - Remote R4d ago
-
AI Performance Optimization Engineer USD 100K-150KBenchmarking | C++ | Continuous batching | Data loading | Data loading optimizationMid-level Full TimeUnited States - Remote R4d ago
-
AI Performance Optimization Engineer USD 100K-150KC++ | CUDA | Continuous batching | Cutlass | DeepSpeedMid-level Full TimeUnited States - Remote R4d ago
-
Engineering Manager, Inference Benchmarking — AI Perf USD 224K-356KDCGM | Distributed Systems | GPU Telemetry | GPU observability | HelmSenior-level Full TimeUS, CA, Santa Clara, United States5d ago
-
AI Performance Optimization Engineer USD 100K-150KBenchmarking | C++ | CUDA | Continuous batching | CutlassBenefits package | Remote workMid-level Full TimeUnited States - Remote R5d ago
-
AI Performance Optimization Engineer USD 100K-150KBenchmarking | C++ | Compiler optimization | Continuous batching | CutlassCareer growth | Health benefits | Mentorship | Remote workMid-level Full TimeUnited States - Remote R5d ago
-
Product Manager - AI Inference & Model Serving USD 165K-275KAI Inference | Artificial Intelligence | Autoscaling | Cache Management | Continuous batchingConference attendance | Professional development | Stock options | Training | Workstation providedMid-level Full TimeAustin, TX, United States6d ago
-
AI Software Engineer USD 151K-332KC++ | CUDA | Continuous batching | FP16 | FP8Hybrid work | Remote workMid-level Full TimeSeattle (WA), United States6d ago
-
Data Curation | Deep learning | DeepSpeed | Direct Preference Optimization | EvaluationSenior-level Full TimeSingapore, Singapore8d ago
-
Senior ML Engineer - Kimchi (LLM Inference Optimization GBP 110K-141KActivations quantization | Amazon Web Services | ArgoCD | CUDA | CUDA-adjacent toolingEquipment budget | Equity options | Extra days off | Hackathon | Learning budgetSenior-level Full TimeUnited Kingdom R8d ago
-
Senior ML Engineer - Kimchi (LLM Inference Optimization) PLN 292K-400KAWS | ArgoCD | Azure | CUDA | Chunked prefillAnnual hackathon | Conference access | Equipment budget | Equity options | Extra days offSenior-level Full TimePoland R8d ago
-
AWS | Argo CD | ArgoCD | Azure | CUDAConference access | Equipment budget | Equity options | Extra days off | Flexible work hoursSenior-level Full TimeFrance R8d ago
-
Senior MLOps Engineer - LLMs EUR 56K-76KA/B | A/B Testing | Argo | Async API | AuthenticationAutonomy | Hybrid work model | Professional growth and learningSenior-level Full TimeNetherlands - Amsterdam11d ago
-
Staff Machine Learning Engineer, ML Infrastructure USD 183K-269KAWS EKS | Amazon IAM | Amazon S3 | Autoscaling | BatchingEmployee resource groups | Free home security system | Free professional monitoring | Hybrid work modelSenior-level Full TimeBoston, MA12d ago
-
Senior-level Full TimeMilpitas, CA, United States12d ago
-
AI/ML ASIC Architect USD 163K-249KARM | ASIC architecture | AXI interconnect | Area Optimization | Attention MechanismsSenior-level Full TimeMilpitas, CA, United States12d ago
-
Machine Learning Engineer, Distributed vLLM USD 136K-287KAPI Gateway | Cilium | Distributed Systems | Envoy | GPU ProfilingPaid parental leave | Paid time off | Retirement 401k match | Tuition reimbursementMid-level Full TimeBoston, United States R12d ago
-
Product Manager - AI Inference & Model Serving USD 160K-275KAI Inference | Autoscaling | Cache Management | Cold Start | Cold Start OptimizationConference attendance | Professional development and training | Stock options | Workstation providedMid-level Full TimeAustin, TX, United States13d ago
-
AWQ | AWS | Accelerate | Azure | BatchingMid-level Full TimeShenzhen, Guangdong, China R13d ago
-
Sr GenAI Infra Specialist SA, AWS WWSO Startup USD 153K-228KAWS | Amazon EC2 | Amazon EKS | Amazon S3 | Cache optimizationInclusive team culture | Mentorship and career growth | Work-life balanceSenior-level Full TimeNew York, New York, USA13d ago
-
Compute Shaders | Diffusion Models | Distributed inference | Edge Computing | Expert parallelism100 percent remoteSenior-level Full TimeRemote job R14d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 201K-332KComputer Vision | Diffusion Models | Edge Computing | Expert parallelism | Flash AttentionRemote workSenior-level Full TimeRemote job R14d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 201K-332KCompute Shaders | Diffusion Models | Distributed inference | Edge Computing | Expert parallelismEnglish communication support | Remote workSenior-level Full TimeRemote job R14d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 201K-332KDiffusion Models | Distributed Inference Systems | Distributed inference | Edge Computing | Expert parallelismRemote workSenior-level Full TimeRemote job R14d ago
-
Diffusion Models | Distributed Inference Systems | Distributed inference | Expert parallelism | Flash Attention100 percent remote | Worldwide remoteSenior-level Full TimeRemote job R14d ago
-
Staff Engineer USD 191K-239KAMD GPU | Apache Yunikorn | Autoscaling | Bin packing | CRIUConference reimbursement | Education reimbursement | Employee assistance program | Employee stock purchase program | Equity compensationSenior-level Full TimeSeattle15d ago
-
Senior Applied ML Engineer (Speech & Audio) USD 140K-200KActivity Detection | Audio codecs | Audio preprocessing | Automatic Speech Recognition | ConformerAccommodation allowance | Career Development Programs | Career growth opportunities | Coffee | Daily DrinksSenior-level Full TimeEgypt - Remote R15d ago
-
Senior Applied ML Engineer (Speech & Audio) USD 140K-200KActivity Detection | Audio Processing | Audio codecs | Automatic Speech Recognition | CUDAAccommodation allowance | Career growth opportunities | Coffee and hot drinks | Company events and parties | Daily breakfastSenior-level Full TimeCairo, Cairo Governorate, Egypt15d ago
-
Inference Engineer - Acceleration CHF 110K-160KAdmission control | CUDA | Cutlass | FlashAttention | KV cacheCommuting subsidy | Learning and development budget | Offsites and team events | Pension plan | Vacation daysMid-level Full TimeZürich, Switzerland15d ago
-
AI SW Stack Deployment Architect INR 2500K-4500KAPI Design | Cloud Computing | Distributed Systems | Edge Computing | Inference ServerSenior-level Full TimeBengaluru, KA, India15d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KCompute Shaders | Diffusion Models | Distributed Inference Systems | Distributed inference | Edge ComputingRemote workSenior-level Full TimeRemote job R16d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KComputer Vision | Deep learning | Diffusion Models | Distributed inference | Edge ComputingRemote workSenior-level Full TimeRemote job R16d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KCompute Shaders | Diffusion Models | Distributed Inference Systems | Distributed inference | Edge ComputingRemote workSenior-level Full TimeRemote job R16d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KDiffusion Models | Distributed Inference Systems | Distributed inference | Edge Computing | Expert parallelismCareer growth | Collaborative research environment | English communication support | Remote work opportunitySenior-level Full TimeRemote job R16d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KCompute Shaders | Custom Compute Shaders | Data Pipelines | Diffusion Models | Distributed Inference SystemsRemote workSenior-level Full TimeRemote job R16d ago
-
Engineer, Staff-AI and Machine Learning with embedded/C++ INR 2000K-3500KC# | C++ | CUDA | Cross-Attention | Design PatternsSenior-level Full TimeHyderabad, Telangana, India16d ago
-
Senior-level Full TimeSan Francisco - Remote, CA, United … R18d ago
-
API Integration | Agent systems | Benchmarking | Computer Vision | Data DriftBare Metal GPU Cluster Access | Company bike leasing | Company events | Employer Subsidy | Flexible work hoursSenior-level Full TimeHeidelberg20d ago
-
AI Platform Engineer INR 1500K-2500KAutomated Evaluation | CI/CD | CUDA | Continuous Checkpointing | Continuous batchingMid-level Full TimeBangalore, India20d ago
-
Director Engineering - AI software stack INR 2400K-5199KAI/ML | Agentic AI | Batching | CPU GPU | CPU GPU DSP NPUExecutive-level Full TimeBangalore, Karnataka, India20d ago
-
Senior Staff/Staff/Lead/Senior Engineer/Engineer -AI and Embedded, C++, AI, Machine Learning INR 1800K-3200KAlgorithm Optimization | C# | C++ | CPU | Cross-AttentionSenior-level Full TimeHyderabad, Telangana, India20d ago
-
Engineer, Staff-Machine Learning-Embedded and C++ INR 2000K-3500KC# | C++ | CUDA | Cross-Attention | Design PatternsSenior-level Full TimeHyderabad, Telangana, India20d ago
-
AI Platform Engineer INR 1500K-2500KAlerting | CUDA | Cause analysis | Continuous batching | GPU ProfilingMid-level Full TimeBangalore, India21d ago
-
Engineer, Senior-Machine Learning-Embedded,C++ INR 2000K-3500KC# | C++ | CUDA | Cross-Attention | Design PatternsSenior-level Full TimeHyderabad, Telangana, India21d ago
-
Engineer, Senior-AI and Machine Learning with embedded/C++ INR 2000K-3500KC# | C++ | CUDA | Cross-Attention | Design PatternsSenior-level Full TimeHyderabad, Telangana, India21d ago
-
Tech Lead — ASR / TTS / Speech LLM (IC + Mentor) INR 2500K-4800KAutoscaling | Batching | CTC | Caching | DERSenior-level Full TimeBengaluru23d ago
-
AWQ | Audio codecs | Audio streaming | Autoscaling | Chunked prefill401k matching | Annual offsites | Dental coverage | Employer-paid training | Healthcare benefitsMid-level Full TimeSan Francisco, CA25d ago