Find jobs in AI/ML, Data Science and Big Data
49 results
for Speculative decoding
(Skill/Tech stack)
-
[SX/EIT-MM] AI/Agent Engineer USD 147K-167KAgent coordination | Agentic Systems | Async Programming | Audio Processing | CI/CDSenior-level Full TimeThành phố Hồ Chí Minh, Hồ …1d ago
-
Senior Forward Deployed Engineer II (AI/ML) INR 1800K-3500KAgents SDK | CUDA | Cache optimization | Continuous batching | CrewAIMid-level Full TimeBengaluru2d ago
-
Senior Forward Deployed Engineer I (AI/ML) INR 3000K-4800KAgents SDK | CUDA | Continuous batching | CrewAI | Data CompressionHybrid work | Travel up to 30%Senior-level Full TimeBengaluru3d ago
-
Engineering Manager, ML Performance USD 207K-301KAuto sharding | Benchmarking | CUDA | CUDA Performance | Compiler optimizationSenior-level Full TimeSunnyvale, CA, USA; Kirkland, WA, USA4d ago
-
AI Software Engineer USD 151K-332KC++ | CUDA | CUDA kernels | CUDA profiling | Cache ManagementCommunity involvement | Health benefits | Hybrid work options | In-person work options | Remote work optionsMid-level Full TimeSeattle (WA), United States5d ago
-
ML Platform Engineer USD 100K-150KAPI Gateway | Abuse detection | Automated rollback | Autoscaling | BatchingRemote workSenior-level Full TimeUnited States - Remote R5d ago
-
Applied AI Scientist - On Site EUR 54K-86KC++ | CUDA | Computer Vision | Deep learning | Distributed TrainingOn-site workSenior-level Full TimeMünchen, BY, DE5d ago
-
C++ | Computer Vision | Deep learning | Distributed Training | Efficient InferenceCore research and development team | On-site workSenior-level Full TimeTel Aviv-Yafo, Tel Aviv District, IL5d ago
-
Senior-level Full Time上海、北京7d ago
-
AI Performance Optimization Engineer USD 100K-150KBenchmarking | C++ | CUDA | Continuous batching | CutlassRemote workMid-level Full TimeUnited States - Remote R8d ago
-
AI Performance Optimization Engineer USD 100K-150KAccess Optimization | Attention Mechanisms | Benchmarking | C plus plus | CPUMid-level Full TimeUnited States - Remote R8d ago
-
Continuous batching | Data parallelism | Deep learning | Distributed Training | Dynamic MemoryComputational resources access | Full sponsorship | Hired by Rakuten Asia after completion | Research exchangesMid-level Full TimeCrimson House Singapore11d ago
-
Application Software Engineer, Inference USD 135K-185KAgent Orchestration | Agent SDK | Auto Scaling | Batch scheduling | C++401k plan | Employee stock purchase plan | Long-term incentives | Medical, dental & vision coverage | Onsite Palo AltoEntry-level Full TimePalo Alto, CA11d ago
-
Sr GenAI Infra Specialist SA, AWS WWSO Startup USD 153K-228KAWS Inferentia | AWS Trainium | Amazon Web Services | Batching | CUDASenior-level Full TimeNew York, New York, USA12d ago
-
AI Engineer EUR 60K-80KAWQ | AWS | Agent SDK | CI/CD | CUDACareer growth opportunities | Permanent employment | Remote work optionMid-level Full TimeRemote - Paris, France R13d ago
-
Senior Machine Learning Engineer USD 188K-282KAdversarial Training | Calibration monitoring | Continuous batching | DPO | Deep learningSenior-level Full TimePalo Alto, CA14d ago
-
Sr. Software Engineer, Inference PLN 321K-470KAutoscaling | BF16 | C++ | CI/CD | CUDACritical illness cover | Employee assistance programme | Family dental insurance | Family medical insurance | Life assuranceSenior-level Full TimeWarsaw, Poland14d ago
-
LLM Inference Frameworks and Optimization Engineer USD 160K-230KC++ | CUDA | CUDA graph | Cluster scheduling | CompilerEquity | Health insuranceMid-level Full TimeSan Francisco, Singapore, Amsterdam14d ago
-
Deep learning | Distributed Training | Flash Attention | Inference Optimization | Kernel FusionHybrid workSenior-level Full TimeToronto, Ontario, Canada17d ago
-
Research Intern, Inference (Fall 2026) USD 116K-126KCUDA | Deep learning | Distributed Systems | JAX | Machine LearningHousing stipend | Open source contribution opportunitiesEntry-level InternshipSan Francisco17d ago
-
Staff Software Engineer, Inference PLN 369K-542KAutoscaling | BF16 | Benchmarking | C++ | CUDACritical illness cover | Employee assistance programme | Family dental insurance | Family medical insurance | Generous pension contributionSenior-level Full TimeWarsaw, Poland18d ago
-
[SX/EIT-MM] Senior AI/Agent Engineer USD 150K-197KA/B | A/B Testing | Async Programming | Attention Mechanism | Audio ProcessingSenior-level Full TimeThành phố Hồ Chí Minh, Hồ …19d ago
-
AI Engineer USD 100K-135KAWQ | AWS | AWS EC2 | Agent Frameworks | CI/CD401k match | Health insurance | Learning and development stipend | Paid parental leave | Paid time offMid-level Full TimeRemote USA - In Tandem R19d ago
-
Attention Mechanisms | C++ | Decoder Only | Decoder-only Transformer | GPU parallelismComprehensive benefitsSenior-level Full TimeNew York, New York, United States …22d ago
-
Senior Quantization Engineer - Edge AI Model Optimization INR 3000K-5000KC++ | CNN | Deep learning | Embedded Systems | Generative AISenior-level Full TimeHyderabad, India23d ago
-
Senior Staff Software Engineer, TPU Performance USD 262K-365KAuto sharding | Benchmarking | C++ | Compiler optimization | Computer EngineeringBenefits | Bonus | EquitySenior-level Full TimeSunnyvale, CA, USA25d ago
-
Attention | Batching | C++ | CUDA | CUDA kernelsBenefits | EquitySenior-level Full TimeUS, CA, Santa Clara, United States27d ago
-
CUDA | CUDA kernels | Compiler optimization | Graph Fusion | High PerformanceMid-level Full TimeSan Jose, California, United States30d ago
-
Senior Solutions Architect - Generative AI INR 2475K-4500KArgo | CI/CD | CUDA | Evaluation | FedRAMPSenior-level Full TimeIndia, Pune1mo ago
-
Engineering Manager, Inference Benchmarking — AI Perf USD 224K-356KDCGM | Distributed Systems | GPU Telemetry | GPU observability | HelmSenior-level Full TimeUS, CA, Santa Clara, United States1mo ago
-
Data Curation | Deep learning | DeepSpeed | Direct Preference Optimization | EvaluationSenior-level Full TimeSingapore, Singapore1mo ago
-
Staff Machine Learning Engineer, ML Infrastructure USD 183K-269KAWS EKS | Amazon IAM | Amazon S3 | Autoscaling | BatchingEmployee resource groups | Free home security system | Free professional monitoring | Hybrid work modelSenior-level Full TimeBoston, MA1mo ago
-
Senior-level Full TimeMilpitas, CA, United States1mo ago
-
AI/ML ASIC Architect USD 163K-249KARM | ASIC architecture | AXI interconnect | Area Optimization | Attention MechanismsSenior-level Full TimeMilpitas, CA, United States1mo ago
-
Compute Shaders | Diffusion Models | Distributed inference | Edge Computing | Expert parallelism100 percent remoteSenior-level Full TimeRemote job R1mo ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 201K-332KComputer Vision | Diffusion Models | Edge Computing | Expert parallelism | Flash AttentionRemote workSenior-level Full TimeRemote job R1mo ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 201K-332KCompute Shaders | Diffusion Models | Distributed inference | Edge Computing | Expert parallelismEnglish communication support | Remote workSenior-level Full TimeRemote job R1mo ago
-
Diffusion Models | Distributed Inference Systems | Distributed inference | Expert parallelism | Flash Attention100 percent remote | Worldwide remoteSenior-level Full TimeRemote job R1mo ago
-
Staff Software Engineer, Machine Learning, Google Chat USD 207K-300KAgentic Workflows | Caching | Cloud Spanner | Continuous Delivery | Continuous integrationSenior-level Full TimeSunnyvale, CA, USA1mo ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KCompute Shaders | Diffusion Models | Distributed Inference Systems | Distributed inference | Edge ComputingRemote workSenior-level Full TimeRemote job R1mo ago
-
Deep learning | Evaluation Pipelines | GPU Cluster | High Performance | High-Performance ComputingSenior-level Full TimeIsrael, Tel Aviv1mo ago
-
AI Platform Engineer INR 1500K-2500KAutomated Evaluation | CI/CD | CUDA | Continuous Checkpointing | Continuous batchingMid-level Full TimeBangalore, India1mo ago
-
Software Engineering Manager, LLM Training USD 170K-277KCUDA | Containerization | Context Parallelism | Data I/O | Data parallelismEntry-level Full TimeMountain View, CA, United States1mo ago
-
AI Platform Engineer INR 1500K-2500KAlerting | CUDA | Cause analysis | Continuous batching | GPU ProfilingMid-level Full TimeBangalore, India1mo ago
-
Senior-level Full TimePalo Alto1mo ago
-
AWQ | Audio codecs | Audio streaming | Autoscaling | Chunked prefill401k matching | Annual offsites | Dental coverage | Employer-paid training | Healthcare benefitsMid-level Full TimeSan Francisco, CA1mo ago
-
Senior-level Full TimeDublin, Ireland1mo ago
-
Principal Model Optimization Engineer USD 295K-345KCUDA | Continuous batching | GPU | LLM Inference | Machine LearningSenior-level Full TimeSan Mateo, CA, United States R1mo ago
-
Applied Scientist, GenAI USD 152K-189KA/B | A/B Testing | AWS | Agent Orchestration | Agent systemsSenior-level Full TimeUS - MA - Wilmington1mo ago