55 jobs for Speculative decoding

Senior Inference Runtime Engineer TWD 1900K-2500K

CUDA | CUDA profiling | Continuous batching | Distributed inference | GPU Memory Optimization

Flexible work culture | Inclusive environment | Training and mentoring

Senior-level Full Time

Singapore, SG / Penang, MY / …

1d ago

Intern - GenAI Benchmarking (MLE) / On-Device Model Deployment (SWE) KRW 45000K-55000K

AIMet | Automated testing | Cache Management | Code review | Context modeling

Entry-level Internship

Seoul, Korea, Republic of

1d ago

Model Optimization Engineer USD 100K-150K

C++ | CUDA | Continuous batching | Deep learning | DeepSpeed

Senior-level Full Time

United States - Remote R

1d ago

ML Performance Engineer USD 100K-150K

Benchmarking | C++ | Continuous batching | Cutlass | Deep learning

Career growth | Direct W2 employment | Remote work

Senior-level Full Time

Tempe, AZ R

1d ago

AI Inference Engineer GBP 80K-106K

Autoscaling | Batching | CUDA | Cache Management | Capacity Planning

Biannual bonus | Breakfast allowance | Dinner allowance | Equity sign-on bonus | Expensed technology

Mid-level Full Time

London, England, United Kingdom - Remote R

1d ago

AI Optimization Engineer USD 100K-150K

Benchmarking | C++ | Cache optimization | Compiler optimization | Continuous batching

Career growth

Senior-level Full Time

United States - Remote R

4d ago

Senior Director, Inference Products and Optimizations USD 274K-343K

AI workload | AI workload orchestration | AMD GPU | CUDA | Container Runtime

Conference reimbursement | Education reimbursement | Employee assistance program | Employee stock purchase program | Equity compensation

Senior-level Full Time

Seattle

4d ago

Senior Director, Inference Products and Optimizations USD 274K-343K

AI workload | AI workload orchestration | AMD | Benchmarking | Call Management

Employee assistance program | Flexible time off | LinkedIn Learning

Senior-level Full Time

San Francisco R

4d ago

Model Optimization Engineer USD 100K-150K

Benchmarking | C++ | CUDA | Continuous batching | Cutlass

Senior-level Full Time

United States - Remote R

5d ago

Senior Software Engineer - AI Inference USD 152K-287K

Attention | Batching | C++ | CUDA | Concurrency

Senior-level Full Time

US, CA, Santa Clara R

5d ago

Agentic AI Engineer (LLM) USD 106K-181K

A/B | A/B Testing | API Development | Agent Orchestration | Autogen

Annual health check-ups | Opportunity to collaborate and learn from industry professionals | Performance bonuses | Preferential pricing for services | Premium healthcare package

Mid-level Full Time

Hanoi, Vietnam

5d ago

Staff Software Engineer, Beyond Live, DeepMind USD 207K-301K

2D Games | 3D Games | Audio Tokenization | Audio video pacing | Audio/Video

Senior-level Full Time

New York, NY, USA; Mountain View, …

6d ago

Senior Staff Engineer, GDC AI Inference Platform USD 262K-365K

Agent coordination | C++ | Cloud infrastructure | Containerization | Data Structures

Bonus | Equity | Health insurance | Paid time off | Retirement plans

Senior-level Full Time

Sunnyvale, CA, USA; Kirkland, WA, USA

6d ago

ML Framework (MetalLM) Engineer USD 175K-312K

C# | C++ | CUDA | Compiler optimization | Compression

Senior-level Full Time

Cupertino

7d ago

Software Engineer, AI Specialist - Wearables AI (Technical Leadership) USD 147K-208K

C plus plus | CI/CD | Cloud Computing | Computer Vision | Deep learning

Senior-level Full Time

Burlingame, CA

10d ago

AI Engineer (Managed Services) SGD 85K-138K

ARES | AWQ | Agent Orchestration | Agent systems | Attention Mechanisms

Mid-level Full Time

Singapore

11d ago

AI Engineer – Enterprise, Data & AI USD 160K-246K

AWS Bedrock | Agentic Workflows | Alerting | Autogen | Autonomous Agents

Mid-level Full Time

Foster City, CA

11d ago

Machine Learning Engineer - Model Inference USD 166K-230K

C++ | Cloud infrastructure | Containers | Continuous batching | Distributed Systems

Mid-level Full Time

Cupertino

11d ago

大模型推理与部署优化工程师 CNY 240K-480K

Ascend 910B | Ascend CANN | Autoscaling | CUDA | Continuous batching

Entry-level Full Time

北京

12d ago

AI Engineer (Managed Services) SGD 85K-138K

A/B | A/B Testing | AWQ | AutoAWQ | B testing

Mid-level Full Time

Singapore

12d ago

Tech Lead Manager, Inference USD 207K-300K

Autoscaling | Cache Management | Caching | Continuous batching | Deployment Pipelines

Senior-level Full Time

SF Bay Area, CA

13d ago

Software Engineer - Training/Inference (C++) USD 180K-440K

Auto Scaling | C++ | CI/CD | CUDA | Code generation

401k plan | Dental insurance | Disability insurance | Discounts | Health insurance

Senior-level Full Time

Palo Alto, CA

14d ago

Lead Software Engineer – LLM Ops Platform Reliability GBP 80K-110K

AWS | Alertmanager | Amazon EKS | Amazon SageMaker | Autoscaling

Senior-level Full Time

GLASGOW, LANARKSHIRE, United Kingdom

15d ago

LLM Engineer (Optimization)

Benchmarking | C++ | CUDA | Cache Compression | Inference Optimization

Senior-level Full Time

Pangyo (Software Dream Center), South Korea

15d ago

Staff ML Engineer, Fine Tuning - Slack USD 197K-344K

Deep learning | GPU infrastructure | Go | Hybrid Retrieval Generation | Hybrid retrieval

401k | Dental insurance | Employee stock purchasing program | Life and disability insurance | Medical insurance

Senior-level Full Time

Washington - Seattle, United States

15d ago

Member of Technical Staff - Inference Research USD 150K-350K

Autoscaling | Disaggregated Prefill | FP8 | INT4) | KV cache

In-person collaboration

Senior-level Full Time

New York

15d ago

Senior Software Engineer, Inference USD 152K-204K

BF16 | C++ | CI/CD | CUDA | CUDA kernels

401k employer match | Company paid life insurance | Employee stock purchase program | Flexible PTO | Flexible spending account

Senior-level Full Time

Sunnyvale, CA / Bellevue, WA

19d ago

Senior ML Engineer (Token Factory) GBP 80K-130K

CI/CD | Distributed Training | Inference Optimization | JAX | JAX Speculative Decoding

Career growth and learning opportunities | Collaborative culture | Flexibility | International environment | Ownership

Senior-level Full Time

Germany; Israel; Netherlands; Prague, Czech Republic; … R

20d ago

Principal LLM Inference Engineer USD 195K-285K

Batching | C# | C++ | CUDA | CUDA kernel

Equity | Flexible working hours | Health insurance | Paid time off

Senior-level Full Time

Santa Clara

20d ago

[2026] Senior Machine Learning Engineer (Systems), Embodied AI/NPCs, ML Platform - PhD Early Career USD 196K-243K

AWS | Azure | Cloud platform | Continuous batching | Data Pipelines

Equity compensation | Health benefits | Paid time off

Senior-level Full Time

San Mateo, CA, United States R

20d ago

[2026] Senior Machine Learning Engineer (Systems), Embodied AI/NPCs, ML Platform - PhD Early Career USD 196K-243K

AWS | Azure | Cloud platform | Continuous batching | Deep learning

Senior-level Full Time

San Mateo, CA, United States R

20d ago

Senior Forward Deployed Engineer II (AI/ML) INR 1800K-3500K

Agents SDK | CUDA | Cache optimization | Continuous batching | CrewAI

Mid-level Full Time

Bengaluru

24d ago

Senior Forward Deployed Engineer I (AI/ML) INR 3000K-4800K

Agents SDK | CUDA | Continuous batching | CrewAI | Data Compression

Hybrid work | Travel up to 30%

Senior-level Full Time

Bengaluru

24d ago

Engineering Manager, ML Performance USD 207K-301K

Auto sharding | Benchmarking | CUDA | CUDA Performance | Compiler optimization

Senior-level Full Time

Sunnyvale, CA, USA; Kirkland, WA, USA

26d ago

Applied AI Scientist - On Site EUR 54K-86K

C++ | CUDA | Computer Vision | Deep learning | Distributed Training

On-site work

Senior-level Full Time

München, BY, DE

27d ago

Applied AI Scientist - On Site

C++ | Computer Vision | Deep learning | Distributed Training | Efficient Inference

Core research and development team | On-site work

Senior-level Full Time

Tel Aviv-Yafo, Tel Aviv District, IL

27d ago

MaaS 架构师 CNY 240K-480K

Attention | Batching | C++ | CUDA | Continuous batching

Senior-level Full Time

上海、北京

28d ago

EDB-IPP Project: Advancing GPU Optimization for Large Language Models SGD 60K-120K

Continuous batching | Data parallelism | Deep learning | Distributed Training | Dynamic Memory

Computational resources access | Full sponsorship | Hired by Rakuten Asia after completion | Research exchanges

Mid-level Full Time

Crimson House Singapore

1mo ago

Application Software Engineer, Inference USD 135K-185K

Agent Orchestration | Agent SDK | Auto Scaling | Batch scheduling | C++

401k plan | Employee stock purchase plan | Long-term incentives | Medical, dental & vision coverage | Onsite Palo Alto

Entry-level Full Time

Palo Alto, CA

1mo ago

Sr GenAI Infra Specialist SA, AWS WWSO Startup USD 153K-228K

AWS Inferentia | AWS Trainium | Amazon Web Services | Batching | CUDA

Senior-level Full Time

New York, New York, USA

1mo ago

AI Engineer EUR 60K-80K

AWQ | AWS | Agent SDK | CI/CD | CUDA

Career growth opportunities | Permanent employment | Remote work option

Mid-level Full Time

Remote - Paris, France R

1mo ago

Senior Machine Learning Engineer USD 188K-282K

Adversarial Training | Calibration monitoring | Continuous batching | DPO | Deep learning

Senior-level Full Time

Palo Alto, CA

1mo ago

Sr. Software Engineer, Inference PLN 321K-470K

Autoscaling | BF16 | C++ | CI/CD | CUDA

Critical illness cover | Employee assistance programme | Family dental insurance | Family medical insurance | Life assurance

Senior-level Full Time

Warsaw, Poland

1mo ago

LLM Inference Frameworks and Optimization Engineer USD 160K-230K

C++ | CUDA | CUDA graph | Cluster scheduling | Compiler

Equity | Health insurance

Mid-level Full Time

San Francisco, Singapore, Amsterdam

1mo ago

Sr. Staff Machine Learning Researcher - Model Training & Optimization CAD 100K-500K

Deep learning | Distributed Training | Flash Attention | Inference Optimization | Kernel Fusion

Hybrid work

Senior-level Full Time

Toronto, Ontario, Canada

1mo ago

Research Intern, Inference (Fall 2026) USD 116K-126K

CUDA | Deep learning | Distributed Systems | JAX | Machine Learning

Housing stipend | Open source contribution opportunities

Entry-level Internship

San Francisco

1mo ago

Staff Software Engineer, Inference PLN 369K-542K

Autoscaling | BF16 | Benchmarking | C++ | CUDA

Critical illness cover | Employee assistance programme | Family dental insurance | Family medical insurance | Generous pension contribution

Senior-level Full Time

Warsaw, Poland

1mo ago

AI Engineer USD 100K-135K

AWQ | AWS | AWS EC2 | Agent Frameworks | CI/CD

401k match | Health insurance | Learning and development stipend | Paid parental leave | Paid time off

Mid-level Full Time

Remote USA - In Tandem R

1mo ago

ML Research Scientist -Deep Learning & Transformer Architectures USD 150K-200K

Attention Mechanisms | C++ | Decoder Only | Decoder-only Transformer | GPU parallelism

Comprehensive benefits

Senior-level Full Time

New York, New York, United States …

1mo ago

Senior Quantization Engineer - Edge AI Model Optimization INR 3000K-5000K

C++ | CNN | Deep learning | Embedded Systems | Generative AI

Senior-level Full Time

Hyderabad, India

1mo ago

Find jobs in AI/ML, Data Science and Big Data