77 jobs for KV cache

Senior AI Engineer - Services Special Projects USD 150K-260K

Adversarial evaluation | Airflow | CI/CD | Cloud Platforms | Content Moderation

Senior-level Full Time

San Francisco Bay Area

1d ago

ML Software Engineer USD 175K-312K

Asynchronous programming | Batching | C++ | Concurrency | Distributed Systems

Observability tooling | On-call support | Production incident root cause support

Entry-level Full Time

Seattle

2d ago

Applied AI Engineer, Inference USD 188K-275K

Batching | Benchmarking | CUDA | GPU Utilization | Graph Capture

401k employer match | Disability insurance | Employee stock purchase program | Flexible PTO | Flexible spending account

Mid-level Full Time

Bellevue, WA/ San Francisco, CA/ Sunnyvale, …

2d ago

Machine Learning/ Search Engineer - Services Special Projects USD 190K-285K

AWS | Apache Kafka | Batching | C++ | Cassandra

Senior-level Full Time

Cupertino

3d ago

多模态交互算法工程师-X-Lab CNY 240K-480K

Attention | Checkpointing | Hard Negative Mining | KV cache | Knowledge Distillation

Senior-level Full Time

上海、深圳

4d ago

Staff/Principal DevOps Engineer, AI Inference USD 192K-272K

AWS IAM | AWS S3 | Amazon EC2 | Amazon EKS | Amazon VPC

Commuter benefits | Company Subsidized Lunch Program | Company holidays | Dental insurance | Educational assistance

Senior-level Full Time

Cambridge, MA USA

4d ago

Tech Lead, Machine Learning Engineer - Global E-Commerce (Conversational AI) SGD 160K-205K

AI Feedback | Context engineering | Data Curation | Deep learning | DeepSpeed

Senior-level Full Time

Singapore, Singapore

5d ago

Algorithm - Serving System Researcher

Analytical modeling | Attention | Batching | Cost modeling | Data Analysis

Senior-level Full Time

Seoul HQ

6d ago

ML Platform Engineer USD 100K-160K

Autoscaling | C++ | Caching | Data caching | Distributed Systems

Senior-level Full Time

Westford, MA R

6d ago

Staff AI Engineer USD 151K-332K

C plus plus | CUDA | CUDA kernels | Continuous batching | FP8

Senior-level Full Time

Seattle (WA), United States

6d ago

MLOps Engineer USD 100K-150K

API Gateway | Abuse detection | Autoscaling | C++ | CUDA

Senior-level Full Time

United States - Remote R

6d ago

ML Performance Engineer USD 100K-150K

Benchmarking | C++ | Compiler optimization | Continuous batching | Custom Kernel

Remote work

Senior-level Full Time

Scottsdale, AZ R

6d ago

AI Researcher - Efficient AI (Contractor) USD 90K-114K

Agent systems | Cache Compression | DPO | Distillation | Hybrid Attention

401k matching | Fitness Goal Incentives | Health, dental, vision insurance | Hybrid work | Life and disability insurance

Entry-level Contract

Santa Clara, CA R

8d ago

Machine Learning Engineer, AI Labs TWD 1300K-2000K

Accuracy evaluation | Cache optimization | Deep learning | Distributed Systems | Inference Optimization

Senior-level Full Time

Taipei, Taiwan

9d ago

Sr. Inference Optimization Engineer (local / edge runtime) USD 195K-361K

AWQ | Batching | Build systems | C++ | CPU Kernels

Hybrid work model | On site and off site work

Senior-level Full Time

USA - CA - Santa Clara, …

9d ago

Senior Engineer, Inference Data Plane USD 139K-174K

Autoscaling | Continuous batching | Data parallelism | GRPC | Go

Employee assistance program | Flexible time off | Hybrid work model | LinkedIn Learning access | Local Employee Meetups

Senior-level Full Time

Seattle

9d ago

Senior Software Engineer I - AI Inference Data Plane USD 139K-174K

Autoscaling | Continuous batching | Data parallelism | Distributed Systems | GRPC

Conference reimbursement | Education reimbursement | Employee assistance program | Employee stock purchase program | Equity compensation

Senior-level Full Time

Austin R

9d ago

Senior Engineer, Inference Data Plane USD 139K-174K

Continuous batching | Data parallelism | Distributed Systems | GRPC | Go

Employee assistance program | Employee stock purchase program | Equity compensation | Flexible time off | LinkedIn Learning access

Senior-level Full Time

Denver R

9d ago

Senior Engineer, Inference Data Plane USD 139K-174K

Continuous batching | Data parallelism | Distributed Systems | GRPC | Go

Conference reimbursement | Employee assistance program | Employee stock purchase program | Equity compensation | Flexible time off

Senior-level Full Time

Boston R

9d ago

Senior Software Engineer I - AI Inference Data Plane USD 139K-174K

Autoscaling | Continuous batching | Data parallelism | Distributed Systems | GRPC

Conference reimbursement | Employee assistance program | Employee stock purchase program | Equity compensation | Flexible time off

Senior-level Full Time

San Francisco R

9d ago

ASIC Architect, Principal (AI Inference) USD 200K-300K

AI accelerator | AI accelerator design | ASIC architecture | Accelerator design | Attention Mechanisms

401k with company matching | Company paid life and disability coverage | Company paid medical dental and vision insurance | Company provided computer and home office setup | Paid Company Holidays

Senior-level Full Time

Remote (United States); Canada R

10d ago

NLP Performance Engineer GBP 100K-140K

Benchmarking | CUDA | KV cache | LLM Inference | Language Models

Annual leave | Barista service | Company pension | Cycle to work scheme | Healthcare

Mid-level Full Time

London, United Kingdom

10d ago

Senior Machine Learning Engineer, LLM Inference Optimization USD 195K-262K

AWQ | Benchmarking | CUDA | CUDA kernels | Chunked prefill

401k match | Career growth | Collaborative culture | Disability insurance | Flexibility

Senior-level Full Time

Palo Alto, California, United States

10d ago

Senior Applied Scientist, Efficient LLM Inference & Model Optimization USD 195K-262K

CUDA | Cache Compression | Co Optimization | Decoding algorithms | Evaluation methodology

401k plan | Career growth and learning opportunities | Collaborative culture | Disability insurance | Flexible work arrangements

Senior-level Full Time

Palo Alto, California, United States

10d ago

Software Technical Leader - GenAI Gateway USD 56K-65K

CUDA | Concurrency | GPU | GRPC | Go

Hybrid work

Senior-level Full Time

Buenos Aires,Argentina

11d ago

Research Scientist / Engineer – Reinforcement Learning Infrastructure USD 200K-300K

Asynchronous training | Containerization | Curriculum learning | Distributed Systems | Distributed Training

Senior-level Full Time

SF Bay Area, CA, Remote, US, … R

12d ago

Software Engineer III - Data Analytics Platform GBP 72K-104K

API Development | Autoscaling | Backward Compatibility | Benchmarking | CI/CD

Senior-level Full Time

LONDON, United Kingdom

12d ago

Machine Learning Engineer — Inference Optimization USD 170K-287K

BF16 | CUDA | Deep learning | Distributed Systems | FP16

Senior-level Full Time

Remote (world) R

12d ago

Senior Inference Runtime Engineer TWD 1900K-2500K

CUDA | CUDA profiling | Continuous batching | Distributed inference | GPU Memory Optimization

Flexible work culture | Inclusive environment | Training and mentoring

Senior-level Full Time

Singapore, SG / Penang, MY / …

13d ago

AI Inference Engineer GBP 80K-106K

Autoscaling | Batching | CUDA | Cache Management | Capacity Planning

Biannual bonus | Breakfast allowance | Dinner allowance | Equity sign-on bonus | Expensed technology

Mid-level Full Time

London, England, United Kingdom - Remote R

13d ago

Research Engineer, Algorithms USD 300K-400K

Analog computation | Attention Mechanism | Efficient Attention | Inference Optimization | KV cache

Mid-level Full Time

New York City

14d ago

Senior Software Engineer I, Inference USD 139K-204K

Adaptive scheduling | Autoscaling | BF16 | C++ | CI/CD

401k match | Dental insurance | Disability insurance | Employee stock purchase program | Flexible PTO

Senior-level Full Time

Sunnyvale, CA / Bellevue, WA

16d ago

Senior Director, Inference Products and Optimizations USD 274K-343K

AI workload | AI workload orchestration | AMD GPU | CUDA | Container Runtime

Conference reimbursement | Education reimbursement | Employee assistance program | Employee stock purchase program | Equity compensation

Senior-level Full Time

Seattle

16d ago

Senior Director, Inference Products and Optimizations USD 274K-343K

AI workload | AI workload orchestration | AMD | Benchmarking | Call Management

Employee assistance program | Flexible time off | LinkedIn Learning

Senior-level Full Time

San Francisco R

16d ago

模型推理系统工程师 CNY 216K-420K

Batching | C++ | CUDA | CUDA graph | FP8

None Full Time

上海

17d ago

大模型推理架构师 CNY 144K-240K

Ascend C | C plus plus | C# | CUDA | CUDA kernel

Senior-level Full Time

上海

17d ago

Applied AI Research Engineer USD 140K-200K

GPU Programming | KV cache | Model Inference | PyTorch | Quantization

Conference events | Dental insurance | Fitness stipend | Health insurance | Learning budget

Mid-level Full Time

New York, NY, US / San … R

18d ago

Cloud Inference Engineer USD 158K-273K

Autoscaling | Batch Processing | Cache Management | Distributed Systems | Go

401k matching | Flexible paid time off | Health insurance | Remote work option | Team events onsite and meetups

Mid-level Full Time

United States / Canada

18d ago

Senior Staff / Principal Machine Learning Scientist, AI Inference & Optimization USD 182K-260K

Attention | Batching | C++ | CoreML | Fine Tuning

Senior-level Full Time

Santa Clara, California, United States

19d ago

Systems Research Engineer GBP 74K-121K

AI infrastructure | C++ | Data Locality | Distributed Systems | Fault Tolerance

None Full Time

Edinburgh, United Kingdom

21d ago

Senior Technical Program Manager (Engineering) - AI Tooling & Systems USD 152K-190K

A/B | A/B Testing | AWS SageMaker | Azure Machine Learning | B testing

Senior-level Full Time

USA | Remote R

24d ago

大模型推理与部署优化工程师 CNY 240K-480K

Ascend 910B | Ascend CANN | Autoscaling | CUDA | Continuous batching

Entry-level Full Time

北京

24d ago

AI Engineer (Managed Services) SGD 85K-138K

A/B | A/B Testing | AWQ | AutoAWQ | B testing

Mid-level Full Time

Singapore

24d ago

Tech Lead Manager, Inference USD 207K-300K

Autoscaling | Cache Management | Caching | Continuous batching | Deployment Pipelines

Senior-level Full Time

SF Bay Area, CA

26d ago

Lead Software Engineer – LLM Ops Platform Reliability GBP 80K-110K

AWS | Alertmanager | Amazon EKS | Amazon SageMaker | Autoscaling

Senior-level Full Time

GLASGOW, LANARKSHIRE, United Kingdom

27d ago

LLM Engineer (Optimization)

Benchmarking | C++ | CUDA | Cache Compression | Inference Optimization

Senior-level Full Time

Pangyo (Software Dream Center), South Korea

27d ago

Senior ML Software Engineer, Data Plane

C# | C++ | CI/CD | CUDA | CUDA kernels

Senior-level Full Time

Tel Aviv-Yafo, Tel Aviv, ISR

27d ago

Member of Technical Staff - Inference Research USD 150K-350K

Autoscaling | Disaggregated Prefill | FP8 | INT4) | KV cache

In-person collaboration

Senior-level Full Time

New York

27d ago

Senior Associate/Manager - Applied AI Engineer, Technology Consulting SGD 162K-220K

ALiBi | AWS | AWS CDK | Attention Mechanisms | Autoscaling

Senior-level Full Time

SG, 048583

30d ago

Software Engineer, Machine Learning Infrastructure - Generative AI USD 137K-299K

APIs | AWQ | AWS | Autoscaling | Backend Development

401k plan | Commuter benefits | Paid parental leave | Paid sick leave | Paid time off

Mid-level Full Time

San Francisco, CA; Sunnyvale, CA; Seattle, …

30d ago

Find jobs in AI/ML, Data Science and Big Data