Find jobs in AI/ML, Data Science and Big Data
44 results
for KV cache
(Skill/Tech stack)
-
AI Platform Engineer INR 1500K-2500KAlerting | CUDA | Cause analysis | Continuous batching | GPU ProfilingMid-level Full TimeBangalore, India21h ago
-
Tech Lead — ASR / TTS / Speech LLM (IC + Mentor) INR 2500K-4800KAutoscaling | Batching | CTC | Caching | DERSenior-level Full TimeBengaluru2d ago
-
AWQ | Audio codecs | Audio streaming | Autoscaling | Chunked prefill401k matching | Annual offsites | Dental coverage | Employer-paid training | Healthcare benefitsMid-level Full TimeSan Francisco, CA4d ago
-
Forward Deployed Engineer (Inference & Post-Training) USD 270K-300KDPO | GRPO | KV cache | LoRA | Pipeline parallelismEquity | Health insurance | Remote work flexibilitySenior-level Full TimeSan Francisco4d ago
-
AI Architect Lead (Hybrid-within BankUnited's footprint) USD 130K-200KAI Foundry | AWS | AWS Bedrock | Architecture Review | Architecture Review BoardSenior-level Full TimeMiami Lakes, FL, United States R6d ago
-
Senior Data Scientist USD 200K-225KAWS | Airflow | ArgoCD | Batching | CUDA401k match | Dental insurance | FSA options | Flexible PTO | Free foodSenior-level Full TimeMarina del Rey, CA6d ago
-
Senior Staff Engineer - AI Data Path USD 180K-287KAutomation | Benchmarking | C++ | CPU GPU | CPU GPU data movementSenior-level Full TimeRemote, United States R6d ago
-
Senior-level Full TimeMaslak, Sarıyer, Turkey6d ago
-
ADaM | Agent systems | Context modeling | Function Calling | Gradient descentMid-level Full TimeMontreal, Quebec, Canada7d ago
-
Distinguished Engineer, Machine Learning USD 147K-299KArtificial Intelligence | Cache optimization | Inference Optimization | KV cache | KV cache optimizationSenior-level Full TimeSanta Clara, California, United States7d ago
-
Principal Software Engineer - AI/ML (Ireland) EUR 58K-86KC++ | CUDA | Cache Management | Cloud Native | Data-Driven OptimizationSenior-level Full TimeRemote Ireland R8d ago
-
AI Engineer - Model Performance USD 165K-250KAttention Backend | Audio Processing | Batching | CUDA | CUDA graphAsync communication | Innovation-focused culture | Remote work | Startup environment | Supportive teamMid-level Full TimeSF Hybrid R12d ago
-
Software Engineer, Inference AI/ML USD 92K-135KAlgorithms | C++ | CI/CD | CUDA | Caching401k match | Company paid life insurance | Flexible PTO | Health savings account | Medical, dental, and vision insuranceEntry-level Full TimeSunnyvale, CA / Bellevue, WA13d ago
-
Research Scientist - AI Compute & DPU - Global Frontier Tech Recruitment Program - 2027 Start (PhD) USD 212K-387KAI Agent | AI agent workflows | Agent workflows | CPU Scheduling | Cause analysisNone Full TimeSan Jose, California, United States13d ago
-
AI Software Engineer Intern CNY 38K-50KCUDA | Distributed Systems | FP8 | FasterTransformer | Flash AttentionOn-site workEntry-level Full Time InternshipCHN - Minhang, China13d ago
-
AI Software Engineer Intern CNY 38K-50KCUDA | Compiler optimization | Continuous batching | Distributed Systems | Dynamic batchingOn-site workEntry-level Full Time InternshipCHN - Minhang, China13d ago
-
AI Software Engineer Intern CNY 28K-50KAWQ | Cache optimization | DINOv2 | DeepSpeed | Diffusion ModelsEntry-level Full Time InternshipCHN - Minhang, China13d ago
-
Entry-level Full Time InternshipCHN - Minhang, China13d ago
-
Machine Learning Engineer 5 - Globalization USD 466K-750KBatching | Data Pipelines | Distributed Training | GPU Optimization | Inference Optimization401k matching | Disability coverage | Flexible spending account | Flexible time off | Health insuranceSenior-level Full TimeUSA - Remote, United States R13d ago
-
AI Software Engineer USD 151K-332KC++ | CUDA | CUDA kernels | Continuous batching | FP8Hybrid work | In-person work | Remote work | Work-life balanceMid-level Full TimeSeattle (WA), United States14d ago
-
Agent systems | Attention Mechanism | CPU | Continuous Improvement | DPODental insurance | Employee assistance program | Flexible Paid Vacation | Flexible paid sick leave | Flexible spending accountSenior-level Full TimePalo Alto, CA21d ago
-
Senior DL Software Engineer, Model Optimization and Edge Deployment - Autonomous Vehicles USD 184K-356KC++ | CUDA | Cutlass | Efficient Attention | GPU ArchitectureSenior-level Full TimeUS, CA, Santa Clara, United States21d ago
-
Sr. AI Inference Systems Engineer USD 120K-225KCUDA | Distributed Systems | Hardware Accelerators | Inference Optimization | Instruction set401k | Dental insurance | Disability insurance | Health insurance | Life insuranceSenior-level Full TimeUS-California-Palo Alto, United States22d ago
-
LLM Algorithm Engineer CNY 38K-50KAPI Development | Agent systems | Attention Mechanisms | Autogen | CUDAEnglish courses | Meal allowance | Online learning access | Onsite gym | Onsite massagesMid-level Full TimeChang Sha Shi, China23d ago
-
LLM Inference Performance & Evals Engineer CAD 142K-195KAttention Mechanisms | C# | C++ | Compiler optimization | DebuggingJob stability | Open source collaboration | Research publicationsMid-level Full TimeToronto, Ontario, Canada26d ago
-
Staff Engineer - AI Development INR 2400K-3880KAutomation | Benchmarking | C plus plus | CPU GPU data movement | Cache ManagementSenior-level Full TimePune, MH, India27d ago
-
Senior-level Full TimeRemote, United States R28d ago
-
Senior Software Engineer - AI Inference USD 152K-287KBatching | C++ | CUDA | Concurrency | Distributed SystemsBenefits | EquitySenior-level Full TimeUS, CA, Santa Clara, United States28d ago
-
Senior Performance Analyst, Inference USD 175K-260KAttention Mechanism | CUDA | Flash Attention | GPU kernel optimization | KV cacheSenior-level Full TimeSunnyvale, CA29d ago
-
AI Inference Engineer - Model Optimization & Deployment USD 205K-303KAccuracy evaluation | BF16 | C++ | CUDA | CUDA kernelsSenior-level Full TimeFoster City, CA1mo ago
-
Senior Software Engineer, AI Inference CAD 135K-220KC++ | Chunked prefill | Continuous batching | Cutlass | DockerSenior-level Full TimeCanada, Toronto1mo ago
-
Senior Product Manager, AI Inference - Dynamo USD 208K-327KAgentic AI | Artificial Intelligence | Cache Management | Data-driven | Data-driven project managementSenior-level Full TimeUS, CA, Santa Clara, United States1mo ago
-
Senior Solutions Architect - KV Cache and AI Storage CNY 460K-600KBluefield | CMX | Caching | Cassandra | CephSenior-level Full TimeChina, Beijing1mo ago
-
Solutions Architect - Top AI Labs CNY 435K-500KArtificial Intelligence | C++ | Computer Systems | Data Structures | Distributed ComputingSenior-level Full TimeChina, Beijing1mo ago
-
Agentic Inference | CUDA | Distributed Training | Docker | GPU ComputingSenior-level Full TimeChina, Beijing1mo ago
-
Senior Deep Learning Solution Architect CNY 367K-490KC++ | Caching | Computer Architecture | Data Structures | Data transferSenior-level Full TimeChina, Beijing1mo ago
-
API Gateway | C++ | Cilium | Distributed tracing | EnvoyMedical/Dental/Vision insurance | Paid parental leave | Paid time off | Retirement 401k matchSenior-level Full TimeBoston, United States R1mo ago
-
Continuous batching | Jupyter | KV cache | Low Latency | Machine LearningDaily meals | Housing subsidy | Medical, dental & vision coverage | Relocation supportMid-level Full TimeCupertino, CA1mo ago
-
Entry-level Internship上海1mo ago
-
Inference Software Engineer USD 150K-275KC++ | CUDA | Continuous batching | Distributed Systems | KV cacheDaily meals | Housing subsidy | Medical, dental & vision coverage | Relocation supportSenior-level Full TimeCupertino, CA1mo ago
-
Machine Learning Research Engineer USD 150K-275KCUDA | Deep learning | Distributed Training | Distributed inference | Inference OptimizationDaily meals | Housing subsidy | Medical, dental & vision coverage | Relocation supportSenior-level Full TimeCupertino, CA1mo ago
-
Senior Software Engineer II, Inference USD 165K-242KAutoscaling | BF16 | C++ | CI/CD | CUDA401k match | Employee stock purchase program | Flexible PTO | Flexible spending account | Health savings accountSenior-level Full TimeSunnyvale, CA / Bellevue, WA1mo ago
-
Senior Engineer 2: Inference Data Plane USD 167K-209KAI | Databases | Distributed Systems | GPU Optimization | GRPCBenefits support | Educational courses | Equity compensation | Flexible time off | Reimbursement for trainingSenior-level Full TimeSan Francisco R1mo ago
-
Senior Engineer 2: Inference Data Plane USD 167K-209KAI | Continuous batching | Data parallelism | Databases | Distributed SystemsEmployee assistance program | Equity compensation | Flexible time off | Learning & development resources | Local Employee MeetupsSenior-level Full TimeAustin R1mo ago