Find jobs in AI/ML, Data Science and Big Data
53 results
for Speculative decoding
(Skill/Tech stack)
-
Compute Shaders | Diffusion Models | Distributed inference | Edge Computing | Expert parallelism100 percent remoteSenior-level Full TimeRemote job R1d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 201K-332KComputer Vision | Diffusion Models | Edge Computing | Expert parallelism | Flash AttentionRemote workSenior-level Full TimeRemote job R1d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 201K-332KCompute Shaders | Diffusion Models | Distributed inference | Edge Computing | Expert parallelismEnglish communication support | Remote workSenior-level Full TimeRemote job R1d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 201K-332KDiffusion Models | Distributed Inference Systems | Distributed inference | Edge Computing | Expert parallelismRemote workSenior-level Full TimeRemote job R1d ago
-
Diffusion Models | Distributed Inference Systems | Distributed inference | Expert parallelism | Flash Attention100 percent remote | Worldwide remoteSenior-level Full TimeRemote job R1d ago
-
Staff Software Engineer, Machine Learning, Google Chat USD 207K-300KAgentic Workflows | Caching | Cloud Spanner | Continuous Delivery | Continuous integrationSenior-level Full TimeSunnyvale, CA, USA2d ago
-
AI Performance Optimization Engineer USD 136K-258KC++ | Continuous batching | Deep learning | Distributed Systems | FSDPMid-level Full TimeUnited States - Remote R3d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KDiffusion Models | Distributed Inference Systems | Distributed inference | Expert parallelism | Flash AttentionEnglish support | Remote workSenior-level Full TimeRemote job R3d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KCompute Shaders | Diffusion Models | Distributed Inference Systems | Distributed inference | Edge ComputingRemote workSenior-level Full TimeRemote job R3d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KComputer Vision | Deep learning | Diffusion Models | Distributed inference | Edge ComputingRemote workSenior-level Full TimeRemote job R3d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KCompute Shaders | Diffusion Models | Distributed Inference Systems | Distributed inference | Edge ComputingRemote workSenior-level Full TimeRemote job R3d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KDiffusion Models | Distributed Inference Systems | Distributed inference | Edge Computing | Expert parallelismCareer growth | Collaborative research environment | English communication support | Remote work opportunitySenior-level Full TimeRemote job R3d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KCompute Shaders | Custom Compute Shaders | Data Pipelines | Diffusion Models | Distributed Inference SystemsRemote workSenior-level Full TimeRemote job R3d ago
-
AI Performance Optimization Engineer USD 136K-258KC++ | Cache optimization | Continuous batching | Cutlass | Deep learningMid-level Full TimeUnited States - Remote R4d ago
-
AI Performance Optimization Engineer USD 136K-258KAccess patterns | Benchmarking | C++ | Cache optimization | Compiler optimizationFull-time W2 employment | Health benefits | Remote workMid-level Full TimeUnited States - Remote R4d ago
-
AI Performance Optimization Engineer USD 159K-264KC++ | Continuous batching | Cutlass | Deep learning | DeepSpeedRemote workMid-level Full TimeUnited States - Remote R4d ago
-
AI Performance Optimization Engineer USD 136K-258KBenchmarking | C++ | Compiler optimization | Continuous batching | DebuggingMid-level Full TimeUnited States - Remote R4d ago
-
AI Performance Optimization Engineer USD 136K-258KAccess Optimization | Attention Optimization | Benchmarking | C++ | Compiler optimizationMid-level Full TimeUnited States - Remote R5d ago
-
Mid-level Full TimeSeattle (WA), United States6d ago
-
Deep learning | GPU clusters | HPC | High Performance | High ThroughputSenior-level Full TimeIsrael, Tel Aviv8d ago
-
Deep learning | Evaluation Pipelines | GPU Cluster | High Performance | High-Performance ComputingSenior-level Full TimeIsrael, Tel Aviv8d ago
-
AI Platform Engineer INR 1500K-2500KAutomated Evaluation | CI/CD | CUDA | Continuous Checkpointing | Continuous batchingMid-level Full TimeBangalore, India8d ago
-
Software Engineering Manager, LLM Training USD 170K-277KCUDA | Containerization | Context Parallelism | Data I/O | Data parallelismEntry-level Full TimeMountain View, CA, United States8d ago
-
AI Platform Engineer INR 1500K-2500KAlerting | CUDA | Cause analysis | Continuous batching | GPU ProfilingMid-level Full TimeBangalore, India9d ago
-
Senior-level Full TimePalo Alto9d ago
-
AWQ | Audio codecs | Audio streaming | Autoscaling | Chunked prefill401k matching | Annual offsites | Dental coverage | Employer-paid training | Healthcare benefitsMid-level Full TimeSan Francisco, CA12d ago
-
Forward Deployed Engineer (Inference & Post-Training) USD 270K-300KDPO | GRPO | KV cache | LoRA | Pipeline parallelismEquity | Health insurance | Remote work flexibilitySenior-level Full TimeSan Francisco13d ago
-
Senior-level Full TimeDublin, Ireland15d ago
-
Principal Model Optimization Engineer USD 295K-345KCUDA | Continuous batching | GPU | LLM Inference | Machine LearningSenior-level Full TimeSan Mateo, CA, United States R15d ago
-
Applied Scientist, GenAI USD 152K-189KA/B | A/B Testing | AWS | Agent Orchestration | Agent systemsSenior-level Full TimeUS - MA - Wilmington15d ago
-
AI Engineer - Model Performance USD 165K-250KAttention Backend | Audio Processing | Batching | CUDA | CUDA graphAsync communication | Innovation-focused culture | Remote work | Startup environment | Supportive teamMid-level Full TimeSF Hybrid R20d ago
-
AI Software Engineer Intern CNY 38K-50KCUDA | Distributed Systems | FP8 | FasterTransformer | Flash AttentionOn-site workEntry-level Full Time InternshipCHN - Minhang, China22d ago
-
AI Software Engineer Intern CNY 38K-50KCUDA | Compiler optimization | Continuous batching | Distributed Systems | Dynamic batchingOn-site workEntry-level Full Time InternshipCHN - Minhang, China22d ago
-
Senior ML Ops Engineer - Dallas, TX USD 48K-168KApache Spark | Big Data | CI/CD | Containerization | Data analytics401k retirement plan | Medical, dental, and vision benefits | Paid Holidays | Paid time off | Variable pay/incentivesSenior-level Full TimeUnited States25d ago
-
Data-Driven Decision Making | Data-driven | Decision Making | Deep learning | Distributed TrainingSenior-level Full TimeSunnyvale, CA29d ago
-
Senior DL Software Engineer, Model Optimization and Edge Deployment - Autonomous Vehicles USD 184K-356KC++ | CUDA | Cutlass | Efficient Attention | GPU ArchitectureSenior-level Full TimeUS, CA, Santa Clara, United States30d ago
-
CUDA | Compiler optimization | Graph optimization | High concurrency | Low-precision computingMid-level Full TimeSan Jose, California, United States1mo ago
-
CUDA | CUDA kernel | Compiler optimization | Deployment Pipelines | Graph FusionMid-level Full TimeSeattle, Washington, United States1mo ago
-
Senior-level Full TimeDublin, Ireland1mo ago
-
LLM Inference Performance & Evals Engineer CAD 142K-195KAttention Mechanisms | C# | C++ | Compiler optimization | DebuggingJob stability | Open source collaboration | Research publicationsMid-level Full TimeToronto, Ontario, Canada1mo ago
-
AI Inference Engineer - Model Optimization & Deployment USD 205K-303KAccuracy evaluation | BF16 | C++ | CUDA | CUDA kernelsSenior-level Full TimeFoster City, CA1mo ago
-
ML Research Engineer (Inference) INR 120K-180KC++ | Deep learning | Generative AI | Hugging Face | Hugging Face TransformersEntry-level Full TimeBengaluru, Karnataka, India1mo ago
-
Senior Software Engineer, LLM Performance USD 180K-339KC++ | CUDA | Cutlass | FlashAttention | FlashInferSenior-level Full TimeSF Bay Area (Hybrid) R1mo ago
-
Research Engineer - Ads Integrity USD 136K-205KAI Safety | AIGC | Deepfake Synthesis | Deepfake detection | Generative AICareer growth opportunities | Flexible work culture | Opportunities for open-source contributions | Opportunities for publicationMid-level Full TimeSan Jose, California, United States1mo ago
-
Senior Member of Technical Staff: ML Systems and Infrastructure INR 2500K-4000KArgo Workflows | ArgoCD | CI/CD | CUDA | GitHub ActionsSenior-level Full TimeBangalore, India1mo ago
-
Senior-level Full TimeDoha Municipality, Doha, Qatar1mo ago
-
Machine Learning Engineer CAD 128K-192KDeep learning | Graph theory | Inference Optimization | LLM Inference | LLM Inference OptimizationMid-level Full TimeToronto - MSO, Canada R1mo ago
-
Principal Machine Learning Engineer USD 32K-32KCI/CD | Cloud Platforms | Containerization | Distributed Training | DockerBirthday celebrations | Company lunches | Dental insurance | Flexible working hours | Generous holiday allowanceSenior-level Full TimeLondon, England, United Kingdom1mo ago
-
Senior Machine Learning Engineer USD 32K-32KDistributed Training | Dynamic batching | Flash Attention | Inference Optimization | Machine Learning401k matching | Adoption Assistance | Birthday celebrations | Company lunches | Dental coverageSenior-level Full TimeLondon, England, United Kingdom1mo ago
-
Continuous batching | Jupyter | KV cache | Low Latency | Machine LearningDaily meals | Housing subsidy | Medical, dental & vision coverage | Relocation supportMid-level Full TimeCupertino, CA1mo ago