Find jobs in AI/ML, Data Science and Big Data
33 results
for Speculative decoding
(Skill/Tech stack)
-
AI Software Engineer Intern CNY 38K-50KCUDA | Distributed Systems | FP8 | FasterTransformer | Flash AttentionOn-site workEntry-level Full Time InternshipCHN - Minhang, China1d ago
-
AI Software Engineer Intern CNY 38K-50KCUDA | Compiler optimization | Continuous batching | Distributed Systems | Dynamic batchingOn-site workEntry-level Full Time InternshipCHN - Minhang, China1d ago
-
AI Software Engineer Intern CNY 28K-50KAWQ | Cache optimization | DINOv2 | DeepSpeed | Diffusion ModelsEntry-level Full Time InternshipCHN - Minhang, China1d ago
-
AI Software Engineer USD 151K-332KC++ | CUDA | CUDA kernels | Continuous batching | FP8Hybrid work | In-person work | Remote work | Work-life balanceMid-level Full TimeSeattle (WA), United States2d ago
-
Senior ML Ops Engineer - Dallas, TX USD 48K-168KApache Spark | Big Data | CI/CD | Containerization | Data analytics401k retirement plan | Medical, dental, and vision benefits | Paid Holidays | Paid time off | Variable pay/incentivesSenior-level Full TimeUnited States5d ago
-
Agentic AI | Edge AI | Executorch | Git | Knowledge DistillationEntry-level Full Time InternshipEindhoven, Netherlands7d ago
-
Data-Driven Decision Making | Data-driven | Decision Making | Deep learning | Distributed TrainingSenior-level Full TimeSunnyvale, CA9d ago
-
Senior DL Software Engineer, Model Optimization and Edge Deployment - Autonomous Vehicles USD 184K-356KC++ | CUDA | Cutlass | Efficient Attention | GPU ArchitectureSenior-level Full TimeUS, CA, Santa Clara, United States9d ago
-
CUDA | Compiler optimization | Graph optimization | High concurrency | Low-precision computingMid-level Full TimeSan Jose, California, United States12d ago
-
CUDA | CUDA kernel | Compiler optimization | Deployment Pipelines | Graph FusionMid-level Full TimeSeattle, Washington, United States13d ago
-
Senior-level Full TimeDublin, Ireland14d ago
-
LLM Inference Performance & Evals Engineer CAD 142K-195KAttention Mechanisms | C# | C++ | Compiler optimization | DebuggingJob stability | Open source collaboration | Research publicationsMid-level Full TimeToronto, Ontario, Canada15d ago
-
AI Inference Engineer - Model Optimization & Deployment USD 205K-303KAccuracy evaluation | BF16 | C++ | CUDA | CUDA kernelsSenior-level Full TimeFoster City, CA19d ago
-
AI Platform Engineer INR 1500K-2000KAlerting | CUDA | Capacity Planning | Continuous batching | Distributed tracingMid-level Full TimeBangalore, India21d ago
-
ML Research Engineer (Inference) INR 120K-180KC++ | Deep learning | Generative AI | Hugging Face | Hugging Face TransformersEntry-level Full TimeBengaluru, Karnataka, India22d ago
-
Senior Software Engineer, LLM Performance USD 180K-339KC++ | CUDA | Cutlass | FlashAttention | FlashInferSenior-level Full TimeSF Bay Area (Hybrid) R23d ago
-
Research Engineer - Ads Integrity USD 136K-205KAI Safety | AIGC | Deepfake Synthesis | Deepfake detection | Generative AICareer growth opportunities | Flexible work culture | Opportunities for open-source contributions | Opportunities for publicationMid-level Full TimeSan Jose, California, United States28d ago
-
Senior Member of Technical Staff: ML Systems and Infrastructure INR 2500K-4000KArgo Workflows | ArgoCD | CI/CD | CUDA | GitHub ActionsSenior-level Full TimeBangalore, India28d ago
-
Senior-level Full TimeDoha Municipality, Doha, Qatar29d ago
-
Machine Learning Engineer CAD 128K-192KDeep learning | Graph theory | Inference Optimization | LLM Inference | LLM Inference OptimizationMid-level Full TimeToronto - MSO, Canada R29d ago
-
Principal Machine Learning Engineer USD 32K-32KCI/CD | Cloud Platforms | Containerization | Distributed Training | DockerBirthday celebrations | Company lunches | Dental insurance | Flexible working hours | Generous holiday allowanceSenior-level Full TimeLondon, England, United Kingdom30d ago
-
Senior Machine Learning Engineer USD 32K-32KDistributed Training | Dynamic batching | Flash Attention | Inference Optimization | Machine Learning401k matching | Adoption Assistance | Birthday celebrations | Company lunches | Dental coverageSenior-level Full TimeLondon, England, United Kingdom1mo ago
-
Continuous batching | Jupyter | KV cache | Low Latency | Machine LearningDaily meals | Housing subsidy | Medical, dental & vision coverage | Relocation supportMid-level Full TimeCupertino, CA1mo ago
-
Miclaw-大模型训练推理方向实习生 CNY 25K-37KAttention Mechanism | C++ | CUDA | Compiler optimization | FlashAttentionEntry-level Internship北京1mo ago
-
Entry-level Internship北京1mo ago
-
Inference Software Engineer USD 150K-275KC++ | CUDA | Continuous batching | Distributed Systems | KV cacheDaily meals | Housing subsidy | Medical, dental & vision coverage | Relocation supportSenior-level Full TimeCupertino, CA1mo ago
-
Machine Learning Research Engineer USD 150K-275KCUDA | Deep learning | Distributed Training | Distributed inference | Inference OptimizationDaily meals | Housing subsidy | Medical, dental & vision coverage | Relocation supportSenior-level Full TimeCupertino, CA1mo ago
-
Software Engineering Manager, LLM Training USD 170K-277KCUDA | CUDA profiling | Containerization | Context Parallelism | Data I/OHealth and wellness programs | Hybrid work | Time away from workEntry-level Full TimeMountain View, CA, United States1mo ago
-
Senior Software Engineer II, Inference USD 165K-242KAutoscaling | BF16 | C++ | CI/CD | CUDA401k match | Employee stock purchase program | Flexible PTO | Flexible spending account | Health savings accountSenior-level Full TimeSunnyvale, CA / Bellevue, WA1mo ago
-
Software Engineering Manager, LLM Training USD 170K-277KCUDA | Containerization | Data parallelism | Distributed Systems | DockerFlexible-hybrid work | Health and wellness programs | Time offEntry-level Full TimeMountain View, CA, United States1mo ago
-
4-bit | C plus plus | C++14 | C++17 | CI/CDSenior-level Full TimeGermany, Munich1mo ago
-
Member of Technical Staff - Inference USD 200K-300KAWS | Ansible | Benchmarks | C++ | CUDACompetitive compensation | Conference attendance | Equity incentives | Flexible work | Professional developmentSenior-level Full TimeRemote R1mo ago
-
Software Engineer, Inference Platform USD 200K-250KCUDA | Distributed Systems | Expert parallelism | GPU Compute | GPU OptimizationDental insurance | Equity | Health insurance | PTO policy | Retirement planMid-level Full TimeSan Francisco, CA1mo ago