Find jobs in AI/ML, Data Science and Big Data
53 results
for Speculative decoding
(Skill/Tech stack)
-
Attention Mechanisms | C++ | Decoder Only | Decoder-only Transformer | GPU parallelismComprehensive benefitsSenior-level Full TimeNew York, New York, United States …2d ago
-
AI Performance Optimization Engineer USD 100K-150KBenchmarking | C++ | Communication Primitives | Continuous batching | Distributed TrainingCareer growth potential | Remote workMid-level Full TimeUnited States - Remote R2d ago
-
AI Performance Optimization Engineer USD 100K-150KBenchmarking | C++ | CUDA | Continuous batching | DebuggingMid-level Full TimeUnited States - Remote R2d ago
-
AI Performance Optimization Engineer USD 100K-150KC++ | CUDA | Continuous batching | Cutlass | Deep learningCareer growth | Health benefits | Remote workMid-level Full TimeUnited States - Remote R2d ago
-
AI Performance Optimization Engineer USD 100K-150KC++ | Compiler optimization | Continuous batching | Distributed Training | FSDPMid-level Full TimeUnited States - Remote R2d ago
-
Mid-level Full TimeSeattle (WA), United States2d ago
-
Senior Quantization Engineer - Edge AI Model Optimization INR 3000K-5000KC++ | CNN | Deep learning | Embedded Systems | Generative AISenior-level Full TimeHyderabad, India3d ago
-
AI Performance Optimization Engineer USD 100K-150KAttention Mechanisms | Benchmarking | C++ | Continuous batching | Data pipelineCareer growth | Remote workMid-level Full TimeUnited States - Remote R5d ago
-
AI Performance Optimization Engineer USD 100K-150KBenchmarking | C++ | Compiler optimization | Continuous batching | CutlassBenefits | Full-time employment | Remote workMid-level Full TimeUnited States - Remote R5d ago
-
Senior Staff Software Engineer, TPU Performance USD 262K-365KAuto sharding | Benchmarking | C++ | Compiler optimization | Computer EngineeringBenefits | Bonus | EquitySenior-level Full TimeSunnyvale, CA, USA5d ago
-
AI Performance Optimization Engineer USD 100K-150KBenchmarking | C plus plus | CUDA | Continuous batching | Distributed TrainingMid-level Full TimeUnited States - Remote R6d ago
-
AI Performance Optimization Engineer USD 100K-150KAccess Optimization | Attention Mechanisms | Benchmarking | C++ | Communication PrimitivesMid-level Full TimeUnited States - Remote R6d ago
-
Attention | Batching | C++ | CUDA | CUDA kernelsBenefits | EquitySenior-level Full TimeUS, CA, Santa Clara, United States7d ago
-
Principal Machine Learning Engineer USD 189K-312KAlgorithm Development | Benchmarking | CPU | Deep learning | GPU401k employer match | Dental insurance | Employee assistance program | Health insurance | Paid parental leaveSenior-level Full TimeBoston, United States R8d ago
-
Senior Machine Learning Engineer CAD 129K-180KDeep learning | Graph theory | Language Processing | Linear Algebra | Machine LearningSenior-level Full TimeToronto - MSO, Canada R8d ago
-
CUDA | CUDA kernels | Compiler optimization | Graph Fusion | High PerformanceMid-level Full TimeSan Jose, California, United States10d ago
-
Senior Solutions Architect - Generative AI INR 2475K-4500KArgo | CI/CD | CUDA | Evaluation | FedRAMPSenior-level Full TimeIndia, Pune13d ago
-
Engineering Manager, Inference Benchmarking — AI Perf USD 224K-356KDCGM | Distributed Systems | GPU Telemetry | GPU observability | HelmSenior-level Full TimeUS, CA, Santa Clara, United States13d ago
-
Data Curation | Deep learning | DeepSpeed | Direct Preference Optimization | EvaluationSenior-level Full TimeSingapore, Singapore15d ago
-
Staff Machine Learning Engineer, ML Infrastructure USD 183K-269KAWS EKS | Amazon IAM | Amazon S3 | Autoscaling | BatchingEmployee resource groups | Free home security system | Free professional monitoring | Hybrid work modelSenior-level Full TimeBoston, MA19d ago
-
Senior-level Full TimeMilpitas, CA, United States19d ago
-
AI/ML ASIC Architect USD 163K-249KARM | ASIC architecture | AXI interconnect | Area Optimization | Attention MechanismsSenior-level Full TimeMilpitas, CA, United States19d ago
-
Sr GenAI Infra Specialist SA, AWS WWSO Startup USD 153K-228KAWS | Amazon EC2 | Amazon EKS | Amazon S3 | Cache optimizationInclusive team culture | Mentorship and career growth | Work-life balanceSenior-level Full TimeNew York, New York, USA21d ago
-
Compute Shaders | Diffusion Models | Distributed inference | Edge Computing | Expert parallelism100 percent remoteSenior-level Full TimeRemote job R21d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 201K-332KComputer Vision | Diffusion Models | Edge Computing | Expert parallelism | Flash AttentionRemote workSenior-level Full TimeRemote job R21d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 201K-332KCompute Shaders | Diffusion Models | Distributed inference | Edge Computing | Expert parallelismEnglish communication support | Remote workSenior-level Full TimeRemote job R21d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 201K-332KDiffusion Models | Distributed Inference Systems | Distributed inference | Edge Computing | Expert parallelismRemote workSenior-level Full TimeRemote job R21d ago
-
Diffusion Models | Distributed Inference Systems | Distributed inference | Expert parallelism | Flash Attention100 percent remote | Worldwide remoteSenior-level Full TimeRemote job R21d ago
-
Staff Software Engineer, Machine Learning, Google Chat USD 207K-300KAgentic Workflows | Caching | Cloud Spanner | Continuous Delivery | Continuous integrationSenior-level Full TimeSunnyvale, CA, USA22d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KCompute Shaders | Diffusion Models | Distributed Inference Systems | Distributed inference | Edge ComputingRemote workSenior-level Full TimeRemote job R23d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KComputer Vision | Deep learning | Diffusion Models | Distributed inference | Edge ComputingRemote workSenior-level Full TimeRemote job R23d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KCompute Shaders | Diffusion Models | Distributed Inference Systems | Distributed inference | Edge ComputingRemote workSenior-level Full TimeRemote job R23d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KDiffusion Models | Distributed Inference Systems | Distributed inference | Edge Computing | Expert parallelismCareer growth | Collaborative research environment | English communication support | Remote work opportunitySenior-level Full TimeRemote job R23d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KCompute Shaders | Custom Compute Shaders | Data Pipelines | Diffusion Models | Distributed Inference SystemsRemote workSenior-level Full TimeRemote job R23d ago
-
Deep learning | GPU clusters | HPC | High Performance | High ThroughputSenior-level Full TimeIsrael, Tel Aviv28d ago
-
Deep learning | Evaluation Pipelines | GPU Cluster | High Performance | High-Performance ComputingSenior-level Full TimeIsrael, Tel Aviv28d ago
-
AI Platform Engineer INR 1500K-2500KAutomated Evaluation | CI/CD | CUDA | Continuous Checkpointing | Continuous batchingMid-level Full TimeBangalore, India28d ago
-
Software Engineering Manager, LLM Training USD 170K-277KCUDA | Containerization | Context Parallelism | Data I/O | Data parallelismEntry-level Full TimeMountain View, CA, United States28d ago
-
AI Platform Engineer INR 1500K-2500KAlerting | CUDA | Cause analysis | Continuous batching | GPU ProfilingMid-level Full TimeBangalore, India29d ago
-
Senior-level Full TimePalo Alto29d ago
-
AWQ | Audio codecs | Audio streaming | Autoscaling | Chunked prefill401k matching | Annual offsites | Dental coverage | Employer-paid training | Healthcare benefitsMid-level Full TimeSan Francisco, CA1mo ago
-
Forward Deployed Engineer (Inference & Post-Training) USD 270K-300KDPO | GRPO | KV cache | LoRA | Pipeline parallelismEquity | Health insurance | Remote work flexibilitySenior-level Full TimeSan Francisco1mo ago
-
Senior-level Full TimeDublin, Ireland1mo ago
-
Principal Model Optimization Engineer USD 295K-345KCUDA | Continuous batching | GPU | LLM Inference | Machine LearningSenior-level Full TimeSan Mateo, CA, United States R1mo ago
-
Applied Scientist, GenAI USD 152K-189KA/B | A/B Testing | AWS | Agent Orchestration | Agent systemsSenior-level Full TimeUS - MA - Wilmington1mo ago
-
AI Engineer - Model Performance USD 165K-250KAttention Backend | Audio Processing | Batching | CUDA | CUDA graphAsync communication | Innovation-focused culture | Remote work | Startup environment | Supportive teamMid-level Full TimeSF Hybrid R1mo ago
-
AI Software Engineer Intern CNY 38K-50KCUDA | Distributed Systems | FP8 | FasterTransformer | Flash AttentionOn-site workEntry-level Full Time InternshipCHN - Minhang, China1mo ago
-
AI Software Engineer Intern CNY 38K-50KCUDA | Compiler optimization | Continuous batching | Distributed Systems | Dynamic batchingOn-site workEntry-level Full Time InternshipCHN - Minhang, China1mo ago
-
Senior ML Ops Engineer - Dallas, TX USD 48K-168KApache Spark | Big Data | CI/CD | Containerization | Data analytics401k retirement plan | Medical, dental, and vision benefits | Paid Holidays | Paid time off | Variable pay/incentivesSenior-level Full TimeUnited States1mo ago
-
Data-Driven Decision Making | Data-driven | Decision Making | Deep learning | Distributed TrainingSenior-level Full TimeSunnyvale, CA1mo ago