Research Engineer - LLM/VLM Inference Optimization (Seed Infra)
San Jose, California, United States
USD 244K-450K Mid-level Full Time
Tasks
- Build compiler level optimized inference pipelines
- Collaborate with teams to improve model toolchains and ecosystem
- Design high performance inference systems for large scale LLMs and VLMs
- Develop CUDA kernels and low precision inference computation
- Develop and optimize inference engines and serving frameworks
- Optimize inference throughput with streaming and speculative decoding
- Perform performance analysis and identify bottlenecks
Perks/Benefits
- N/A
Skills/Tech-stack
CUDA | CUDA kernels | Compiler optimization | Graph Fusion | High Performance | High-Performance Computing | Inference Optimization | Low Precision | Low-precision computing | Parallel Computing | Performance Analysis | Performance Computing | Precision computing | Speculative decoding | Streaming inference
Education
N/A
Related jobs
-
C++ | CI/CD | CUDA | Containerization | Docker401k | Dental insurance | Disability insurance | Life insurance | Medical insuranceSenior-level Full TimeNewark, CA23h ago
-
Senior AI Systems Engineer USD 131K-195KAlerting | Bash | CI/CD | CMMC | Configuration ManagementFully remote | Hybrid work | Onsite workSenior-level Full TimeRaleigh, North Carolina, United States; Albuquerque, … R1d ago
-
3D Graphics | Actuator modeling | Body Dynamics | C# | C++Senior-level Full TimeRedmond, WA, US1d ago
-
Agentic AI Data Engineer USD 130K-148KAPI Gateway | AWS API | AWS API Gateway | AWS Glue | AWS KinesisMid-level Full TimeUnited States1d ago
-
Staff Software Engineer, Embedded Systems/Firmware USD 207K-300KAPI Design | Bare Metal | C# | C++ | DMASenior-level Full TimeSunnyvale, CA, USA1d ago
-
Staff Software Engineer, AI/ML, Google Public Sector USD 207K-300KAccelerator optimization | C++ | Cloud Object Storage | Deep learning | Distributed SystemsSenior-level Full TimeReston, VA, USA; Washington D.C., DC, …1d ago
-
Senior IT Big Data Engineer USD 166K-174KApache Airflow | Data Architecture | Data Governance | Data Lakes | Data Modeling401k matching | Computer reimbursement | Dental insurance | Disability insurance | Employee assistance programSenior-level Full TimeWashington, DC, United States1d ago
-
Applied AI Engineer, Clinical Informatics USD 181K-283KADaM | AWS | Azure | Biomedical Ontology | CDISCSenior-level Full TimeUS: Boston MA Lilly Seaport Innovation …1d ago
-
.NET | Agentic Automation | Automation Framework | C# | Evaluation FrameworksDental insurance | Employee assistance program | Flexible spending account | Generous time-off policies | Health insuranceSenior-level Full TimeAUT01 - Poly West Parmer Lane …1d ago
-
Quantitative Developer (Fintech) USD 100K-150KAudit trails | Backtesting | C++ | Cloud Native | ConcurrencyMid-level Full TimeUnited States - Remote R1d ago
-
Automation | Benchmarking | Compute Throughput | Computer Vision | Data dashboardsMid-level Full TimeAustin, Texas, USA1d ago
-
AI Performance Optimization Engineer USD 100K-150KC++ | Continuous batching | Custom Kernel | Custom kernel development | Cutlass100 percent remote | Benefits package | Full-time employmentMid-level Full TimeUnited States - Remote R1d ago
-
AI Performance Optimization Engineer USD 100K-150KBenchmarking | C++ | Continuous batching | Data loading | Data loading optimizationMid-level Full TimeUnited States - Remote R1d ago
-
AI Performance Optimization Engineer USD 100K-150KC++ | CUDA | Continuous batching | Cutlass | DeepSpeedMid-level Full TimeUnited States - Remote R1d ago
-
AI Performance Optimization Engineer USD 100K-150KBenchmarking | C++ | Continuous batching | Cutlass | DeepSpeedBenefits | Remote workMid-level Full TimeUnited States - Remote R1d ago
-
AI Performance Optimization Engineer USD 100K-150KC++ | CUDA | Continuous batching | Cutlass | DeepSpeedMid-level Full TimeUnited States - Remote R1d ago
-
AI Engineer Associate USD 120K-135KAWS | C plus plus | CUDA | Computer Vision | Convolutional Neural NetworksInternational travelMid-level Full TimeBelmont, CA, US, 940021d ago
-
Open-Source Machine Learning Engineer - US Remote USD 200K-335KAccelerate | Datasets | Deep learning | Distributed Training | Fine TuningConference reimbursement | Flexible paid time off | Flexible working hours | Health, dental, and vision benefits | Parental leaveMid-level Full TimeNew York, New York, United States … R1d ago
-
Associate Principal, AI Engineer USD 187K-281KAI vector search | AKS | API Design | AWS | Accelerator InfrastructureMid-level Full TimeUS - California - San Diego …1d ago
-
Sr. Gaming AI Engineer USD 105K-161KAPI Integration | Batching | C Sharp | C plus plus | CPU SchedulingDental insurance | Employee assistance program | Flexible paid vacation and sick leave | Flexible spending account | Generous time offSenior-level Full TimeTEX01 - Houston, Texas (TEX01), United …1d ago
-
Software Engineer, ML Performance Optimization USD 185K-260KC++ | CUDA | Distributed Training | GPU | Model CompressionMid-level Full TimeFoster City, CA1d ago
-
Customer Support Engineer (Inference) USD 160K-230KAnsible | Artificial Intelligence | Cluster administration | Compute Cluster | Compute Cluster AdministrationHealth insurance | Remote work flexibility | Startup equityMid-level Full TimeSan Francisco, CA2d ago
-
Senior, ML Engineer - Neural Rendering USD 177K-234K3D Gaussian Splatting | 3D Reconstruction | CUDA | Computer Graphics | Computer Vision401k employer match | Disability insurance | Flexible schedule | Life insurance | Paid medical dental and visionSenior-level Full TimeRemote - US, Ann Arbor, MI R2d ago
-
Machine Learning Engineer, App SW USD 283K-381KAutonomous Driving | C++ | CUDA | Closed Loop | Closed loop controlFlexible schedule | Hybrid work | Mentorship | Work from home optionsSenior-level Full TimeDetroit2d ago
-
Software Engineer, Embedded Systems Security, Silicon USD 147K-211KAndroid system | Android system architecture | C# | C++ | Consumption analysisMid-level Full TimeMountain View, CA, USA; San Diego, …2d ago