Machine Learning Engineer, Inference & Serving (Speech LLM) - San Francisco
Tasks
- Apply model compression and quantization
- Build inference engines for large language models and speech models
- Collaborate with training and backend teams
- Deploy GPU accelerated inference pipelines
- Implement continuous batching
- Integrate real time audio streaming
- Manage KV cache and stateful connections
- Optimize latency and throughput
- Support distributed multi GPU inference with autoscaling
Perks/Benefits
- 401k matching
- Annual offsites
- Dental coverage
- Employer-paid training
- Healthcare benefits
- Hybrid work
- New parent leave
- Paid Holidays
- Unlimited PTO
- Vision coverage
Skills/Tech-stack
AWQ | Audio codecs | Audio streaming | Autoscaling | Chunked prefill | Continuous batching | Distributed Systems | FP8 | GPTQ | GPU Architecture | INT8 | Inference | Inference Server | KV cache | Kubernetes | Language Models | Large Language Models | Latency optimization | Lookahead Decoding | Machine Learning | NVIDIA CUDA | NVIDIA Triton | NVIDIA Triton Inference | NVIDIA Triton Inference Server | Neural audio codecs | PagedAttention | Post-training | Post-training Quantization | SGLang | Speculative decoding | Speech Processing | Tensor Parallelism | TensorRT | TensorRT-LLM | Throughput Optimization | Time To First Audio | Time To First Token | Triton Inference Server | VLLM | WebRTC | WebSockets
Education
N/A
Roles
AI | AI Engineer | Engineer | Learning Engineer | Machine Learning Engineer
Regions
Countries
States
Related jobs
-
AI Engineer USD 109K-140KAgent Orchestration | Computer Vision | Data Validation | Data extraction | Document ClassificationMid-level Full TimeMorristown, NJ, United States5h ago
-
Alerting | Ansible | Bash | CI/CD | CephRemote workSenior-level Full TimeUnited States, United States R7h ago
-
Ansible | Bash | CI/CD | CentOS | CephContract-to-hire | No sponsorship | Remote workSenior-level Full TimeUnited States, United States R7h ago
-
AI Intern USD 50K-58KAccess Control | Artificial Intelligence | Cybersecurity | Data Privacy | Help deskEntry-level InternshipYork, United States9h ago
-
Machine Learning Engineer USD 131K-178KAWS | Cassandra | Convolutional Neural Networks | Data Lakes | Data PipelinesMid-level Full TimeRemote, NY, US R9h ago
-
Amazon S3 | Data Engineering | Data Modeling | Data Pipelines | Data QualitySenior-level Full TimeNew York10h ago
-
Amazon S3 | Automation | Data Engineering | Data Modeling | Data Pipelines401k match | Dental insurance | Life insurance | Long-term disability | Medical insuranceSenior-level Full TimePrinceton10h ago
-
Senior Databricks Forward Deployed Engineer - GPS USD 119K-198KAPI Integration | AWS | Airflow | Azure | CI/CDTravelSenior-level Full TimeArlington/Rosslyn, Virginia, United States; Atlanta, Georgia, …10h ago
-
GenAI Engineer USD 73K-105KAWS Bedrock | Amazon SageMaker | Amazon Web Services | Data integration | Fine TuningCompetitive benefits package | Onsite work | Travel 0 to 25 percentEntry-level Full TimeArlington/Rosslyn, Virginia, United States11h ago
-
Lead AI and Data Solutions Engineer II USD 137K-229KAmazon Web Services | Apache Spark | Application Programming | Application Programming Interfaces | Cloud ComputingSenior-level Full TimeSacramento, California, United States; Tempe, Arizona, …11h ago
-
Databricks Senior Consultant USD 113K-188KAWS | Azure | Business Intelligence | Cloud platform | Data EngineeringSenior-level Full TimeArlington/Rosslyn, Virginia, United States; Sacramento, California, …11h ago
-
TikTok Shop - E-commerce Anti-Fraud Data Scientist USD 156K-296KA/B | A/B Testing | Analytics | B testing | Big DataMid-level Full TimeSeattle, Washington, United States11h ago
-
Software Engineer, Systems ML - SW/HW Co-design USD 117K-173KAI infrastructure | Bias Mitigation | C# | C++ | Co-designSenior-level Full TimeSunnyvale, CA | Redmond, WA12h ago
-
Software Engineer, Machine Learning USD 213K-293KAPI Design | Agent Orchestration | Artificial Intelligence | Bias Mitigation | C++Senior-level Full TimeSunnyvale, CA | Remote, US | … R12h ago
-
Acoustics | Algorithm Integration | Audio Software | Bring-up | C++Senior-level Full TimeMountain View, CA, USA12h ago
-
Senior Software Engineer, Generative AI, Google Ads USD 174K-252KComputer Vision | Data Processing | Debugging | GenAI | Information RetrievalSenior-level Full TimeMountain View, CA, USA12h ago
-
Staff Software Engineer, AI/ML Performance USD 207K-300KAlgorithms | Auto sharding | C++ | Code debugging | Code generationSenior-level Full TimeSunnyvale, CA, USA12h ago
-
C++ | Data Processing | Debugging | Deep learning | Few-Shot LearningSenior-level Full TimeMountain View, CA, USA12h ago
-
Senior Software Engineer, Generative AI USD 174K-252KAgent-based | Agent-based systems | Cloud platform | Data Structures | Data Structures and AlgorithmsSenior-level Full TimeSunnyvale, CA, USA12h ago
-
Software Engineer III, Generative AI, Payments Risk USD 147K-211KAgent systems | Algorithms | Analytics | Big Data | Computer VisionSenior-level Full TimeMountain View, CA, USA12h ago
-
Software Engineer III, Infrastructure, Infra Spanner USD 147K-211KC++ | Concurrency | Consensus Algorithms | Data Corruption | Data corruption diagnosisSenior-level Full TimeSunnyvale, CA, USA12h ago
-
C++ | Data Analysis | Data Processing | Deep learning | EmbeddingsSenior-level Full TimeMountain View, CA, USA12h ago
-
Apache Flume | C++ | Data Modeling | Data Processing | Data StructuresSenior-level Full TimeMountain View, CA, USA12h ago
-
CAN | DNP3 | Data Visualization | Docker | Firmware Over The AirSenior-level Full TimeSan Francisco, California, United States15h ago
-
Machine Learning Research Engineer USD 146K-222KData Analysis | Data Visualization | Deep learning | GPU Programming | Graph Neural Networks401k | Education reimbursement program | Flexible benefits package | Flexible schedule | Relocation assistanceMid-level Full TimeLivermore, CA, United States19h ago