aijobs.net

Machine Learning Engineer, Inference & Serving (Speech LLM) - San Francisco

San Francisco, CA

USD 180K-270K Mid-level Full Time

Apply Save
Found 4d ago
Tasks
Perks/Benefits
Skills/Tech-stack

AWQ | Audio codecs | Audio streaming | Autoscaling | Chunked prefill | Continuous batching | Distributed Systems | FP8 | GPTQ | GPU Architecture | INT8 | Inference | Inference Server | KV cache | Kubernetes | Language Models | Large Language Models | Latency optimization | Lookahead Decoding | Machine Learning | NVIDIA CUDA | NVIDIA Triton | NVIDIA Triton Inference | NVIDIA Triton Inference Server | Neural audio codecs | PagedAttention | Post-training | Post-training Quantization | SGLang | Speculative decoding | Speech Processing | Tensor Parallelism | TensorRT | TensorRT-LLM | Throughput Optimization | Time To First Audio | Time To First Token | Triton Inference Server | VLLM | WebRTC | WebSockets

Education

N/A

Roles

AI | AI Engineer | Engineer | Learning Engineer | Machine Learning Engineer

Regions

North America

Countries

United States

States

California, US

Cities

San Francisco, California, US

Apply Save
Language: en Views: 2 Clicks: 0 Saves: 0

Related jobs