aijobs.net

Principal LLM Inference Engineer

Santa Clara

USD 195K-285K Senior-level Full Time

Apply Save
Found 23h ago
Tasks
Perks/Benefits
Skills/Tech-stack

Batching | C# | C++ | CUDA | CUDA kernel | CUDA kernel programming | Continuous batching | Distributed inference | JAX | KV cache | Kernel programming | Mixture of Experts | Model Serving | Multi model serving | Multi-model | ONNX Runtime | Performance Profiling | Pipeline parallelism | Python | Quantization | SGLang | Sparsity | Speculative decoding | Tensor Parallelism | TensorRT-LLM | Triton | VLLM

Education

Bachelor of Science | Master of Science | PhD

Roles

Engineer | Inference Engineer | LLM Inference Engineer | Principal | Principal LLM Inference Engineer

Regions

North America

Countries

United States

States

California, US

Cities

Santa Clara, California, US

Apply Save
Language: en Views: 1 Clicks: 0 Saves: 0

Related jobs