aijobs.net

AI Inference Engineer - Model Optimization & Deployment

Foster City, CA

USD 205K-303K (estimate) Senior-level Full Time

Apply Save
Found 12h ago
Tasks
Perks/Benefits
Skills/Tech-stack

Accuracy evaluation | BF16 | C++ | CUDA | CUDA kernels | Cache optimization | Causal Attention | Compilation Pipeline | Efficient Fine Tuning | FP16 | FP8 | Fine Tuning | FlashAttention | INT4) | INT8 | KV cache | KV cache optimization | Latency benchmarking | Linear-attention | LoRA | Memory Optimization | Mixed Precision | Model Conversion | PagedAttention | Parameter efficient fine-tuning | Post-training | Post-training Quantization | PyTorch | Python | QLoRA | Quantization | Quantization aware training | Speculative decoding | TensorRT | TensorRT Plugins | TensorRT-LLM

Education

N/A

Roles

AI | AI Inference Engineer | Engineer | Inference Engineer | ML Engineer | Model Optimization Engineer | Optimization Engineer

Regions

North America

Countries

United States

States

California, US

Cities

Foster City, California, US

Apply Save
Language: en | Views: 2 | Clicks: 0 | Saves: 0

Related jobs