aijobs.net

Solutions Architect, Inference Deployments

US, CA, Santa Clara, United States

USD 152K-241K Senior-level Full Time

Apply Save
Found 22d ago
Tasks
Perks/Benefits
Skills/Tech-stack

AI Inference | AI inference workloads | Disaggregated inference | GPU Operator | GPU Orchestration | GPU memory | GPU memory management | Inference Server | Inference acceleration | Inference workloads | Kubernetes | Low Latency | Low Latency Networking | Memory Management | Model Optimization | Multi-Instance GPU | NIM Operator | NVIDIA GPU | NVIDIA GPU Operator | Neural Networks | Nvidia Dynamo | Open Source | Open-source contributions | Quantization | RDMA | SGLang | Speculative decoding | TensorRT-LLM | Transformer Neural Networks | Triton Inference | Triton Inference Server | UCX | VLLM | WideEP

Education

Bachelor | Computer Science | Engineering

Roles

Architect | Solutions Architect

Regions

North America

Countries

United States

States

California, US

Cities

Santa Clara, California, US

Apply Save
Language: en | Views: 1 | Clicks: 0 | Saves: 0

Related jobs