Senior Data Center Performance Engineer - Benchmarking and Optimization
Tasks
- Analyze performance bottlenecks
- Build automation tools for performance monitoring
- Characterize AI training and inference workloads
- Design performance benchmarking strategies
- Drive performance improvements through system tuning
- Resolve performance issues with cross functional teams
- Track key performance indicators
Perks/Benefits
- N/A
Skills/Tech-stack
C++ | CUDA | Docker | Infiniband | JAX | Kubernetes | Linux | Linux Perf | MPI | NCCL | NVIDIA Nsight | NVLink | Nsight Systems | Nvidia NSight Systems | PyTorch | Python | RoCE | Slurm | TensorFlow
Education
Regions
Countries
States
Related jobs
-
Senior Software Engineer, PyTorch - Deep Learning USD 152K-287KC++ | CUDA | Distributed Computing | Parallel Programming | PyTorchSenior-level Full TimeUS, CA, Santa Clara R4d ago
-
Bash | Bootstrap | CSI | CSS3 | Container StorageSenior-level Full TimeUS, CA, Santa Clara R4d ago
-
Senior Software Engineer, AI Storage USD 184K-287KAlgorithms | Bash | C++ | CUDA | CloudBenefits | EquitySenior-level Full TimeUS, CA, Santa Clara R4d ago
-
Senior Deep Learning Framework Communications Engineer USD 152K-287KC++ | CUDA | CUDA kernels | CuTe | Distributed TrainingBenefits | EquitySenior-level Full TimeUS, CA, Santa Clara R6d ago
-
Senior Deep Learning Frameworks CUDA Software Engineer USD 184K-356KAutograd | C++ | CUDA | Compiler technology | Computer ArchitectureSenior-level Full TimeUS, CA, Santa Clara R10d ago
-
Senior Scientific Machine Learning Engineer – Earth-2 USD 152K-287KCUDA | Containers | Data parallelism | Diffusion Models | GPU KernelBenefits | EquitySenior-level Full TimeUS, CA, Santa Clara R12d ago
-
Senior Storage Production Engineer - DGX Cloud USD 176K-333KAI/ML | Access Control | Algorithms | Ansible | AuditingBenefits | Equity | On-call rotationSenior-level Full TimeUS, CA, Santa Clara R12d ago