Manager, Next-Gen AI Cluster Validation
Tasks
- Automate systems
- Build software development platform
- Collaborate on cluster architecture
- Coordinate with partners and customers
- Develop system designs
- Develop tooling and documentation
- Execute at scale bringup
- Integrate compute networking storage software
- Integrate new technologies
- Lead engineering team
- Perform performance engineering
- Support large scale supercomputing systems
- Validate cluster deployments
Perks/Benefits
- N/A
Skills/Tech-stack
Ansible | Cluster architecture | Deep learning | Distributed Systems | Go | Grafana | HPC | Infiniband | Machine Learning | Multi-GPU | Performance Engineering | Prometheus | Python | RoCE | Supercomputing | System Automation
Education
Regions
Countries
States
Related jobs
-
Senior Product Manager - Agentic Data Analytics USD 208K-379KCPU GPU | CPU GPU Tradeoffs | Cost estimation | Data Governance | Data analyticsSenior-level Full TimeUS, CA, Santa Clara R2d ago
-
Senior Manager, Engineering - AI Developer Tools USD 272K-431KAgile | Automation | Go | JavaScript | PythonSenior-level Full TimeUS, CA, Santa Clara R11d ago
-
Senior Technical Program Manager, Deep Learning Software USD 168K-322KAgile | Aha! | Capacity Planning | Confluence | Deep learningSenior-level Full TimeUS, CA, Santa Clara R12d ago