Member of Technical Staff - GPU Infrastructure
San Francisco
USD 180K-300K (estimate) Senior-level Full Time Found 4d ago
Tasks
- Configure parallel filesystems
- Create runbooks and documentation
- Create technical proposals
- Deploy and configure orchestration systems
- Design GPU cluster architectures
- Develop deployment strategies
- Implement high-performance networking
- Implement monitoring and alerting
- Optimize GPU utilization
- Plan capacity for large clusters
- Present architectural recommendations
- Provide on-call support
- Troubleshoot hardware and software issues
- Tune system performance
- Understand workload requirements
Perks/Benefits
Skills/Tech-stack
Ansible | Bash | CUDA | Container Runtime | Cooling management | Driver stack | GPU Architecture | GPU clusters | HPC environments | Infiniband | Kubernetes | Linux Performance Tuning | Linux performance | NVIDIA GPU | NVIDIA GPU architecture | NVLink | Network Topology | Network topology design | Performance Tuning | Power and Cooling Management | Power and cooling | Python | RoCE | Slurm | Terraform | Topology design
Education
N/A
Regions
Countries
States
Language: en |
Views: 1 |
Clicks: 0
Related jobs
-
Software Engineer, Infrastructure Security USD 255K-325KApplication Security | Automation | Cloud infrastructure | Communication skills | IAMMid-level Full TimeSan Francisco1d ago
-
Infrastructure Engineer (AI Platforms) USD 220K-250KAWS | Azure | CI/CD | Cloud infrastructure | DevOpsArchitectural authority | Competitive compensation | High-impact role | Long-term growthSenior-level Full TimeNew York, NY1d ago
-
Engineering Manager, Borglet Machine Learning USD 197K-291KC++ Programming | C/C++ | C/C++ Programming | Containers | Data AnalysisBenefits | Bonus | EquitySenior-level Full TimeSunnyvale, CA, USA2d ago
-
Senior Infrastructure Engineer USD 140K-175KAWS | Ansible | Azure | CI/CD | Datadog401k matching | Catered lunches | Dental | Dependent care benefits | Disability insuranceSenior-level Full TimeBoulder, CO2d ago
-
Senior Software Engineer, Infrastructure, AI/ML Storage USD 166K-244KAPIs | Artificial Intelligence | C++ | Data Storage | Distributed SystemsBenefits | Bonus | EquitySenior-level Full TimeKirkland, WA, USA3d ago
-
Lead Software Engineer, AI Infrastructure USD 146K-220KAWS | Distributed Systems | Docker | GCP | GoBonuses | Commuting allowance | Health insurance | Long-term incentives | Paid time offSenior-level Full TimeSeattle, WA4d ago
-
Member of Technical Staff, Agents Modeling USD 175K-250KAI modeling | Agent Frameworks | Data Generation | Data Pipelines | Machine LearningCo-working stipend | Cutting-edge AI research | Enrichment benefits | Health and dental benefits | Inclusive cultureSenior-level Full TimeNew York5d ago
-
Infrastructure Software Engineer, Enterprise GenAI USD 216K-270KAI Technologies | API Development | Cloud Platforms | Distributed Systems | JavaScriptAdditional benefits | Dental | Health insurance | Learning stipend | PTOSenior-level Full TimeSan Francisco, CA; New York, NY5d ago
-
Software Engineer, ML Infrastructure USD 225K-325KAPIs development | Data workflows | Distributed data | Distributed data workflows | FastAPICollaborative environment | Flexible PTO | Health, dental, vision benefits | Inclusive culture | Parental leaveSenior-level Full TimeSan Francisco6d ago
-
Research Engineer - Machine Learning USD 146K-208KCloud Computing | Data Engineering | Data Management | Data Processing | Distributed SystemsMid-level Full TimeRedmond, WA8d ago
-
Software Engineer, Machine Learning Infrastructure USD 156K-235KAI | Cloud services | Data Engineering | Distributed Systems | High AvailabilityCompany bonus | Equity | Health benefits | Wellness stipendsMid-level Full TimeSeattle, SF9d ago
-
Member of Technical Staff - Python SDK USD 150K-350KAsync Programming | Customer Empathy | Developer tool development | Experience Design | Open SourceSenior-level Full TimeNew York11d ago
-
AI infrastructure | Automation | Cloud Computing | Containerization | DevOpsCareer growth opportunities | Collaborative environment | Global project exposureEntry-level Full TimeSeattle, Washington, United States15d ago
-
AI | Cloud Native | Distributed Systems | GPU Acceleration | Inference infrastructureDevelopment workshops | Hands-on experience | Industry exposure | Personal and professional growth | Social eventsEntry-level InternshipSan Jose, California, United States15d ago
-
Infrastructure Engineer Intern (Compute Infrastructure - Cloud-Native)- 2026 Start(PHD) USD 136K-246KAI | Cloud Computing | Containerization | Distributed Systems | KubernetesCollaboration with industry experts | Development workshops | Hands-on experience | Industry exposure | Social eventsEntry-level InternshipSan Jose, California, United States15d ago
-
AI workloads | Big Data | Cloud Native | Container Orchestration | Data AnalysisSenior-level Full TimeSeattle, Washington, United States15d ago
-
Infrastructure Engineer Intern (Compute Infrastructure - Cloud-Native )- 2026 Summer (MS/BS) USD 129K-246KAI | Cloud Native | Containers | Kubernetes | LLMHands-on experience | Industry exposure | Professional growth | Social events | WorkshopsEntry-level InternshipSeattle, Washington, United States15d ago