Senior DGX Cloud AI Infrastructure Software Engineer
Tasks
- Build and scale distributed systems
- Co design and implement APIs
- Debug and triage AI issues
- Define reliability metrics
- Develop infrastructure software and tools
- Enhance infrastructure for AI platforms
- Improve infrastructure efficiency and resiliency
- Root cause and analyze system failures
Perks/Benefits
- N/A
Skills/Tech-stack
AI Inference | AI Training | APIs | C# | C++ | Data Infrastructure | Distributed Systems | ELK | GPU | Infiniband | JAX | Logging | Loki | Monitoring | NCCL | Observability | Prometheus | PyTorch | Python | RDMA | Ray | TensorFlow
Education
Bachelor of Engineering | Bachelor of Science | Master of Science
Related jobs
-
(Sr) Cloud & Data Engineer CNY 192K-240KAWS | Automation | CI/CD | Container Security | Data ModelingMid-level Full TimeBeijing, Beijing, CN11h ago
-
Deep Learning Performance Architect CNY 152K-240KComputer Architecture | Deep learning | Inference | JAX | Language ModelsSenior-level Full TimeChina, Shanghai19h ago
-
Deep Learning Performance Architect CNY 144K-240KAI Agents | Computer Architecture | Deep learning | GPU | Generative AISenior-level Full TimeChina, Shanghai19h ago
-
C# | C++ | Computer Vision | Debugging | Deep learningSenior-level Full TimeChina, Shanghai19h ago
-
Entry-level Full Time广州22h ago
-
Senior-level Full Time上海、北京23h ago
-
Mid-level Full Time北京 R1d ago
-
Miclaw-端云协同调度专家 (Hybrid AI Architect) CNY 240K-360K5G | API Integration | Claude 3.5 | Distributed Systems | GPT-4oHybrid workSenior-level Full Time北京 R1d ago
-
Java开发工程师(大数据方向) CNY 180K-360KApache Flink | Apache Spark | Data pipeline | Distributed Systems | IO ProgrammingMid-level Full Time武汉1d ago
-
A/B | A/B Experimentation | Autoscaling | Caching | Canary testingCommute subsidy | Disability insurance | Employee stock ownership | Generous vacation | Health insuranceSenior-level Full TimeShanghai, China1d ago
-
Apache Airflow | Apache Flink | Apache Spark | Automated testing | Data LakeCommute subsidy | Competitive retirement pension plans | Employee resource groups | Employee stock ownership | Generous vacation personal daysSenior-level Full TimeShanghai, China1d ago
-
Airflow | CUDA | Data Lake | Data Warehouse | FlinkCommute subsidy | Competitive retirement pension plans | Employee resource groups | Employee stock ownership | Generous vacation personal daysSenior-level Full TimeShanghai, China1d ago
-
A/B | A/B Testing | Autoscaling | B testing | Canary testingCommute subsidy | Competitive retirement pension plans | Employee resource groups | Employee stock ownership | Generous vacationSenior-level Full TimeShanghai, China1d ago
-
Senior Data Engineer, Content Management Systems (China) CNY 144K-240KAPI Integration | AWS | Access Control | Alibaba Cloud | CI/CDAnnual medical check-up | Flexible benefits | Long service award | Medical and life insurance | Paid time offSenior-level Full TimeChina - Shanghai1d ago
-
AWS | Apache Airflow | Apache Kafka | Apache Spark | AzureMid-level Full TimeCN-Shenzhen-HyQ, China1d ago
-
Entry-level Internship Part TimeShanghai (JingAn), China1d ago
-
Senior Data Engineer, Content Management Systems (China) CNY 144K-240KAPI Integration | AWS | Alibaba Cloud | Apache Kafka | Apache SparkAnnual Medical Checkup | Flexible benefits | Life insurance | Long service award | Medical insuranceSenior-level Full TimeChina - Shanghai1d ago
-
Mid-level Full Time深圳2d ago
-
Entry-level Internship北京2d ago
-
Senior Software Engineer - Machine Learning CNY 360K-600KData Analysis | Data Visualization | Deep learning | Experimentation | Fraud DetectionCareer progression | Collaborative culture | Competitive compensation | Global growth opportunitiesSenior-level Full TimeShenzhen, China2d ago
-
AI intern CNY 28K-50KAutomated testing | Continuous integration | Deep learning | Generative AI | JavaEntry-level InternshipBeijing,Beijing,China2d ago
-
Intelligent Test Automation & GenAI Tool Engineer CNY 360K-540KAgent systems | C# | C++ | CI/CD | ConfluenceSenior-level Full TimeShanghai, Shanghai, China2d ago
-
Senior Data Engineer CNY 360K-600KActive Directory | Agile | Apache Spark | Azure Active Directory | Azure CosmosHybrid work environment | Inclusion support | Professional growth | Wellbeing supportSenior-level Full TimeChengdu, Manulife Information and Technology Center, …2d ago
-
Senior Consultant Specialist (AI Architect/Tech Lead) CNY 144K-192KAPI Design | AWS | Alibaba Cloud | Automation | CI/CDSenior-level Full TimeGuangzhou, Guangdong, China R3d ago
-
Sr. AI Process Engineer, Seller Compliance CNY 360K-600KAWS | CI/CD | Data Pipelines | Deployment | Feature StoreSenior-level Full TimeShanghai, CHN3d ago