Senior DGX Cloud AI Infrastructure Software Engineer
Tasks
- Build and scale distributed systems
- Co design and implement APIs
- Debug and triage AI issues
- Define reliability metrics
- Develop infrastructure software and tools
- Enhance infrastructure for AI platforms
- Improve infrastructure efficiency and resiliency
- Root cause and analyze system failures
Perks/Benefits
- N/A
Skills/Tech-stack
AI Inference | AI Training | APIs | C# | C++ | Data Infrastructure | Distributed Systems | ELK | GPU | Infiniband | JAX | Logging | Loki | Monitoring | NCCL | Observability | Prometheus | PyTorch | Python | RDMA | Ray | TensorFlow
Education
Bachelor of Engineering | Bachelor of Science | Master of Science
Related jobs
-
IT Intern - LLM Application Engineer CNY 25K-37KAPI Optimization | Agent systems | Autogen | Claude Code | CursorEntry-level Full Time InternshipBeijing, Beijing, China11h ago
-
Ai算法工程师-汽车专项-实习 CNY 25K-37KAutoml | C# | C++ | Computer Vision | Data ProcessingInternship | Mentorship | Real-world projectsEntry-level Internship南京23h ago
-
Machine Learning Engineer (Training Optimization) CNY 144K-240KCUDA | DeepSpeed | Diffusion Models | Distributed Training | FSDPEntry-level Full TimeBeijing, Beijing, China1d ago
-
Bash | Data Processing | Docker | GCP | LinuxAsynchronous culture | Entrepreneurial team | Friendly work environment | Hands-off managementMid-level Full TimeShenzhen, China1d ago
-
Mid-level Full Time北京1d ago
-
Entry-level Full TimeChina Shanghai1d ago
-
Intern, Agentic AI Researcher (007358) CNY 50K-50KAgentic AI | Artificial Intelligence | Claude | GitHub Copilot | Language ProcessingEntry-level InternshipNANJING,CN,2100001d ago
-
Benchmarking | C++ | CUDA | Deep learning | Distributed SystemsSenior-level Full TimeChina, Shanghai1d ago
-
None Full Time深圳1d ago
-
Mid-level Full Time深圳1d ago
-
Apache Spark | Batch Processing | Big Data | Cloud Architecture | Cloud DataSenior-level Full TimeShenzhen, Guangdong Province, China2d ago
-
AI Solution Manager AI 解决方案经理 CNY 280K-360KArtificial Intelligence | Cloud Platforms | Data Pipelines | Enterprise Platforms | LLMMid-level Full TimeBeijing, Beijing, China2d ago
-
Senior-level Full TimeShanghai, China2d ago
-
数据平台工程师 CNY 180K-300KAWS | Azure | CI/CD | CloudFormation | Data GovernanceFlexible work arrangements | In-person collaborationMid-level Full TimeSHC01 - DXC Shanghai Campus Phase …2d ago
-
Sr. Consultant - Data Scientist CNY 360K-540KAgile | Computer Vision | Containerization | Data Governance | Data ScienceEmployee assistance program | Mindfulness programs | On demand digital course library | Personalized wellbeing programs | Volunteer matching programSenior-level Full TimeChina Shanghai (Hongmei)2d ago
-
AI Software Engineering Intern CNY 60K-60KAI Agents | Agent systems | Data Pipelines | Deep learning | Fine TuningCareer development opportunities | On-site work environmentEntry-level Full Time InternshipCHN - Beijing, China2d ago
-
Sr. Application Engineer CNY 360K-600KAutomated Workflows | C# | Cross-Functional Collaboration | Cross-functional | Data AnalysisSenior-level Full TimeChina - Beijing - Building 102, …2d ago
-
Mid-level Full Time北京 R2d ago
-
大模型算法研究员-MiMo CNY 500K-500KActive Learning | C++ | Curriculum learning | Data Generation | Deep learningEntry-level Full Time北京2d ago
-
Miclaw-端云协同调度专家 (Hybrid AI Architect) CNY 240K-480K5G | Cloud API | Consistency protocols | Data Compression | Data PrivacyHybrid workSenior-level Full Time北京 R2d ago
-
Mid-level Full Time武汉2d ago
-
Entry-level Full Time北京 R2d ago
-
Entry-level Full Time北京 R2d ago
-
Mid-level Full Time北京 R2d ago
-
具身世界模型训练INFRA工程师 - XiaomiRobotics CNY 180K-360KDeep learning | DeepSpeed | Distributed Training | Fault Tolerance | KubernetesMid-level Full Time北京2d ago