Senior DGX Cloud AI Infrastructure Software Engineer
Tasks
- Build and scale distributed systems
- Co design and implement APIs
- Debug and triage AI issues
- Define reliability metrics
- Develop infrastructure software and tools
- Enhance infrastructure for AI platforms
- Improve infrastructure efficiency and resiliency
- Root cause and analyze system failures
Perks/Benefits
- N/A
Skills/Tech-stack
AI Inference | AI Training | APIs | C# | C++ | Data Infrastructure | Distributed Systems | ELK | GPU | Infiniband | JAX | Logging | Loki | Monitoring | NCCL | Observability | Prometheus | PyTorch | Python | RDMA | Ray | TensorFlow
Education
Bachelor of Engineering | Bachelor of Science | Master of Science
Related jobs
-
Mid-level Full TimeHangzhou17h ago
-
Mid-level Full Time深圳18h ago
-
【27届实习】云原生Ai平台研发工程师-杭州 CNY 37K-37KArgo Workflow | Computer networks | Containers | Data Structures | GoEntry-level Internship杭州18h ago
-
【27届实习】数据挖掘工程师 CNY 25K-37KData Structures | Deep learning | Distributed machine learning | Go | JavaFull-time conversion opportunityEntry-level Internship Temporary上海18h ago
-
【27届实习】Dba(数据库工程师) CNY 37K-37KAutomation | Database Architecture | Distributed Systems | MongoDB | MySQLFull-time conversion opportunityEntry-level Internship上海18h ago
-
Mid-level Full Time东莞19h ago
-
Ai算法工程师 CNY 180K-300KConvolutional Neural Networks | Data Mining | Data Warehouse | Data labeling | Deep learningMid-level Full Time东莞19h ago
-
Mid-level Full TimeTianjin, CN1d ago
-
Sr. Associate Director, Data and Analytics CNY 240K-360KAgile | Alerting | Analytics engineering | Automated testing | CI/CDMid-level Full TimeGuangzhou, Guangdong, China2d ago
-
Senior-level Full TimeChina2d ago
-
Senior-level Full TimeChina2d ago
-
Bash | Data Ingestion | Data Processing | Docker | GCPAsynchronous work culture | Friendly laid-back atmosphereMid-level Full TimeShanghai, China2d ago
-
Computer Graphics | Computer Vision | CoreML | Deep learning | Diffusion ModelsSenior-level Full TimeBeijing, Beijing, China2d ago
-
CUDA | DeepSpeed | Distributed Training | FSDP | Gradient CheckpointingEntry-level Full TimeBeijing, Beijing, China2d ago
-
Senior-level Full TimeBeijing, China2d ago
-
AI Computing Software Development Engineer, TensorRT CNY 144K-240KArtificial Intelligence | C# | C++ | Debugging | Deep learningSenior-level Full TimeChina, Shanghai2d ago
-
Entry-level Internship深圳2d ago
-
AI Platform | AI Platform Architecture | Agile methodologies | Artificial Intelligence | Cloud ArchitectureSenior-level Full TimeDalian, Liaoning, China3d ago
-
Entry-level InternshipBeijing,Beijing,China3d ago
-
(Sr) Cloud & Data Engineer CNY 192K-240KAWS | Automation | CI/CD | Container Security | Data ModelingMid-level Full TimeBeijing, Beijing, CN3d ago
-
[Growth Engineering] Staff Back-end Engineer I CNY 144K-240KAnomaly Detection | CI/CD | Containerization | FastAPI | Graph DatabaseSenior-level Full TimeShanghai, China3d ago
-
[Growth Engineering] Staff Back-end Engineer I CNY 144K-240KCI/CD | Containerization | FastAPI | Graph Database | Inference ServerSenior-level Full TimeShanghai, China3d ago
-
AI Development Intern CNY 28K-50KAlgorithm Development | Data Analysis | Machine Learning | PyTorch | PythonEntry-level InternshipBeijing,Beijing,China3d ago
-
Deep Learning Performance Architect CNY 152K-240KComputer Architecture | Deep learning | Inference | JAX | Language ModelsSenior-level Full TimeChina, Shanghai3d ago
-
Deep Learning Performance Architect CNY 144K-240KAI Agents | Computer Architecture | Deep learning | GPU | Generative AISenior-level Full TimeChina, Shanghai3d ago