Machine Learning Engineer (Training Optimization)
Beijing, Beijing, China
CNY 144K-240K (estimate) Entry-level Full Time
Tasks
- Apply best practices for distributed training
- Debug profile and fine tune training workflows
- Design distributed training systems for foundation models
- Develop custom CUDA or Triton kernels
- Integrate systems with research and modeling requirements
- Optimize training performance GPU utilization communication overhead memory efficiency
Perks/Benefits
- N/A
Skills/Tech-stack
CUDA | DeepSpeed | Diffusion Models | Distributed Training | FSDP | GPU Kernel | Gradient Checkpointing | JAX | Language Models | Large Language Models | Low Precision | Megatron-LM | Multimodal Learning | NVIDIA Nemo | PyTorch | Python | Triton | Zero
Education
N/A
Related jobs
-
Ai算法工程师-汽车专项-实习 CNY 25K-37KAutoml | C# | C++ | Computer Vision | Data ProcessingInternship | Mentorship | Real-world projectsEntry-level Internship南京14h ago
-
Bash | Data Processing | Docker | GCP | LinuxAsynchronous culture | Entrepreneurial team | Friendly work environment | Hands-off managementMid-level Full TimeShenzhen, China19h ago
-
Mid-level Full Time北京19h ago
-
Data Scientist Summer Intern__Deep Learning CNY 38K-50KBig Data | Computer Vision | Deep learning | Machine Learning | PythonEntry-level Full Time InternshipShanghai, China1d ago
-
Intern, Agentic AI Researcher (007358) CNY 50K-50KAgentic AI | Artificial Intelligence | Claude | GitHub Copilot | Language ProcessingEntry-level InternshipNANJING,CN,2100001d ago
-
None Full Time深圳1d ago
-
Mid-level Full Time深圳1d ago
-
Senior-level Full TimeShanghai, China2d ago
-
数据平台工程师 CNY 180K-300KAWS | Azure | CI/CD | CloudFormation | Data GovernanceFlexible work arrangements | In-person collaborationMid-level Full TimeSHC01 - DXC Shanghai Campus Phase …2d ago
-
Sr. Consultant - Data Scientist CNY 360K-540KAgile | Computer Vision | Containerization | Data Governance | Data ScienceEmployee assistance program | Mindfulness programs | On demand digital course library | Personalized wellbeing programs | Volunteer matching programSenior-level Full TimeChina Shanghai (Hongmei)2d ago
-
Sr. Application Engineer CNY 360K-600KAutomated Workflows | C# | Cross-Functional Collaboration | Cross-functional | Data AnalysisSenior-level Full TimeChina - Beijing - Building 102, …2d ago
-
Mid-level Full Time北京 R2d ago
-
大模型算法研究员-MiMo CNY 500K-500KActive Learning | C++ | Curriculum learning | Data Generation | Deep learningEntry-level Full Time北京2d ago
-
Miclaw-端云协同调度专家 (Hybrid AI Architect) CNY 240K-480K5G | Cloud API | Consistency protocols | Data Compression | Data PrivacyHybrid workSenior-level Full Time北京 R2d ago
-
Entry-level Full Time北京 R2d ago
-
高级算法工程师(Nlp方向) CNY 240K-480KAgent Development | Agent development tools | Agent memory | CUDA | ChromaSenior-level Full Time北京2d ago
-
Entry-level Full Time北京 R2d ago
-
Mid-level Full Time北京 R2d ago
-
具身世界模型训练INFRA工程师 - XiaomiRobotics CNY 180K-360KDeep learning | DeepSpeed | Distributed Training | Fault Tolerance | KubernetesMid-level Full Time北京2d ago
-
具身世界模型推理INFRA工程师 - XiaomiRobotics CNY 240K-480KCFG Parallelization | Diffusion Models | Expert parallelism | FP8 Quantization | Inference OptimizationSenior-level Full Time北京2d ago
-
具身智能算法工程师-模型 CNY 500K-500KActor-critic | Deep learning | Distributed Training | Implicit Q Learning | Inference accelerationMid-level Full Time北京 R2d ago
-
Entry-level Full Time北京2d ago
-
AI基础设施研发工程师(Sandbox / 容器化)-MiMo CNY 180K-420KAppArmor | Argo Workflows | CPU resource scheduling | Cgroup | ContainerdMid-level Full Time北京 R2d ago
-
大模型 Infra 研发实习生(Agentic RL 方向) CNY 25K-37KAsynchronous programming | Concurrency | Distributed Systems | Docker | GitEntry-level Internship深圳2d ago
-
Ai应用工程师(提效方向 0-1) CNY 50K-50KAI Programming | AI Programming Tools | API Integration | JavaScript | Language ProcessingEngineering resource support | Hands-on product development | Model and compute support | Real world usageEntry-level Internship深圳2d ago