机器学习平台研发工程师
Tasks
- Build Kubernetes CRD and controller services
- Build machine learning platform
- Design and develop large-scale compute platform
- Develop compute infrastructure
- Develop distributed data processing workflows
- Extend Kubernetes scheduler capabilities
- Improve resource utilization for training and inference
- Improve toolchain and developer experience
- Optimize platform usability and performance
- Support distributed training and inference
Perks/Benefits
- N/A
Skills/Tech-stack
Apache Flink | Apache Spark | CRD | Controller | DDP | DeepSpeed | Docker | FSDP | Go | Kubernetes | MPI | OpenMP | PyTorch | RDMA | Ray
Education
Related jobs
-
具身智能-强化学习(灵巧操作方向) 实习生 CNY 25K-37KActor-critic | Diffusion Models | Distributed Training | Embodied intelligence | Flow matchingEntry-level Full Time Internship深圳2h ago
-
DPO | Deep learning | Diverse Preference Optimization | Learning algorithms | Machine LearningMid-level Full Time上海4h ago
-
算法工程师-大模型数据方向 CNY 240K-360KAutomated Evaluation | Clustering | Corpus Synthesis | Data Augmentation | Data GovernanceSenior-level Full Time上海4h ago
-
Mid-level Full Time上海4h ago
-
Senior-level Full Time上海4h ago
-
Mid-level Internship上海4h ago
-
Mid-level Full Time上海4h ago
-
Senior-level Full Time上海4h ago
-
Entry-level Full TimeSuzhou, Jiangsu, China14h ago
-
Senior-level Full Time上海1d ago
-
Mid-level Full Time东莞1d ago
-
Ai算法工程师 CNY 180K-300KConvolutional Neural Networks | Data Mining | Data Warehouse | Data cleaning | Data labelingMid-level Full Time东莞1d ago
-
Ai 院--多模态团队--多模态理解算法研究员-强化学习方向 CNY 240K-480KDPO | Data Preprocessing | Data cleaning | DeepSpeed | Distributed TrainingSenior-level Full Time北京 R1d ago
-
AI院-GLM团队-AI-Native 全栈工程师(偏后端) CNY 180K-300KAPI Design | API design and implementation | Cloud Native | Data Processing | Database operationsMid-level Full Time北京1d ago
-
Mid-level Full Time杭州1d ago
-
AI院--训练Infra工程师 CNY 180K-300KComputer Vision | Distributed Training | Language Models | Language Processing | Large Language ModelsMid-level Full Time北京1d ago
-
MaaS-SRE/DBA CNY 240K-480KAuto Scaling | Backup and Restore | Caching | Capacity Planning | Disaster RecoveryOn-call rotation | Regular incident drillsSenior-level Full Time北京1d ago
-
Senior-level Full Time上海1d ago
-
Bash | Cloud platform | Data Ingestion | Data Processing | DockerAsynchronous culture | Friendly work atmosphere | Portfolio sharing support | Remote/distributed workMid-level Full TimeBeijing, China2d ago
-
Mid-level Full Time北京 R3d ago
-
Entry-level Full Time北京 R3d ago
-
机器人VLA算法研究员 - XiaomiRobotics CNY 500K-500KAction Generation | Data Engineering | Deep learning | Diffusion Models | Machine LearningEntry-level Full Time北京3d ago
-
Mid-level Full Time北京 R3d ago
-
Mid-level Full Time北京 R3d ago
-
Mid-level Full Time北京3d ago