大模型训练调优工程师
Tasks
- Analyze training stability, convergence, and resource utilization
- Apply mixed precision and compilation optimizations for training performance
- Build and maintain multimodal model training pipeline
- Collaborate with algorithms on model architecture and training strategy
- Deploy multi GPU distributed training with DDP
- Design training quality evaluation and anomaly monitoring
- Implement parameter efficient fine tuning with LoRA and PEFT
- Optimize training pipeline with task scheduling and data loading
Perks/Benefits
- N/A
Skills/Tech-stack
Adapter | DDP | Data-parallel | DeepSpeed | Distributed Data Parallel | Distributed data | FSDP | Heterogeneous Acceleration | LoRA | Mixed Precision | Model Deployment | Multi-GPU | PEFT | PyTorch | Transformer
Education
Bachelor of Engineering | Bachelor of Science | Master of Science
Regions
Countries
States
Related jobs
-
Ai 院--多模态团队--多模态理解算法研究员-强化学习方向 CNY 240K-480KDPO | Data Preprocessing | Data cleaning | DeepSpeed | Distributed TrainingSenior-level Full Time北京 R1d ago
-
Mid-level Full Time北京 R3d ago
-
Entry-level Full Time北京 R3d ago
-
Mid-level Full Time北京 R3d ago
-
Mid-level Full Time北京 R3d ago
-
Mid-level Full Time上海、深圳 R7d ago
-
AI ML Engineer CNY 280K-360KAWS | Azure | C++ | Cloud Computing | Computer VisionPerformance bonuses | Professional development opportunities | Remote workMid-level Full TimeShenzhen, Guangdong Province, China R11d ago
-
AI工程师-Agent Memory & RAG 方向(成都) CNY 240K-480KBERT | Chroma | Cross-Encoder | Embedding Models | FaissSenior-level Full Time成都 R12d ago
-
AI工程师-Agent Memory & RAG 方向(武汉) CNY 240K-480KAlgorithms | BERT | Chroma | Cross-Encoder | Data StructuresSenior-level Full Time武汉 R12d ago
-
AI工程师-Agent Memory & RAG 方向(北京) CNY 240K-480KBERT | Chroma | Cross-Encoder | Embedding Models | FaissSenior-level Full Time北京 R12d ago
-
AWQ | AWS | Accelerate | Azure | BatchingMid-level Full TimeShenzhen, Guangdong, China R13d ago
-
Generative AI - ML System Engineering CNY 360K-600KC++ | CUDA | Compilation | Data pipeline | Diffusion ModelsFully remote option | On-site work flexibilitySenior-level Full TimeShanghai R19d ago
-
AWS | Azure | JavaScript | NoSQL | Node.jsFast-paced environment | Remote workMid-level Full TimeHangzhou R1mo ago
-
AWS | Agile | Azure | Blockchain | CursorRemote workMid-level Full TimeShenzhen R1mo ago