aijobs.net

大语言模型后训练/Agentic算法工程师

上海、北京

CNY 180K-360K (estimate) Entry-level Full Time

Apply Save
Found 1d ago
Tasks
Perks/Benefits
Skills/Tech-stack

Agentic RL | DAPO | Distributed Training | Evaluation Frameworks | Function Calling | GRPO | Inference Serving | Java | Language Processing | Level optimization | Long Range | Long Range Task Learning | Machine Learning | Memory | Multi-turn dialogue | Natural Language | Natural Language Processing | On Policy | On policy Distillation | OpenRLHF | PPO | PPO RL | Planning | Policy Distillation | Preference Learning | Python | RLHF | RLVR | React | Reflection | Reinforcement Learning | Reward Modeling | Sparse Reward | Sparse Reward Modeling | Tool Integrated Reasoning | Training frameworks | Trajectory Level Optimization | TypeScript | VeRL

Education

Bachelor of Engineering | Bachelor of Science | Master of Science | PhD

Roles

Agentic Algorithm Engineer | Algorithm Engineer | Engineer | LLM Training Engineer | Training Engineer

Regions

Asia/Pacific

Countries

China

States

Shanghai, CN | Beijing, CN

Cities

Shanghai, Shanghai, CN | Beijing, Beijing, CN

Apply Save
Language: zh Views: 0 Clicks: 0 Saves: 0

Related jobs