Find jobs in AI/ML, Data Science and Big Data
2 results
for Group Relative Policy Optimization
(Skill/Tech stack)
-
大模型算法工程师(开放域对话) CNY 180K-300KA/B | A/B Testing | Agentic reinforcement learning | B testing | DeepSpeedMid-level Internship上海、北京3d ago
-
Research Scientist, Safety Post Training USD 216K-270KAdversarial evaluation | Direct Preference Optimization | Generative AI | Group Relative Policy Optimization | Human FeedbackCommuter stipend | Comprehensive health insurance | Dental insurance | Learning and development stipend | Paid time offSenior-level Full TimeSan Francisco, CA; New York, NY1mo ago