Find jobs in AI/ML, Data Science and Big Data
1 result
for Generalized Reward Policy Optimization
(Skill/Tech stack)
-
Staff AI Engineer, Model Post-Training and Alignment USD 196K-268KBenchmarking | Deep learning | Direct Preference Optimization | Fine Tuning | Generalized Reward Policy OptimizationCompany events | Comprehensive healthcare | Education subsidy | Learning and development programs | Meal allowancesSenior-level Full TimeAPAC2d ago