Find jobs in AI/ML, Data Science and Big Data

12 results for Proximal Policy Optimization (Skill/Tech stack)

具身智能算法工程师-模型 CNY 500K-500K

Actor-critic | Deep learning | Distributed Training | GPU Training | IQL

Mid-level Full Time

北京 R

4h ago
Staff Software Engineer, Generative AI, Core ML USD 207K-300K

AI Feedback | Computer Vision | Data Processing | Deep learning | Digital Twin

Senior-level Full Time

Mountain View, CA, USA

1d ago
Machine Learning Engineer (Post-Training) EUR 57K-84K

AWS | Data Pipelines | Data-parallel | DeepSpeed | Direct Preference Optimization

Senior-level Full Time

Paris, France

1d ago
Decision Intelligence Engineer - Next Best Action USD 129K-177K

A3C | Backtesting | Bellman Equation | Conservative Q Learning | Constraint Mapping

401k retirement savings plan | Medical, dental, and vision benefits | Occasional travel | Remote work | Time off

Senior-level Full Time

Remote US, United States R

6d ago
具身智能-多模态强化学习算法专家 CNY 240K-480K

Actor-critic | Deep Q-Network | Isaac Sim | LLM | Mujoco

Senior-level Full Time

北京、上海

7d ago
LLM Engineer (Reinforcement Learning)

DDP | Deep learning | Direct Preference Optimization | Distributed Training | Docker

Senior-level Full Time

Pangyo (Software Dream Center), South Korea

8d ago
大模型应用算法工程师/专家 CNY 240K-480K

C++ | Computer Vision | Deep learning | Direct Preference Optimization | Human Computer Dialogue

Senior-level Full Time

上海、北京

9d ago
Tech Lead Manager- MLRE, ML Systems USD 264K-331K

CUDA | Distributed Systems | Flash Attention | GRPO | Human Feedback

Commuter stipend | Generous PTO | Health, dental and vision coverage | Learning and development stipend | Retirement benefits

Senior-level Full Time

San Francisco, CA; New York, NY

9d ago
Agent RL Infra Engineer USD 224K-356K

AI Feedback | Active Learning | Cluster management | Continuous Learning | Data Curation

Senior-level Full Time

US, CA, Santa Clara, United States

11d ago
Applied Reinforcement Learning Engineer USD 150K-160K

Actor-critic | Agent systems | BCQ | Behavioral cloning | CQL

Equal opportunity employer | Hybrid remote work | Research publications opportunity

Mid-level Full Time

Remote Work( USA), United States R

14d ago
Senior ML Engineer – Distributed RL & Post-Training Infrastructure A USD 204K-350K

Automated testing | Cryptography | Direct Preference Optimization | Distributed Systems | Docker

Senior-level Full Time

Remote R

22d ago
Senior AI Research Scientist (6240) USD 170K-270K

Adversarial Learning | Attention Networks | Dash | Data Preprocessing | Data Wrangling

Hybrid work schedule | Professional development programs | Travel for training and team building

Senior-level Full Time

San Jose, CA, US

1mo ago