Find jobs in AI/ML, Data Science and Big Data
23 results
for PPO
(Skill/Tech stack)
-
大模型 Infra 研发实习生(Agentic RL 方向) CNY 25K-37KAsynchronous programming | Concurrency | Distributed Systems | Docker | GitEntry-level Internship深圳2d ago
-
大模型 Infra 研发实习生(Agentic RL 方向) CNY 25K-37KAlerting | Asynchronous programming | Concurrency | Data Retrieval | Data StorageEntry-level Internship深圳2d ago
-
大语言模型后训练/Agentic算法工程师 CNY 180K-360KAgentic RL | DAPO | Distributed Training | Function Calling | GRPOEntry-level Full Time上海、北京2d ago
-
大模型 Infra 研发实习生(Agentic RL 方向) CNY 25K-37KAlerting | Asynchronous programming | Concurrency | Data pipeline | Distributed SystemsEntry-level Internship深圳2d ago
-
大模型 Infra 研发实习生(Agentic RL 方向) CNY 25K-37KAsynchronous programming | Concurrency | Distributed Systems | Docker | GitFlexible work schedule | Internship opportunity | MentorshipEntry-level Internship深圳2d ago
-
AI Platform Engineer, Training and Inference USD 150K-225KANN indexing | BF16 | DDP | Embeddings | FP8Career growth | Learning opportunitiesSenior-level Full TimeSan Francisco2d ago
-
(USA) Staff, Data Scientist USD 110K-286KA/B | A/B Testing | B testing | Backtesting | Bayesian Methods401k match | Company discounts | Education benefit program | Multiple health plans | PTOSenior-level Full Time(USA) Crossman Respect Building CA SUNNYVALE …10d ago
-
Mid-level Full Time上海11d ago
-
大模型算法工程师(开放域对话) CNY 180K-300KDPO | Deep learning | DeepSpeed | Distributed Training | Function CallingInternshipMid-level Internship上海11d ago
-
Behavior Cloning | Diffusion Models | Embodied AI | Hardware Integration | Imitation LearningEquity | Health benefits | Lunches | Snacks | Team activitiesSenior-level Full TimeSanta Clara, CA12d ago
-
Adversarial Robustness | Agent learning | Audio Processing | Computer Vision | Content ModerationCareer growth | Research mentorshipNone Full TimeSan Jose, California, United States21d ago
-
AIGC Detection | Adversarial Learning | Agentic Systems | Cross-modal alignment | GRPONone Full TimeSeattle, Washington, United States21d ago
-
Adversarial Networks | Adversarial Training | Cross-modal alignment | GRPO | Generative Adversarial NetworksEntry-level InternshipSan Jose, California, United States21d ago
-
Applied Reinforcement Learning Engineer 2 USD 150K-300KActorCritic | BCQ | BehavioralCloning | CQL | DQNMid-level Full TimeRedmond, Washington, United States23d ago
-
Senior Software Engineer, RL Post-Training Frameworks USD 184K-356KActor Based Programming | C# | C++ | Consistency models | DPOComprehensive benefits | EquitySenior-level Full TimeUS, CA, Santa Clara, United States28d ago
-
Tech Lead, Robotic AI Model USD 150K-180KAction Chunking | Action Tokenization | Behavior Cloning | DPO | DeepSpeedSenior-level Full TimeEl Segundo, California, United States29d ago
-
Entry-level Internship上海29d ago
-
Robotics ML Expert, AI USD 60K-60KAgent systems | Control Theory | Dm_control | Domain Randomization | DrakeAsync collaboration | Fully remote | Independent contractor 1099Senior-level Full TimeMiami R29d ago
-
Research Scientist, LLM Evaluation & Post-Training USD 150K-160KBenchmarking | Context evaluation | DPO | Data Processing | Error AnalysisSenior-level Full TimeRemote Work( USA), United States R30d ago
-
Senior Solutions Architect, Retail USD 184K-356KAPI Integration | Agent systems | Agents SDK | Benchmarking | C++Equity | Health benefits | Paid time offSenior-level Full TimeUS, CA, Remote, United States R30d ago
-
C++ | Deep learning | GPU clusters | GRPO | High PerformanceEquity | Healthcare benefits | Paid time off | Retirement benefitsSenior-level Full TimeUS, CA, Santa Clara, United States1mo ago
-
Engineer - ML & RL CAD 93K-116KAgent systems | Contextual bandit | Deep learning | DeepSpeed | Distributed TrainingMid-level Contract Full TimeEdmonton, Alberta, Canada1mo ago
-
AI Research Scientist - Safety Alignment Team USD 213K-293KAdversarial prompts | Automation | Computer Vision | DPO | Dataset curationSenior-level Full TimeMenlo Park, CA1mo ago