Find jobs in AI/ML, Data Science and Big Data
20 results
for Proximal Policy Optimization
(Skill/Tech stack)
-
Senior Machine Learning Engineer, Agentic USD 163K-245KArtificial Intelligence | Direct Preference Optimization | Evaluation | Fine Tuning | Human-in-the-loop401k matching | Catered meals | Employee events | Employer-paid disability insurance | Employer-paid life insuranceSenior-level Full TimeBellevue, WA; Menlo Park, CA1d ago
-
具身智能算法工程师-模型 CNY 500K-500KActor-critic | Deep learning | Distributed Training | Implicit Q Learning | Inference accelerationMid-level Full Time北京 R1d ago
-
AI Scientist GBP 46K-46KAzure | Azure OpenAI | Azure OpenAI Services | Databricks | Dataset PreparationMid-level Full TimeLondon, United Kingdom5d ago
-
Senior Machine Learning Engineer, RL / Locomotion USD 220K-336KActor-critic | Domain Randomization | GPU Computing | Isaac Lab | Isaac-GymHealth benefits | Recovery BenefitsSenior-level Full TimeCosta Mesa, California, United States5d ago
-
Head of World Models (Universal Robots, India) INR 3000K-6000KAI orchestration | Actor-critic | Agent Frameworks | Autogen | DPOExecutive-level Full TimeBangalore, IN7d ago
-
Actor-critic | Exploration Explotation | Exploration Explotation Tradeoffs | Group Relative Policy Optimization | NLPCareer growth opportunities | Flexible work culture | Fully remote work | Global collaboration | Innovation-focused environmentMid-level Full TimeGermany R8d ago
-
Agentic Systems | Deep learning | Diffusion Models | Fine Tuning | Generative AI401k eligibility | Annual bonus | Dental insurance | Medical insurance | Paid time offSenior-level Full TimeLos Altos, CA16d ago
-
Adversarial Networks | Computer Vision | Cross-modal alignment | GRPO | Generative Adversarial NetworksEntry-level InternshipSeattle, Washington, United States21d ago
-
Data Analysis | Dataset Processing | Direct Preference Optimization | Evaluation Pipelines | Fine TuningEntry-level InternshipSan Jose, California, United States25d ago
-
Actor-critic | Air Traffic Management | Air traffic | Machine Learning | OptimizationFlexible working space | Informal corporate culture | Thesis assignment allowanceEntry-level InternshipAmsterdam, Noord-Holland, Nederland R25d ago
-
Tech Lead, Robotic AI Model USD 150K-180KAction Chunking | Action Tokenization | Behavior Cloning | DPO | DeepSpeedSenior-level Full TimeEl Segundo, California, United States28d ago
-
Senior AI Engineer Specialist INR 2500K-3500KAgentic AI | Apache Spark | Direct Preference Optimization | Distributed Computing | Embedding architecturesSenior-level Full TimeIND - Bengaluru - Esko-Graphics India …30d ago
-
Robotics & Reinforcement Learning Engineer EUR 60K-84KActor-critic | Actuator modeling | Behavior Cloning | C++ | Control SystemsAnnual leave | Early Friday finish | Flexible working hours | Free coffee and tea | Permanent full-time contractSenior-level Contract Full TimeBarcelona, CT, Spain1mo ago
-
Staff Software Engineer, Generative AI, Core ML USD 207K-300KAI Feedback | Computer Vision | Data Processing | Deep learning | Digital TwinSenior-level Full TimeMountain View, CA, USA1mo ago
-
Senior-level Full Time北京、上海1mo ago
-
DDP | Deep learning | Direct Preference Optimization | Distributed Training | DockerSenior-level Full TimePangyo (Software Dream Center), South Korea1mo ago
-
大模型应用算法工程师/专家 CNY 240K-480KC++ | Computer Vision | Deep learning | Direct Preference Optimization | Human Computer DialogueSenior-level Full Time上海、北京1mo ago
-
Tech Lead Manager- MLRE, ML Systems USD 264K-331KCUDA | Distributed Systems | Flash Attention | GRPO | Human FeedbackCommuter stipend | Generous PTO | Health, dental and vision coverage | Learning and development stipend | Retirement benefitsSenior-level Full TimeSan Francisco, CA; New York, NY1mo ago
-
Agent RL Infra Engineer USD 224K-356KAI Feedback | Active Learning | Cluster management | Continuous Learning | Data CurationSenior-level Full TimeUS, CA, Santa Clara, United States1mo ago
-
Applied Reinforcement Learning Engineer USD 150K-160KActor-critic | Agent systems | BCQ | Behavioral cloning | CQLEqual opportunity employer | Hybrid remote work | Research publications opportunityMid-level Full TimeRemote Work( USA), United States R1mo ago