Find jobs in AI/ML, Data Science and Big Data
57 results
for Policy Optimization
(Skill/Tech stack)
-
Audio Processing | Autoregression | Autoregressive models | Computer Vision | Deep learningRemote workSenior-level Full TimeRemote job R1d ago
-
Senior Machine Learning Engineer, Agentic USD 163K-245KArtificial Intelligence | Direct Preference Optimization | Evaluation | Fine Tuning | Human-in-the-loop401k matching | Catered meals | Employee events | Employer-paid disability insurance | Employer-paid life insuranceSenior-level Full TimeBellevue, WA; Menlo Park, CA1d ago
-
Research Scientist, Safety Post Training USD 216K-270KAdversarial evaluation | Direct Preference Optimization | Generative AI | Group Relative Policy Optimization | Human FeedbackCommuter stipend | Comprehensive health insurance | Dental insurance | Learning and development stipend | Paid time offSenior-level Full TimeSan Francisco, CA; New York, NY2d ago
-
具身智能算法工程师-模型 CNY 500K-500KActor-critic | Deep learning | Distributed Training | Implicit Q Learning | Inference accelerationMid-level Full Time北京 R2d ago
-
AI Scientist GBP 46K-46KAzure | Azure OpenAI | Azure OpenAI Services | Databricks | Dataset PreparationMid-level Full TimeLondon, United Kingdom6d ago
-
Principal Machine Learning Engineer, Short-form USD 233K-350KCloud platform | Data Modeling | Feedback Loop Mitigation | Feedback loop | GCP Pipelines401k plan | Dental insurance | Disability insurance | Life insurance | Medical insuranceSenior-level Full TimeNew York, NY, US, 100366d ago
-
Senior Machine Learning Engineer, RL / Locomotion USD 220K-336KActor-critic | Domain Randomization | GPU Computing | Isaac Lab | Isaac-GymHealth benefits | Recovery BenefitsSenior-level Full TimeCosta Mesa, California, United States6d ago
-
Research Engineer, Applied AI Engineering USD 250K-555KAds Ranking | Algorithms | Data Pipelines | Data Structures | Deep learningMid-level Full TimeSan Francisco7d ago
-
Head of World Models (Universal Robots, India) INR 3000K-6000KAI orchestration | Actor-critic | Agent Frameworks | Autogen | DPOExecutive-level Full TimeBangalore, IN8d ago
-
Head of Simulation (Universal Robots, India) INR 3000K-6000KAI orchestration | Actor-Critic methods | Actor-critic | Agent Frameworks | AutogenExecutive-level Full TimeBangalore, IN8d ago
-
Actor-critic | End to End | End-to-end training | Exploration/exploitation | GRPOCareer growth opportunities | Continuous learning opportunities | Flexible work culture | Fully remote | Global collaborationMid-level Full TimeLuxembourg R8d ago
-
Actor-critic | Benchmarking | Convergence Stability | Deep reinforcement learning | Experiment trackingCareer growth opportunities | Flexible work culture | Fully remote work | Global collaboration opportunitiesMid-level Full TimeGreece R8d ago
-
Actor-critic | Computational Efficiency | Exploration/exploitation | Language Processing | Multi-ModalCareer growth opportunities | Flexible work culture | Fully remote work | Global collaboration opportunitiesMid-level Full TimeChile R8d ago
-
Actor-critic | Deep learning | Exploration/exploitation | GRPO | Language ProcessingCareer growth opportunities | Flexible work culture | Fully remote | Global collaborationMid-level Full TimeSweden R8d ago
-
Actor-critic | Artificial Intelligence | Exploration Explotation | Group Relative Policy Optimization | Language ProcessingCareer growth opportunities | Flexible work culture | Fully remoteMid-level Full TimeUnited Arab Emirates R8d ago
-
Actor-critic | Exploration/exploitation | Machine Learning | Multi-Modal | Multi-modal AICareer growth opportunities | Flexible work culture | Fully remote work | Global collaborationMid-level Full TimeTurkey R8d ago
-
Actor-critic | Deep learning | Experimentation | Exploration/exploitation | Language ProcessingCareer growth opportunities | Flexible work culture | Fully remote | Global collaboration opportunitiesMid-level Full TimeAustralia R8d ago
-
Actor-critic | Experimentation | Exploration/exploitation | GRPO | Large-scaleCareer growth | Continuous learning | Flexible work culture | Fully remote | Global collaborationMid-level Full TimeSouth Africa R8d ago
-
Actor-critic | Computer Vision | Deep learning | Exploration/exploitation | Language ProcessingCareer growth | Continuous learning | Flexible work culture | Fully remote | Global collaborationMid-level Full TimeMexico R8d ago
-
Actor-critic | Exploration/exploitation | Language Processing | Large-scale | Large-scale experimentationCareer growth | Flexible work culture | Fully remote | Global collaborationMid-level Full TimeItaly R8d ago
-
Actor-critic | Exploration Exploitation Tradeoff | Exploration/exploitation | Machine Learning | Multi-ModalCareer growth opportunities | Flexible work culture | Fully remote work | Global collaborationMid-level Full TimeNetherlands R8d ago
-
Actor-critic | Exploration Exploitation Tradeoff | Exploration/exploitation | Language Processing | Learning algorithmsCareer growth opportunities | Continuous learning | Flexible work culture | Fully remote work | Global collaboration opportunitiesMid-level Full TimeIreland R8d ago
-
Actor-critic | Artificial Intelligence | Experiment tracking | Exploration/exploitation | GRPOCareer growth | Continuous learning culture | Flexible work schedule | Fully remote | Global collaborationMid-level Full TimeSwitzerland R8d ago
-
Actor-Critic methods | Actor-critic | Computational Efficiency | Exploration/exploitation | Machine LearningCareer growth | Flexible work culture | Fully remote | Global collaboration opportunitiesMid-level Full TimeFrance R8d ago
-
Actor-critic | Exploration Explotation | Exploration Explotation Tradeoffs | Group Relative Policy Optimization | NLPCareer growth opportunities | Flexible work culture | Fully remote work | Global collaboration | Innovation-focused environmentMid-level Full TimeGermany R8d ago
-
Actor-Critic methods | Actor-critic | Benchmarking | Convergence | ExperimentationCareer growth opportunities | Flexible work culture | Fully remote work | Global collaboration opportunitiesMid-level Full TimeSpain R8d ago
-
Actor-critic | Data Pipelines | Exploration/exploitation | Large-scale | Large-scale experimentationCareer growth opportunities | Flexible work culture | Fully remote | Global collaboration | Innovation-focused environmentMid-level Full TimeCanada R8d ago
-
Actor-critic | Deep reinforcement learning | Exploration/exploitation | GRPO | Language ProcessingCareer growth opportunities | Flexible work culture | Fully remote work | Global collaborationMid-level Full TimeIndia R8d ago
-
Analytics Team Lead USD 109K-230KA/B | A/B Testing | B testing | Crime analysis | Data AnalysisBonus eligibility | Remote workSenior-level Full TimeHome based-Florida, United States R10d ago
-
Applied Scientist, Trustworthy Shopping Experience (TSE) INR 2000K-4000KAgentic AI | Computer Vision | Cross-modal alignment | Data Warehousing | Deep learningSenior-level Full TimeBengaluru, Karnataka, IND15d ago
-
Deep learning | GPU Computing | Language Models | Language Processing | Large Language ModelsEntry-level Full Time InternshipUS, CA, Santa Clara, United States15d ago
-
Senior AI Engineer - VLA Foundation Model CHF 128K-192KAutonomy | Diffusion Models | Edge Computing | Generative AI | Imitation LearningIn person Work Mode | Mentorship experienceSenior-level Full TimeZürich16d ago
-
Agentic Systems | Deep learning | Diffusion Models | Fine Tuning | Generative AI401k eligibility | Annual bonus | Dental insurance | Medical insurance | Paid time offSenior-level Full TimeLos Altos, CA16d ago
-
Applied Scientist, Customer Behavior Analytics USD 142K-193KCounterfactual analysis | Deep learning | Econometrics | Generative Models | Language ModelsMid-level Full TimeSeattle, Washington, USA17d ago
-
Senior Principal Machine Learning Engineer (Fulfilment) SGD 182K-240KDecision Processes | DeepSpeed | Direct Preference Optimization | Distributed Training | Dynamic ModelsBirthday leave | Confidential Assistance Programme | FlexWork | Medical insurance | Parental leaveExecutive-level Full TimeSingapore, Singapore20d ago
-
Adversarial Networks | Computer Vision | Cross-modal alignment | GRPO | Generative Adversarial NetworksEntry-level InternshipSeattle, Washington, United States21d ago
-
Asynchronous programming | Concurrency | Deep learning | Distributed Systems | JAXCompany-provided equipment | Flexible hours | Fully remote work | Health insurance allowance | Home-office allowanceMid-level Full TimeRemote (EMEA/East Coast) R23d ago
-
Data Analysis | Dataset Processing | Direct Preference Optimization | Evaluation Pipelines | Fine TuningEntry-level InternshipSan Jose, California, United States25d ago
-
Senior Director, AI Model LifeCycle USD 301K-355KCheckpointing | Dataset versioning | Experiment tracking | Failure recovery | Fine Tuning401k match | Cell phone stipend | Commuter benefits | Dental insurance | HSA contributionsSenior-level Full TimeSan Francisco, CA - US26d ago
-
Actor-critic | Air Traffic Management | Air traffic | Machine Learning | OptimizationFlexible working space | Informal corporate culture | Thesis assignment allowanceEntry-level InternshipAmsterdam, Noord-Holland, Nederland R26d ago
-
Tech Lead, Robotic AI Model USD 150K-180KAction Chunking | Action Tokenization | Behavior Cloning | DPO | DeepSpeedSenior-level Full TimeEl Segundo, California, United States29d ago
-
Senior AI Engineer Specialist INR 2500K-3500KAgentic AI | Apache Spark | Direct Preference Optimization | Distributed Computing | Embedding architecturesSenior-level Full TimeIND - Bengaluru - Esko-Graphics India …1mo ago
-
Applied Scientist , Amazon Customer Service USD 142K-222KAgentic AI | Artificial Intelligence | Dataset curation | Direct Preference Optimization | Embedding ModelsMid-level Full TimeSanta Clara, California, USA1mo ago
-
Robotics & Reinforcement Learning Engineer EUR 60K-84KActor-critic | Actuator modeling | Behavior Cloning | C++ | Control SystemsAnnual leave | Early Friday finish | Flexible working hours | Free coffee and tea | Permanent full-time contractSenior-level Contract Full TimeBarcelona, CT, Spain1mo ago
-
AI Engineer - Imitation Learning (Senior) CHF 128K-192KAutonomy | C++ | Diffusion Model | Diffusion Policy | Generative AIIn-person collaborationSenior-level Full TimeZürich1mo ago
-
Helix AI Engineer, Reinforcement Learning USD 150K-350KCredit Assignment | Distributed Training | Experiment Management | Exploration | Model-based reinforcement learningIn-office collaborationSenior-level Full TimeSan Jose, CA1mo ago
-
Staff Software Engineer, Generative AI, Core ML USD 207K-300KAI Feedback | Computer Vision | Data Processing | Deep learning | Digital TwinSenior-level Full TimeMountain View, CA, USA1mo ago
-
Senior-level Full Time北京、上海1mo ago
-
DDP | Deep learning | Direct Preference Optimization | Distributed Training | DockerSenior-level Full TimePangyo (Software Dream Center), South Korea1mo ago
-
大模型应用算法工程师/专家 CNY 240K-480KC++ | Computer Vision | Deep learning | Direct Preference Optimization | Human Computer DialogueSenior-level Full Time上海、北京1mo ago