Find jobs in AI/ML, Data Science and Big Data
22 results
for Online Reinforcement Learning
(Skill/Tech stack)
-
Sr. Staff Software Engineer, Machine Learning USD 191K-315KContent Safety | Evaluation Pipelines | Fine Tuning | Incident Response | Incident monitoringHealth and wellness programs | Hybrid work environment | Time away from workSenior-level Full TimeMountain View, CA, United States5d ago
-
Sr. Staff Software Engineer, Machine Learning USD 191K-315KContent Safety | Deep learning | Evaluation Pipelines | Fine Tuning | Harm TaxonomyHealth and wellness programs | Time away from workSenior-level Full TimeMountain View, CA, United States7d ago
-
Actor-critic | End to End | End-to-end training | Exploration/exploitation | GRPOCareer growth opportunities | Continuous learning opportunities | Flexible work culture | Fully remote | Global collaborationMid-level Full TimeLuxembourg R8d ago
-
Actor-critic | Benchmarking | Convergence Stability | Deep reinforcement learning | Experiment trackingCareer growth opportunities | Flexible work culture | Fully remote work | Global collaboration opportunitiesMid-level Full TimeGreece R8d ago
-
Actor-critic | Computational Efficiency | Exploration/exploitation | Language Processing | Multi-ModalCareer growth opportunities | Flexible work culture | Fully remote work | Global collaboration opportunitiesMid-level Full TimeChile R8d ago
-
Actor-critic | Deep learning | Exploration/exploitation | GRPO | Language ProcessingCareer growth opportunities | Flexible work culture | Fully remote | Global collaborationMid-level Full TimeSweden R8d ago
-
Actor-critic | Exploration Exploitation Tradeoff | Exploration/exploitation | GRPO | Machine LearningCareer growth opportunities | Flexible work culture | Fully remote | International collaborationMid-level Full TimeSaudi Arabia R8d ago
-
Actor-critic | Artificial Intelligence | Exploration Explotation | Group Relative Policy Optimization | Language ProcessingCareer growth opportunities | Flexible work culture | Fully remoteMid-level Full TimeUnited Arab Emirates R8d ago
-
Actor-critic | Deep learning | Experimentation | Exploration/exploitation | Language ProcessingCareer growth opportunities | Flexible work culture | Fully remote | Global collaboration opportunitiesMid-level Full TimeAustralia R8d ago
-
Actor-critic | Experimentation | Exploration/exploitation | GRPO | Large-scaleCareer growth | Continuous learning | Flexible work culture | Fully remote | Global collaborationMid-level Full TimeSouth Africa R8d ago
-
Actor-critic | Computer Vision | Deep learning | Exploration/exploitation | Language ProcessingCareer growth | Continuous learning | Flexible work culture | Fully remote | Global collaborationMid-level Full TimeMexico R8d ago
-
Actor-critic | Exploration/exploitation | Language Processing | Large-scale | Large-scale experimentationCareer growth | Flexible work culture | Fully remote | Global collaborationMid-level Full TimeItaly R8d ago
-
Actor-critic | Exploration Exploitation Tradeoff | Exploration/exploitation | Machine Learning | Multi-ModalCareer growth opportunities | Flexible work culture | Fully remote work | Global collaborationMid-level Full TimeNetherlands R8d ago
-
Actor-critic | Exploration Exploitation Tradeoff | Exploration/exploitation | Language Processing | Learning algorithmsCareer growth opportunities | Continuous learning | Flexible work culture | Fully remote work | Global collaboration opportunitiesMid-level Full TimeIreland R8d ago
-
Actor-critic | Artificial Intelligence | Experiment tracking | Exploration/exploitation | GRPOCareer growth | Continuous learning culture | Flexible work schedule | Fully remote | Global collaborationMid-level Full TimeSwitzerland R8d ago
-
Actor-Critic methods | Actor-critic | Computational Efficiency | Exploration/exploitation | Machine LearningCareer growth | Flexible work culture | Fully remote | Global collaboration opportunitiesMid-level Full TimeFrance R8d ago
-
Actor-critic | Exploration Explotation | Exploration Explotation Tradeoffs | Group Relative Policy Optimization | NLPCareer growth opportunities | Flexible work culture | Fully remote work | Global collaboration | Innovation-focused environmentMid-level Full TimeGermany R8d ago
-
Actor-critic | Data Pipelines | Exploration/exploitation | Large-scale | Large-scale experimentationCareer growth opportunities | Flexible work culture | Fully remote | Global collaboration | Innovation-focused environmentMid-level Full TimeCanada R8d ago
-
Actor-critic | Deep reinforcement learning | Exploration/exploitation | GRPO | Language ProcessingCareer growth opportunities | Flexible work culture | Fully remote work | Global collaborationMid-level Full TimeIndia R8d ago
-
Data Processing | Deep learning | Distributed Training | Generative Models | Human FeedbackFamily leave | Free food and snacks | Health care plan | Life insurance | Long-term disabilitySenior-level Full Time费利蒙9d ago
-
Helix AI Engineer, Reinforcement Learning USD 150K-350KCredit Assignment | Distributed Training | Experiment Management | Exploration | Model-based reinforcement learningIn-office collaborationSenior-level Full TimeSan Jose, CA1mo ago
-
Research Intern – Reinforcement Learning (RL) INR 300K-420KAgent systems | Fine Tuning | LLM Fine-tuning | Language Processing | Learning environmentsEntry-level InternshipNoida1mo ago