Find jobs in AI/ML, Data Science and Big Data
11 results
for Reward Optimization
(Skill/Tech stack)
-
Actor-critic | End to End | End-to-end training | Exploration/exploitation | GRPOCareer growth opportunities | Continuous learning opportunities | Flexible work culture | Fully remote | Global collaborationMid-level Full TimeLuxembourg R22h ago
-
Actor-critic | Convergence | Deep reinforcement learning | Experimentation | Exploration/exploitationCareer growth opportunities | Flexible work culture | Fully remoteMid-level Full TimeBulgaria R22h ago
-
Actor-critic | Benchmarking | Convergence Stability | Deep reinforcement learning | Experiment trackingCareer growth opportunities | Flexible work culture | Fully remote work | Global collaboration opportunitiesMid-level Full TimeGreece R22h ago
-
Actor-critic | Deep learning | Exploration/exploitation | Group Relative Policy Optimization | Language ProcessingCareer growth opportunities | Continuous learning opportunities | Flexible work culture | Fully remote work | Global collaboration opportunitiesMid-level Full TimePoland R22h ago
-
Actor-critic | Deep learning | Exploration/exploitation | Group Relative Policy Optimization | Language ProcessingCareer growth | Flexible work culture | Fully remote | Global collaborationMid-level Full TimeBelgium R22h ago
-
Actor-critic | Experimentation | Exploration/exploitation | GRPO | Large-scaleCareer growth | Continuous learning | Flexible work culture | Fully remote | Global collaborationMid-level Full TimeSouth Africa R22h ago
-
Actor-critic | Exploration/exploitation | Language Processing | Large-scale | Large-scale experimentationCareer growth | Flexible work culture | Fully remote | Global collaborationMid-level Full TimeItaly R22h ago
-
Actor-critic | Exploration/exploitation | GRPO | Multi-Modal | NLPCareer growth opportunities | Flexible work culture | Fully remote | International collaborationMid-level Full TimePortugal R22h ago
-
Actor-Critic methods | Actor-critic | Computational Efficiency | Exploration/exploitation | Machine LearningCareer growth | Flexible work culture | Fully remote | Global collaboration opportunitiesMid-level Full TimeFrance R22h ago
-
Actor-Critic methods | Actor-critic | Benchmarking | Convergence | ExperimentationCareer growth opportunities | Flexible work culture | Fully remote work | Global collaboration opportunitiesMid-level Full TimeSpain R22h ago
-
Actor-critic | Data Pipelines | Exploration/exploitation | Large-scale | Large-scale experimentationCareer growth opportunities | Flexible work culture | Fully remote | Global collaboration | Innovation-focused environmentMid-level Full TimeCanada R22h ago