Find jobs in AI/ML, Data Science and Big Data
51 results
for Reward Modeling
(Skill/Tech stack)
-
Software Engineer, Research Acceleration USD 350K-475KDistributed Systems | Evaluation Frameworks | Experiment tracking | JAX | Machine LearningDental insurance | Health insurance | Paid parental leave | Relocation support | Unlimited PTOSenior-level Full TimeSan Francisco7h ago
-
Applied Scientist / Machine Learning Engineer USD 311K-370KAction models | Active Learning | Data Curation | Data Deduplication | Data ProcessingHybrid work policyMid-level Full TimeSunnyvale14h ago
-
AI Feedback | Checkpointing | Cost Performance | Cost-performance tradeoffs | Data Decontamination401k matching | Country specific visa support | Flexible work arrangements | Medical, dental, and vision options | Parental leaveSenior-level Full TimePalo Alto, California, United States1d ago
-
Data Scientist (Remote) USD 140K-215KContext Management | DPO | DeepSpeed | Experiment tracking | Experimental DesignEmployee networks | Great Place to Work certification | Paid adoption leave | Paid parental leave | Professional developmentMid-level Full TimeUSA VA Remote, United States R2d ago
-
大模型算法工程师(开放域对话) CNY 180K-300KA/B | A/B Testing | Agentic reinforcement learning | B testing | DeepSpeedMid-level Internship上海、北京4d ago
-
大语言模型后训练/Agentic算法工程师 CNY 180K-360KDistributed Training | Function Calling | GRPO | Human Feedback | JSONEntry-level Full Time上海、北京4d ago
-
Agent Post-Training, Computer Use Research USD 295K-445KData pipeline | Evaluation | Experimentation | Grader Development | Machine LearningSenior-level Full TimeSan Francisco4d ago
-
Agent Post-Training, Connectors Research USD 295K-445KData Pipelines | Deep learning | Experimentation | Language Models | Language ProcessingSenior-level Full TimeSan Francisco4d ago
-
Agent Post-Training, Context Research USD 295K-445KData Pipelines | Deep learning | Experimentation | Grading | Language ModelsMid-level Full TimeSan Francisco4d ago
-
Agent Orchestration | Agent systems | Automated testing | Context modeling | Data QualityEntry-level InternshipSingapore, Singapore6d ago
-
Data Pipelines | Evaluation | Fine Tuning | Human Feedback | LLM Fine-tuningSenior-level Full TimeParis, France6d ago
-
Staff Software Engineer, AI/ML USD 216K-271KAI Feedback | Agentic AI | Data Pipelines | Direct Preference Optimization | Experimentation platformsConference reimbursement | Education reimbursement | Employee assistance program | Employee stock purchase program | Equity compensationSenior-level Full TimeSeattle7d ago
-
Senior Applied Scientist USD 142K-270KData Pipelines | Diffusion Models | Direct Preference Optimization | Evaluation metrics | Fine TuningSenior-level Full TimeSeattle, United States R7d ago
-
Director, Reinforcement Learning & Agentic Post-Training EUR 151K-200KAI Feedback | API Integration | Distributed Training | Environment Design | EvaluationExecutive-level Full TimeParis, France7d ago
-
Research Scientist, Gemini Data, DeepMind EUR 104K-107KFine Tuning | JAX | Language Models | Large Language Models | Machine LearningMid-level Full TimeParis, France8d ago
-
Closed Loop | Closed Loop Evaluation | Counterfactual Simulation | Data Generation | Decision MakingSenior-level Full TimePangyo (Software Dream Center), South Korea8d ago
-
Staff Machine Learning Engineer GBP 155K-163KData Processing | Deep learning | Distributed Training | Generative AI | Human FeedbackCompany benefits program | Discretionary annual bonus | Equity incentive planSenior-level Full TimeLondon, UK8d ago
-
Senior Software Engineer - Model Training & AI Evals INR 3500K-5000KAI Feedback | Ablation Studies | Benchmarking | CI/CD | Data GenerationSenior-level Full TimeRemote (India) R13d ago
-
AI Feedback | Agent Orchestration | Agent systems | Agentic AI | Autonomous ReasoningSenior-level Full TimeSeoul, South Korea14d ago
-
Staff Data Scientist: Semantic Substrate Incubation USD 206K-271KAWS CDK | AWS CloudFormation | AWS EC2 | AWS Lambda | AWS NeptuneConference and publication support | Continuous learning stipend | Dedicated growth time | Flexible time off | Health and dental insuranceSenior-level Full TimeSeattle, Washington, United States15d ago
-
Machine Learning Engineer USD 170K-315KData Preprocessing | Deep learning | Evaluation benchmarks | Fine Tuning | GPU ProfilingHealth benefits | Hybrid work model | Retirement benefits | Vacation timeMid-level Full TimeUSA - CA - Santa Clara, …16d ago
-
Data Scientist (Remote) USD 120K-180KAbuse Resistance | Agent safety | Agentic Planning | Data scaling | DeepSpeedEmployee networks | Great Place to Work certified | Office culture | Paid adoption leave | Paid parental leaveMid-level Full TimeUSA VA Remote, United States R19d ago
-
Researcher, Agent Post-Training, Personality USD 295K-445KBehavioral Science | Data Pipelines | Evals | Evaluation | Human FeedbackSenior-level Full TimeSan Francisco20d ago
-
Machine Learning Engineer, PhD Intern (Fall) USD 100K-125KAlgorithms | Data Management | Distillation | Feedback learning | Generative AIRemote work flexibilityEntry-level InternshipUnited States - Remote R20d ago
-
Intern Engineer – RL Post-Training for LLMs CAD 58K-104KData Generation | Deep learning | DeepSpeed | Distributed Training | GRPOInternshipEntry-level InternshipVancouver, British Columbia, Canada22d ago
-
Sr. Physical AI Research Scientist CAD 140K-180KAI alignment | Artificial Intelligence | Computer Vision | Constitutional AI | Continual LearningHybrid work scheduleSenior-level Full TimeToronto, ON, CA26d ago
-
Research Engineer - LLM Training & Alignment Systems CAD 127K-225KAutomation | Benchmarking | C# | C++ | Data CurationMid-level Contract Full TimeKingston, Ontario, Canada27d ago
-
Machine Learning Researcher - RL and Agentic Systems USD 190K-287KAgentic Systems | Benchmarking | Data Validation | Dataset Quality Evaluation | Dataset qualityMid-level Full TimeRemote R1mo ago
-
Data Curation | Deep learning | DeepSpeed | Direct Preference Optimization | EvaluationSenior-level Full TimeSingapore, Singapore1mo ago
-
Staff Machine Learning Engineer, AV Core USD 336K-370K3D Scene | 3D Scene Understanding | Action models | Behavior Modeling | C++Hybrid work | Work from homeSenior-level Full TimeSunnyvale1mo ago
-
Agent simulation | Behavioral Modeling | DPO | Data Curation | Data GenerationEntry-level Full Time InternshipUS, CA, Santa Clara, United States1mo ago
-
Data Curation | Data Generation | Deep learning | Distributed Training | Fine TuningInternship benefitsEntry-level Full Time InternshipUS, CA, Santa Clara, United States1mo ago
-
Audio Processing | Autoregression | Autoregressive models | Computer Vision | Deep learningRemote workSenior-level Full TimeRemote job R1mo ago
-
Applied Scientist, Wayve Labs USD 147K-213KAutoregressive models | Depth Estimation | Diffusion Models | Foundation Models | LanguageDaily yoga | Enhanced parental leave | Flexible working hours | Hybrid working | Large Social BudgetsMid-level Full TimeSunnyvale1mo ago
-
Agent Orchestration | Data Pipelines | Debugging | Evaluation | Language ModelsDirect founder collaboration | High technical ownership | Hybrid option | Meaningful architectural influence | Mission-driven healthcare impactSenior-level Full TimeRemote; Boston, MA; Onsite R1mo ago
-
Applied AI Engineer USD 175K-275KEmbeddings | Generative AI | LanceDB | Langchain | Language ModelsDevelopment opportunities | Hybrid work culture | Mentorship | Professional growthSenior-level Full TimeSan Francisco1mo ago
-
Applied Scientist, Wayve Labs CAD 100K-132KAutoregressive models | Computer Vision | Data sets | Depth Estimation | Diffusion ModelsDaily yoga | Enhanced parental leave | Flexible working hours | Large Social Budgets | Onsite barMid-level Full TimeVancouver1mo ago
-
Applied Scientist, Wayve Labs GBP 80K-96KAutoregressive models | Depth Estimation | Diffusion Models | Foundation Models | Human FeedbackDaily yoga | Enhanced parental leave | Flexible working hours | Onsite bar | Onsite chefMid-level Full TimeLondon1mo ago
-
A/B | A/B Testing | B testing | Data Pipelines | Fine Tuning401k retirement plan | Health insurance | Meal allowance | Paid flexible holidays | Paid parental leaveSenior-level Full TimeNew York, NY1mo ago
-
Software Engineer - Machine Learning USD 190K-220KAdversarial Data | Adversarial Data Generation | Adversarial Training | Content Moderation | DPOMid-level ContractMountain View, CA1mo ago
-
Data Processing | Deep learning | Distributed Training | Generative Models | Human FeedbackFamily leave | Free food and snacks | Health care plan | Life insurance | Long-term disabilitySenior-level Full Time费利蒙1mo ago
-
Deep learning | GPU Computing | Language Models | Language Processing | Large Language ModelsEntry-level Full Time InternshipUS, CA, Santa Clara, United States1mo ago
-
Alignment | Benchmark design | Constitutional AI | Continued Pretraining | Data CurationSenior-level Full TimeDublin, CA (HQ)1mo ago
-
Alignment | Benchmark design | DPO | Data Curation | Data DeduplicationSenior-level Full TimeIndia/Bengaluru1mo ago
-
Constitutional AI | Continued Pretraining | DPO | Data Curation | DeduplicationSenior-level Full TimeBrazil/Remote R1mo ago
-
Senior Applied AI Researcher (India) INR 2500K-4500KArtificial Intelligence | DPO | Data parallelism | DataLoader | DeepSpeedSenior-level Full TimeIndia/Bengaluru1mo ago
-
Senior Applied AI Researcher (Brazil) BRL 271K-370KCI/CD | DPO | Data parallelism | Deep learning | DeepSpeedSenior-level Full TimeBrazil/Remote R1mo ago
-
Senior Applied AI Researcher (Dublin, CA) USD 190K-300KAutomated testing | Continuous Evaluation | Data parallelism | Deep learning | DeepSpeedSenior-level Full TimeDublin, CA (HQ)1mo ago
-
Applied AI Researcher (India) INR 2000K-3465KAWS | Automated testing | Azure | CI/CD | Cloud ComputingMid-level Full TimeIndia/Bengaluru1mo ago
-
Applied AI Researcher (Dublin, CA) USD 239K-331KCI/CD | Computer Vision | Data Preprocessing | Deep learning | Direct Preference OptimizationMid-level Full TimeDublin, CA (HQ)1mo ago