Find jobs in AI/ML, Data Science and Big Data
18 results
for GRPO
(Skill/Tech stack)
-
Adversarial Networks | Computer Vision | Cross-modal alignment | GRPO | Generative Adversarial NetworksEntry-level InternshipSeattle, Washington, United States2d ago
-
Adversarial Robustness | Agent learning | Audio Processing | Computer Vision | Content ModerationCareer growth | Research mentorshipNone Full TimeSan Jose, California, United States2d ago
-
AIGC Detection | Adversarial Learning | Agentic Systems | Cross-modal alignment | GRPONone Full TimeSeattle, Washington, United States2d ago
-
Adversarial Networks | Adversarial Training | Cross-modal alignment | GRPO | Generative Adversarial NetworksEntry-level InternshipSan Jose, California, United States2d ago
-
Applied Research - Evals & Data USD 150K-300KAccelerate | Data Pipelines | Data Versioning | Distributed Systems | Distributed tracingConference attendance | Professional development budget | Relocation support | Remote work | Team offsitesSenior-level Full TimeSan Francisco5d ago
-
Causal Inference | Cross-modal fusion | DPO | Data Modeling | Deep learningEntry-level Full TimeSan Jose, California, United States7d ago
-
Causal Inference | Cross-modal fusion | DPO | Data Modeling | Deep learningMid-level Full TimeSeattle, Washington, United States8d ago
-
Agent systems | Agentic AI | Artificial Intelligence | Benchmarking | Continual LearningDiversity training | Flexible work options | GPU infrastructure access | International Conference Publishing Support | Paid time offEntry-level Full TimeDresden, DE, 010698d ago
-
Senior Software Engineer, RL Post-Training Frameworks USD 184K-356KActor Based Programming | C# | C++ | Consistency models | DPOComprehensive benefits | EquitySenior-level Full TimeUS, CA, Santa Clara, United States8d ago
-
Tech Lead, Robotic AI Model USD 150K-180KAction Chunking | Action Tokenization | Behavior Cloning | DPO | DeepSpeedSenior-level Full TimeEl Segundo, California, United States9d ago
-
Entry-level Internship上海9d ago
-
Mid-level Internship上海9d ago
-
Agent systems | Attention Mechanism | CPU | Continuous Improvement | DPODental insurance | Employee assistance program | Flexible Paid Vacation | Flexible paid sick leave | Flexible spending accountSenior-level Full TimePalo Alto, CA10d ago
-
Research Scientist, LLM Evaluation & Post-Training USD 150K-160KBenchmarking | Context evaluation | DPO | Data Processing | Error AnalysisSenior-level Full TimeRemote Work( USA), United States R10d ago
-
C++ | Deep learning | GPU clusters | GRPO | High PerformanceEquity | Healthcare benefits | Paid time off | Retirement benefitsSenior-level Full TimeUS, CA, Santa Clara, United States16d ago
-
AI/ML Research Scientist, LLM Post-Training & Evaluation USD 150K-160KAlignment | Benchmarking | DPO | Data Processing | Error AnalysisMid-level Full TimeRedmond, Washington, United States17d ago
-
Tech Lead Manager- MLRE, ML Systems USD 264K-331KCUDA | Distributed Systems | Flash Attention | GRPO | Human FeedbackCommuter stipend | Generous PTO | Health, dental and vision coverage | Learning and development stipend | Retirement benefitsSenior-level Full TimeSan Francisco, CA; New York, NY1mo ago
-
Actor-critic | Data Curation | Deep learning | GRPO | Machine LearningCareer development opportunities | Global team | Remote workSenior-level Full TimeRemote job R1mo ago