Director, Reinforcement Learning & Agentic Post-Training
Tasks
- Build agent training and reinforcement learning environments
- Create training data generation and preference pipelines
- Design evaluation frameworks for multi step workflows
- Develop reward models and verifiers
- Improve policies using supervised fine tuning preference optimization and reinforcement learning
- Lead reinforcement learning strategy for LLM agents
- Measure tool call correctness and workflow completion
- Mentor machine learning engineers and review technical designs
- Partner with product and domain experts to build trainable agent environments
- Set engineering standards for experiment tracking and reproducibility
Perks/Benefits
- N/A
Skills/Tech-stack
AI Feedback | API Integration | Distributed Training | Environment Design | Evaluation | Experiment tracking | Fine Tuning | GRPO | Human Feedback | Language Models | Large Language Models | Learning from Human Feedback | Megatron | Megatron-LM | NEMO | NVIDIA Nemo | NVIDIA Nemotron | Observability | Offline Reinforcement Learning | PPO | Policy Optimization | Preference optimization | PyTorch | Python | Ray | Reinforcement Learning | Reinforcement Learning from AI Feedback | Reinforcement Learning from Human Feedback | Rejection Sampling | Reproducibility | Reward Modeling | Reward shaping | Rollback Safety | Supervised Fine Tuning | Tool use | VLLM
Education
N/A
Related jobs
- No jobs found.