Evaluation Program Manager, Human Data Operations
USA - Remote, United States
â ď¸ We'll shut down after Aug 1st - try foođŚ for all jobs in tech â ď¸
Full Time Mid-level / Intermediate USD 70K - 370K
- Remote-first
- Website
- @WeAreNetflix đ
- GitHub
- Search
Netflix
Watch Netflix movies & TV shows online or stream right to your smart TV, game console, PC, Mac, mobile, tablet and more.Netflix is one of the world's leading entertainment services, with over 300 million paid memberships in over 190 countries enjoying TV series, films and games across a wide variety of genres and languages. Members can play, pause and resume watching as much as they want, anytime, anywhere, and can change their plans at any time.
Now as Netflix explores a broader world of entertainment â expanding into Games, Ads and a world of AI â we are looking for a Human Data Operations Manager to help drive the growth of a new business unit at Netflix focused on how we evaluate and train ML and Generative AI models for use in our work.
About the RoleÂ
Netflix is building toward more intelligent and responsive systemsâand thoughtful, high-quality evaluation is essential to making sure weâre moving in the right direction. As an Evaluation Project Manager, youâll help stand up a new capability focused on defining and operationalizing how we assess the performance of AI and machine learning systems across the company.
This is a new role on a new team: Human Data Operations (HDO). Weâre creating the frameworks, tools, and workflows that ensure human judgment is applied with consistency, clarity, and careâwhether weâre evaluating helpfulness, tone, safety, relevance, or creative quality.
Youâll not only shape how evaluations are designedâbut also own the day-to-day execution of these efforts. From scoping and planning to rater onboarding and calibration, youâll be accountable for driving delivery from start to finish. Just as critically, youâll act as a thought partner and influencerâbringing stakeholders along as you introduce new ways of working, build alignment across teams, and establish a shared language around quality. Your work will help ensure that AI features at Netflix are not only high-performing, but also aligned with our values, our users, and the creative integrity that defines our brand.
Youâll partner closely with the Human Data Operations Manager to ensure that evaluation designs are not only rigorous and alignedâbut also effectively resourced, scoped, and executed at scale.
The Opportunity
This is a rare opportunity to get in on the ground floor of a function that will shape how we measure and guide the performance of AI systems at Netflix. As an Evaluation Project Manager, youâll partner across research, product, UX, and engineering to develop frameworks, rubrics, and workflows that enable rigorous, scalable human evaluation. But beyond shaping the âwhatâ and âhow,â youâll also lead the âwhenâ and âdone.â Youâll be responsible for keeping evaluation projects on trackâensuring consistent execution, timely delivery, and high rater alignment. If you're excited to bring structure to ambiguity and influence how Netflix develops responsible AIâwhile being accountable for tangible deliveryâthis is your chance to create meaningful impact from day one.
The ideal candidate:
The ideal candidate brings a rare combination of structure and flexibility. You know how to create evaluation frameworks that are rigorous and scalableâand youâre also a driver who gets them out the door. Youâre skilled at translating vision into workflows, defining milestones, and delivering consistent results in a dynamic environment. You can steer teams across functions, keep timelines on track, and ensure rater quality without micromanaging. You thrive in spaces where thereâs no roadmap, and you take pride in making things real, not just possible.
Responsibilities:
Lead end-to-end execution of human evaluation initiativesâfrom intake and scoping to delivery
Develop and operationalize frameworks for evaluating GenAI and ML outputs
Collaborate across research, product, UX, and engineering to embed evaluation into model development cycles
Build and maintain project timelines, proactively manage blockers, and ensure timely execution
Develop clear, scalable guidelines and scoring rubrics to ensure consistent rater judgment
Oversee rater onboarding, calibration, and QA workflows
Define and monitor success metrics such as speed to IRR, throughput, and task effectiveness
Pilot and refine evaluation tasks to improve clarity, inter-rater reliability, and feedback quality
Build foundational documentation and drive adoption of best practices across teams
Track evaluation health and proactively communicate progress to stakeholders clearly and proactively
Anticipate and proactively resolve bottlenecks and blockers
Act as the connective tissue across multiple partners to ensure alignment and effective execution of evaluations at scale
QualificationsÂ
4+ years of experience leading human evaluations or structured QA frameworks in an AI/ML environment
Strong understanding of evaluation design, including guidelines, rubrics, and scoring protocols
Experience running complex cross-functional projects end-to-end, with clear ownership of delivery
Proven ability to work across disciplines and align stakeholders toward shared outcomes
Excellent written and verbal communication skills
Experience in GenAI, LLMs, prompt evaluation, or similar ML-powered systems
Ability to synthesize feedback into clear recommendations and process improvements
Familiarity with responsible AI principles and how to embed them into evaluation design
Prior experience managing human annotation vendors, raters, or data labeling teams
Strong organizational skills and executional focus; ability to track details while seeing the bigger picture
Have acted as a strategic and operational partner to senior leaders. Attributes include:
Shine in ambiguous situations
See around corners (anticipate bottlenecks, escalate effectively, anticipate and make trade-offs)
Influence over authority to promote alignmentÂ
Strategic thinker
Strong organizer & executor
Operates effectively across many teamsÂ
Generally, our compensation structure consists solely of an annual salary; we do not have bonuses. You choose each year how much of your compensation you want in salary versus stock options. To determine your personal top of market compensation, we rely on market indicators and consider your specific job family, background, skills, and experience to determine your compensation in the market range. The range for this role is $70,000-$370,000.
Netflix provides comprehensive benefits including Health Plans, Mental Health support, a 401(k) Retirement Plan with employer match, Stock Option Program, Disability Programs, Health Savings and Flexible Spending Accounts, Family-forming benefits, and Life and Serious Injury Benefits. We also offer paid leave of absence programs. Full-time hourly employees accrue 35 days annually for paid time off to be used for vacation, holidays, and sick paid time off. Full-time salaried employees are immediately entitled to flexible time off.
See more detail about our Benefits here.
Netflix is a unique culture and environment. Learn more here.
Inclusion is a Netflix value and we strive to host a meaningful interview experience for all candidates. If you want an accommodation/adjustment for a disability or any other reason during the hiring process, please send a request to your recruiting partner.
We are an equal-opportunity employer and celebrate diversity, recognizing that diversity builds stronger teams. We approach diversity and inclusion seriously and thoughtfully. We do not discriminate on the basis of race, religion, color, ancestry, national origin, caste, sex, sexual orientation, gender, gender identity or expression, age, disability, medical condition, pregnancy, genetic makeup, marital status, or military service.
Job is open for no less than 7 days and will be removed when the position is filled.
Tags: DataOps Engineering Generative AI LLMs Machine Learning ML models Research Responsible AI UX
Perks/benefits: 401(k) matching Career development Equity / stock options Flex hours Flex vacation Health care Medical leave Startup environment
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.