Senior Technical Program Manager, AI/ML & Data Infrastructure, Central Technology
Redwood City, CA (Open to Flex)
Full Time Senior-level / Expert USD 178K - 267K
Chan Zuckerberg Initiative
The Chan Zuckerberg Initiative (CZI) is a new kind of philanthropy that’s on a mission to help build a more inclusive, just and healthy future for everyone.The Chan Zuckerberg Initiative was founded by Priscilla Chan and Mark Zuckerberg in 2015 to help solve some of society’s toughest challenges — from eradicating disease and improving education to addressing the needs of our local communities. Our mission is to build a more inclusive, just, and healthy future for everyone.
The Team
Across our work in Science, Education, and within our communities, we pair technology with grantmaking, impact investing, and collaboration to help accelerate the pace of progress toward our mission. Our Central Operations & Partners team provides the support needed to push this work forward.
Central Operations & Partners consists of our Brand & Communications, Community, Facilities, Finance, Infrastructure/IT Operations/Business Systems, Initiative Operations, People, Real Estate/Workplace/Facilities/Security, Research & Learning, and Ventures teams. These teams provide the essential operations, services, and strategies needed to support CZI’s progress toward achieving its mission to build a better future for everyone.
Our Central Tech team provides technology and security support for CZI and our grantees. Engineering, IT, and Security are most effective when in sync and learning from each other daily. Across our three pillars of Infrastructure, Security, and Grantee & Partner Support, we enable our teams to achieve their goals faster and more securely. We leverage technology to automate manual processes, constantly innovate to optimize operations, provide first-class support, and build solutions to enable the scale and execution of our business partners' strategies and initiatives.
The Opportunity
We’re seeking a Senior Technical Program Manager to lead cross-functional programs that accelerate the effectiveness of our AI/ML and Data Infrastructure teams. This TPM will drive initiatives that improve how internal teams access, use, and scale compute and platform resources—including onboarding and offboarding workflows, access management systems, and infrastructure programs that support efficient, secure, and impactful research and development across the organization.
What You'll Do
- Lead AI/ML infrastructure programs: Drive execution of technical initiatives across GPU scheduling, platform enablement, observability, or workload orchestration.
- Lead access and lifecycle workflows: Own the end-to-end experience for users accessing shared infrastructure resources—including onboarding, offboarding, documentation, and support processes. Serve as the primary point of contact for researchers and internal teams navigating compute access, and collaborate with platform teams to ensure smooth transitions, including long-term model or data migration when needed.
- Coordinate infrastructure access requests: Manage intake and operational workflows for machine learning infrastructure access, including triage, tracking, and communication. Ensure alignment across engineering, research, and platform teams, and help evolve the process as usage scales and needs become more complex.
- Drive documentation systems: Own the structure, accuracy, and governance of internal documentation, onboarding guides, runbooks, and infrastructure wikis.
- Enhance visibility: Maintain and improve AI system dashboards and reporting systems for onboarding timelines, RFA volume, and infrastructure program milestones.
What You'll Bring
- 7+ years of experience in technical program management or infrastructure-focused operations in complex engineering environments.
- Proven ability to manage large-scale technical programs across multiple stakeholders and teams.
- High-level understanding of machine learning workflows and model training pipelines, with the ability to translate infrastructure needs between research and engineering teams.
- Strong organizational skills and experience leading cross-functional programs with tight timelines and multiple stakeholders.
- Excellent written and verbal communication skills, including the ability to align stakeholders at multiple levels.
- A passion for building efficient, secure, and inclusive systems to support cutting-edge science and research.
- Familiarity with on-prem/HPC and/or multi cloud-based GPU infrastructure, orchestration tools, and platforms like Slurm, Run:AI, MLflow, W&B or similar systems is a huge plus.
Compensation
The Redwood City, CA base pay range for this role is $178,000.00 - $267,000.00. New hires are typically hired into the lower portion of the range, enabling employee growth in the range over time. Actual placement in range is based on job-related skills and experience, as evaluated throughout the interview process.
Benefits for the Whole You
We’re thankful to have an incredible team behind our work. To honor their commitment, we offer a wide range of benefits to support the people who make all we do possible.
- CZI provides a generous employer match on employee 401(k) contributions to support planning for the future.
- Annual benefit for employees that can be used most meaningfully for them and their families, such as housing, student loan repayment, childcare, commuter costs, or other life needs.
- CZI Life of Service Gifts are awarded to employees to “live the mission” and support the causes closest to them.
- Paid time off to volunteer at an organization of your choice.
- Funding for select family-forming benefits.
- Relocation support for employees who need assistance moving to the Bay Area
- And more!
If you’re interested in a role but your previous experience doesn’t perfectly align with each qualification in the job description, we still encourage you to apply as you may be the perfect fit for this or another role.
Explore our work modes, benefits, and interview process at www.chanzuckerberg.com/careers.
#LI-Hybrid
Tags: Engineering Finance GPU HPC Machine Learning MLFlow ML infrastructure Model training Pipelines Research Security Weights & Biases
Perks/benefits: 401(k) matching Career development Relocation support
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.