Principal Product Manager, ML Training
Sunnyvale, CA
Cerebras Systems
Cerebras is the go-to platform for fast and effortless AI training and inference.Cerebras Systems builds the world's largest AI chip, 57 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.
Cerebras' current customers include global corporations across multiple industries, national labs, and top-tier healthcare systems. Cerebras had an extraordinary 2024, highlighted by the launch of the third-generation CS-3 AI accelerator, groundbreaking inference capabilities, and world records in training and molecular dynamics simulations. With achievements like 70x faster inference speeds compared to GPUs and a 1 trillion parameter model trained on a single CS-3, Cerebras has redefined efficiency and scalability in AI. Recognized by prestigious awards and global events, Cerebras is excited to continue revolutionizing AI in 2025 with bold innovation and collaboration.
The Role
In this role, you will design and own the primary interface that ML researchers and scientists use to train massive LLMs (1+ trillion parameters) on Cerebras chips. Speed of iteration is critical for accelerating breakthroughs, and you’ll be at the helm of ensuring we systematically reduce the “time-to-insight” for ML research across diverse domains.
You will work with a deeply technical product team to drive the vision and strategy for Cerebras’ ML training ecosystem, including CSTorch (our PyTorch-equivalent framework) and Model Zoo (a library of high-level abstractions and domain-specific tools for training and fine-tuning LLMs). You will lead the creation of a seamless platform that enables researchers to preprocess data, pre-train, fine-tune, and evaluate models effortlessly on Cerebras hardware. By building intuitive workflows, extensible tools, and integrated libraries, you’ll empower both cutting-edge ML research and domain-specific innovation.
As the Cerebras ML Training PM, you’ll play a pivotal role in advancing AI across industries, working with the most cutting-edge training techniques and collaborating with a world-class research and engineering team.
Your Impact
- Develop and provide a deep intuition for the ML researcher training workflow, including data preprocessing, training, fine-tuning, and evaluations.
- Define and execute the product roadmap for Model Zoo (our ML training library) and CSTorch (our ML framework), ensuring they form a flexible, beautifully designed, and extensible platform.
- Collaborate with our internal AppliedML team, as well as external ML researchers and domain scientists to design features that dramatically reduce time-to-insight and accelerate breakthroughs.
- Drive cross-functional collaboration to align product roadmaps and execute priorities across frameworks and libraries.
- Be the voice of the user! Define relevant success metrics and continuously incorporate both feedback and emerging trends in ML to refine CSTorch and ModelZoo, maintaining leadership in the space.
- Work across Product, Engineering, and business leadership to help define our product go-to-market approach to maximize value to users and expand our user community over time.
- Communicate roadmaps, priorities, experiments, and decisions clearly across a wide spectrum of audiences from internal customers to executives.
Requirements
- Bachelor’s or Master’s degree in computer science, electrical engineering, physics, mathematics, a related scientific/engineering discipline, or equivalent practical experience
- 4-10+ years of product management experience in developer tools, ML frameworks, or software platforms.
- Strong understanding of typical training and fine-tuning workflows, including model development, iterative experimentation, and debugging.
- Familiarity with machine learning/deep learning concepts and techniques for training modern models
- Proven ability to collaborate across engineering, research, and user-facing teams to deliver impactful solutions.
- Experience working with a data science/ML stack, including TensorFlow and PyTorch
- Experience developing machine learning applications or building tools for machine learning application developers
- An entrepreneurial sense of ownership of overall team and product success, and the ability to make things happen around you. A bias towards getting things done, owning the solution, and driving problems to resolution
- Outstanding presentation skills with a strong command of verbal and written communication
Why Join Cerebras
People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:
- Build a breakthrough AI platform beyond the constraints of the GPU
- Publish and open source their cutting-edge AI research
- Work on one of the fastest AI supercomputers in the world
- Enjoy job stability with startup vitality
- Our simple, non-corporate work culture that respects individual beliefs
Read our blog: Five Reasons to Join Cerebras in 2024.
Apply today and become part of the forefront of groundbreaking advancements in AI.Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.
This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Architecture Computer Science Deep Learning Engineering GPU Helm LLMs Machine Learning Mathematics ML models Open Source Physics PyTorch Research TensorFlow
Perks/benefits: Career development Flex hours Startup environment Team events
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.