AI/ML Engineer
United States - Remote
Blueprint
Do more with less. Optimize your cloud spend and maximize your profit margins with Blueprint's cloud and data optimization.About Blueprint
At Blueprint, we’re on a mission to empower therapists with world-class tools so they can focus on what matters most—delivering exceptional mental health care.
Our AI assistant is purpose-built for therapists, automating the administrative tasks that slow them down and enabling them to operate at the top of their license. With Blueprint, therapists aren’t just managing their work; they’re supported by tools that understand the context of each client interaction. Compared to legacy software tools, Blueprint feels more like having the world’s best executive assistant at your side.
Today, over 50,000 therapists are on Blueprint, leveraging our platform to enhance care for hundreds of thousands of clients. We’ve found strong product-market fit and are scaling rapidly to meet demand.
Our organization is very flat and our team is intentionally small and talent-dense. We like people who are truthseekers, creative, and passionate about improving mental health care.
We’re a remote-first company (US and Canada only, for now) and come together in person a few times a year to connect, have fun, and help shape the future of mental health care.
About the role
We’re looking for an experienced AI/ML Engineer to take ownership of evaluation and quality across our AI systems. At Blueprint, AI isn’t a bolt-on — it’s the foundation of our product. We use LLMs to automate clinical documentation, deliver clinical insights, and reimagine how therapists work.
This role is about making sure those systems work reliably, safely, and well. You’ll design the evaluation infrastructure that helps us measure what “good” looks like across subjective, human-centered workflows and build the tools to track, test, and improve model outputs over time.
You’ll work closely with engineering, product, and clinical leaders to define quality in practical, therapist-facing terms and make sure we have the systems in place to deliver it consistently.
This is a highly cross-functional, high-impact role. Your work will directly shape what tens of thousands of therapists experience when they use our product every day.
What You’ll Do
- Design and build our end-to-end evaluation infrastructure: LLM-as-a-judge, human QA pipelines, offline scoring, and more
- Define and implement application-specific quality metrics — not just accuracy, but tone, structure, clinical alignment, and more
- Collaborate with product and clinical leads to turn subjective requirements into structured evaluation criteria
- Monitor and analyze model performance across different therapist cohorts and workflows
- Build tools and processes to capture in-the-wild feedback from clinicians and route it back into model and product improvement loops
- Work closely with engineers to integrate eval into our CI, deployment, and iteration cycles
- Help shape data labeling, prompt evaluation, experiment design, and prompt tuning frameworks
Who We’re Looking For
You’re a hands-on ML/AI practitioner who’s passionate about building high-quality systems that actually get used — not just optimizing for benchmark scores. You’ve worked with LLMs in production at scale and know the hard part is making outputs reliable, human-aligned, and easy to evaluate. You’re motivated by impact, comfortable with ambiguity, and thrive in early-stage, fast-paced environments.
You might be a fit if:
- You’ve built or owned evaluation infrastructure for LLMs or generative AI products
- You have experience designing QA workflows, human-in-the-loop systems, or LLM-as-a-judge pipelines
- You think in terms of feedback loops — and can turn fuzzy product goals into testable quality metrics
- You write code, ship experiments, and are comfortable working across the stack to get the right signals flowing
- You’re excited about working closely with product, design, and domain experts to define and refine what “good” means in a real-world AI application
Bonus if you have:
- Experience in healthcare, mental health, or other high-trust environments
- Familiarity with labeling, data QA, or prompt engineering at scale
- A strong POV on eval tools, metrics, or best practices — and a willingness to invent new ones where needed
Benefits
- Competitive salary and equity
- 100% remote – no office, no commuting
- Health, dental, and vision insurance, with 75% of your premium covered by Blueprint
- Semi-annual team gatherings (in Chicago!)
- Unlimited PTO
- Opportunities to grow with the company and shape our product
- Hardworking, mission-driven, friendly coworkers
Blueprint is an equal opportunity employer and does not discriminate on the basis of race, gender, sexual orientation, gender identity/expression, national origin, disability, age, genetic information, veteran status, marital status, pregnancy or related condition, or any other basis protected by law.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Data QA Engineering Generative AI LLMs Machine Learning Pipelines Prompt engineering
Perks/benefits: Career development Competitive pay Equity / stock options Health care Insurance Salary bonus Startup environment Unlimited paid time off
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.