AI/ML Engineer

United States - Remote

Applications have closed

Remote-first
Website
LinkedIn
Search

Blueprint

Do more with less. Optimize your cloud spend and maximize your profit margins with Blueprint's cloud and data optimization.

View all jobs at Blueprint

Find more jobs like this Jobs in the United States

Posted 1 month ago

About Blueprint

At Blueprint, we’re on a mission to empower therapists with world-class tools so they can focus on what matters most—delivering exceptional mental health care.

Our AI assistant is purpose-built for therapists, automating the administrative tasks that slow them down and enabling them to operate at the top of their license. With Blueprint, therapists aren’t just managing their work; they’re supported by tools that understand the context of each client interaction. Compared to legacy software tools, Blueprint feels more like having the world’s best executive assistant at your side.

Today, over 50,000 therapists are on Blueprint, leveraging our platform to enhance care for hundreds of thousands of clients. We’ve found strong product-market fit and are scaling rapidly to meet demand.

Our organization is very flat and our team is intentionally small and talent-dense. We like people who are truthseekers, creative, and passionate about improving mental health care.

We’re a remote-first company (US and Canada only, for now) and come together in person a few times a year to connect, have fun, and help shape the future of mental health care.

About the role

We’re looking for an experienced AI/ML Engineer to take ownership of evaluation and quality across our AI systems. At Blueprint, AI isn’t a bolt-on — it’s the foundation of our product. We use LLMs to automate clinical documentation, deliver clinical insights, and reimagine how therapists work.

This role is about making sure those systems work reliably, safely, and well. You’ll design the evaluation infrastructure that helps us measure what “good” looks like across subjective, human-centered workflows and build the tools to track, test, and improve model outputs over time.

You’ll work closely with engineering, product, and clinical leaders to define quality in practical, therapist-facing terms and make sure we have the systems in place to deliver it consistently.

This is a highly cross-functional, high-impact role. Your work will directly shape what tens of thousands of therapists experience when they use our product every day.

What You’ll Do

Design and build our end-to-end evaluation infrastructure: LLM-as-a-judge, human QA pipelines, offline scoring, and more
Define and implement application-specific quality metrics — not just accuracy, but tone, structure, clinical alignment, and more
Collaborate with product and clinical leads to turn subjective requirements into structured evaluation criteria
Monitor and analyze model performance across different therapist cohorts and workflows
Build tools and processes to capture in-the-wild feedback from clinicians and route it back into model and product improvement loops
Work closely with engineers to integrate eval into our CI, deployment, and iteration cycles
Help shape data labeling, prompt evaluation, experiment design, and prompt tuning frameworks

Who We’re Looking For

You’re a hands-on ML/AI practitioner who’s passionate about building high-quality systems that actually get used — not just optimizing for benchmark scores. You’ve worked with LLMs in production at scale and know the hard part is making outputs reliable, human-aligned, and easy to evaluate. You’re motivated by impact, comfortable with ambiguity, and thrive in early-stage, fast-paced environments.

You might be a fit if:

You’ve built or owned evaluation infrastructure for LLMs or generative AI products
You have experience designing QA workflows, human-in-the-loop systems, or LLM-as-a-judge pipelines
You think in terms of feedback loops — and can turn fuzzy product goals into testable quality metrics
You write code, ship experiments, and are comfortable working across the stack to get the right signals flowing
You’re excited about working closely with product, design, and domain experts to define and refine what “good” means in a real-world AI application

Bonus if you have:

Experience in healthcare, mental health, or other high-trust environments
Familiarity with labeling, data QA, or prompt engineering at scale
A strong POV on eval tools, metrics, or best practices — and a willingness to invent new ones where needed

Benefits

Competitive salary and equity
100% remote – no office, no commuting
Health, dental, and vision insurance, with 75% of your premium covered by Blueprint
Semi-annual team gatherings (in Chicago!)
Unlimited PTO
Opportunities to grow with the company and shape our product
Hardworking, mission-driven, friendly coworkers

Blueprint is an equal opportunity employer and does not discriminate on the basis of race, gender, sexual orientation, gender identity/expression, national origin, disability, age, genetic information, veteran status, marital status, pregnancy or related condition, or any other basis protected by law.

Find more jobs like this Jobs in the United States

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats: 19 5 0

Categories: Deep Learning Jobs Engineering Jobs Machine Learning Jobs

Tags: Data QA Engineering Generative AI LLMs Machine Learning Pipelines Prompt engineering