Research Engineer, Language Model Evaluations

Palo Alto

Full Time Mid-level / Intermediate USD 100K - 295K *

Hippocratic AI

The First Safety Focused LLM for Healthcare

View company page

Apply now Apply later

Posted 3 weeks ago

Hippocratic AI’s mission is to develop the first safest focused Large Language Model (LLM) for healthcare. The company believes that a safe LLM can dramatically improve healthcare accessibility and health outcomes in the world by bringing deep healthcare expertise to every human. No other technology has the potential to have this level of global impact on health. The company was co-founded by CEO Munjal Shah, alongside a group of physicians, hospital administrators, healthcare professionals, and artificial intelligence researchers from El Camino Health, Johns Hopkins, Washington University in St. Louis, Stanford, Google, and Nvidia. Hippocratic AI has received a total of $120M in funding and is backed by leading investors, including General Catalyst, Andreessen Horowitz, Premji Invest, and SV Angel.

About the role

We are looking for a Research Engineer to lead evaluations for Hippocratic AI’s 1 trillion+ parameters constellation of Large Language Models. Your job will be to design and implement evaluations that allow Hippocratic AI to evaluate the performance and safety of our models. As a Research Engineer focused on Evaluation, you'll work closely with our research and applied science teams to design experiments and build evaluation infrastructure. You'll help validate performance and safety across a wide range of important tasks. You’ll help to assure that our LLMs are well-benchmarked with known performance and safety on a wide range of healthcare related tasks, allowing us to compare against human feedback.

Requirements:

5+ years Python programming experience / machine learning research
Have experience using Large Language Models, preferably have trained or fine tuned large models in the past.
Are comfortable writing code
Want to learn more about machine learning research
Care about patient safety
You want to design and implement rigorous evaluations

Preferred:

Building user interfaces for data analysis
Developing robust evaluation metrics for language models
Handling textual dataset sourcing, curation, and processing tasks at scale
Statistics

Representative projects:

Designing and running a new evaluation that tests our model’s reasoning capabilities
Leading the vision of what it takes to safely evaluate patient safety in the world of Generative AI
Devise a consistent but representative evaluation suite for healthcare conversations
Running experiments to determine how prompting techniques affect results on industry benchmarks
Improving the tooling that researchers use to implement evaluations
Explaining our evaluations and their results to internal decision makers and Stakeholders
Collaborating with a research team to develop a robust evaluation for a new model capability they are developing

Apply now Apply later

Share this job via
or

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Tags: Data analysis Generative AI LLMs Machine Learning Prompt engineering Python Research Statistics

Region: North America

Country: United States

Job stats: 14 7 0

Categories: Engineering Jobs NLP Jobs Research Jobs

More jobs like this

« Back to job search To the top ↑

Explore more AI, ML, Data Science career opportunities

Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.

Research Engineer, Language Model Evaluations

Palo Alto

Full Time Mid-level / Intermediate USD 100K - 295K *

Hippocratic AI

Requirements:

Preferred:

Representative projects:

More jobs like this

Data ETL Engineer

Associate Director, Data Management and Analytics Lead

Data Analytics Engineer, YouTube

Data Engineer II - Riot Data Products, Player Journey

Developer-Bigdata

Data Engineering Associate

Software Engineer, Supercomputing Scheduling

Research Scientist, Google Research

Data and Machine Learning Engineer

Chatbot Developer

Explore more AI, ML, Data Science career opportunities