Research Engineer, Language Model Evaluations
Palo Alto
Hippocratic AI’s mission is to develop the first safest focused Large Language Model (LLM) for healthcare. The company believes that a safe LLM can dramatically improve healthcare accessibility and health outcomes in the world by bringing deep healthcare expertise to every human. No other technology has the potential to have this level of global impact on health. The company was co-founded by CEO Munjal Shah, alongside a group of physicians, hospital administrators, healthcare professionals, and artificial intelligence researchers from El Camino Health, Johns Hopkins, Washington University in St. Louis, Stanford, Google, and Nvidia. Hippocratic AI has received a total of $120M in funding and is backed by leading investors, including General Catalyst, Andreessen Horowitz, Premji Invest, and SV Angel.
About the role
We are looking for a Research Engineer to lead evaluations for Hippocratic AI’s 1 trillion+ parameters constellation of Large Language Models. Your job will be to design and implement evaluations that allow Hippocratic AI to evaluate the performance and safety of our models. As a Research Engineer focused on Evaluation, you'll work closely with our research and applied science teams to design experiments and build evaluation infrastructure. You'll help validate performance and safety across a wide range of important tasks. You’ll help to assure that our LLMs are well-benchmarked with known performance and safety on a wide range of healthcare related tasks, allowing us to compare against human feedback.
Requirements:
5+ years Python programming experience / machine learning research
Have experience using Large Language Models, preferably have trained or fine tuned large models in the past.
Are comfortable writing code
Want to learn more about machine learning research
Care about patient safety
You want to design and implement rigorous evaluations
Preferred:
Building user interfaces for data analysis
Developing robust evaluation metrics for language models
Handling textual dataset sourcing, curation, and processing tasks at scale
Statistics
Representative projects:
Designing and running a new evaluation that tests our model’s reasoning capabilities
Leading the vision of what it takes to safely evaluate patient safety in the world of Generative AI
Devise a consistent but representative evaluation suite for healthcare conversations
Running experiments to determine how prompting techniques affect results on industry benchmarks
Improving the tooling that researchers use to implement evaluations
Explaining our evaluations and their results to internal decision makers and Stakeholders
Collaborating with a research team to develop a robust evaluation for a new model capability they are developing
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Data analysis Generative AI LLMs Machine Learning Prompt engineering Python Research Statistics
More jobs like this
Explore more AI, ML, Data Science career opportunities
Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.
- Open Marketing Data Analyst jobs
- Open Data Science Manager jobs
- Open Lead Data Analyst jobs
- Open Data Engineer II jobs
- Open Senior Business Intelligence Analyst jobs
- Open Principal Data Engineer jobs
- Open MLOps Engineer jobs
- Open Data Analytics Engineer jobs
- Open Data Scientist II jobs
- Open Power BI Developer jobs
- Open Junior Data Scientist jobs
- Open Business Intelligence Developer jobs
- Open Product Data Analyst jobs
- Open Business Data Analyst jobs
- Open Sr Data Engineer jobs
- Open Data Analyst Intern jobs
- Open Senior Data Architect jobs
- Open Sr. Data Scientist jobs
- Open Big Data Engineer jobs
- Open Research Scientist jobs
- Open Data Quality Analyst jobs
- Open Azure Data Engineer jobs
- Open Manager, Data Engineering jobs
- Open Principal Data Scientist jobs
- Open Data Product Manager jobs
- Open GCP-related jobs
- Open Data quality-related jobs
- Open Java-related jobs
- Open ML models-related jobs
- Open Business Intelligence-related jobs
- Open Data management-related jobs
- Open Privacy-related jobs
- Open Deep Learning-related jobs
- Open Data visualization-related jobs
- Open PhD-related jobs
- Open Finance-related jobs
- Open NLP-related jobs
- Open PyTorch-related jobs
- Open TensorFlow-related jobs
- Open APIs-related jobs
- Open Consulting-related jobs
- Open LLMs-related jobs
- Open CI/CD-related jobs
- Open Generative AI-related jobs
- Open Snowflake-related jobs
- Open Hadoop-related jobs
- Open Kubernetes-related jobs
- Open Data governance-related jobs
- Open Databricks-related jobs
- Open Airflow-related jobs