Data Scientist II

Roberts Ctr Pediatric Research

Children's Hospital of Philadelphia

View all jobs at Children's Hospital of Philadelphia

Apply now Apply later

SHIFT:

Day (United States of America)

Seeking Breakthrough Makers

Children’s Hospital of Philadelphia (CHOP) offers countless ways to change lives. Our diverse community of more than 20,000 Breakthrough Makers will inspire you to pursue passions, develop expertise, and drive innovation.

At CHOP, your experience is valued; your voice is heard; and your contributions make a difference for patients and families. Join us as we build on our promise to advance pediatric care—and your career.

CHOP’s Commitment to Diversity, Equity, and Inclusion

CHOP is committed to building an inclusive culture where employees feel a sense of belonging, connection, and community within their workplace. We are a team dedicated to fostering an environment that allows for all to be their authentic selves. We are focused on attracting, cultivating, and retaining diverse talent who can help us deliver on our mission to be a world leader in the advancement of healthcare for children.

We strongly encourage all candidates of diverse backgrounds and lived experiences to apply.


A Brief Overview

The mission of the Campbell Laboratory at the Children’s Hospital of Philadelphia is to ensure that every child with a rare genetic disease is diagnosed as quickly and accurately as possible. We believe that a promising approach is to use large language models (LLMs) to better understand our patients’ health and intervene in the electronic health record to facilitate diagnosis. We hope these approaches can help address inequities in the way genetic care is provided to patients from historically marginalized backgrounds. Our lab performs training of LLMs using both on-premises GPUs as well as by taking advantage of cloud-based resources. We are seeking a data scientist to assist with the collection, standardization, and analysis of data from the electronic health record and to assist in implementation of LLM training in the cloud. The data scientist will have the opportunity to work with physicians, nurses, and laboratory professionals to improve the care of children.

The data scientist will work closely in-person with the laboratory director, Dr. Ian Campbell, and other data scientists and trainees to advance the mission of the laboratory. The laboratory’s LLMs are implemented in Python (PyTorch, JAX) using data extracted from electronic health record (Epic) relational databases. Thus, familiarity with Python and basic SQL are required. The data scientist will contribute to reproducible research by committing high quality and well documented code to the enterprise and public GitHub.

The Campbell Lab is committed to diversity and strives to create an equitable work environment for everyone. Individuals from historically marginalized backgrounds are strongly encouraged to apply.



What you will do

  • Perform exploratory data analysis in pediatric biomedical research using machine learning, statistics, and mathematical analysis incorporating heterogeneous and complex data types under direct supervision.
  • Contribute to assessing and implementing computational, algorithmic, and predictive analytic approaches to address biomedical research questions.
  • With guidance, contribute to the experimental design, execution, testing and critical evaluation of methods as applied to translational data science research projects.
  • Contribute to design and implementation of continuous validation plans for production systems that incorporate models and algorithms, providing guidelines and support for large-scale implementation.
  • Implement computational algorithms and experiments for test and evaluation; interprets data to assesses algorithm performance.
  • Participate in communication of research methods, implementation, and results to varied audience of clinicians, scientists, analysts, and programmers.
  • Work closely with hospital operations and electronic health record vendor teams to translate models and algorithms into production applications.
  • Contribute to manuscript writing for results publication, authors abstracts, and present at professional conferences.

Education Qualifications

  • Bachelor's Degree Required

  • Bachelor's Degree Analytics, Data Science, Statistics, Mathematics, Computer Science or a related field Preferred

  • Masters or PhD in Analytics, Data Science, Statistics, Mathematics, Computer Science or a related field Preferred

Experience Qualifications

  • At least three (3) years experience with progressively more complex data science, applied statistics, machine learning, or mathematical modeling projects. Required
  • At least four (4) years with progressively more complex data science, applied statistics, machine learning, or mathematical modeling projects Preferred
  • At least one year of experience with complex data science, applied statistics, machine learning, or mathematical modeling projects Preferred
  • Natural language processing experience, particularly in the biological and medical domains Preferred
  • Experience with transformer architecture and associated software (e.g., PyTorch, Tensorflow, JAX) is Preferred
  • Experience using distributed computing technologies Preferred
  • Experience with cloud virtual machine environments Preferred

Skills and Abilities

  • Experience and demonstrated ability acquiring new technical/analytic skills and domain knowledge to support successful contribution to research and development projects is required.
  • Experience formulating or contributing to the formulation of analysis plans and selection of appropriate methods.
  • Experience using existing machine learning and analytic tools in either applied educational or professional projects is required.
  • Experience writing code in either applied educational or professional projects using Python is required.
  • Familiarity with relational databases (e.g. Postgres, MySQL) strongly preferred.
  • Strong verbal and written communications skills with the demonstrated ability to explain complex technical concepts to a lay audience.
  • Applied statistics or mathematical modeling experience preferred.
  • Natural language processing experience particularly in the biological and medical domains preferred.
  • Experience using distributed computing technologies (e.g. Akka, MapReduce, Cuda) preferred.
  • Familiarity with graph, key value, and document data stores (e.g. Neo4j, Hadoop, MongoDB) preferred.
  • Experience creating informative visualizations for complex, high dimensional data preferred.


To carry out its mission, CHOP is committed to supporting the health of our patients, families, workforce, and global community. As a condition of employment, CHOP employees who work in patient care buildings or who have patient facing responsibilities must be fully vaccinated against COVID-19 and receive an annual influenza vaccine. Learn more.

Employees may request exemptions for valid religious and medical reasons. Start dates may be delayed until candidates are immunized or exemption requests are reviewed.

EEO / VEVRAA Federal Contractor | Tobacco Statement

Apply now Apply later
  • Share this job via
  • 𝕏
  • or

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  0  0  0
Category: Data Science Jobs

Tags: Architecture Computer Science CUDA Data analysis EDA GitHub Hadoop JAX LLMs Machine Learning Mathematics MongoDB MySQL Neo4j NLP PhD PostgreSQL Python PyTorch RDBMS Research SQL Statistics TensorFlow Testing

Perks/benefits: Career development Conferences

Region: North America
Country: United States

More jobs like this