Healthcare Data Engineer

CA, San Francisco, United States of America

Apply now Apply later

We are seeking a Healthcare Data Engineer to architect, develop, and scale pipelines that harmonize and integrate the EHR data across different datasets. In this hands-on role, you will design and maintain high-throughput ETL workflows, apply standards such as HL7 FHIR, OMOP, and SNOMED to guarantee interoperability, and collaborate with bioinformatics, clinical, product, and engineering teams to deliver secure, research-ready data for our expanding disease predicting pipeline.

Key Responsibilities

Data Standardization & Interoperability

  • Map heterogeneous data to HL7 FHIR, OMOP, SNOMED CT, ICD-10/11, LOINC, RxNorm, and related vocabularies.

  • Maintain high fidelity and minimal data loss through ontology-driven mapping and validation.

Design & Implement ETL Pipelines

  • Work with the engineering team to improve the workflows to ingest, de-identify, and harmonize clinical data from various EHR systems.

  • Integrate structured and unstructured data (clinical notes, imaging, lab results) into a unified schema.

Cloud Architecture & Scalability

  • Work with the engineering team to maintain a secure, cloud-based infrastructure capable of supporting petabyte-scale datasets.

  • Leverage distributed computing frameworks (e.g., Apache Spark, Databricks) for high-throughput data processing.

Privacy & Security

  • Ensure compliance with HIPAA, GDPR, and other applicable regulations.

  • Implement federated data-sharing patterns and robust encryption for data in transit and at rest.

Data Quality & Validation

  • Work with the engineering team to build automated anomaly-detection pipelines for real-time data quality checks.


Collaboration & Communication

  • Work with cross-functional teams (engineering, product, clinical, lab) to set timelines and roadmaps.

  • Share daily progress and surface blockers early while following established best practices in healthcare data engineering.

Skills and Experience

  • PhD in CS, Bioinformatics, or a related field; OR5+ years of experience in data engineering with at least 2+ years specific to healthcare or clinical informatics.

  • Hands-on knowledge of HL7 FHIR, OMOP, SNOMED CT, and other healthcare data standards.

  • Proficiency in SQL and one or more programming languages (Python, C+).

  • Experience with cloud platforms (AWS, Azure, or GCP) and distributed frameworks (Spark, Databricks).

  • Familiarity with privacy-preserving architectures, data encryption, and federated data models.

  • Demonstrated success in building ETL pipelines .

  • Strong communication skills to translate complex data requirements into actionable plans for cross-functional teams.

  • Nice to have: Familiarity with genomic data, and/or NLP for clinical text.

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  0  0  0
Category: Engineering Jobs

Tags: Architecture AWS Azure Bioinformatics Databricks Data quality Engineering ETL GCP HL7 LOINC NLP OMOP PhD Pipelines Privacy Python Research RxNorm Security SNOMED Spark SQL Unstructured data

Region: North America
Country: United States

More jobs like this