Healthcare Data Engineer
CA, San Francisco, United States of America
We are seeking a Healthcare Data Engineer to architect, develop, and scale pipelines that harmonize and integrate the EHR data across different datasets. In this hands-on role, you will design and maintain high-throughput ETL workflows, apply standards such as HL7 FHIR, OMOP, and SNOMED to guarantee interoperability, and collaborate with bioinformatics, clinical, product, and engineering teams to deliver secure, research-ready data for our expanding disease predicting pipeline.
Key Responsibilities
Data Standardization & Interoperability
Map heterogeneous data to HL7 FHIR, OMOP, SNOMED CT, ICD-10/11, LOINC, RxNorm, and related vocabularies.
Maintain high fidelity and minimal data loss through ontology-driven mapping and validation.
Design & Implement ETL Pipelines
Work with the engineering team to improve the workflows to ingest, de-identify, and harmonize clinical data from various EHR systems.
Integrate structured and unstructured data (clinical notes, imaging, lab results) into a unified schema.
Cloud Architecture & Scalability
Work with the engineering team to maintain a secure, cloud-based infrastructure capable of supporting petabyte-scale datasets.
Leverage distributed computing frameworks (e.g., Apache Spark, Databricks) for high-throughput data processing.
Privacy & Security
Ensure compliance with HIPAA, GDPR, and other applicable regulations.
Implement federated data-sharing patterns and robust encryption for data in transit and at rest.
Data Quality & Validation
Work with the engineering team to build automated anomaly-detection pipelines for real-time data quality checks.
Collaboration & Communication
Work with cross-functional teams (engineering, product, clinical, lab) to set timelines and roadmaps.
Share daily progress and surface blockers early while following established best practices in healthcare data engineering.
Skills and Experience
PhD in CS, Bioinformatics, or a related field; OR5+ years of experience in data engineering with at least 2+ years specific to healthcare or clinical informatics.
Hands-on knowledge of HL7 FHIR, OMOP, SNOMED CT, and other healthcare data standards.
Proficiency in SQL and one or more programming languages (Python, C+).
Experience with cloud platforms (AWS, Azure, or GCP) and distributed frameworks (Spark, Databricks).
Familiarity with privacy-preserving architectures, data encryption, and federated data models.
Demonstrated success in building ETL pipelines .
Strong communication skills to translate complex data requirements into actionable plans for cross-functional teams.
Nice to have: Familiarity with genomic data, and/or NLP for clinical text.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Architecture AWS Azure Bioinformatics Databricks Data quality Engineering ETL GCP HL7 LOINC NLP OMOP PhD Pipelines Privacy Python Research RxNorm Security SNOMED Spark SQL Unstructured data
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.