Senior Data Management Engineer

South San Francisco, CA

Full Time Senior-level / Expert USD 173K - 180K

Calico Life Sciences

View all jobs at Calico Life Sciences

Apply now Apply later

Posted 2 days ago

Who We Are:

Calico (Calico Life Sciences LLC) is an Alphabet-founded research and development company whose mission is to harness advanced technologies and model systems to increase our understanding of the biology that controls human aging. Calico will use that knowledge to devise interventions that enable people to lead longer and healthier lives. Calico’s highly innovative technology labs, its commitment to curiosity-driven discovery science, and, with academic and industry partners, its vibrant drug-development pipeline, together create an inspiring and exciting place to catalyze and enable medical breakthroughs.

Position Description:

As a Senior Data Management Engineer, you will work closely with Calico scientists, external collaborators, and contract research organizations to help store and provide access to large, complex, and diverse biological datasets. You will develop schemas to accurately capture and document experimental results and methods at an appropriate technical level. You will advise scientists in best practices for biological metadata management and maintaining data provenance. You will assist with sanitizing and transforming project data and metadata. You must be able to learn and work independently yet collaborate well with coworkers and share their passion to advance Calico’s quest to understand aging and age-related disease.

Position Responsibilities:

Work with scientists and engineers to identify optimal ways to prepare, annotate, store and navigate their datasets, including pairing with engineers on data application design and improvement
Define and document best practices for capturing and entering experimental metadata, and educate scientists and collaborators about these standards
Perform data wrangling tasks including cleaning, transforming, and labeling datasets and developing relevant schemas for storing that data
Maintain quality control and integrity of current and archived data
Build data models and processes based on business and technical requirements to channel data from multiple inputs through data pipelines, ensuring successful processing and data validity

Position Requirements:

3+ years’ experience curating (organizing, cleaning, and efficiently manipulating) scientific datasets
Advanced knowledge of biology (degree in life sciences or computational biology, and/or experience working in a biology lab environment)
Detail-oriented with strong organizational, project management and analytical skills
Ability to work effectively with scientists and engineers to elucidate and translate data organization needs into written requirements and specifications
Ability to understand scientific literature, experimental procedures and their limitations, and current needs of the research community
Knowledge of SQL; familiarity with relational databases, relational data concepts and data modeling
Ability to clearly and concisely communicate technical, scientific and non-technical information, both verbally and in writing
Experience writing shell scripts and/or Python – including basic data extraction, transformation, loading, and analysis scripts
Must be willing to work onsite at least 4 days a week

Nice to Have:

Familiarity with controlled vocabularies and ontologies
Advanced knowledge of bioinformatics, genomics, and proteomics methods
Advanced knowledge of data structures and formats used in scientific approaches
Experience assisting clinical personnel in data and metadata submission
Understanding of current regulatory guidelines, GCP, and industry standards, practices, and terminologies regarding data management
Ability to provide product specification and review as part of software development
Experience with Unix tools for data manipulation
Familiarity with software development processes in a collaborative setting, e.g. reading and reviewing teammates’ code in GitHub or similar source control
Experience interacting with information systems programmatically via a web API
Experience with data quality assessment
Applied Machine Learning experience for curation of historical / legacy lab data

The estimated base salary range for this role is $173,000 - $180,000. Actual pay will be based on a number of factors including experience and qualifications. This position is also eligible for two annual cash bonuses.

Apply now Apply later

Job stats: 0 0 0

Category: Engineering Jobs

Tags: APIs Bioinformatics Biology Data management Data pipelines Data quality GCP GitHub Machine Learning Pipelines Python RDBMS Research SQL