Senior Data Management Engineer
South San Francisco, CA
Calico Life Sciences
Who We Are:
Calico (Calico Life Sciences LLC) is an Alphabet-founded research and development company whose mission is to harness advanced technologies and model systems to increase our understanding of the biology that controls human aging. Calico will use that knowledge to devise interventions that enable people to lead longer and healthier lives. Calico’s highly innovative technology labs, its commitment to curiosity-driven discovery science, and, with academic and industry partners, its vibrant drug-development pipeline, together create an inspiring and exciting place to catalyze and enable medical breakthroughs.
Position Description:
As a Senior Data Management Engineer, you will work closely with Calico scientists, external collaborators, and contract research organizations to help store and provide access to large, complex, and diverse biological datasets. You will develop schemas to accurately capture and document experimental results and methods at an appropriate technical level. You will advise scientists in best practices for biological metadata management and maintaining data provenance. You will assist with sanitizing and transforming project data and metadata. You must be able to learn and work independently yet collaborate well with coworkers and share their passion to advance Calico’s quest to understand aging and age-related disease.
Position Responsibilities:
- Work with scientists and engineers to identify optimal ways to prepare, annotate, store and navigate their datasets, including pairing with engineers on data application design and improvement
- Define and document best practices for capturing and entering experimental metadata, and educate scientists and collaborators about these standards
- Perform data wrangling tasks including cleaning, transforming, and labeling datasets and developing relevant schemas for storing that data
- Maintain quality control and integrity of current and archived data
- Build data models and processes based on business and technical requirements to channel data from multiple inputs through data pipelines, ensuring successful processing and data validity
Position Requirements:
- 3+ years’ experience curating (organizing, cleaning, and efficiently manipulating) scientific datasets
- Advanced knowledge of biology (degree in life sciences or computational biology, and/or experience working in a biology lab environment)
- Detail-oriented with strong organizational, project management and analytical skills
- Ability to work effectively with scientists and engineers to elucidate and translate data organization needs into written requirements and specifications
- Ability to understand scientific literature, experimental procedures and their limitations, and current needs of the research community
- Knowledge of SQL; familiarity with relational databases, relational data concepts and data modeling
- Ability to clearly and concisely communicate technical, scientific and non-technical information, both verbally and in writing
- Experience writing shell scripts and/or Python – including basic data extraction, transformation, loading, and analysis scripts
- Must be willing to work onsite at least 4 days a week
Nice to Have:
- Familiarity with controlled vocabularies and ontologies
- Advanced knowledge of bioinformatics, genomics, and proteomics methods
- Advanced knowledge of data structures and formats used in scientific approaches
- Experience assisting clinical personnel in data and metadata submission
- Understanding of current regulatory guidelines, GCP, and industry standards, practices, and terminologies regarding data management
- Ability to provide product specification and review as part of software development
- Experience with Unix tools for data manipulation
- Familiarity with software development processes in a collaborative setting, e.g. reading and reviewing teammates’ code in GitHub or similar source control
- Experience interacting with information systems programmatically via a web API
- Experience with data quality assessment
- Applied Machine Learning experience for curation of historical / legacy lab data
The estimated base salary range for this role is $173,000 - $180,000. Actual pay will be based on a number of factors including experience and qualifications. This position is also eligible for two annual cash bonuses.
Tags: APIs Bioinformatics Biology Data management Data pipelines Data quality GCP GitHub Machine Learning Pipelines Python RDBMS Research SQL
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.