Data Engineer II (BTI, Bioinformatics)
District of Columbia-Washington
Full Time Mid-level / Intermediate USD 92K - 154K
Children's National Hospital
Children's National Hospital is ranked one of the top 10 pediatric hospitals in the nation by U.S. News & World Report. Serving the nation's children for over 150 years, we are the premier provider of pediatric care in the Washington, D.C.,...Description
Description:
The Brain Tumor Institute (BTI) Bioinformatics Core at Children’s National Hospital is seeking a highly skilled Bioinformatics Engineer to join our team. This position will play a critical role in advancing research of multiple PIs to uncovering oncogenic mechanisms in pediatric brain tumors and identifying novel therapeutic targets.
The role involves close collaboration with the Director of the BTI, researchers, and clinicians at Children’s National. The successful candidate will lead implementation of Gabriella Miller Kids First workflows using AWS and/or CAVATICA as well as new workflow development and benchmarking CWL or NextFlow. The candidate will lead data modeling and database generation for current and incoming BTI genomics data files and may be responsible for project-specific engineering needs, such as API/UI development.
Key Responsibilities:
-
Responsible for implementation of Gabriella Miller Kids First workflows using AWS, CAVATICA, and/or other cloud-based tools.
-
Collaborate with bioinformatics scientists and PIs to benchmark and optimize new production-scale analysis pipelines and workflows to generate high quality and high data integrity outputs
-
Support project-specific engineering needs, such as database/API/UI development
-
Collaborate with an AWS architect and/or IT to optimize resource use as needed
-
Be responsible for AWS bucket organization and file storage management
-
Create and maintain clear documentation for data engineering workflows, including codebases, data pipelines, validation, testing, and CI/CD processes.
-
Create and implement a data model and database for BTI genomics data
Application Process:
This position may be remote or hybrid. Candidates should be prepared to share their GitHub handle and present a recent project as part of the interview process.
Qualifications
Required Qualifications:
-
Bachelor’s Degree in computational discipline or systems engineering
-
At least three (3) years of experience in a production clinical or research bioinformatics data processing role required.
-
Experience with cloud and high performance compute environments including creation and use of virtual machines, virtual environments, and job submissions
-
Experience profiling performance, benchmarking, and optimizing multiple job types and scenarios in bioinformatics data processing.
-
Experience with current standard parallel computing and data processing workflows essential (eg: Snakemake, NextFlow, CWL, WDL).
-
Experience with reproducible pipeline development including software version control, use and creation of docker and/or singularity images, collaborative code review
-
Experience diagnosing and troubleshooting pipeline errors and unexpected behaviors. This includes taking initiative whether it be debugging, online searches, contacting authors of software for assistance and generally seeking assistance as needed
-
Willingness to contribute to open source software development for the purposes of improving/meeting community standards when appropriate
-
Attention to detail.
-
Experience benchmarking and implementing harmonization workflows for a variety of NGS data types.
-
Ability to independently plan and execute pipelines and workflows of high complexity required.
-
Ability to independently engineer systems relative to larger enterprise frameworks required.
-
Strong UNIX/LINUX expertise required.
-
Expertise in support mechanisms for applications written in common bioinformatics languages such as R, Python, Perl, Java, or similar required
-
Expertise in common bioinformatics applications, data sources, and data formats required.
-
Knowledge of common NGS or other high-throughput data formats is required.
-
Expertise with resources of genomic data sets and analysis tools, such as GATK, UCSC Genome Browser, Bioconductor, ENCODE, and NCBI databases is required.
-
Demonstrated ability to develop and implement best practices for bioinformatics systems integration, testing, and deployment is required.
-
Ability to lead discussions with various information systems and technology owners to achieve desired bioinformatics outcomes is required.
Preferred Education:
-
Master’s degree in computational discipline or systems engineering
Organizational Accountabilities
Organizational Accountabilities (Staff)
Organizational Commitment/Identification
- Anticipate and responds to customer needs; follows up until needs are met
Teamwork/Communication
- Demonstrate collaborative and respectful behavior
- Partner with all team members to achieve goals
- Receptive to others’ ideas and opinions
Performance Improvement/Problem-solving
- Contribute to a positive work environment
- Demonstrate flexibility and willingness to change
- Identify opportunities to improve clinical and administrative processes
- Make appropriate decisions, using sound judgment
Cost Management/Financial Responsibility
- Use resources efficiently
- Search for less costly ways of doing things
Safety
- Speak up when team members appear to exhibit unsafe behavior or performance
- Continuously validate and verify information needed for decision making or documentation
- Stop in the face of uncertainty and takes time to resolve the situation
- Demonstrate accurate, clear and timely verbal and written communication
- Actively promote safety for patients, families, visitors and co-workers
- Attend carefully to important details - practicing Stop, Think, Act and Review in order to self-check behavior and performance
Primary Location
: District of Columbia-WashingtonWork Locations
: Remote Work Location 111 Michigan Avenue NW Washington 20010Job
: Information TechnologyOrganization
: Ctr Cancer & Immunology RsrchPosition Status: R (Regular) - FT - Full-TimeShift: DayWork Schedule: 9:00 AM - 5:30 PMJob Posting
: Dec 27, 2024, 6:20:27 PMFull-Time Salary Range
: 92684.8 - 154460.8Tags: APIs AWS Bioconductor Bioinformatics CI/CD Data pipelines Docker Engineering GitHub Java Linux Open Source Perl Pipelines Python R Research Testing
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.