Senior Data Engineer | Genomic Surveillance

Wellcome Genome Campus, United Kingdom

Wellcome Sanger Institute

We are a world-leading genomics research institute in Cambridge. Our work helps improve human health and understand life on Earth

View all jobs at Wellcome Sanger Institute

Apply now Apply later

Do you want to help us improve human health and understand life on Earth? Make your mark by shaping the future to enable or deliver life-changing science to solve some of humanity’s greatest challenges.

Do you want to apply your data engineering skills and explore genomic technologies to improve healthcare and economic growth? Grow your career contributing your skills and knowledge towards global health advancement, addressing critical gaps in applied research in genomic surveillance.

About the role:

As a Senior Data Engineer within the Genomic Surveillance Unit (GSU) you will be part of the Digital and Data team (DnD), processing, analysing and interpreting genomic surveillance data at scale to enable and support public-health decision-making. 

Working with Bioinformatics, Platform (DevOps), and Quality Assurance functions  , you will deliver digital software products and services aggregating data to understand how disease migration & transmission are altered by climate change, focusing on:

  • The design and implementation of a data management and analytics platform supporting vector, pathogen, and viral projects effectively.

  • Development of end-to-end products and services for pathogen surveillance - with a particular focus on vector-borne disease and respiratory viruses

It is the responsibility of the data engineering team to support the journey from sample to data, data to observation and observation to insight to inform actionable health interventions for commercial and public health needs.

You will be responsible for:

  • Develop, maintain and operate systems and processes for integration, linking and harmonisation of data across a broad range of data sources and types.

  • Create innovative data pipelines with new technologies that enable quick queries and real time data feeds.

  • Actively maintain Data Engineering systems and ensure code, dependencies & operating systems are up to date and compatible, with maintenance roadmaps in place.

  • Maintain and promote industry-standard coding and documentation practice, as well as adopting best practice software engineering techniques both within the team and externally

  • Identify data quality issues or pipeline errors and provide updates and fixes - play a key role in providing support (service and incident management) to existing data pipelines and outputs

  • Profile and improve data pipelines to ensure the optimisation of the infrastructure used

  • Maintain consistent progress and organisation of projects, ensuring outputs are delivered within time & quality constraints

  • Support the team and the organisation with ad hoc development and analytical projects

  • Recognise and respect data and project sensitivities, partners, collaborations and data access agreements

  • Working with DevOps team to monitor capacity & utilisation within the cloud and on-premise infrastructure to support existing and new components of the data architecture

  • Provide technical mentorship and support for more junior colleagues

About You:

Well-versed in Data Engineering, alongside professional knowledge of technologies including Prefect, Spark, Hive, Gitlab & Helm, you have the ability to quickly understand technical and process challenges and break down complex problems into actionable steps.

Essential Skills:

  • Experience of big data engineering and Agile principles

  • Understanding of big data engineering tools and how they can be used strategically (we use  Spark, Hive, Trino, DBT,,, Parquet, Delta Lake, S3. It would be equally valuable if you knew similar technologies, such as Redshift, Athena, BigQuery, Databricks, Hudi, Apache Iceberg, Google Cloud Storage, etc.)

  • Python development experience: Pythonic standards, Object Oriented Programming

  • Experience with RESTFul API technology

  • Experience in defining and operating systems that integrate data from multiple sources in an environment where data provenance is essential

  • Knowledge and experience with modern software development practices, including version control, continuous integration and workflow management tools such as Jira, Gitlab, etc.

  • Familiarity with SQL and databases, both running and maintaining as well as using

  • Experience working with modern data stacks underpinned by Data Lakes, Data Warehouses, Data Lakehouses

  • Experience of Cloud based technologies and management (e.g. OpenStack, AWS, Google)

  • Experience with Data Modelling

  • Experience with Orchestration tools (e.g. Airflow, Prefect, AWS Step Functions)

About Us:

One of the biggest challenges in the battle against infectious disease is that pathogens are continually evolving. Genomic surveillance involves sequencing the genetic material of pathogenic microbes and their vectors so that evolutionary changes that affect transmission, disease severity and susceptibility to treatment can be observed. The goal of the Wellcome Sanger Institute’s Genomic Surveillance Unit (GSU) is to enable the use of genomic surveillance as a practical tool for infectious disease control and pandemic preparedness.

Additional information

Applications: Please include a cover letter along with your CV. In your cover letter, include detail on how your knowledge, skills and experience match the requirements of the role described.

Salary: £49,000 - £58,900

Closing Date: Sunday 04-MAY

Contract duration: 3 years fixed-term

Role profile: For the full list of accountabilities and criteria, please click here.

View our 2023-2024 Institute Highlights here: bit.ly/SangerHighlights2023-24  

Hybrid Working at Wellcome Sanger:

We recognise that there are many benefits to Hybrid Working; including an improved work-life balance, with more focused time, as well as the ability to organise working time so that collaborative opportunities and team discussions are facilitated on campus. The hybrid working arrangement will vary for different roles and teams. The nature of your role and the type of work you do will determine if a hybrid working arrangement is possible.

Equality, Diversity and Inclusion:

We aim to attract, recruit, retain and develop talent from the widest possible talent pool, thereby gaining insight and access to different markets to generate a greater impact on the world. We have a supportive culture with the following staff networks, LGBTQ+, Parents and Carers, Disability and Race Equity to bring people together to share experiences, offer specific support and development opportunities and raise awareness. The networks are also a place for allies to provide support to others.

We want our people to be whoever they want to be because we believe people who bring their best selves to work, do their best work. That’s why we’re committed to creating a truly inclusive culture at Sanger Institute. We will consider all individuals without discrimination and are committed to creating an inclusive environment for all employees, where everyone can thrive.

Our Benefits:

We are proud to deliver an awarding campus-wide employee wellbeing strategy and programme. The importance of good health and adopting a healthier lifestyle and the commitment to reduce work-related stress is strongly acknowledged and recognised at Sanger Institute.

Sanger Institute became a signatory of the International Technician Commitment initiative In March 2018.  The Technician Commitment aims to empower and ensure visibility, recognition, career development and sustainability for technicians working in higher education and research, across all disciplines.

Apply now Apply later
Job stats:  1  0  0
Category: Engineering Jobs

Tags: Agile Airflow APIs Architecture Athena AWS Big Data BigQuery Bioinformatics Databricks Data management Data pipelines Data quality dbt DevOps Engineering GCP GitLab Google Cloud Helm Jira OpenStack Parquet Pipelines Python Redshift Research Spark SQL Step Functions

Perks/benefits: Career development Equity / stock options Health care

Region: Europe
Country: United Kingdom

More jobs like this