Data Engineer

Cambridge, England, United Kingdom

Full Time Mid-level / Intermediate GBP 60K - 111K *

Origin Sciences

Origin Sciences is a medical technology company that develops innovative devices and tests to aid the detection and diagnosis of gastrointestinal diseases.

View all jobs at Origin Sciences

Apply now Apply later

Posted 3 weeks ago

Origin Sciences is a start-up biotechnology company based in Granta Park, just south of Cambridge. We develop our own innovative medical devices, which we use in clinical trials to collect a biobank of mucus-based biospecimens. This biobank provides clinical material for our research and development streams to assist with diagnostic development.

Our primary disease area is Colorectal Cancer (CRC). We are creating a minimally invasive and accurate CRC diagnostic, which will allow the NHS to focus more resources on patients with serious pathologies. Our motivations are to reduce NHS waiting times, enable earlier CRC detection and reduce unnecessary investigations performed on healthy patients.

Our CRC diagnostic is analogous to blood-based liquid biopsies. However, we have the advantage of evaluating material that was collected closer to the pathology of interest.

The role:

We are seeking a Data Engineer to join our Data Team to manage our data infrastructure.

The Data Engineer plays a key role integrating data between teams to streamline data flow at an organisational level. They will be responsible for managing data flow and cloud infrastructure.

Origin Sciences uses state-of-the-art sequencing methodologies to analyse our mucus-based biospecimens. This sequencing produces large volumes of data, which must be processed through bioinformatics pipelines. The outputs of these pipelines need to be available to the Data Team for downstream analysis. The Data Engineer would be responsible for managing the cloud infrastructure to support our sequencing projects for both clinical analytics and BI reporting.

Main Duties & Responsibilities:

Manage organisational cloud resources to enable us to process large volumes of sequencing data.
Implement improvements to enhance usability and security of our infrastructure.
Implement automation to streamline data collection from laboratory instruments.
Use ETL to help with centralisation of organisational data
Ad hoc requests for engineering support from the laboratory and clinical teams.

Requirements

Skills & Qualification:

Bachelor of Science degree or equivalent technical degree or equivalent experience.
Python/R or equivalent programming language for data processing.
Experience configuring and managing cloud infrastructure.
Experience configuring and managing cloud resources.
Understanding of cloud security best practices.
Familiar integrating APIs with existing infrastructure for process automation.
Experience with containerisation, such as Docker or Singularity.
Experience with schedulers, such as AWS Batch, GCP Batch or Slurm.
Git and version control
Ability to critically evaluate data-handling practices in a commercial R&D environment
Understanding of data management best practices