Garvan Summer Scholarship Program 2024/25

Garvan / TKCC Sydney

The Summer Scholarship Program, provides an exceptional opportunity for currently enrolled undergraduate students to engage in research projects during the summer of 2024/2025.

This program is designed to immerse highly talented undergraduates, particularly those in Science or related disciplines, in the research process. It aims to enrich your educational journey and inspire a deeper interest in research or related fields.

Participants will gain invaluable experience by working alongside our esteemed supervisors on meaningful research projects, providing a fantastic insight into the research process and helping you determine if a research career is right for you.

The program spans 8 weeks, offering 11 scholarship positions. Each scholarship is valued up to $5,000, with funding allocated at a rate of $625 per week based on the program's duration.

We encourage you to seize this opportunity to enhance your academic and professional development.

WHAT YOU WILL DO

Our range of projects for the Summer of 2024/2025 :

Pan-Cancer analysis of fetal-like cells in tumours at spatial resolution

We discovered that fetal-like cells in liver cancer play an essential role in immune suppression (Sharma et al., Cell 2020) and impact clinical outcomes in patients treated with immunotherapy (Li et al., Nature Cancer 2024). However, whether such cells exist in other cancers and play any role in their response to therapy remains unknown. This project utilises pan-cancer spatial transcriptomic data to investigate which other tumours exhibit fetal-like properties in the microenvironment.

What you will learn On this project you will specifically learn how to explore spatial transcriptomic data to identify fetal-like cells in the tumour microenvironment. This project will also allow fellows to work on the interphase of developmental biology, cancer genomics and computational science.

Prerequisites R, Python
 

Decoding the epigenetic principles of early embryonic development.
During embryonic development, gene expression dynamics are tightly controlled by the epigenome. This includes cytosine DNA methylation, DNA wrapping around histone-containing nucleosomes, and post-translational histone modifications. These epigenetic marks function to activate or repress gene transcription, ultimately enabling the development and regulation of diverse cell and tissue types. Using the latest single-cell epigenome and transcriptome sequencing technologies, this project will investigate the fundamental epigenetic mechanisms driving the complex organization of tissues in a developing embryo. These findings will provide a necessary foundation for exploring how aberrant changes to the epigenome drive congenital disorders associated with improper organ development and positioning, such as heterotaxy syndrome and dextrocardia.


What you will learn This project would suit a motivated individual familiar with the basic principles of gene expression and next-generation sequencing, who has a keen interest in analysing data from cutting-edge sequencing technologies.
Prerequisites R, Unix shell

Evaluation and finetuning of phenotype concept recognition tools
Many genomics applications require clinical information in the form of HPO terms. The Human Phenotype Ontology (HPO) is a controlled vocabulary of more than 10’000 terms describing every imaginable clinical feature, e.g. ‘brachydactyly’ (short fingers). Various NLP methods have been developed to automatically extract HPO terms from case reports in the medical literature, but the field is hampered by the lack of a comprehensive ‘gold corpus’, a manually annotated reference data set, which is essential for both evaluation and training. We have recently created such a gold corpus, containing over 6500 annotations on almost 300 case reports.

What you will learn In this project, you will use this dataset to evaluate the performance of current state-of-the-art HPO phenotype concept recognition tools. If time permits, we can also use this dataset to finetune an existing tool.
Prerequisites Python (required), familiarity with NLP and LLMs (helpful)

Building a polygenic risk prediction pipeline for whole genome sequence data
Polygenic risk scores are an emerging genetic risk prediction instrument, with opportunities for clinical application across almost any common disease (e.g. cancer, diabetes, coronary artery disease, glaucoma). The project will leverage a range of genomic datasets generated across a series of technology platforms, including high- and low-coverage WGS, long-read WGS, genotyping array, and a blended genome-exome approach).


What you will learn In this project, the successful candidate will build and validate a pipeline to calculate polygenic risk scores (PRS) from whole genome sequence data, which will have valuable applications across multiple clinical cohorts for genetic diagnosis and risk prediction. This project would suit a motivated individual with strong computational skills, and a keen interest in genomics and personalised medicine.
Prerequisites Unix, Python/R

Developing a Novel Diagnostic Method for the Muscle Disease FSHD Using Long-Read Sequencing
Facioscapulohumeral muscular dystrophy (FSHD) is a hereditary disorder causing progressive muscle weakness. Current diagnostic methods are slow, complex, and often yield inconclusive results. This project aims to develop a streamlined diagnostic test for FSHD using Oxford Nanopore Technologies (ONT) long-read sequencing, which is the only technology equipped to read through the challenging D4Z4 repeat region on chromosome 4. In addition to genotyping key regions, ONT sequencing allows simultaneous profiling of DNA methylation, a critical feature for diagnosing FSHD. By integrating both genetic and epigenetic markers, we will design a single, comprehensive test that improves accuracy, reduces testing time, and resolves unsolved cases. The student will implement a bioinformatics pipeline for analysing both genetic and epigenetic data, helping to optimise the test and validate it using clinical samples.

What you will learn
Prerequisites GitHub for hosting the comparison and automation with GitHub actions Bash, Python, and R for running tools and visualisation Enthusiasm for genomics data analysis

Understanding the heterogeneity of breast cancer
Breast cancer is known to be genetically and clinically heterogeneous. Several omics-based methods have been developed to categorize breast cancer patients. Yet a conundrum in the field has been the lack of consistency between these subtyping methods (PAM50, SCMGENE, SCMOD, SSP, IntClust, AIMS), which has stalled widespread adoption of these omics-based approaches to categorise breast cancer in the clinic. With the advent of more sophisticated biotechnologies like single-cell and spatial omics, we are provided with the opportunity to further explore these methods using the higher resolution data.

What you will learn In this project, you will be given the opportunity to 1) apply several omics subtyping methods on breast cancer data generated using bulk and single-cell technologies; 2) compare the subtyping results between these methods; and 3) investigate the heterogeneity of breast cancer.
Prerequisites Experience in R is preferred

Development of REDCap Integration for CTRL using Typescript and REST
The Garvan Data Science Platform maintains the open-source dynamic consent platform CTRL (https://github.com/Australian-Genomics/CTRL). We are currently developing a new version of CTRL, incorporating updated architectural elements and transitioning the codebase to TypeScript to broaden its open-source user base.


A key feature of the existing CTRL is its ability to integrate with various relevant services, including the widely-used electronic data collection software REDCap. REDCap’s REST API is utilized to share study surveys and patient data.


What you will learn We are seeking a motivated university student in Computer Science, Engineering, or a related discipline to assist in developing the new REDCap integration. The ideal candidate will possess strong software development skills, a good understanding of TypeScript, and familiarity with REST APIs. This project will also offer an opportunity to engage with and learn about various aspects of CTRL, including its database, admin portal, and front-end.
Prerequisites Typescript, REST API
 

Harnessing single-cell multi-omics and population genetics to identify novel regulatory elements for autoimmune diseases
Autoimmune diseases are of great public health burden globally. Genetic variation plays an important role in the pathogenesis of autoimmune diseases. Understanding their genetic aetiology and how the disease-causal genes are regulated in different immune cell types are critical to developing curative therapies. Hundreds of genetic variants have been linked to autoimmune diseases, but their biological mechanisms are poorly understood. Only 20~30% autoimmune disease risk variants are found to be associated with gene expression in immune cells, and little is known of how regulatory elements, such as promoter and enhancer, contribute to such limited overlap. This project aims to integrate large-sale single-cell multi-omics dataset (i.e., TenK10K) to identify genetic variations that are associated with chromatin accessibility in cell-type-specific patterns.

What you will learn The candidate will obtain hands-on experience in large-scale genomics data analysis, high-performance computer, and software development.
Prerequisites Basic skills in Unix are required. Experience in R/Python is preferred. Experience in C/C++ will be given priority.

Multi-omics single-cell data integration and gene regulatory network inference using deep learning
Advances in single-cell sequencing technologies have made it possible to explore regulatory landscapes across multiple omics layers, including transcriptome (scRNA-seq), chromatin accessibility (scATAC-seq), and DNA methylation (snmC-seq, sci-MET). This offers a valuable opportunity to uncover the regulatory networks in various cell types. One of the significant computational challenges when performing integration of unpaired multi-omics data is the variation in feature spaces between the different modalities. This project will adopt the state-of-the-art deep learning methods to integrate unpaired single-cell RNA-seq and chromatin accessibility datasets and further inference gene regulatory networks in different cell types.

What you will learn The candidate will learn new skills relevant to single-cell cohort curation, multi-omics integration, and genetic regulatory network inference.
Prerequisites Basic skills in Unix are required. Experience in R/Python is preferred.

Identifying novel genetic markers for coronary artery disease by combining blood samples from both patients and healthy donors
Genetic variations have played a critical role in human complex diseases and explain a large proportion of the population variations. Genome-wide association study (GWAS) has identified thousands of genetic loci associated with complex diseases. However, most of them are located in non-coding regions, making it hard to interpret their function and understand how they confer the disease risk via regulatory roles. Gene expression data is often combined with GWAS results to understand the biological mechanism of the disease-associations. However, most publicly available gene expression data is obtained from healthy donors, which could miss potential signals that only exist in patient samples. This project will investigate whether utilising gene expression data of blood sample from patients will improve the power of detecting functionally relevant genes compared to that from healthy donors. Coronary artery disease will be used as a model trait, but the statistical framework will be applicable to all human complex diseases.

What you will learn The candidate will learn how to perform genome-wide association analysis, eQTL association, and statistical colocalisation. They will also obtain hands-on experience in statistical modelling and computer programming.
Prerequisites Basic skills in Unix are required. Experience in R/Python is preferred

Genomic Research Results Return in Australasia
The Clinical Translation & Engagement platform runs a program called My Research Results (MyRR) which was developed and staffed by genetic counsellors. MyRR is available to Human Research Ethics Committee approved studies Australia-wide and comprises genetic counselling services to notify research participants of clinically actionable research findings, support for researchers with developing an ethical strategy for managing research findings and an online information platform. My Research Results currently returns research findings for genomic research cohorts, however many researchers and cohort owners in Australia do not know about the program, and the program has never been offered to Contract Research Organisations.

What you will learn This project would involve desktop research to define the landscape for My Research Results and develop short questionnaires and interview schedules to further understand who (else) is returning results and the potential for My Research Results to partner with more studies and providers. The successful scholar would also undertake a rapid literature review and some supervised data analysis of return of results outcomes from the MyRR programs and contribute to publication of the data.
Prerequisites Microsoft Office/Google workspace; research methods subjects including literature reviews, annotated bibliographies and an introduction to qualitative and/or quantitative research

HOW TO APPLY

To apply you must complete both Parts as below:

Part One:

Your application via the Garvan Careers Site/Workday should include:

  • Copy of your CV/resume [no more than five (5) pages]

  • Cover letter outlining which project(s) you are applying for [one page only]

  • Copy of your academic transcript/s

[Note - Our system requires these documents to be compiled into one PDF document]

Part Two:

In addition to submitting your application via Workday, please complete the Student Applicant Form at: https://airtable.com/appoHuAqO2nvaFrzD/pagC4GNTOIb4cFdmM/form

Note:

All applications must be submitted via the Garvan Careers site.

Incomplete applications or applications without supporting documents will not be assessed.

CLOSING DATE

The position will remain open until filled. We will be reviewing applications as they are received, and so we encourage you to submit your application as soon as possible.

We aim to have positions filled by the end of October 2024 for project commencement in Mid-November 2024. All applicants will be notified of the outcome of their application by the end of October 2024.

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  12  2  0
Category: Research Jobs

Tags: Airtable APIs Bioinformatics Biology Computer Science Data analysis Deep Learning Engineering GitHub LLMs NLP Open Source Python R Research REST API Statistics Testing TypeScript

Perks/benefits: Career development Health care

Regions: Asia/Pacific Europe