Data Scientist AI intern

Basel Headquarter

Roche

As a pioneer in healthcare, we have been committed to improving lives since the company was founded in 1896 in Basel, Switzerland. Today, Roche creates innovative medicines and diagnostic tests that help millions of patients globally.

View all jobs at Roche

Apply now Apply later

Roche fosters diversity, equity and inclusion, representing the communities we serve. When dealing with healthcare on a global scale, diversity is an essential ingredient to success. We believe that inclusion is key to understanding people’s varied healthcare needs. Together, we embrace individuality and share a passion for exceptional care. Join Roche, where every voice matters.

The Position

Developing Automated Agents for Dataset Cleaning and Generation with Generative AI and LLMs

Supervisors: , Ercan Suekuer, Tatyana Doktorova

We are seeking an intern to support the development of a fully automated agent-based system for dataset cleaning and generation using generative AI and large language models (LLMs). The goal is to build intelligent agents capable of automatically identifying and correcting data inconsistencies, handling missing data, and generating high-quality datasets. These agents will leverage advanced AI models, including LLMs, to streamline data preparation processes.

As part of this role, you'll also collaborate with Microsoft Azure engineers, who will provide support in deploying and optimizing these systems on the Azure platform.

In this position you will: 

  • Collaborate on building an agent-based approach for automating dataset cleaning, preparation, and generation using LLMs and generative AI.

  • Design workflows that utilize AI models to identify, handle, and resolve data quality issues.

  • Explore machine learning techniques for data synthesis and augmentation to improve the availability and quality of datasets.

  • Integrate Langchain and LlamaParser into data processing workflows as part of the automation pipeline.

  • Work closely with team members to understand project needs and integrate the automated agents into existing workflows.

  • Support testing, validation, and optimization of the developed agents.

Qualifications Required:

  • You have completed your studies (Bachelor or Master) within the past 12 months, or you are currently pursuing a Master’s or PhD degree in computer science, bioinformatics, computational sciences, or a related field.

  • Strong programming skills, particularly in Python and R.

  • Familiarity with AI/ML frameworks (e.g., TensorFlow, PyTorch) and experience with generative AI and LLMs are highly desirable.

  • Experience with Langchain and LlamaParser is a plus.

  • Experience in data handling, cleaning, and preprocessing is essential.

  • Nice-to-have: Knowledge of clinical trials, CDISC data standards, and know-how of Retrieval-Augmented Generation (RAG) approaches.

  • Strong problem-solving and communication skills, with the ability to work both independently and as part of a team.

 

You have very good interpersonal and communication skills, are able to build good working relationships, and are an outstanding teammate. Your experience and investigative attitude allow you to work independently, to design, perform, and interpret experiments, and to embark on new scientific methodologies.

Start: from January until March 2025

Duration: 6-9 Months

Workload: 100%

Due to regulations non-EU/EFTA citizens must be enrolled and provide a certificate from the university stating that an internship is mandatory as part of the application documents

Who we are

At Roche, more than 100,000 people across 100 countries are pushing back the frontiers of healthcare. Working together, we’ve become one of the world’s leading research-focused healthcare groups. Our success is built on innovation, curiosity and diversity.

Basel is the headquarters of the Roche Group and one of its most important centres of pharmaceutical research. Over 10,700 employees from over 100 countries come together at our Basel/Kaiseraugst site, which is one of Roche`s largest sites. Read more.

Besides extensive development and training opportunities, we offer flexible working options, 18 weeks of maternity leave and 10 weeks of gender independent partnership leave. Our employees also benefit from multiple services on site such as child-care facilities, medical services, restaurants and cafeterias, as well as various employee events.

We believe in the power of diversity and inclusion, and strive to identify and create opportunities that enable all people to bring their unique selves to Roche.

Roche is an Equal Opportunity Employer.

Apply now Apply later
  • Share this job via
  • 𝕏
  • or
Job stats:  5  1  0

Tags: Azure Bioinformatics CDISC Computer Science Data quality Generative AI LangChain LLMs Machine Learning Pharma PhD Python PyTorch R RAG Research TensorFlow Testing

Perks/benefits: Career development Equity / stock options Flex hours Medical leave Team events

Region: Europe
Country: Switzerland

More jobs like this