Data Scientist
Madrid Osiris, Spain
Roche
As a pioneer in healthcare, we have been committed to improving lives since the company was founded in 1896 in Basel, Switzerland. Today, Roche creates innovative medicines and diagnostic tests that help millions of patients globally.At Roche you can show up as yourself, embraced for the unique qualities you bring. Our culture encourages personal expression, open dialogue, and genuine connections, where you are valued, accepted and respected for who you are, allowing you to thrive both personally and professionally. This is how we aim to prevent, stop and cure diseases and ensure everyone has access to healthcare today and for generations to come. Join Roche, where every voice matters.
The Position
We are looking for a highly skilled Data Scientist with expertise in building AI-powered applications. We will be building GenAI solutions end-to-end: from concept, through prototyping, productization, to operations. The ideal candidate will bring technical expertise in Natural Language Processing (NLP), especially leveraging Large Language Models (LLM) and proficiency in prompt engineering techniques.
Key Responsibilities:
Generative AI Application Development: Collaborate with AI engineers, product owners, business analysts and other developers in Agile teams to integrate LLMs into scalable, robust, fair and ethical end-user applications, focusing on user experience, relevance, and real-time performance
Algorithm Development: Design, develop, customize, optimize, and fine-tune LLM-based and other AI-infused algorithms tailored to specific use cases such as text generation, summarization, information extraction, chatbots, AI agents, code generation, document analysis, sentiment analysis, data analysis, etc.
Data Curation for LLMs: Design data pipelines to curate, preprocess, and structure datasets that improve LLM-based algorithms performance and reduce biases, with a focus on data quality and diversity
Exploratory Data Analysis (EDA): Perform thorough data exploration to understand dataset characteristics, uncover patterns, detect biases, and identify data quality issues; use statistical and visualization techniques to inform feature engineering, model selection, and optimization of LLM-based applications
Support in Prompt Engineering: support prompt engineers, business analysts and subject matter experts in crafting and optimizing prompts to guide LLM outputs, enhancing performance for specific tasks; be ready to participate in prompt engineering when necessary
Experimentation and Validation: Conduct rigorous experimentation, including A/B testing, to evaluate algorithm performance against benchmarks and control groups; use metrics specific to generative AI as well as pre-GenAI techniques, as required
Software Development: Apply software development best practices, including writing unit test; contribute to configuring CI/CD pipelines, containerizing applications, setting up APIs, ensuring robust logging, experiment tracking, and model monitoring
Continuous Improvement: Collaborate with other developers to monitor deployed algorithms, identify areas for improvement, and collaborate on updates to enhance performance
Stakeholder Communication: Translate complex technical results into clear, actionable insights for stakeholders, driving data-driven decision-making across the organization
Ethical AI and Bias Mitigation: Implement techniques to identify and mitigate biases in LLM outputs, ensuring responsible and ethical AI deployment
Pre-generative AI Application Development: Design and implement classical machine learning and NLP models (e.g., regression, classification, clustering, sequence modeling) when they provide a more efficient, interpretable, or cost-effective solution compared to LLMs; integrate these models into AI applications as needed
Requirements:
B.Sc., B.Eng., M.Sc., M.Eng., Ph.D. or D.Eng. in Computer Science, Physics, Statistics, Mathematics or equivalent degree and experience with Artificial Intelligence
Experience: 3+ years working with advanced machine learning algorithms
3+ years of hands-on experience working with language models, especially those based on Transformer architectures (e.g. BERT, T5, RoBERTa), and at least 1 year of experience with generative large language models (e.g. GPT, LLaMA, Claude, Cohere, etc.)
Technical Skills: Advanced proficiency in Python and experience with deep learning frameworks such as PyTorch or TensorFlow; expertise with Transformer architectures; hands-on experience with LangChain or similar LLM frameworks
Experience with designing end-to-end RAG systems using state of the art orchestration frameworks (hands on experience with fine-tuning LLMs for specific tasks and use cases considered as an additional advantage)
Practical overview and experience with AWS services to design cloud solutions, familiarity with Azure is a plus; experience with working with GenAI specific services like Azure OpenAI, Amazon Bedrock, Amazon SageMaker JumpStart, etc.
Data Skills: Strong skills in data manipulation, annotation, and crafting datasets that maximize LLM effectiveness; experience in working with data stores like vector, relational, NoSQL databases and data lakes through APIs; experience with data augmentation techniques or synthetic data generation in the context of LLMs considered as a plus
Prompt Engineering: Hands-on experience with prompt design, zero-shot, and few-shot learning paradigms to optimize LLM performance without extensive training or fine-tuning
Evaluation Metrics: Deep understanding of generative model and pre-GenAI evaluation techniques
NLP Expertise: Solid foundation in natural language processing, including tokenization, embeddings, attention mechanisms, and transfer learning specific to LLMs
Statistical Knowledge: Strong background in statistics, machine learning algorithms, and optimization techniques
Classical Machine Learning & NLP: Experience with traditional NLP techniques and classical machine learning algorithms (e.g., decision trees, SVMs, random forests, gradient boosting) for text analysis and structured data applications
Pre-LLM Model Development: Hands-on experience developing and deploying machine learning models for tasks such as classification, clustering, regression, and sequence modeling using frameworks like Scikit-learn, XGBoost, or traditional NLP pipelines
Feature Engineering & Data Preprocessing: Strong skills in feature engineering, dimensionality reduction, text preprocessing, and structured data transformation to improve model performance
Deployment: Experience in deploying LLM models with cloud platforms (AWS, Azure) and machine learning workbenches for robust and scalable productization
Proficiency in best practices of software engineering
Problem Solving: Excellent analytical skills and the ability to tackle complex challenges with innovative solutions
Communication: Strong verbal and written communication skills, with the ability to present complex findings clearly to both technical and non-technical audiences
The successful candidate should also:
be passionate about AI and stay up-to-date with the latest developments in LLMs, GenAI, and AI in general
be team-oriented, proactive, and collaborative
be an excellent problem solver and analytical thinker
be detail-oriented and highly organized
be willing to learn and expand their skill set
have the ability to work collaboratively in a fast-paced, dynamic environment
be able to communicate in English at the level of: C1+
be located near the Central European time zone, or willing to work at a time consistent with the Central European time zone
A healthier future drives us to innovate. Together, more than 100’000 employees across the globe are dedicated to advance science, ensuring everyone has access to healthcare today and for generations to come. Our efforts result in more than 26 million people treated with our medicines and over 30 billion tests conducted using our Diagnostics products. We empower each other to explore new possibilities, foster creativity, and keep our ambitions high, so we can deliver life-changing healthcare solutions that make a global impact.
Let’s build a healthier future, together.
Roche is an Equal Opportunity Employer.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: A/B testing Agile APIs Architecture AWS Azure BERT Chatbots CI/CD Classification Claude Clustering CoHere Computer Science Data analysis Data pipelines Data quality Deep Learning EDA Engineering Feature engineering Generative AI GPT LangChain LLaMA LLMs Machine Learning Mathematics ML models NLP NoSQL OpenAI Physics Pipelines Prompt engineering Prototyping Python PyTorch RAG RoBERTa SageMaker Scikit-learn Statistics TensorFlow Testing XGBoost
Perks/benefits: Career development
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.