Data Scientist II

NLD Amsterdam (Radarweg), Netherlands

Elsevier

Elsevier is a global information analytics company that helps institutions and professionals progress science, advance healthcare and improve performance

View all jobs at Elsevier

Apply now Apply later

About our Team: We are a diverse team of natural language processing and machine learning experts, taxonomy experts and scientific content experts in biology and chemistry domains. We mainly develop best-in-class enrichment algorithms that deeply mine scientific literature (journals and patents) for Elsevier life science .com products such as Reaxys and Embase as well as for life sciences key customers such as top pharma companies. These products allow our customers to create new medicine, detect potential issues with medical devices, fabricate new materials and expand scientific knowledge. We use state of the art natural language processing (NLP) techniques such as large language models in our applications to advance life sciences research of our customers in accordance with Responsible AI principles.  
  

About the Role: You will be responsible for building proof-of-concept (Gen)AI solutions that address customer needs across our product portfolio. Your work will span the entire data science project lifecycle—from design and implementation to productionization and ongoing support. You will deliver efficient, production-ready Python code and collaborate closely with the technology team to deploy and operationalize our data science pipelines. Partnering with content and taxonomy experts, you will help create training and test datasets and design quality assurance procedures. Additionally, you will work within cross-functional squads alongside domain experts, IT specialists, and product colleagues to pilot and develop innovative methods for extracting information that drives new product development and adds value for our customers. 

Responsibilities 

  • Data collection, data analysis, model development, defining quality metrics, quality assessment of models and regular presentations to stakeholders 

  • Creating production ready Python packages for each component of data science pipelines (such as pre-processing and model inference) and their deployment together with the technology team 

  • Integration of data science components and end-to-end quality assessment 

  • Keeping our data science pipelines robust against model drift and ensuring continuous output quality; development of needed tools and strategies for maintenance such as automatic model re-training.  

  • Establishing the reporting process of the performance of the pipeline, and automatic re-training strategy for the existing pipelines 

Requirements 

  • Have relevant applied experience and Msc/MTech in the field of computer science, data science, artificial intelligence, mathematics, statistics, bioinformatics or other quantitative fields or relevant experience. International working/education experience is a plus! 

  • Strong hands-on knowledge of Python, ability to write unit tests and production ready code adhering to Python best practices and object oriented programming principles. 

  • Data processing, cleaning and analysis skills: experience with pandas, numpy, matplotlib, boto3 

  • Experience with SOTA deep learning approaches in NLP domain such as LLMs, especially RAG and Agentic workflows using GenAI solutions 

  • Knowledge about finetuning for specific use cases such as named entity recognition and relation extraction 

  • Experience with CI/CD, Git, PyTorch, LangChain, Ollama/ChatGPT and AWS services such as SageMaker. Experience with Spark/Databricks is a plus! 

  • Willingness to learn, analytical thinking, problem solving and communication skills; ability to translate complex requirements into practical solutions  

  • Experience in classical machine Learning: Classification, Regression, Clustering, Text Mining. You have an excellent understanding of Neural Networks, Random Forests, Logistic Regression, SVM, K-Means etc.  

  • Experience in later stages of data science life cycle such as optimizing productionization (techniques such as parallelization, multi threading etc.) and automated model re-training. Interest and affinity in MLOps is a plus! 

About Cell Press and Elsevier

Cell Press is an international organization with primary offices in Cambridge, Massachusetts; London; Amsterdam; Shanghai, and Beijing. We also support flexible working, and we are open to considering flexible work arrangements for this position. 

Cell Press offers an attractive salary and benefits package and a stimulating working environment. Cell Press is a part of Elsevier, the world's leading provider of scientific, technical and medical (STM) information, tools and resources. 

About the Business: A global company based in Amsterdam, Elsevier partners with scientists, researchers, healthcare providers, educators and decision-makers in academic institutions, governments and corporations to help them find, evaluate and use information. Our breadth of content is unparalleled, spanning virtually every STM field in the world and includes such distinguished brands as Gray's Anatomy, The Lancet and Cell. Using innovative technology, we deliver our content through tools that help our customers be more productive and successful in their work. ScienceDirect delivers the worlds' leading journals electronically to over 11 million readers in 200 countries. And physicians in 95 percent of teaching hospitals rely on MD Consult to get critical information that can save lives. Elsevier employs over 7,000 people in more than 70 offices worldwide. We are an employer of choice, attracting and developing talented and creative people who thrive in a challenging and fast-paced environment. 

Work in a way that works for you:

We promote a healthy work/life balance across the organisation. We offer an appealing working prospect for our people. With numerous wellbeing initiatives, shared parental leave, study assistance and sabbaticals, we will help you meet your immediate responsibilities and your long-term goals.

-----------------------------------------------------------------------

We are committed to providing a fair and accessible hiring process. If you have a disability or other need that requires accommodation or adjustment, please let us know by completing our Applicant Request Support Form or please contact 1-855-833-5120.

Criminals may pose as recruiters asking for money or personal information. We never request money or banking details from job applicants. Learn more about spotting and avoiding scams here.

Please read our Candidate Privacy Policy.

We are an equal opportunity employer: qualified applicants are considered for and treated during employment without regard to race, color, creed, religion, sex, national origin, citizenship status, disability status, protected veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other characteristic protected by law.

USA Job Seekers:

EEO Know Your Rights.

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  4  1  0
Category: Data Science Jobs

Tags: AWS Banking Bioinformatics Biology ChatGPT Chemistry CI/CD Classification Clustering Computer Science Data analysis Databricks Deep Learning Generative AI Git GPT LangChain LLMs Machine Learning Mathematics Matplotlib ML models MLOps Model inference NLP NumPy Pandas Pharma Pipelines Privacy Python PyTorch RAG Research Responsible AI SageMaker Spark Statistics Teaching

Perks/benefits: Career development Flex hours Medical leave Parental leave

Region: Europe
Country: Netherlands

More jobs like this