Senior Data Engineer (home-based)

Warsaw, Poland

⚠️ We'll shut down after Aug 1st - try foo🦍 for all jobs in tech ⚠️

Full Time Senior-level / Expert EUR 71K - 132K * ^est.

IQVIA

Solutions to help life sciences organizations drive healthcare forward and get the right treatments to patients, faster.

View all jobs at IQVIA

Apply now Apply later

Posted 3 weeks ago

At IQVIA™, we are continuously expanding the boundaries of what’s possible in clinical development through advanced analytics, cutting-edge technology, and deep scientific expertise. Within our Research & Development Solutions (RDS) organization, we are enhancing our services with agentic systems—autonomous AI agents that can reason, plan, act, and learn—to further streamline clinical trial workflows and accelerate the delivery of new therapies. By embedding these capabilities into our service offerings for our customers and the clinical sites that we engage with to run clinical trials, we not only strengthen our leadership in AI-driven clinical research, but also bring life-changing treatments to patients faster and more efficiently.

We are seeking an experienced Senior Data Engineer to join our innovative AI team. In this role, you will lead the development and optimization of data infrastructure supporting our cutting-edge Agentic AI initiatives. You will collaborate with ML engineers, AI scientists, and product managers to architect, implement, and maintain robust data pipelines that power autonomous AI agents. As a senior data engineer of the R&DS AI Innovation Program, you will help shape our data strategy while ensuring our data solutions scale effectively to meet the demanding requirements of next-generation AI systems.

Key Responsibilities

Mandatory

Design, develop, and maintain scalable data pipelines and ETL processes to support AI research and development.
Collaborate with AI scientists and engineers to understand data requirements and ensure data availability and quality.
Implement data governance and security measures to protect sensitive information.
Monitor and troubleshoot data pipeline issues to ensure smooth operation.
Stay updated with the latest advancements in data engineering and AI technologies.
Design and implement scalable, resilient data architectures specifically tailored for AI agent training, fine-tuning, and inference workflows.
Develop and maintain high-performance data pipelines utilizing modern orchestration frameworks to support real-time agent interactions and feedback loops.

Preferred

Create specialized data storage and retrieval systems for efficient vector embeddings, knowledge graphs, and symbolic reasoning components used by AI agents.
Implement robust data validation, monitoring, and governance frameworks to ensure high-quality training data for AI systems while maintaining compliance with privacy regulations.
Continuously monitor and improve data system performance, focusing on reducing latency for agent decision-making processes.

Qualifications

Mandatory

Education: Bachelor's or Master's degree in Computer Science, Data Engineering, or related field; advanced degree preferred.
Experience: 5+ years of professional experience in data engineering, with at least 2 years focused on ML/AI data infrastructure.
Programming & Technologies:
- Advanced proficiency in Python and Scala; experience with Rust, Go, Java or Julia valued.
- Expert-level knowledge of SQL and NoSQL databases.
- Hands-on experience with vector databases (e.g., Pinecone, Weaviate, Milvus).
- Proficiency with modern data orchestration platforms (e.g. Airflow 2.x).
Cloud & Infrastructure:
- Extensive experience with at least one major cloud platform (AWS, Azure, GCP).
- Expertise in containerization and orchestration (Docker, Kubernetes).
- Experience with Infrastructure as Code (e.g. Terraform).
Data Processing:
- Experience with distributed computing frameworks (Spark, Dask, Ray).
- Proficiency with streaming technologies (e.g. Kafka, Flink).
- Knowledge of modern data lakehouse architectures.

Preferred

Certification in cloud platforms, big data technologies, engineering, or ML operations.
Experience in collaborations with ML engineers on implementing CI/CD pipelines for data processing and model deployment, ensuring seamless integration between data infrastructure and AI development workflows.
Working knowledge of ML frameworks (e.g. PyTorch, TensorFlow).
Experience with feature stores and experiment tracking platforms.
Understanding of LLM fine-tuning data requirements and processing. Experience developing data systems for autonomous AI agents or other agentic AI applications.
Background in prompt engineering or retrieval-augmented generation systems.
Experience with semantic caching and efficient storage/retrieval of AI-generated artifacts.
Familiarity with LLM evaluation metrics and benchmarking frameworks.
Expertise in building hybrid data architectures combining traditional databases with vector stores.
Experience with RAG (Retrieval-Augmented Generation) systems and related data pipelines.
Knowledge of reinforcement learning from human feedback (RLHF) data workflows.
Experience in mentoring junior engineers, establish best practices, and contribute to architectural decisions across the organization's data infrastructure.

This role can be performed fully remotely. However, if you prefer working from an office environment, we are happy to accommodate that as well.

IQVIA is a leading global provider of clinical research services, commercial insights and healthcare intelligence to the life sciences and healthcare industries. We create intelligent connections to accelerate the development and commercialization of innovative medical treatments to help improve patient outcomes and population health worldwide. Learn more at https://jobs.iqvia.com

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats: 6 0 0

Category: Engineering Jobs

Tags: Airflow Architecture AWS Azure Big Data CI/CD Computer Science Data governance Data pipelines Data strategy Docker Engineering ETL Flink GCP Java Julia Kafka Kubernetes LLMs Machine Learning Model deployment NoSQL Pinecone Pipelines Privacy Prompt engineering Python PyTorch R RAG R&D Reinforcement Learning Research RLHF Rust Scala Security Spark SQL Streaming TensorFlow Terraform Weaviate