Lead Data Engineer

Antwerp, Flanders, Belgium

Full Time Senior-level / Expert EUR 41K - 95K *

Board of Innovation

View all jobs at Board of Innovation

Apply now Apply later

Posted 1 day ago

About the Role

We are hiring a Data Engineer to design, build, and manage scalable data pipelines that support our AI-powered tools and applications, including agentic tools that adapt to user behaviors. You will harmonize and transform data from disparate sources to ensure it is ready for use in foundational model integrations. The ideal candidate has prior experience in a consulting or agency environment and thrives in project-based settings.

This is a hands-on role where you will be responsible for building and implementing systems from the ground up. You would write production-level code while defining processes and best practices for future team growth.

Responsibilities

Develop and manage ETL pipelines to extract, transform, and load data from various internal and external sources into harmonized datasets.
Design, optimize, and maintain databases and data storage systems (e.g. PostgreSQL, MongoDB, Azure Data Lake, or AWS S3).
Implement scalable workflows for automating data harmonization and transformation using tools like Apache Airflow or AWS Glue.
Collaborate with AI Application Engineers to prepare data for use in foundational model workflows (e.g. embeddings and retrieval-augmented generation setups).
Ensure data integrity, quality, and security across all pipelines and workflows.
Monitor, debug, and optimize data pipelines for performance and reliability.
Support data needs for the development of adaptive agentic tools, including data structuring and optimization.
Stay current on best practices in cloud data solutions and emerging technologies in data engineering.

Requirements

Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field.
A minimum of 6 years of professional experience in data engineering with a focus on building and managing ETL pipelines.
Proven experience working in a consulting or agency environment on project-based work.
Advanced proficiency in Python, SQL, and data transformation libraries like pandas or PySpark.
Strong experience with cloud data platforms (e.g. AWS Glue, Azure Synapse, BigQuery).
Hands-on experience with data pipeline orchestration tools like Apache Airflow or Prefect.
Solid understanding of database design and optimization for relational and non-relational databases.
Familiarity with API integration for ingesting and processing data.
Advanced English skills, both written and verbal, with the ability to communicate effectively in an international team.

Preferred Qualifications

Experience with vector database systems (e.g. Pinecone, Weaviate) and RAG workflows.
Knowledge of foundational model requirements for data preparation.
Strong understanding of data governance, privacy, and security best practices.