Lead Data Engineer
Antwerp, Flanders, Belgium
Board of Innovation
About the Role
We are hiring a Data Engineer to design, build, and manage scalable data pipelines that support our AI-powered tools and applications, including agentic tools that adapt to user behaviors. You will harmonize and transform data from disparate sources to ensure it is ready for use in foundational model integrations. The ideal candidate has prior experience in a consulting or agency environment and thrives in project-based settings.
This is a hands-on role where you will be responsible for building and implementing systems from the ground up. You would write production-level code while defining processes and best practices for future team growth.
Responsibilities
- Develop and manage ETL pipelines to extract, transform, and load data from various internal and external sources into harmonized datasets.
- Design, optimize, and maintain databases and data storage systems (e.g. PostgreSQL, MongoDB, Azure Data Lake, or AWS S3).
- Implement scalable workflows for automating data harmonization and transformation using tools like Apache Airflow or AWS Glue.
- Collaborate with AI Application Engineers to prepare data for use in foundational model workflows (e.g. embeddings and retrieval-augmented generation setups).
- Ensure data integrity, quality, and security across all pipelines and workflows.
- Monitor, debug, and optimize data pipelines for performance and reliability.
- Support data needs for the development of adaptive agentic tools, including data structuring and optimization.
- Stay current on best practices in cloud data solutions and emerging technologies in data engineering.
Requirements
- Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field.
- A minimum of 6 years of professional experience in data engineering with a focus on building and managing ETL pipelines.
- Proven experience working in a consulting or agency environment on project-based work.
- Advanced proficiency in Python, SQL, and data transformation libraries like pandas or PySpark.
- Strong experience with cloud data platforms (e.g. AWS Glue, Azure Synapse, BigQuery).
- Hands-on experience with data pipeline orchestration tools like Apache Airflow or Prefect.
- Solid understanding of database design and optimization for relational and non-relational databases.
- Familiarity with API integration for ingesting and processing data.
- Advanced English skills, both written and verbal, with the ability to communicate effectively in an international team.
Preferred Qualifications
- Experience with vector database systems (e.g. Pinecone, Weaviate) and RAG workflows.
- Knowledge of foundational model requirements for data preparation.
- Strong understanding of data governance, privacy, and security best practices.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Airflow APIs AWS AWS Glue Azure BigQuery Computer Science Consulting Data governance Data pipelines Engineering ETL MongoDB Pandas Pinecone Pipelines PostgreSQL Privacy PySpark Python RAG RDBMS Security SQL Weaviate
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.