Lead Data Engineer – Data Ingestion
Gdansk, Poland
Hapag-Lloyd
We are looking for a hands-on Lead Data Engineer – Data Ingestion to guide the development and execution of our data ingestion processes into the Data Lakehouse. In this role, you will be responsible for designing scalable and reliable data pipelines, transforming source inputs into trusted Bronze-layer assets. You’ll work closely with data owners, source system teams, and the Product Manager to ensure successful onboarding of new sources and to drive continuous improvements in data quality, automation, and performance. You’ll lead a small team of engineers by example—actively coding, reviewing solutions, and setting engineering standards. This is a role for someone who is passionate about data integration and wants to shape the way data flows through our platform.
- Design and maintain scalable data ingestion pipelines from internal and external sources including Kafka, APIs, SFTP, and file-based systems.
- Establish and maintain best practices for ingesting and processing structured and semi-structured data (e.g., JSON, Avro, CSV).
- Align ingestion pipelines with enterprise data architecture and naming conventions.
- Manage batch and micro-batch processing; contribute to future transition towards streaming ingestion.
- Transform and validate ingested data into the Bronze layer within the Data Lakehouse.
- Follow standardized onboarding scenarios and integrate new data sources in a consistent and governed way.
- Apply schema and metadata standards using Unity Catalog and Collibra, ensure proper lineage tracking.
- Collaborate with Product Manager and domain teams to scope and deliver ingestion use cases.
- Ensure data quality and validation as part of the ingestion process.
- Implement improvements in pipeline performance, cost efficiency, and automation.
- Collaborate closely with teams using the ingested data to ensure usability and traceability.
- Mentor team members, perform code reviews, and contribute to internal engineering standards and documentation.
- Collaborate with the Data Governance team to ensure traceability, cataloging, and access control across new data domains.
- Define and maintain onboarding playbooks and reusable ingestion templates.
- Actively participate in backlog grooming, planning sessions, and technical refinement.
- Minimum 6 years of experience in data engineering with a focus on ingestion and transformation.
- Hands-on experience with Databricks (Spark), Python, Kafka, and cloud data processing on AWS (e.g., S3).
- Experience with orchestration and workflow management (Airflow on Astronomer).
- Strong SQL and Spark (preferably PySpark) skills.
- Working knowledge of metadata governance via Unity Catalog and Collibra.
- Familiarity with infrastructure-as-code tools (Terraform) and version control (GitLab).
- Proven experience building reliable and performant ingestion pipelines.
- Ability to collaborate with stakeholders, drive improvements, and document processes clearly.
- Comfortable in an agile, product-oriented environment.
- Experience working with semi-structured data and schema evolution techniques.
- Knowledge of distributed data systems and challenges related to ingestion at scale.
- Ability to lead technical conversations with external vendors or source system teams.
- Understanding of testing frameworks for data pipelines and CI/CD validation.
- Familiarity with tools for data profiling, schema validation, and automated lineage capture.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Agile Airflow APIs Architecture Avro AWS CI/CD CSV Databricks Data governance Data pipelines Data quality Engineering GitLab JSON Kafka Pipelines PySpark Python Spark SQL Streaming Terraform Testing
Perks/benefits: Team events
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.