Lead Data Engineer – Data Ingestion

Gdansk, Poland

Apply now Apply later

We are looking for a hands-on Lead Data Engineer – Data Ingestion to guide the development and execution of our data ingestion processes into the Data Lakehouse. In this role, you will be responsible for designing scalable and reliable data pipelines, transforming source inputs into trusted Bronze-layer assets. You’ll work closely with data owners, source system teams, and the Product Manager to ensure successful onboarding of new sources and to drive continuous improvements in data quality, automation, and performance. You’ll lead a small team of engineers by example—actively coding, reviewing solutions, and setting engineering standards. This is a role for someone who is passionate about data integration and wants to shape the way data flows through our platform. 

  • Design and maintain scalable data ingestion pipelines from internal and external sources including Kafka, APIs, SFTP, and file-based systems. 
  • Establish and maintain best practices for ingesting and processing structured and semi-structured data (e.g., JSON, Avro, CSV). 
  • Align ingestion pipelines with enterprise data architecture and naming conventions. 
  • Manage batch and micro-batch processing; contribute to future transition towards streaming ingestion. 
  • Transform and validate ingested data into the Bronze layer within the Data Lakehouse. 
  • Follow standardized onboarding scenarios and integrate new data sources in a consistent and governed way. 
  • Apply schema and metadata standards using Unity Catalog and Collibra, ensure proper lineage tracking. 
  • Collaborate with Product Manager and domain teams to scope and deliver ingestion use cases. 
  • Ensure data quality and validation as part of the ingestion process. 
  • Implement improvements in pipeline performance, cost efficiency, and automation.  
  • Collaborate closely with teams using the ingested data to ensure usability and traceability. 
  • Mentor team members, perform code reviews, and contribute to internal engineering standards and documentation. 
  • Collaborate with the Data Governance team to ensure traceability, cataloging, and access control across new data domains. 
  • Define and maintain onboarding playbooks and reusable ingestion templates. 
  • Actively participate in backlog grooming, planning sessions, and technical refinement. 
  • Minimum 6 years of experience in data engineering with a focus on ingestion and transformation. 
  • Hands-on experience with Databricks (Spark), Python, Kafka, and cloud data processing on AWS (e.g., S3). 
  • Experience with orchestration and workflow management (Airflow on Astronomer). 
  • Strong SQL and Spark (preferably PySpark) skills. 
  • Working knowledge of metadata governance via Unity Catalog and Collibra. 
  • Familiarity with infrastructure-as-code tools (Terraform) and version control (GitLab). 
  • Proven experience building reliable and performant ingestion pipelines. 
  • Ability to collaborate with stakeholders, drive improvements, and document processes clearly. 
  • Comfortable in an agile, product-oriented environment. 
  • Experience working with semi-structured data and schema evolution techniques. 
  • Knowledge of distributed data systems and challenges related to ingestion at scale. 
  • Ability to lead technical conversations with external vendors or source system teams. 
  • Understanding of testing frameworks for data pipelines and CI/CD validation. 
  • Familiarity with tools for data profiling, schema validation, and automated lineage capture. 

With a fleet of 287 modern container ships and a Vessel Capacity 2.2 million TEU, as well as a Container Capacity 3.2 million TEU including one of the world’s largest and most modern reefer container fleets, Hapag-Lloyd is one of the world’s leading liner shipping companies. In the Liner Shipping segment, the Company has around 13.500 employees and 400 offices in 139 countries. Hapag-Lloyd has a container capacity of 11.9 million TEU – including one of the largest and most modern fleets of reefer containers. A total of 114 liner services worldwide ensure fast and reliable connections between more than 600 ports across the world. In the Terminal & Infrastructure segment, Hapag-Lloyd has stakes in 20 terminals in Europe, Latin America, the United States, India, and North Africa. The roughly 2.600 employees assigned to the Terminal & Infrastructure segment deal with terminal-related activities and provide complementary logistics services at selected locations.
Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  0  0  0

Tags: Agile Airflow APIs Architecture Avro AWS CI/CD CSV Databricks Data governance Data pipelines Data quality Engineering GitLab JSON Kafka Pipelines PySpark Python Spark SQL Streaming Terraform Testing

Perks/benefits: Team events

Region: Europe
Country: Poland

More jobs like this