Lead Data Engineer – Core Data Platform
Gdansk, Poland
Hapag-Lloyd
We are seeking an experienced Lead Data Engineer– Core Data Platform to drive the development and stability of our cloud-native data infrastructure. This role is crucial in ensuring our data platform is secure, reliable, and scalable. You’ll be responsible for the provisioning and automation of key components including Databricks, Airflow (Astronomer), GitLab CI/CD, Terraform, and monitoring tools such as Grafana. You will collaborate closely with DevOps, Security, and Architecture teams and provide platform support to other data teams, including those building the new Data Warehouse in Databricks. As a hands-on technical leader, you’ll contribute to the platform codebase while defining standards, improving automation, and ensuring operational excellence across environments.
- Design and maintain the core data platform infrastructure including Databricks, Airflow (Astronomer), AWS, CI/CD pipelines, and monitoring.
- Manage and provision cloud environments using Terraform; maintain GitLab-based CI/CD workflows.
- Own the RBAC implementation and environment governance ensuring secure and maintainable access controls.
- Monitor cost and performance metrics using observability tools
- Troubleshoot infrastructure-level issues and support DataOps teams working on the Data Lakehouse.
- Collaborate closely with DevOps, Security, Architecture, and Product teams to align on platform standards.
- Support deployment processes from development to production for data workflows.
- Maintain documentation of architectural decisions, operational procedures, and tooling.
- Proactively identify and implement automation and improvements across platform components.
- Contribute directly to platform development, including coding, reviews, and mentoring.
- Maintain a modular and scalable Terraform repository structure for multi-environment deployments.
- Contribute to the development of internal tooling for platform automation and efficiency.
- Define tagging strategies and cost monitoring standards across environments and workspaces.
- Coordinate incident response and platform stability improvements.
- Partner with Security and Cloud Governance teams on policies, audits, and compliance initiatives.
- Track and address technical debt within platform infrastructure components.
- Minimum 6 years of experience in platform or DevOps engineering in a data context.
- Expertise in provisioning and automating cloud infrastructure using Terraform.
- Strong knowledge of AWS services (IAM, networking, S3, cost tracking) and Databricks.
- Handson experience with orchestrators like Airflow (Astronomer) and CI/CD pipelines (GitLab).
- Familiarity with Databricks workspace and admin configurations.
- Experience implementing RBAC, securing access and environments.
- Experience in monitoring performance and setting up alerting/observability (Grafana, Prometheus).
- Strong coding skills in Python and pySpark.
- Ability to work closely with both technical and business stakeholders.
- Mindset focused on scalability, reliability, and documentation.
- Deep understanding of CI/CD strategies for data platforms and workflow-driven applications.
- Familiarity with observability stacks (logs, metrics, traces) and SRE practices.
- Experience in supporting multi-account cloud environments and cross-region deployments.
- Proficiency in debugging infrastructure-related issues in cloud-native data pipelines.
- Strong documentation habits and ability to produce clear, actionable technical runbooks.
- Familiarity with platform cost optimization tools and practices (e.g., AWS Cost Explorer, native tagging, budgeting).
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Airflow Architecture AWS CI/CD Databricks DataOps Data pipelines Data warehouse DevOps Engineering GitLab Grafana Pipelines PySpark Python Security Terraform
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.