Data Engineer
Norfolk, Virginia, United States; Suffolk, Virginia, United States; Charlotte, North Carolina, United States; Raleigh, North Carolina, United States
⚠️ We'll shut down after Aug 1st - try foo🦍 for all jobs in tech ⚠️
TowneBank
Checking, savings, lending, mortgages and more! TowneBank helps you with your personal or business financial needs and goals.Primary Purpose:
We are seeking a forward-thinking Data Engineer with strong Databricks and cloud expertise to join our banking data team. In this mid-level role, you will design and optimize batch ETL workflows on a modern Databricks Lakehouse platform, delivering high-quality data pipelines that support critical banking functions. The ideal candidate is adept with large-scale data engineering in the cloud and understands the governance and security demands of a regulated financial environment.
Essential Responsibilities:
- Build and Integrate Data Pipelines: Design, integrate, and implement batch ETL processes for data from diverse source systems into our Databricks environment, contributing to the expansion and optimization of our cloud-based data lake (Lakehouse).
- Data Quality and Integrity: Ensure pipelines meet high standards of data quality and integrity, implementing rigorous validation, cleansing, and enrichment processes on large volumes of banking data. Maintain historical data for auditability and regulatory compliance (leveraging Delta Lake’s ACID features for versioning).
- Performance Optimization: Optimize data processing performance on Databricks (e.g. efficient Spark SQL, partitioning techniques) and manage ETL job scheduling and dependencies to meet business SLAs for data timeliness.
- Governance and Compliance: Adhere to enterprise data governance policies and implement security best practices for sensitive financial data. Ensure compliance with banking regulations by enforcing access controls, encryption, and data lineage tracking across pipelines.
- Cross-Team Collaboration: Work closely with data architects, analysts, and business stakeholders to gather requirements and translate banking domain needs into scalable data solutions. Collaborate with BI, risk, and data science teams to support analytics and machine learning initiatives with robust data feeds.
- Continuous Improvement: Identify and implement improvements (including automating repeatable workflows) to enhance pipeline stability, efficiency, and future scalability. Keep the data platform up-to-date with industry best practices and emerging Databricks features.
- Adheres to applicable federal laws, rules, and regulations including those related to Anti-Money Laundering (AML) and the Bank Secrecy Act (BSA).
- Other duties as assigned.
Minimum Required Skills & Competencies:
- Experience: Bachelor’s degree in Computer Science or related field (or equivalent practical experience). 3+ years of experience as a data engineer in complex, large-scale data environments, preferably in the cloud.
- Databricks & Spark Proficiency: Strong hands-on expertise with Databricks and the Apache Spark ecosystem (PySpark, Spark SQL) for building large-scale data pipelines. Experience working with Delta Lake tables and Lakehouse architectural patterns for data management.
- Databricks Delta Live Tables (DLT): Experience using Delta Live Tables to build automated, declarative ETL pipelines on Databricks.
- Programming & SQL: Proficient in Python (including PySpark) for data processing tasks. Solid coding skills in SQL for complex querying and transformation of data (Scala or Java experience is a plus).
- Cloud Platforms: Experience with at least one major cloud platform (AWS, Azure, or GCP) and its data services (e.g., S3, Azure Data Lake Storage, BigQuery). Familiarity with cloud-based ETL tools and infrastructure (e.g., Azure Data Factory, AWS Glue) for scalable storage and processing.
- Data Modeling: Strong understanding of data modeling and data warehousing concepts, including designing relational schemas and dimensional models (OLTP/OLAP, star schemas, etc.) for analytics.
- Pipeline Architecture: Experience designing end-to-end data pipeline architectures, including orchestration and workflow scheduling. Familiarity with pipeline orchestration tools (Databricks Jobs, Apache Airflow, or Azure Data Factory) to automate and manage complex workflows.
- Data Quality & Testing: Hands-on experience implementing data quality checks (unit tests, data validation rules) and monitoring in ETL pipelines to ensure accuracy and consistency of data outputs.
- Data Governance & Security: Knowledge of data governance standards and security best practices for managing sensitive data. Understanding of compliance requirements in banking (e.g., encryption, PII handling, auditing) and ability to enforce data access controls and documentation of data lineage.
- Version Control & CI/CD: Experience using version control (Git) and CI/CD pipelines for code deployment. Comfortable with DevOps practices to package, test, and deploy data pipeline code in a controlled, repeatable manner.
- Analytical Mindset: Strong problem-solving skills with an ability to troubleshoot complex data issues. Capable of translating business requirements into efficient, reliable ETL solutions, and optimizing workflows for performance and cost-efficiency.
Desired Skills & Competencies:
- Banking/Financial Domain Knowledge: Familiarity with the banking sector’s data and processes (e.g. retail banking transactions, investment trading data, fraud detection, risk analytics) is a strong plus. Understanding financial services terminology or prior experience on finance data projects can help contextualize data engineering work.
- Streaming Data Pipelines: Exposure to real-time data streaming and event-driven architectures. Knowledge of Spark Structured Streaming or Kafka for ingesting and processing streaming data alongside batch workflows is a plus.
- Regulatory Data Pipelines: Experience building data pipelines for regulatory reporting or compliance use-cases in finance. Familiarity with ensuring consistency, integrity, and timeliness of data in regulatory pipelines (e.g. for CCAR, AML, or Basel reporting) would set a candidate apart.
- DataOps/MLOps Practices: Understanding of DataOps techniques (automated testing, monitoring, and CI/CD for data pipelines) or MLOps integration to support machine learning data requirements. Experience with tools and frameworks that improve the automation and reliability of data workflows is a plus.
- Certifications: Relevant industry certifications can be an advantage – for example, Databricks Certified Data Engineer or cloud platform certifications in data engineering. These demonstrate a commitment to staying current with evolving data technologies.
Physical Requirements:
- Express or exchange ideas by means of the spoken word via email and verbally.
- Exert up to 10 pounds of force occasionally, use your arms and legs, and sit most of the time.
- Have close visual acuity to perform activities such as analyzing data, viewing a computer terminal, reading, and preparing documentation.
- Not substantially exposed to adverse environmental conditions.
- The physical demands described here are representative of those that must be met by an employee to successfully perform the essential responsibilities of this job. Reasonable accommodations may be made to enable individuals with disabilities to perform essential responsibilities.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Airflow Architecture AWS AWS Glue Azure Banking BigQuery CI/CD Computer Science Databricks Data governance Data management DataOps Data pipelines Data quality Data Warehousing DevOps Engineering ETL Finance GCP Git Java Kafka Machine Learning MLOps OLAP Pipelines PySpark Python Scala Security Spark SQL Streaming Testing
Perks/benefits: Team events
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.