Data Engineer
Overland Park, Kansas, United States
Shamrock Trading Corporation
- Develop & Maintain Scalable Data Pipelines: Build, optimize, and maintain ETL/ELT pipelines using Databricks, Apache Spark, and Delta Lake.
- Optimize Data Processing: Implement performance tuning techniques to improve Spark-based workloads.
- Cloud Data Engineering: Work with AWS services (S3, Lambda, Glue, Redshift, etc.) to design and implement robust data architectures.
- Real-time & Streaming Data: Develop streaming solutions using Kafka and Databricks Structured Streaming.
- Data Quality & Governance: Implement data validation, observability, and governance best practices using Unity Catalog or other tools.
- Cross-functional Collaboration: Partner with analysts, data scientists, and application engineers to ensure data meets business needs.
- Automation & CI/CD: Implement infrastructure-as-code (IaC) and CI/CD best practices for data pipelines using tools like Terraform, dbt, and GitHub Actions.
- Bachelor’s degree in computer science, data science or related technical field, or equivalent practical experience
- 2-5+ years of experience in data engineering, with a focus on cloud-based platforms.
- Strong hands-on experience with Databricks (including Spark, Delta Lake, and MLflow).
- Experience building and maintaining AWS based data pipelines: currently utilizing AWS Lambda, Docker / ECS, MSK, Airflow, Databricks, Unity Catalog
- Development experience utilizing two or more of the following:
- Python: (Pandas/Numpy, Boto3, SimpleSalesforce)
- Databricks (pySpark, pySQL, DLT)
- Apache Spark
- Kafka and the Kafka Connect ecosystem (schema registry and Avro)
- Familiarity with CI/CD for data pipelines and infrastructure as code (Terraform, dbt)
- Strong SQL skills for data transformation and performance tuning.
- Understanding of data security and governance best practices.
- Enthusiasm for working directly with customer teams (Business units and internal IT)
- Proven experience with relational and NoSQL databases (e.g. Postgres, Redshift, MongoDB)
- Experience with version control (git) and peer code reviews
- Familiarity with data lakehouse architectures and optimization strategies.
- Familiarity with data visualization techniques using tools such as Grafana, PowerBI, AWS Quick Sight, and Excel.
- Medical: Fully paid healthcare, dental and vision premiums for employees and eligible dependents
- Work-Life Balance: Competitive PTO and paid leave policies
- Financial: Generous company 401(k) contributions and employee stock ownership after one year
- Wellness: Onsite gym and discounted membership to select fitness centers. Jogging trails available at Overland Park offices
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Airflow Architecture Avro AWS Business Intelligence CI/CD Computer Science Databricks Data Mining Data pipelines Data quality Data visualization Data Warehousing dbt Docker ECS ELT Engineering ETL Excel Finance Git GitHub Grafana Kafka Lambda MLFlow MongoDB NoSQL NumPy Pandas Pipelines PostgreSQL Power BI PySpark Python Redshift Security Spark SQL Streaming Terraform
Perks/benefits: Career development Fitness / gym Health care Medical leave Wellness
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.