Data Engineer
Madrid
Daniel J Edelman Holdings
Edelman is a global communications firm that partners with businesses and organizations to evolve, promote and protect their brands and reputations.At Edelman, we understand diversity, equity, inclusion and belonging (DEIB) transform our colleagues, our company, our clients, and our communities. We are in relentless pursuit of an equitable and inspiring workplace that is respectful of all, reflects and represents the world in which we live, and fosters trust, collaboration and belonging.
We currently seeking a Data Engineer with 3-5 years’ experience. The ideal candidate would have the ability to work independently within an AGILE working environment and have experience working with cloud infrastructure leveraging tools such as Apache Airflow, Databricks, and Snowflake. A familiarity with real-time data processing and AI implementation is advantageous.
Why You'll Love Working with Us:At Edelman, we believe in fostering a collaborative and open environment where every team member’s voice is valued. Our data engineering team thrives on building robust, scalable, and efficient data systems to power insightful decision-making.
We are at an exciting point in our journey, focusing on designing and implementing modern data pipelines, optimizing data workflows, and enabling seamless integration of data across platforms. You’ll work with best-in-class tools and practices for data ingestion, transformation, storage and analysis, ensuring high data quality, performance, and reliability.
Our data stack leverages technologies like ETL/ELT pipelines, distributed computing frameworks, data lakes, and data warehouses to process and analyze data efficiently at scale. Additionally, we are exploring the use of Generative AI techniques to support tasks like *data enrichment and automated reporting, enhancing the insights we deliver to stakeholders.
This role provides a unique opportunity to work on projects involving batch processing , streaming data pipelines, and automation of data workflows, with occasional opportunities to collaborate on AI-driven solutions.
If you’re passionate about designing scalable systems, building reliable data infrastructure, and solving real-world data challenges, you’ll thrive here. We empower our engineers to explore new tools and approaches while delivering meaningful, high-quality solutions in a supportive, forward-thinking environment.
Responsibilities:
- Design, build, and maintain scalable and robust data pipelines to support analytics and machine learning models, ensuring high data quality and reliability for both batch & real-time use cases.
- Design, maintain, optimize data models and data structures in tooling such as Snowflake and Databricks.
- Leverage Databricks and Cloud-native solutions for big data processing, ensuring efficient management of Spark jobs and seamless integration with other data services.
- Utilize PySpark and/or Ray to build and scale distributed computing tasks, enhancing the performance of machine learning model training and inference processes.
- Monitor, troubleshoot, and resolve issues within data pipelines and infrastructure, implementing best practices for data engineering and continuous improvement.
- Diagrammatically document data engineering workflows.
- Collaborate with other Data Engineers, Product Owners, Software Developers and Machine Learning Engineers to implement new product features by understanding their needs and delivery timeously.
Qualifications:
- Minimum of 3 years experience deploying enterprise level scalable data engineering solutions.
- Strong examples of independently developed data pipelines end-to-end, from problem formulation, raw data, to implementation, optimization, and result.
- Proven track record of building and managing scalable cloud-based infrastructure on AWS (incl. S3, Dynamo DB, EMR).
- Proven track record of implementing and managing of AI model lifecycle in a production environment.
- Experience using Apache Airflow (or equivalent) , Snowflake, Lucene-based search engines.
- Experience with Databricks (Delta format, Unity Catalog).
- Advanced SQL and Python knowledge with associated coding experience.
- Strong Experience with DevOps practices for continuous integration and continuous delivery (CI/CD).
- Experience wrangling structured & unstructured file formats (Parquet, CSV, JSON).
- Understanding and implementation of best practices within ETL end ELT processes.
- Data Quality best practice implementation using Great Expectations.
- Real-time data processing experience using Apache Kafka Experience (or equivalent) will be advantageous.
- Work independently with minimal supervision.
- Takes initiative and is action-focused.
- Mentor and share knowledge with junior team members.
- Collaborative with a strong ability to work in cross-functional teams.
- Excellent communication skills with the ability to communicate with stakeholders across varying interest groups.
- Fluency in spoken and written English.
We are dedicated to building a diverse, inclusive, and authentic workplace, so if you’re excited about this role but your experience doesn’t perfectly align with every qualification, we encourage you to apply anyway. You may be just the right candidate for this or other roles.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Agile Airflow AWS Big Data CI/CD CSV Databricks Data pipelines Data quality DevOps ELT Engineering ETL Generative AI JSON Kafka Machine Learning ML models Model training Parquet Pipelines PySpark Python Snowflake Spark SQL Streaming
Perks/benefits: Equity / stock options
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.