Data Engineering Senior Associate
Bangalore (SDC) - Bagmane Tech Park, India
PwC
We are a community of solvers combining human ingenuity, experience and technology innovation to help organisations build trust and deliver sustained outcomes.Line of Service
AdvisoryIndustry/Sector
Not ApplicableSpecialism
Advisory - OtherManagement Level
Senior AssociateJob Description & Summary
At PwC, our people in data and analytics engineering focus on leveraging advanced technologies and techniques to design and develop robust data solutions for clients. They play a crucial role in transforming raw data into actionable insights, enabling informed decision-making and driving business growth.In data engineering at PwC, you will focus on designing and building data infrastructure and systems to enable efficient data processing and analysis. You will be responsible for developing and implementing data pipelines, data integration, and data transformation solutions.
Job Description and Key Responsibilities
- Design, develop, and maintain robust, scalable ETL pipelines using tools like Apache Spark, Kafka, and other big data technologies.
- Data Architecture design - Design scalable and reliable data architectures, including Lakehouse, hybrid batch/streaming systems, Lambda, and Kappa architectures.
- Demonstrate proficiency in Python, PySpark, Spark, and a solid understanding of design patterns (e.g., SOLID).
- Ingest, process, and store structured, semi-structured, and unstructured data from various sources.
- Cloud experiece: Hands-on experience with setting up data pipelines using cloud offerings (AWS, Azure, GCP).
- Optimize ETL processes to ensure scalability and efficiency.
- Work with various file formats, such as JSON, CSV, Parquet, and Avro.
- Possess deep knowledge of RDBMS, NoSQL databases, and CAP theorem principles
- Collaborate with data scientists, analysts, and other stakeholders to understand data requirements and optimize data models for performance and scalability.
- Document data processes, architectures, and models comprehensively to facilitate cross-team understanding and maintenance.
- Implement and maintain CI/CD pipelines using tools like Docker, Kubernetes, and GitHub.
- Ensure data quality, integrity, and security across all systems and processes.
- Implement and monitor data governance best practices.
- Stay up-to-date with emerging data technologies and trends, and identify opportunities for innovation and improvement.
- Knowledge of other Cloud Data/Integration/Orchestration Platforms- Snowflake, Databricks, Azure Data Factory etc. is good to have
GenAI Skills
- Leverage Large Language Models (LLMs) to generate and manage synthetic datasets for training AI models.
- Integrate Generative AI tools into data pipelines while critically analyzing and validating Gen AI-generated solutions to ensure reliability and adherence to best practices.
Minimum years’ experience required 4-7 of experience in Programming Language (Any of Python, Scala, Java) (Python Preferred), Apache Spark, ADF, Azure Databricks, Postgres, Knowhow of NoSQL is desirable, ETL (Batch/Streaming), Git , Familiarity with Agile.
Required Qualification: BE / master’s in design / B – Design / B.Tech / HCI – Certification (Preferred)
Education (if blank, degree and/or field of study not specified)
Degrees/Field of Study required:Degrees/Field of Study preferred:Certifications (if blank, certifications not specified)
Required Skills
Optional Skills
Accepting Feedback, Accepting Feedback, Active Listening, Agile Scalability, Amazon Web Services (AWS), Analytical Thinking, Apache Hadoop, Azure Data Factory, Communication, Creativity, Data Anonymization, Database Administration, Database Management System (DBMS), Database Optimization, Database Security Best Practices, Data Engineering, Data Engineering Platforms, Data Infrastructure, Data Integration, Data Lake, Data Modeling, Data Pipeline, Data Quality, Data Transformation, Data Validation {+ 18 more}Desired Languages (If blank, desired languages not specified)
Travel Requirements
Available for Work Visa Sponsorship?
Government Clearance Required?
Job Posting End Date
March 27, 2025* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Agile Architecture Avro AWS Azure Big Data CI/CD CSV Databricks Data governance Data pipelines Data quality Docker Engineering ETL GCP Generative AI Git GitHub Hadoop Java JSON Kafka Kubernetes Lambda LLMs NoSQL Parquet Pipelines PostgreSQL PySpark Python RDBMS Scala Security Snowflake Spark Streaming Unstructured data
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.