EY-GDS Consulting-AI and DATA-Data Engineer-Senior
Bengaluru, KA, IN, 560016
EY
Mit unseren vier integrierten Geschäftsbereichen — Wirtschaftsprüfung und prüfungsnahe Dienstleistungen, Steuerberatung, Unternehmensberatung und Strategy and Transactions — sowie unserem Branchenwissen unterstützen wir unsere Mandanten dabei,...At EY, you’ll have the chance to build a career as unique as you are, with the global scale, support, inclusive culture and technology to become the best version of you. And we’re counting on your unique voice and perspective to help EY become even better, too. Join us and build an exceptional experience for yourself, and a better working world for all.
We are seeking a highly skilled and motivated AWS Data Engineer with experience in AWS Glue, AWS Redshift, S3, and Python to join our dynamic team. As a Data Engineer, you will be responsible for designing, developing, and optimizing data pipelines and solutions that support business intelligence, analytics, and large-scale data processing. You will work closely with data scientists, analysts, and other engineering teams to ensure seamless data flow across our systems.
Key Responsibilities:
- Design and Develop ETL Pipelines: Leverage AWS Glue to design and implement scalable ETL (Extract, Transform, Load) processes that move and transform data from various sources into AWS Redshift or other storage systems.
- Engineering governed, batch and near real time data pipelines using AWS native technologies like DirectConnect, S3, Lambda functions, Glue, Kinesis and CloudTrail or equivalent
- Designing and implementing serverless data engineering workloads using AWS ecosystem , taking inputs from S3, RDS, and other cloud based sources (ex: SaaS data) , applying business transformations using distributed compute (ex : EMR, Glue, Spark, etc. ) and persisting insights in the target store (ex: S3, Redshift, DynamoDB)
- Maintain, optimize, and scale AWS Redshift clusters to ensure efficient data storage, retrieval, and query performance.
- Utilize Amazon S3 to store raw data, manage large datasets, and integrate with other AWS services to ensure secure, scalable, and cost-effective data solutions.
- Create and manage AWS Glue crawlers and jobs to automate data cataloging and ingestion processes across various structured and unstructured data sources.
- Use Python (and PySpark within Glue) to write scripts for data transformation, integration, and automation tasks, ensuring clean, efficient, and reusable code.
- Ensure data accuracy and integrity by implementing data validation, cleansing, and error-handling processes in ETL pipelines.
- Optimize AWS Glue jobs, Redshift queries, and data flows to ensure optimal performance and reduce processing times and costs.
- Enable data consumption from reporting and analytics business applications using AWS services (ex: QuickSight, Sagemaker, JDBC / ODBC connectivity, etc.)
- Experience in identify, define and design logical data model, required entities, relationships, data constraints and dependencies focused on enabling reporting and analytics business use cases
- Work closely with data scientists, analysts, and stakeholders to understand data requirements and provide solutions that enable data-driven decision-making.
- Monitoring and Troubleshooting: Develop and implement monitoring strategies to ensure data pipelines are running smoothly. Quickly troubleshoot and resolve any data-related issues or failures.
Required Skills and Qualifications:
- 5+ years of experience in data engineering or a similar role, with a focus on AWS technologies.
- Strong experience with AWS Glue building ETL pipelines, managing crawlers, and working with Glue data catalogue.
- Proficiency in AWS Redshift designing and managing Redshift clusters, writing complex SQL queries, and optimizing query performance.
- Hands-on experience with Amazon S3 data storage, data lifecycle policies, and integration with other AWS services.
- Solid programming skills in Python especially for data manipulation (using libraries like pandas) and automation of ETL jobs.
- Experience with PySpark within AWS Glue for large-scale data transformations.
- Proficiency in writing and optimizing SQL queries for data manipulation and reporting.
- Familiarity with data warehouse concepts: star schemas, partitioning, indexing, and data normalization.
- Strong problem-solving skills and attention to detail.
- Experience with version control systems like SVN, Git.
- Experience with Data Streaming Technologies like AWS Kinesis and Kafka Implementation on AWS
Good To have :
- Knowledge of AWS IAM for managing secure access to data resources.
- Familiarity with DevOps practices and automation tools like Terraform or CloudFormation.
- Experience with data visualization tools like QuickSight or integrating Redshift data with BI tools (Tableau, PowerBI, etc.).
- AWS certifications such as AWS Certified Data Analytics – Specialty or AWS Certified Solutions Architect are a plus.
EY | Building a better working world
EY exists to build a better working world, helping to create long-term value for clients, people and society and build trust in the capital markets.
Enabled by data and technology, diverse EY teams in over 150 countries provide trust through assurance and help clients grow, transform and operate.
Working across assurance, consulting, law, strategy, tax and transactions, EY teams ask better questions to find new answers for the complex issues facing our world today.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: AWS AWS Glue Business Intelligence CloudFormation Consulting Data Analytics Data pipelines Data visualization Data warehouse DevOps DynamoDB Engineering ETL Git Kafka Kinesis Lambda Pandas Pipelines Power BI PySpark Python QuickSight Redshift SageMaker Spark SQL Streaming Tableau Terraform Unstructured data
Perks/benefits: Career development
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.