Pyspark Tech Lead
Bengaluru, Karnataka, India
ALLEN Digital
ALLEN - India's best coaching institute for NEET, IIT JEE, and classes 6 to 10 with 36 years of experience offering unparalleled personalized guidance. Join today and embark on a journey to academic excellence.Location: Bengaluru,Karnataka,India
Job Description:
We are looking for a skilled and experienced PySpark Tech Lead to join our dynamic engineering team. In this role, you will lead the development and execution of high-performance big data solutions using PySpark. You will work closely with data scientists, engineers, and architects to design and implement scalable data pipelines and analytics solutions.
As a Tech Lead, you will mentor and guide a team of engineers, ensuring the adoption of best practices for building robust and efficient systems while driving innovation in the use of data technologies.
Key Responsibilities:
Lead and Develop: Design and implement scalable, high-performance data pipelines and ETL processes using PySpark on distributed systems
Tech Leadership: Provide technical direction and leadership to a team of engineers, ensuring the delivery of high-quality solutions that meet both business and technical requirements.
Architect Solutions: Develop and enforce best practices for architecture, design, and coding standards. Lead the design of complex data engineering workflows, ensuring they are optimized for performance and cost-effectiveness.
Collaboration: Collaborate with data scientists, analysts, and other stakeholders to understand data requirements, translating them into scalable technical solutions.
Optimization & Performance Tuning: Optimize large-scale data processing pipelines to improve efficiency and performance. Implement best practices for memory management, data partitioning, and parallelization in Spark.
Code Review & Mentorship: Conduct code reviews to ensure high-quality code, maintainability, and scalability. Provide guidance and mentorship to junior and mid-level engineers.
Innovation & Best Practices: Stay current on new data technologies and trends, bringing fresh ideas and solutions to the team. Implement continuous integration and deployment pipelines for data workflows.
Problem Solving: Identify bottlenecks, troubleshoot, and resolve issues related to data quality, pipeline failures, and performance optimization.
Skills and Qualifications:
Experience:
7+ years of hands-on experience in PySpark and large-scale data processing.
Technical Expertise:
Strong knowledge of PySpark, Spark SQL, and Apache Kafka.
Experience with cloud platforms like AWS (EMR, S3), Google Cloud, or Azure.
In-depth understanding of distributed computing, parallel processing, and data engineering principles.
Data Engineering:
Expertise in building ETL pipelines, data wrangling, and working with structured and unstructured data.
Experience with databases (relational and NoSQL) such as SQL, MongoDB, or DynamoDB.
Familiarity with data warehousing solutions and query optimization techniques
Leadership & Communication:
Proven ability to lead a technical team, make key architectural decisions, and mentor junior engineers.
Excellent communication skills, with the ability to collaborate effectively with cross-functional teams and stakeholders.
Problem Solving:
Strong analytical skills with the ability to solve complex problems involving large datasets and distributed systems.
Education:
Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field (or equivalent practical experience).
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Architecture AWS Azure Big Data Computer Science Data pipelines Data quality Data Warehousing Distributed Systems DynamoDB Engineering ETL GCP Google Cloud Kafka MongoDB NoSQL Pipelines PySpark Spark SQL Unstructured data
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.