Data Engineer - (PySpark, Kafka, and Cloud Technologies)
Pune - Hinjewadi (Ascendas)
Synechron
Synechron is an innovative global consulting firm delivering industry-leading digital solutions to transform and empower businesses.Overview:
We are seeking a Data Engineer with over 7 years of professional experience in data engineering to join our dynamic team. The ideal candidate will possess strong expertise in PySpark and Spark development, along with exposure to cloud environments such as ECS and Kubernetes. If you're passionate about data processing, real-time messaging, and building robust data solutions, we encourage you to apply!
Overall Responsibilities:
Data Processing:
- Develop and optimize data processing pipelines using PySpark and associated Spark modules for handling large datasets efficiently.
- Work with Delta Lakehouse technology to ensure reliable and scalable data storage solutions.
Cloud Environment Management:
- Utilize cloud technologies such as ECS and Kubernetes to deploy and manage data processing applications.
- Ensure efficient resource utilization and application performance within the cloud environment.
Real-Time Data Consumption:
- Implement solutions for consuming real-time messages using Kafka, handling streaming data to support various applications.
- Collaborate with stakeholders to integrate and process real-time data flows effectively.
Unix and HDFS Support:
- Leverage basic Unix knowledge and HDFS to manage and manipulate large-scale data files, with familiarity in libraries such as PyArrow.
- Ensure data operations are performed smoothly within Hadoop and Spark environments.
Database Management: Maintain working knowledge of SQL databases to query and manipulate data as needed for analysis and application needs.
Version Control: Utilize BitBucket or equivalent SCM tools for version control, collaboration, and code management throughout the development lifecycle.
Workflow Management: Knowledge of Apache Airflow is desirable to assist in orchestrating complex data workflows and ensuring task execution reliability.
Skills and Experience:
Data Engineering Experience: 7+ years of experience in software development with a strong focus on data engineering.
PySpark Proficiency: Good exposure and experience with PySpark and Spark development, particularly in data processing tasks.
Cloud Technologies: Familiarity with ECS and Kubernetes for deploying applications in a cloud environment.
Real-Time Messaging: Experience with consuming real-time messages via Kafka, enhancing data pipeline responsiveness.
Unix and HDFS: Basic understanding of Unix and HDFS operations, along with relevant support libraries such as PyArrow.
Database Skills:
Overview:
We are seeking a Data Engineer with over 7 years of professional experience in data engineering to join our dynamic team. The ideal candidate will possess strong expertise in PySpark and Spark development, along with exposure to cloud environments such as ECS and Kubernetes. If you're passionate about data processing, real-time messaging, and building robust data solutions, we encourage you to apply!Overall Responsibilities:
Data Processing:
- Develop and optimize data processing pipelines using PySpark and associated Spark modules for handling large datasets efficiently.
- Work with Delta Lakehouse technology to ensure reliable and scalable data storage solutions.
Cloud Environment Management:
- Utilize cloud technologies such as ECS and Kubernetes to deploy and manage data processing applications.
- Ensure efficient resource utilization and application performance within the cloud environment.
Real-Time Data Consumption:
- Implement solutions for consuming real-time messages using Kafka, handling streaming data to support various applications.
- Collaborate with stakeholders to integrate and process real-time data flows effectively.
Unix and HDFS Support:
- Leverage basic Unix knowledge and HDFS to manage and manipulate large-scale data files, with familiarity in libraries such as PyArrow.
- Ensure data operations are performed smoothly within Hadoop and Spark environments.
Database Management:
- Maintain working knowledge of SQL databases to query and manipulate data as needed for analysis and application needs.
Version Control:
- Utilize BitBucket or equivalent SCM tools for version control, collaboration, and code management throughout the development lifecycle.
Workflow Management:
- Knowledge of Apache Airflow is desirable to assist in orchestrating complex data workflows and ensuring task execution reliability.
Skills and Experience:
Data Engineering Experience: 7+ years of experience in software development with a strong focus on data engineering.
PySpark Proficiency: Good exposure and experience with PySpark and Spark development, particularly in data processing tasks.
Cloud Technologies: Familiarity with ECS and Kubernetes for deploying applications in a cloud environment.
Real-Time Messaging: Experience with consuming real-time messages via Kafka, enhancing data pipeline responsiveness.
Unix and HDFS: Basic understanding of Unix and HDFS operations, along with relevant support libraries such as PyArrow.
Database Skills: Working knowledge of databases (SQL), capable of querying and handling data for processing.
Source Control: Proficient in using BitBucket or equivalent source control management tool.
Team Collaboration: Ability to collaborate effectively in teams, building meaningful relationships to achieve common goals.
Time Management:
- Skilled at multi-tasking and delivering projects within tight deadlines.
- Working knowledge of databases (SQL), capable of querying and handling data for processing.
Source Control: Proficient in using BitBucket or equivalent source control management tool.
Team Collaboration: Ability to collaborate effectively in teams, building meaningful relationships to achieve common goals.
Time Management: Skilled at multi-tasking and delivering projects within tight deadlines.
If you are a dedicated Data Engineer with a passion for building data-driven solutions and driving change within an organization, we encourage you to apply for this exciting opportunity and join our innovative team!
SYNECHRON’S DIVERSITY & INCLUSION STATEMENT
Diversity & Inclusion are fundamental to our culture, and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity, Equity, and Inclusion (DEI) initiative ‘Same Difference’ is committed to fostering an inclusive culture – promoting equality, diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger, successful businesses as a global company. We encourage applicants from across diverse backgrounds, race, ethnicities, religion, age, marital status, gender, sexual orientations, or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements, mentoring, internal mobility, learning and development programs, and more.
All employment decisions at Synechron are based on business needs, job requirements and individual qualifications, without regard to the applicant’s gender, gender identity, sexual orientation, race, ethnicity, disabled or veteran status, or any other characteristic protected by law.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Airflow Bitbucket DataOps ECS Engineering Hadoop HDFS Kafka Kubernetes Pipelines PySpark Spark SQL Streaming
Perks/benefits: Equity / stock options Flex hours
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.