Senior PySpark Data Engineer (Big Data, Cloud Data Solutions, & Python)

Hyderabad, India

Full Time Senior-level / Expert USD 59K - 109K * ^est.

Synechron

Synechron is an innovative global consulting firm delivering industry-leading digital solutions to transform and empower businesses.

View all jobs at Synechron

Apply now Apply later

Posted 9 hours ago

Job Summary

Synechron is seeking a skilled PySpark Data Engineer to design, develop, and optimize data processing solutions leveraging modern big data technologies. In this role, you will lead efforts to build scalable data pipelines, support data integration initiatives, and work closely with cross-functional teams to enable data-driven decision-making. Your expertise will contribute to enhancing business insights and operational efficiency, positioning Synechron as a pioneer in adopting emerging data technologies.

Software Requirements

Required Software Skills:

PySpark (Apache Spark with Python) – experience in developing data pipelines
Apache Spark ecosystem knowledge
Python programming (versions 3.7 or higher)
SQL and relational database management systems (e.g., PostgreSQL, MySQL)
Cloud platforms (preferably AWS or Azure)
Version control: GIT
Data workflow orchestration tools like Apache Airflow
Data management tools: SQL Developer or equivalent

Preferred Software Skills:

Experience with Hadoop ecosystem components
Knowledge of containerization (Docker, Kubernetes)
Familiarity with data lake and data warehouse solutions (e.g., AWS S3, Redshift, Snowflake)
Monitoring and logging tools (e.g., Prometheus, Grafana)

Overall Responsibilities

Lead the design and implementation of large-scale data processing solutions using PySpark and related technologies
Collaborate with data scientists, analysts, and business teams to understand data requirements and deliver scalable pipelines
Mentor junior team members on best practices in data engineering and emerging technologies
Evaluate new tools and methodologies to optimize data workflows and improve data quality
Ensure data solutions are robust, scalable, and aligned with organizational data governance policies
Stay informed on industry trends and technological advancements in big data and analytics
Support production environment stability and performance tuning of data pipelines
Drive innovative approaches to extract value from large and complex datasets

Technical Skills (By Category)

Programming Languages:

Required: Python (PySpark experience minimum 2 years)
Preferred: Scala (for Spark), SQL, Bash scripting

Databases/Data Management:

Relational databases (PostgreSQL, MySQL)
Distributed storage solutions (HDFS, cloud object storage like S3 or Azure Blob Storage)
Data warehousing platforms (Snowflake, Redshift – preferred)

Cloud Technologies:

Required: Experience deploying and managing data solutions on AWS or Azure
Preferred: Knowledge of cloud-native services like EMR, Data Factory, or Azure Data Lake

Frameworks and Libraries:

Apache Spark (PySpark)
Airflow or similar orchestration tools
Data processing frameworks (Kafka, Spark Streaming – preferred)

Development Tools and Methodologies:

Version control with GIT
Agile management tools: Jira, Confluence
Continuous integration/deployment pipelines (Jenkins, GitLab CI)

Security Protocols:

Understanding of data security, access controls, and GDPR compliance in cloud environments

Experience Requirements

Minimum of 5+ years in data engineering, with hands-on PySpark experience
Proven track record of developing, deploying, and maintaining scalable data pipelines
Experience working with data lakes, data warehouses, and cloud data services
Demonstrated leadership in projects involving big data technologies
Experience mentoring junior team members and collaborating across teams
Prior experience in financial, healthcare, or retail sectors is beneficial but not mandatory

Day-to-Day Activities

Develop, optimize, and deploy big data pipelines using PySpark and related tools
Collaborate with data analysts, data scientists, and business teams to define data requirements
Conduct code reviews, troubleshoot pipeline issues, and optimize performance
Mentor junior team members on best practices and emerging technologies
Design solutions for data ingestion, transformation, and storage
Evaluate new tools and frameworks for continuous improvement
Maintain documentation, monitor system health, and ensure security compliance
Participate in sprint planning, daily stand-ups, and project retrospectives to align priorities

Qualifications

Bachelor’s or Master’s degree in Computer Science, Information Technology, or related discipline
Relevant industry certifications (e.g., AWS Data Analytics, GCP Professional Data Engineer) preferred
Proven experience working with PySpark and big data ecosystems
Strong understanding of software development lifecycle and data governance standards
Commitment to continuous learning and professional development in data engineering technologies

Professional Competencies

Analytical mindset and problem-solving acumen for complex data challenges
Effective leadership and team management skills
Excellent communication skills tailored to technical and non-technical audiences
Adaptability in fast-evolving technological landscapes
Strong organizational skills to prioritize tasks and manage multiple projects
Innovation-driven with a passion for leveraging emerging data technologies

SYNECHRON’S DIVERSITY & INCLUSION STATEMENT

Diversity & Inclusion are fundamental to our culture, and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity, Equity, and Inclusion (DEI) initiative ‘Same Difference’ is committed to fostering an inclusive culture – promoting equality, diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger, successful businesses as a global company. We encourage applicants from across diverse backgrounds, race, ethnicities, religion, age, marital status, gender, sexual orientations, or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements, mentoring, internal mobility, learning and development programs, and more.

All employment decisions at Synechron are based on business needs, job requirements and individual qualifications, without regard to the applicant’s gender, gender identity, sexual orientation, race, ethnicity, disabled or veteran status, or any other characteristic protected by law.

Candidate Application Notice

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats: 0 0 0

Categories: Big Data Jobs Engineering Jobs

Tags: Agile Airflow AWS Azure Big Data Computer Science Confluence Data Analytics Data governance Data management Data pipelines Data quality Data warehouse Data Warehousing Docker Engineering GCP Git GitLab Grafana Hadoop HDFS Jenkins Jira Kafka Kubernetes MySQL Pipelines PostgreSQL PySpark Python RDBMS Redshift Scala Security Snowflake Spark SQL Streaming