Senior PySpark Data Engineer (Big Data, Cloud Data Solutions, & Python)

Hyderabad, India

Synechron

Synechron is an innovative global consulting firm delivering industry-leading digital solutions to transform and empower businesses.

View all jobs at Synechron

Apply now Apply later

Job Summary

Synechron is seeking a skilled PySpark Data Engineer to design, develop, and optimize data processing solutions leveraging modern big data technologies. In this role, you will lead efforts to build scalable data pipelines, support data integration initiatives, and work closely with cross-functional teams to enable data-driven decision-making. Your expertise will contribute to enhancing business insights and operational efficiency, positioning Synechron as a pioneer in adopting emerging data technologies.

Software Requirements

Required Software Skills:

  • PySpark (Apache Spark with Python) – experience in developing data pipelines
  • Apache Spark ecosystem knowledge
  • Python programming (versions 3.7 or higher)
  • SQL and relational database management systems (e.g., PostgreSQL, MySQL)
  • Cloud platforms (preferably AWS or Azure)
  • Version control: GIT
  • Data workflow orchestration tools like Apache Airflow
  • Data management tools: SQL Developer or equivalent

Preferred Software Skills:

  • Experience with Hadoop ecosystem components
  • Knowledge of containerization (Docker, Kubernetes)
  • Familiarity with data lake and data warehouse solutions (e.g., AWS S3, Redshift, Snowflake)
  • Monitoring and logging tools (e.g., Prometheus, Grafana)

Overall Responsibilities

  • Lead the design and implementation of large-scale data processing solutions using PySpark and related technologies
  • Collaborate with data scientists, analysts, and business teams to understand data requirements and deliver scalable pipelines
  • Mentor junior team members on best practices in data engineering and emerging technologies
  • Evaluate new tools and methodologies to optimize data workflows and improve data quality
  • Ensure data solutions are robust, scalable, and aligned with organizational data governance policies
  • Stay informed on industry trends and technological advancements in big data and analytics
  • Support production environment stability and performance tuning of data pipelines
  • Drive innovative approaches to extract value from large and complex datasets

Technical Skills (By Category)

Programming Languages:

  • Required: Python (PySpark experience minimum 2 years)
  • Preferred: Scala (for Spark), SQL, Bash scripting

Databases/Data Management:

  • Relational databases (PostgreSQL, MySQL)
  • Distributed storage solutions (HDFS, cloud object storage like S3 or Azure Blob Storage)
  • Data warehousing platforms (Snowflake, Redshift – preferred)

Cloud Technologies:

  • Required: Experience deploying and managing data solutions on AWS or Azure
  • Preferred: Knowledge of cloud-native services like EMR, Data Factory, or Azure Data Lake

Frameworks and Libraries:

  • Apache Spark (PySpark)
  • Airflow or similar orchestration tools
  • Data processing frameworks (Kafka, Spark Streaming – preferred)

Development Tools and Methodologies:

  • Version control with GIT
  • Agile management tools: Jira, Confluence
  • Continuous integration/deployment pipelines (Jenkins, GitLab CI)

Security Protocols:

  • Understanding of data security, access controls, and GDPR compliance in cloud environments

Experience Requirements

  • Minimum of 5+ years in data engineering, with hands-on PySpark experience
  • Proven track record of developing, deploying, and maintaining scalable data pipelines
  • Experience working with data lakes, data warehouses, and cloud data services
  • Demonstrated leadership in projects involving big data technologies
  • Experience mentoring junior team members and collaborating across teams
  • Prior experience in financial, healthcare, or retail sectors is beneficial but not mandatory

Day-to-Day Activities

  • Develop, optimize, and deploy big data pipelines using PySpark and related tools
  • Collaborate with data analysts, data scientists, and business teams to define data requirements
  • Conduct code reviews, troubleshoot pipeline issues, and optimize performance
  • Mentor junior team members on best practices and emerging technologies
  • Design solutions for data ingestion, transformation, and storage
  • Evaluate new tools and frameworks for continuous improvement
  • Maintain documentation, monitor system health, and ensure security compliance
  • Participate in sprint planning, daily stand-ups, and project retrospectives to align priorities

Qualifications

  • Bachelor’s or Master’s degree in Computer Science, Information Technology, or related discipline
  • Relevant industry certifications (e.g., AWS Data Analytics, GCP Professional Data Engineer) preferred
  • Proven experience working with PySpark and big data ecosystems
  • Strong understanding of software development lifecycle and data governance standards
  • Commitment to continuous learning and professional development in data engineering technologies

Professional Competencies

  • Analytical mindset and problem-solving acumen for complex data challenges
  • Effective leadership and team management skills
  • Excellent communication skills tailored to technical and non-technical audiences
  • Adaptability in fast-evolving technological landscapes
  • Strong organizational skills to prioritize tasks and manage multiple projects
  • Innovation-driven with a passion for leveraging emerging data technologies

S​YNECHRON’S DIVERSITY & INCLUSION STATEMENT
 

Diversity & Inclusion are fundamental to our culture, and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity, Equity, and Inclusion (DEI) initiative ‘Same Difference’ is committed to fostering an inclusive culture – promoting equality, diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger, successful businesses as a global company. We encourage applicants from across diverse backgrounds, race, ethnicities, religion, age, marital status, gender, sexual orientations, or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements, mentoring, internal mobility, learning and development programs, and more.


All employment decisions at Synechron are based on business needs, job requirements and individual qualifications, without regard to the applicant’s gender, gender identity, sexual orientation, race, ethnicity, disabled or veteran status, or any other characteristic protected by law.

Candidate Application Notice

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  0  0  0

Tags: Agile Airflow AWS Azure Big Data Computer Science Confluence Data Analytics Data governance Data management Data pipelines Data quality Data warehouse Data Warehousing Docker Engineering GCP Git GitLab Grafana Hadoop HDFS Jenkins Jira Kafka Kubernetes MySQL Pipelines PostgreSQL PySpark Python RDBMS Redshift Scala Security Snowflake Spark SQL Streaming

Perks/benefits: Career development Flex hours

Region: Asia/Pacific
Country: India

More jobs like this