Data Engineer

Pune City/Hyderabad, India

Apply now Apply later

Job Overview:

We are seeking a highly skilled Senior Python & PySpark Developer to join our team. The ideal candidate will have extensive experience with Python development, PySpark, and working with large datasets in distributed computing environments. You will be responsible for designing, implementing, and optimizing data pipelines, ensuring seamless data processing, and contributing to our overall data architecture.

Key Responsibilities:

  • Develop, maintain, and optimize scalable data processing pipelines using Python and PySpark.
  • Collaborate with cross-functional teams to understand business requirements and translate them into technical specifications.
  • Work with large datasets to perform data wrangling, cleansing, and analysis.
  • Implement best practices for efficient distributed computing and data processing.
  • Optimize existing data pipelines for performance and scalability.
  • Conduct code reviews, mentor junior developers, and contribute to team knowledge sharing.
  • Develop and maintain technical documentation.
  • Troubleshoot, debug, and resolve issues related to data processing.
  • Collaborate with data engineers, data scientists, and analysts to deliver high-quality solutions.

Requirements


  • Bachelor's or Master’s degree in Computer Science, Data Engineering, or related field.
  • 5+ years of experience in Python programming.
  • 3+ years of hands-on experience with PySpark and distributed data processing frameworks.
  • Strong understanding of big data ecosystems (Hadoop, Spark, Hive).
  • Experience working with cloud platforms like AWS, GCP, or Azure.
  • Proficient with SQL and relational databases.
  • Familiarity with ETL processes and data pipelines.
  • Strong problem-solving skills with the ability to troubleshoot and optimize code.
  • Excellent communication skills and the ability to work in a team-oriented environment.

Preferred Qualifications:

  • Experience with Apache Kafka or other real-time data streaming technologies.
  • Familiarity with machine learning frameworks (TensorFlow, Scikit-learn).
  • Experience with Docker, Kubernetes, or other containerization technologies.
  • Knowledge of DevOps tools (CI/CD pipelines, Jenkins, Git, etc.).
  • Familiarity with data warehousing solutions such as Redshift or Snowflake.


Benefits

Standard Company Benefits

Apply now Apply later
  • Share this job via
  • 𝕏
  • or

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  0  0  0
Category: Engineering Jobs

Tags: Architecture AWS Azure Big Data CI/CD Computer Science Data pipelines Data Warehousing DevOps Docker Engineering ETL GCP Git Hadoop Jenkins Kafka Kubernetes Machine Learning Pipelines PySpark Python RDBMS Redshift Scikit-learn Snowflake Spark SQL Streaming TensorFlow

Perks/benefits: Career development

Region: Asia/Pacific
Country: India

More jobs like this