Data Engineer
Pune City/Hyderabad, India
DATAECONOMY
Job Overview:
We are seeking a highly skilled Senior Python
& PySpark Developer to join our team. The ideal candidate will have
extensive experience with Python development, PySpark, and working with large
datasets in distributed computing environments. You will be responsible for
designing, implementing, and optimizing data pipelines, ensuring seamless data
processing, and contributing to our overall data architecture.
Key Responsibilities:
- Develop,
maintain, and optimize scalable data processing pipelines using Python and
PySpark.
- Collaborate
with cross-functional teams to understand business requirements and
translate them into technical specifications.
- Work
with large datasets to perform data wrangling, cleansing, and analysis.
- Implement
best practices for efficient distributed computing and data processing.
- Optimize
existing data pipelines for performance and scalability.
- Conduct
code reviews, mentor junior developers, and contribute to team knowledge
sharing.
- Develop
and maintain technical documentation.
- Troubleshoot,
debug, and resolve issues related to data processing.
- Collaborate
with data engineers, data scientists, and analysts to deliver high-quality
solutions.
Requirements
- Bachelor's
or Master’s degree in Computer Science, Data Engineering, or related
field.
- 5+
years of experience in Python programming.
- 3+
years of hands-on experience with PySpark and distributed data processing
frameworks.
- Strong
understanding of big data ecosystems (Hadoop, Spark, Hive).
- Experience
working with cloud platforms like AWS, GCP, or Azure.
- Proficient
with SQL and relational databases.
- Familiarity
with ETL processes and data pipelines.
- Strong
problem-solving skills with the ability to troubleshoot and optimize code.
- Excellent
communication skills and the ability to work in a team-oriented
environment.
Preferred Qualifications:
- Experience
with Apache Kafka or other real-time data streaming technologies.
- Familiarity
with machine learning frameworks (TensorFlow, Scikit-learn).
- Experience
with Docker, Kubernetes, or other containerization technologies.
- Knowledge
of DevOps tools (CI/CD pipelines, Jenkins, Git, etc.).
- Familiarity
with data warehousing solutions such as Redshift or Snowflake.
Benefits
Standard Company Benefits* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Architecture AWS Azure Big Data CI/CD Computer Science Data pipelines Data Warehousing DevOps Docker Engineering ETL GCP Git Hadoop Jenkins Kafka Kubernetes Machine Learning Pipelines PySpark Python RDBMS Redshift Scikit-learn Snowflake Spark SQL Streaming TensorFlow
Perks/benefits: Career development
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.