Pyspark Developer (Contract)

Bengaluru, Karnataka, India

Apply now Apply later

Location: Bengaluru,Karnataka,India

About Us

Allen Digital was created as a strategic partnership between Allen Careers Institute and Bodhi Tree Systems to ensure tech enablement for millions of students. Allen Digital aims to build an EdTech platform to provide students with everything a classroom cannot. We have the backing of some of the best names in – media, business, education and technology. We are a start-up with a reputed team, strong investors and a legacy.

 

 

Job Summary

We are seeking a skilled PySpark Developer to join our data engineering team. The ideal candidate will have strong expertise in Apache Spark, Python programming, and big data technologies. You will be responsible for designing, developing, and maintaining scalable data pipelines and ETL processes to support our data analytics and business intelligence initiatives.

 

 

Key Responsibilities

  • Advanced PySpark Development:

    • Write Efficient PySpark Code: Develop high-quality, reusable, and scalable PySpark scripts to process large datasets.

    • Custom Transformations: Design and implement custom transformations and user-defined functions (UDFs) to meet specific data processing requirements.

    • Data Frame Manipulation: Utilize PySpark DataFrame APIs to perform complex data manipulations, aggregations, and joins.

    • Error Handling and Logging: Implement robust error handling and logging mechanisms within PySpark applications to ensure reliability and ease of troubleshooting.

  • Develop and Maintain Data Pipelines:

    • Design, build, and optimize robust and scalable data pipelines using PySpark.

    • Implement ETL processes to ingest, transform, and load data from various sources.

  • Performance Optimization:

    • Optimize PySpark jobs for performance and efficiency, including tuning Spark configurations and optimizing resource utilization.

    • Conduct code reviews and performance assessments to identify and implement improvements in PySpark codebases.

  • Data Processing and Analysis:

    • Perform data cleaning, validation, and transformation using PySpark to ensure data quality and consistency.

    • Collaborate with data analysts and scientists to understand data requirements and deliver PySpark-based solutions.

  • Integration and Deployment:

    • Integrate PySpark applications with other data tools and platforms, ensuring seamless data flow across systems.

    • Participate in the deployment and maintenance of PySpark applications in production environments, ensuring scalability and reliability.

  • Collaboration and Documentation:

    • Work closely with cross-functional teams including data engineers, software developers, and business stakeholders to deliver comprehensive data solutions.

    • Document PySpark code, processes, workflows, and technical specifications to ensure maintainability and knowledge sharing.

 

 

Qualifications

  • Education:

    • Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related field.

  • Experience:

    • 1-3 years of experience in data engineering or a related role.

    • Proven experience with Apache Spark and PySpark.

  • Technical Skills:

    • Proficiency in Python programming, with a strong emphasis on PySpark for big data processing.

    • Strong understanding of big data technologies and frameworks (e.g., Hadoop, Hive, Kafka).

    • Experience with SQL and database systems (e.g., MySQL, PostgreSQL, NoSQL databases).

    • Familiarity with data warehousing concepts and tools.

  • Tools & Platforms:

    • Experience with cloud platforms (e.g., AWS, Azure, Google Cloud) is a plus.

    • Knowledge of version control systems (e.g., Git).

 

 

Preferred Skills

  • Advanced Analytics:

    • Experience with machine learning libraries and frameworks.

  • DevOps Practices:

    • Familiarity with CI/CD pipelines and containerization technologies (e.g., Docker, Kubernetes).

  • Soft Skills:

    • Strong problem-solving abilities and attention to detail.

    • Excellent communication and teamwork skills.

    • Ability to manage multiple tasks and meet deadlines in a fast-paced environment.

Apply to this job
Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  0  0  0
Category: Engineering Jobs

Tags: APIs AWS Azure Big Data Business Intelligence CI/CD Computer Science Data Analytics Data pipelines Data quality Data Warehousing DevOps Docker Engineering ETL GCP Git Google Cloud Hadoop Kafka Kubernetes Machine Learning MySQL NoSQL Pipelines PostgreSQL PySpark Python Spark SQL

Perks/benefits: Startup environment

Region: Asia/Pacific
Country: India

More jobs like this