Consultant
Pune, IN
Atos
We design digital solutions from the everyday to the mission critical — in artificial intelligence, hybrid cloud, infrastructure management, decarbonization and employee experience.Eviden, part of the Atos Group, with an annual revenue of circa € 5 billion is a global leader in data-driven, trusted and sustainable digital transformation. As a next generation digital business with worldwide leading positions in digital, cloud, data, advanced computing and security, it brings deep expertise for all industries in more than 47 countries. By uniting unique high-end technologies across the full digital continuum with 47,000 world-class talents, Eviden expands the possibilities of data and technology, now and for generations to come.
What impact you can make
You will critical role in designing, building, and maintaining data pipelines and systems to process, store, and analyse data efficiently.
Role and Responsibilities
1. Data Pipeline Development
- Build and maintain scalable, reliable, and efficient ETL (Extract, Transform, Load) pipelines using Python and Airflow.
- Automate data ingestion and processing workflows from multiple sources .
2. Data Integration
- Integrate and transform data from disparate sources (e.g., APIs, third-party systems, legacy systems).
- Handle data standardization, validation, and quality assurance during integration.
3. Big Data Processing
- Utilize big data technologies like Apache Spark, and Snowflake for large-scale data processing.
- Write efficient and scalable Python scripts to process and validate the data.
4. Data Governance and Quality
- Implement data validation, cleaning, and transformation processes to ensure data accuracy and reliability.
- Enforce compliance with data governance policies and standards (e.g., GDPR, HIPAA).
5. Collaboration
- Work closely with other teams to understand data requirements.
- Collaborate with software engineers to integrate data workflows into applications.
6. Monitoring and Optimization
- Monitor the performance of data pipelines and systems.
- Debug and optimize data workflows to improve efficiency and reliability.
7. Scripting and Automation
- Develop reusable and modular Python scripts for repeated tasks.
- Automate workflows for recurring data processing jobs.
8. Documentation and Best Practices
- Document pipeline architecture.
Required Skills and Experience
- Experience in PySpark and Python Language.
- Experience in (OLAP Systems).
- Experience in SQL (should be able to write complex SQL Queries)
- Experience in Orchestration (Apache Airflow is preferred).
- Experience in Hadoop (Spark and Hive: Optimization of Spark and Hive apps).
- Knowledge in Snowflake (good to have).
- Experience in Data Quality (good to have).
- Knowledge in File Storage (S3 is good to have)
Let’s grow together.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Airflow APIs Architecture Big Data Data governance Data pipelines Data quality ETL Hadoop OLAP Pipelines PySpark Python Security Snowflake Spark SQL
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.