Data Engineer - WGDT

New Delhi, India

⚠️ We'll shut down after Aug 1st - try foo🦍 for all jobs in tech ⚠️

Full Time USD 49K - 91K * ^est.

Wadhwani Foundation

Wadhwani Foundation is a non-profit organization that focuses on accelerating job growth & paves the way for millions to earn family-sustaining wages. Join us!

View all jobs at Wadhwani Foundation

Apply now Apply later

Posted 1 month ago

The Role Context:

This is an exciting opportunity to join a dynamic and growing organization, working at the forefront of technology trends and developments in social impact sector. Wadhwani Center for Government Digital Transformation (WGDT) works with the government ministries and state departments in India with a mission of “Enabling digital transformation to enhance the impact of government policy, initiatives and programs”.

We are seeking a highly motivated and detail-oriented individual to join our team as a Data Engineer with experience in the designing, constructing, and maintaining the architecture and infrastructure necessary for data generation, storage and processing and contribute to the successful implementation of digital government policies and programs. You will play a key role in developing, robust, scalable, and efficient systems to manage large volumes of data, make it accessible for analysis and decision-making and driving innovation & optimizing operations across various government ministries and state departments in India.

Key Responsibilities:

a. Data Architecture Design: Design, develop, and maintain scalable data pipelines and infrastructure for ingesting, processing, storing, and analyzing large volumes of data efficiently. This involves understanding business requirements and translating them into technical solutions.

b. Data Integration: Integrate data from various sources such as databases, APIs, streaming platforms, and third-party systems. Should ensure the data is collected reliably and efficiently, maintaining data quality and integrity throughout the process as per the Ministries/government data standards.

c. Data Modeling: Design and implement data models to organize and structure data for efficient storage and retrieval. They use techniques such as dimensional modeling, normalization, and denormalization depending on the specific requirements of the project.

d. Data Pipeline Development/ ETL (Extract, Transform, Load): Develop data pipeline/ETL processes to extract data from source systems, transform it into the desired format, and load it into the target data systems. This involves writing scripts or using ETL tools or building data pipelines to automate the process and ensure data accuracy and consistency.

e. Data Quality and Governance: Implement data quality checks and data governance policies to ensure data accuracy, consistency, and compliance with regulations. Should be able to design and track data lineage, data stewardship, metadata management, building business glossary etc.

f. Data lakes or Warehousing: Design and maintain data lakes and data warehouse to store and manage structured data from relational databases, semi-structured data like JSON or XML, and unstructured data such as text documents, images, and videos at any scale. Should be able to integrate with big data processing frameworks such as Apache Hadoop, Apache Spark, and Apache Flink, as well as with machine learning and data visualization tools.

g. Data Security: Implement security practices, technologies, and policies designed to protect data from unauthorized access, alteration, or destruction throughout its lifecycle. It should include data access, encryption, data masking and anonymization, data loss prevention, compliance, and regulatory requirements such as DPDP, GDPR, etc.

h. Database Management: Administer and optimize databases, both relational and NoSQL, to manage large volumes of data effectively.

i. Data Migration: Plan and execute data migration projects to transfer data between systems while ensuring data consistency and minimal downtime.

a. Performance Optimization: Optimize data pipelines and queries for performance and scalability. Identify and resolve bottlenecks, tune database configurations, and implement caching and indexing strategies to improve data processing speed and efficiency.

b. Collaboration: Collaborate with data scientists, analysts, and other stakeholders to understand their data requirements and provide them with access to the necessary data resources. They also work closely with IT operations teams to deploy and maintain data infrastructure in production environments.

c. Documentation and Reporting: Document their work including data models, data pipelines/ETL processes, and system configurations. Create documentation and provide training to other team members to ensure the sustainability and maintainability of data systems.

d. Continuous Learning: Stay updated with the latest technologies and trends in data engineering and related fields. Should participate in training programs, attend conferences, and engage with the data engineering community to enhance their skills and knowledge.

Desired Skills/ Competencies

Education: A Bachelor's or Master's degree in Computer Science, Software Engineering, Data Science, or equivalent with at least 5 years of experience.
Database Management: Strong expertise in working with databases, such as SQL databases (e.g., MySQL, PostgreSQL) and NoSQL databases (e.g., MongoDB, Cassandra).
Big Data Technologies: Familiarity with big data technologies, such as Apache Hadoop, Spark, and related ecosystem components, for processing and analyzing large-scale datasets.
ETL Tools: Experience with ETL tools (e.g., Apache NiFi, Talend, Apache Airflow, Talend Open Studio, Pentaho, Infosphere) for designing and orchestrating data workflows.
Data Modeling and Warehousing: Knowledge of data modeling techniques and experience with data warehousing solutions (e.g., Amazon Redshift, Google BigQuery, Snowflake).
Data Governance and Security: Understanding of data governance principles and best practices for ensuring data quality and security.
Cloud Computing: Experience with cloud platforms (e.g., AWS, Azure, Google Cloud) and their data services for scalable and cost-effective data storage and processing.
Streaming Data Processing: Familiarity with real-time data processing frameworks (e.g., Apache Kafka, Apache Flink) for handling streaming data.

KPIs:

Data Pipeline Efficiency: Measure the efficiency of data pipelines in terms of data processing time, throughput, and resource utilization. KPIs could include average time to process data, data ingestion rates, and pipeline latency.
Data Quality Metrics: Track data quality metrics such as completeness, accuracy, consistency, and timeliness of data. KPIs could include data error rates, missing values, data duplication rates, and data validation failures.
System Uptime and Availability: Monitor the uptime and availability of data infrastructure, including databases, data warehouses, and data processing systems. KPIs could include system uptime percentage, mean time between failures (MTBF), and mean time to repair (MTTR).
Data Storage Efficiency: Measure the efficiency of data storage systems in terms of storage utilization, data compression rates, and data retention policies. KPIs could include storage utilization rates, data compression ratios, and data storage costs per unit.
Data Security and Compliance: Track adherence to data security policies and regulatory compliance requirements such as DPDP, GDPR, HIPAA, or PCI DSS. KPIs could include security incident rates, data access permissions, and compliance audit findings.
Data Processing Performance: Monitor the performance of data processing tasks such as ETL (Extract, Transform, Load) processes, data transformations, and data aggregations. KPIs could include data processing time, CPU usage, and memory consumption.
Scalability and Performance Tuning: Measure the scalability and performance of data systems under varying workloads and data volumes. KPIs could include scalability benchmarks, system response times under load, and performance improvements achieved through tuning.
Resource Utilization and Cost Optimization: Track resource utilization and costs associated with data infrastructure, including compute resources, storage, and network bandwidth. KPIs could include cost per data unit processed, cost per query, and cost savings achieved through optimization.
Incident Response and Resolution: Monitor the response time and resolution time for data-related incidents and issues. KPIs could include incident response time, time to diagnose and resolve issues, and customer satisfaction ratings for support services.
Documentation and Knowledge Sharing: Measure the quality and completeness of documentation for data infrastructure, data pipelines, and data processes. KPIs could include documentation coverage, documentation update frequency, and knowledge sharing activities such as internal training sessions or knowledge base contributions.

Years of experience of the current role holder

New Position

Ideal years of experience

3 – 5 years

Career progression for this role

CTO WGDT (Head of Incubation Centre)

*******************************************************************************

Wadhwani Corporate Profile: (Click on this link)

Our Culture:

WF is a global not-for-profit, and works like a start-up, in a fast-moving, dynamic pace where change is the only constant and flexibility is the key to success.

Three mantras that we practice across job roles, levels, functions, programs and initiatives, are Quality, Speed, Scale, in that order.

We are an ambitious and inclusive organization, where everyone is encouraged to contribute and ideate. We are intensely and insanely focused on driving excellence in everything we do.

We want individuals with the drive for excellence, and passion to do whatever it takes to deliver world class outcomes to our beneficiaries. We set our own standards often more rigorous than what our beneficiaries demand, and we want individuals who love it this way.

We have a creative and highly energetic environment – one in which we look to each other to innovate new solutions not only for our beneficiaries but for ourselves too. Open to collaborate with a borderless mentality, often going beyond the hierarchy and siloed definitions of functional KRAs, are the individuals who will thrive in our environment.

This is a workplace where expertise is shared with colleagues around the globe. Individuals uncomfortable with change, constant innovation, and short learning cycles and those looking for stability and orderly working days may not find WF to be the right place for them.

Finally, we want individuals who want to do greater good for the society leveraging their area of expertise, skills and experience.

The foundation is an equal opportunity firm with no bias towards gender, race, colour, ethnicity, country, language, age and any other dimension that comes in the way of progress.

Join us and be a part of us!

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats: 1 0 0

Category: Engineering Jobs

Tags: Airflow APIs Architecture AWS Azure Big Data BigQuery Cassandra Computer Science Data governance Data pipelines Data quality Data visualization Data warehouse Data Warehousing Engineering ETL Flink GCP Google Cloud Hadoop JSON Kafka KPIs Machine Learning MongoDB MySQL NiFi Nonprofit NoSQL Pentaho Pipelines PostgreSQL RDBMS Redshift Security Snowflake Spark SQL Streaming Talend Unstructured data XML