Consultant

Bangalore, Karnataka, India

KPMG India

Welcome to KPMG International.

View all jobs at KPMG India

Apply now Apply later

KPMG Delivery Network (KDN) AI Foundry is looking for a highly motivated and detail-oriented professional to join its growing team as a Data Validation & Pre-processing tester. This role is pivotal in ensuring the integrity and readiness of data used in AI model development across a wide range of business applications.

Key responsibilities include:

Data Quality Assessment

  • Conduct thorough audits of incoming datasets to identify missing, inconsistent, or duplicate records.
  • Validate data against predefined schemas and business rules to ensure structural integrity.
  • Collaborate with data owners to resolve anomalies and ensure data readiness for modeling.

Bias & Imbalance Detection

  • Analyze datasets for class imbalance, demographic representation, and potential sources of bias.
  • Apply statistical techniques to quantify and visualize bias across features and labels.
  • Recommend and implement rebalancing strategies such as resampling or synthetic data generation.

Validation Frameworks

  • Develop automated validation scripts to check data types, value ranges, and logical consistency.
  • Integrate validation steps into CI/CD pipelines to catch issues early in the ML lifecycle.
  • Maintain test coverage for edge cases and evolving data schemas.

Documentation & Reproducibility

  • Maintain detailed logs of data sources, transformation logic, and validation outcomes.
  • Create reproducible notebooks and scripts for auditability and knowledge transfer.
  • Ensure compliance with data governance and privacy standards through transparent documentation.

Educational qualifications

  • Bachelor’s degree in Computer Science, Data Science, Engineering, or a related field is required. Master’s degree or relevant certifications in data or AI technologies are preferred.

Work experience

  • Minimum 5 years of experience in testing, data validation, or AI/ML pipeline development.
  • Proven track record in building and maintaining scalable data pre-processing workflows.
  • Hands-on experience with data quality assessment tools and frameworks (e.g., Great Expectations, Pandera).
  • Strong understanding of cloud-based data platforms such as Azure, AWS, or GCP.
  • Experience collaborating with cross-functional teams including data scientists, engineers, and business stakeholders.

Skills

  • Proficiency in Python and data libraries such as Pandas, NumPy, and Scikit-learn.
  • Experience with data validation tools.
  • Strong understanding of data pre-processing techniques including normalization, encoding, and feature engineering.
  • Familiarity with cloud platforms (Azure, AWS, GCP) and their data services.
  • Excellent analytical, problem-solving, and communication skills for cross-functional collaboration.
Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  0  0  0
Category: Consulting Jobs

Tags: AWS Azure CI/CD Computer Science Data governance Data quality Engineering Feature engineering GCP Machine Learning ML models NumPy Pandas Pipelines Privacy Python Scikit-learn Statistics Testing

Region: Asia/Pacific
Country: India

More jobs like this