Consultant

Bangalore, Karnataka, India

Full Time Mid-level / Intermediate USD 40K - 75K * ^est.

KPMG India

Welcome to KPMG International.

View all jobs at KPMG India

Apply now Apply later

Posted 12 hours ago

KPMG Delivery Network (KDN) AI Foundry is looking for a highly motivated and detail-oriented professional to join its growing team as a Data Validation & Pre-processing tester. This role is pivotal in ensuring the integrity and readiness of data used in AI model development across a wide range of business applications.

Key responsibilities include:

Data Quality Assessment

Conduct thorough audits of incoming datasets to identify missing, inconsistent, or duplicate records.
Validate data against predefined schemas and business rules to ensure structural integrity.
Collaborate with data owners to resolve anomalies and ensure data readiness for modeling.

Bias & Imbalance Detection

Analyze datasets for class imbalance, demographic representation, and potential sources of bias.
Apply statistical techniques to quantify and visualize bias across features and labels.
Recommend and implement rebalancing strategies such as resampling or synthetic data generation.

Validation Frameworks

Develop automated validation scripts to check data types, value ranges, and logical consistency.
Integrate validation steps into CI/CD pipelines to catch issues early in the ML lifecycle.
Maintain test coverage for edge cases and evolving data schemas.

Documentation & Reproducibility

Maintain detailed logs of data sources, transformation logic, and validation outcomes.
Create reproducible notebooks and scripts for auditability and knowledge transfer.
Ensure compliance with data governance and privacy standards through transparent documentation.

Educational qualifications

Bachelor’s degree in Computer Science, Data Science, Engineering, or a related field is required. Master’s degree or relevant certifications in data or AI technologies are preferred.

Work experience

Minimum 5 years of experience in testing, data validation, or AI/ML pipeline development.
Proven track record in building and maintaining scalable data pre-processing workflows.
Hands-on experience with data quality assessment tools and frameworks (e.g., Great Expectations, Pandera).
Strong understanding of cloud-based data platforms such as Azure, AWS, or GCP.
Experience collaborating with cross-functional teams including data scientists, engineers, and business stakeholders.

Skills

Proficiency in Python and data libraries such as Pandas, NumPy, and Scikit-learn.
Experience with data validation tools.
Strong understanding of data pre-processing techniques including normalization, encoding, and feature engineering.
Familiarity with cloud platforms (Azure, AWS, GCP) and their data services.
Excellent analytical, problem-solving, and communication skills for cross-functional collaboration.

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats: 0 0 0

Category: Consulting Jobs

Tags: AWS Azure CI/CD Computer Science Data governance Data quality Engineering Feature engineering GCP Machine Learning ML models NumPy Pandas Pipelines Privacy Python Scikit-learn Statistics Testing