Consultant
Bangalore, Karnataka, India
KPMG Delivery Network (KDN) AI Foundry is looking for a highly motivated and detail-oriented professional to join its growing team as a Data Validation & Pre-processing tester. This role is pivotal in ensuring the integrity and readiness of data used in AI model development across a wide range of business applications.
Key responsibilities include:
Data Quality Assessment
- Conduct thorough audits of incoming datasets to identify missing, inconsistent, or duplicate records.
- Validate data against predefined schemas and business rules to ensure structural integrity.
- Collaborate with data owners to resolve anomalies and ensure data readiness for modeling.
Bias & Imbalance Detection
- Analyze datasets for class imbalance, demographic representation, and potential sources of bias.
- Apply statistical techniques to quantify and visualize bias across features and labels.
- Recommend and implement rebalancing strategies such as resampling or synthetic data generation.
Validation Frameworks
- Develop automated validation scripts to check data types, value ranges, and logical consistency.
- Integrate validation steps into CI/CD pipelines to catch issues early in the ML lifecycle.
- Maintain test coverage for edge cases and evolving data schemas.
Documentation & Reproducibility
- Maintain detailed logs of data sources, transformation logic, and validation outcomes.
- Create reproducible notebooks and scripts for auditability and knowledge transfer.
- Ensure compliance with data governance and privacy standards through transparent documentation.
Educational qualifications
- Bachelor’s degree in Computer Science, Data Science, Engineering, or a related field is required. Master’s degree or relevant certifications in data or AI technologies are preferred.
Work experience
- Minimum 5 years of experience in testing, data validation, or AI/ML pipeline development.
- Proven track record in building and maintaining scalable data pre-processing workflows.
- Hands-on experience with data quality assessment tools and frameworks (e.g., Great Expectations, Pandera).
- Strong understanding of cloud-based data platforms such as Azure, AWS, or GCP.
- Experience collaborating with cross-functional teams including data scientists, engineers, and business stakeholders.
Skills
- Proficiency in Python and data libraries such as Pandas, NumPy, and Scikit-learn.
- Experience with data validation tools.
- Strong understanding of data pre-processing techniques including normalization, encoding, and feature engineering.
- Familiarity with cloud platforms (Azure, AWS, GCP) and their data services.
- Excellent analytical, problem-solving, and communication skills for cross-functional collaboration.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: AWS Azure CI/CD Computer Science Data governance Data quality Engineering Feature engineering GCP Machine Learning ML models NumPy Pandas Pipelines Privacy Python Scikit-learn Statistics Testing
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.