Automated Feature Extraction and Clustering of Product Claims and Ingredients Using Machine Learning
Solna, Sweden
Knightec
Hi, we are Knightec, your strategic partner in product and service development, dedicated to create positive change for the business of tomorrow.High Level Description
Consumer products often advertise features like "protection against germs", "gentle on fabric", "natural ingredients only", "rich in vitamin C", or "high energy efficiency", making it crucial to analyse and compare these claims and ingredient lists across a wide range of product categories. The aim is to develop analytics and market insights to identify trends across various consumer products, including categories such as personal care, food and beverages, and household cleaning supplies. To achieve this, automated clustering of product claims and ingredients is needed, followed by feature extraction to identify key characteristics that define each group.
This thesis will involve developing and evaluating methods to first cluster similar product claims and ingredients into meaningful groups, and then extract important features from each cluster to identify trends, common themes, and unique selling points. This will provide valuable market insights and help assess how products are positioned relative to one another.
Project Description
The project will begin with two datasets: a product claims dataset containing around 300,000 claims from a variety of consumer products and an ingredients dataset containing lists of ingredients from these products.
The first step is to research and apply clustering techniques to group similar product claims. The focus will be on finding the most suitable clustering algorithms and optimizing them to ensure meaningful groupings. Various methods will be compared to determine which approach works best for the given data. Once the clustering is complete, feature extraction methods will be applied to identify key characteristics from the clusters of product claims. The goal is to derive relevant insights that are specific to each group, highlighting common keywords such as "protection", "natural ingredients", or "efficiency".
The ingredients dataset will also be clustered to identify common groupings and standardize ingredient lists across different products (e.g. “distilled water” and “aqua” become “water”), allowing for clearer analysis and comparison.
Throughout the project, different clustering and feature extraction methods will be compared using appropriate metrics to evaluate their performance. The research will involve identifying the best approaches, optimizing their parameters, and assessing their performance.
Who are we looking for?
We are looking for a motivated student with great interest in machine learning, natural language processing (NLP), and data science. Knowledge of clustering techniques and feature extraction is beneficial. This thesis is suitable for students of computer science, data science, or a related field, with experience in Python and machine learning frameworks.
Purpose
The purpose of the thesis is to develop an automated system for clustering product claims and ingredient lists, followed by extracting key features from the product claims. The ultimate goal is to provide analytics and market insights that can be used to identify trends, compare product categories, and understand key differentiators in the market.The thesis project can be published and used in your personal portfolio as well as in company marketing. Include Resumé/CV and portfolio in your application.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Clustering Computer Science Machine Learning NLP Python Research
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.