Thesis Project: 30 hp - Cascaded machine learning models for anomaly detection at the edge

Södertälje, SE, 151 38

Full Time EUR 34K - 79K *

Scania Group

Scania is a world-leading provider of transport solutions, including trucks and buses for heavy transport applications combined with an extensive product-related service offering.

View all jobs at Scania Group

Apply now Apply later

Posted 5 hours ago

Background
Scania is one of the world’s leading manufacturer of trucks and buses for heavy transports, as well as industrial and marine engines. Transport services and logistics services make up an increasing part of our business, which guarantees Scania’s customers cost-efficient transport solutions and high availability. Over a million Scania vehicles are in active use, in over 100 countries.
In the Connected Systems department within Scania R&D, we develop new solutions for connected vehicles in our Internet of Things (IoT) platform, as part of Scania’s shift towards sustainable transport system. Advanced data analysis capabilities are a cornerstone enabler in this development.

Target/scope
Time series anomaly detection holds the potential of providing automatic detection of faults and issues in a wide variety of technical systems. There are promising results from using deep learning models in this area. However, these types of models often require a lot of computational resources which makes it difficult to employ them in edge devices, like vehicles, where the computational capabilities are restrained.
In this context, we consider a four-stage cascade of machine learning (ML) models. The aim of cascading is achieving lower computational complexity without losing too much performance. Depending on the results from the investigation, the models may either be trained together or independently, but during inference they will act together.
The goal of the project is to create a model cascade for anomaly detection at the edge, consisting of four models, with individual model computational complexity ranging from low to high, and individual performance ranging from acceptable to high. Performance should be measured in terms of some suitable metric, such as precision and recall. The models should be trained on non-anomalous data. The performance and computational requirements of the cascade during detection of anomalies will be evaluated by using labelled test data. Datasets will be provided by Scania.

Description of the assignment

Literature study of cascaded ML methods and ML methods for anomaly detection
Development of four models for anomaly detection (note that the models do not all have to be of a different type (e.g., both M3 and M4 might be DNN-based but have different size):

Model

Performance

Computational requirements

Basic

Low

Good

Moderate

High

Medium

State of the art (or similar)

High

4.Development of a training and inference framework for a cascade consisting of four individual models. Examples of such frameworks can be found in e.g. [1] and [2] but they would need to be modified given the context and goals of the present project. Special consideration should be given to the logic for model activation during inference (cf. [3] and [4]).

5..Evaluate the results with respect to suitable baselines, using various metrics such as performance, computational complexity, and memory requirements.

6.(Optional) Consideration of methods for automated real-time offloading decisions, i.e., determining when and if it is suitable to distribute the ML computations to other environments in a network given system parameters such as communication costs, latency, predicted QoS, edge device load et. al.

References
[1] S. Vargaftik et al., RADE: resource-efficient supervised anomaly detection using decision tree-based ensemble methods, https://link.springer.com/article/10.1007/s10994-021-06047-x
[2] C. Ferri et al., Delegating classifiers, https://www.researchgate.net/publication/221345841_Delegating_classifiers
[3] Shohei Enemoto, Takeharu Eda, Learning to Cascade: Confidence Calibration for Improving the Accuracy and Computational Cost of Cascade Inference Systems, https://cdn.aaai.org/ojs/16900/16900-13-20394-1-2-20210518.pdf
[4] H. Narasimhan et al., Post-hoc Estimators for Learning to Defer to an Expert, https://openreview.net/pdf?id=_jg6Sf6tuF7

Education/line/direction
Assign education, line or direction: masters programmes in Data Science, Machine Learning, Computer Science, Engineering Science, Applied Mathematics or similar.
Number of students: 1-2
Start date for the Thesis project: January 2025
Estimated timescale: 20 weeks

Contact person and supervisor
Sophia Zhang Pettersson, senior data scientist, 08-553 727 36 sophia.zhang.pettersson@scania.com
Juan Carlos Andresen, group manager, 08-553 835 16
juan-carlos.andresen@scania.com

Application
Your application should contain CV, personal letter, and copies of grades.
Applicants will be assessed on a continuous basis until the position is filled. Do not wait until the last date to apply!

A background check might be conducted for this position. We are conducting interviews continuously and may close the recruitment earlier than the date specified.

Thesis Worker

Apply now Apply later