Master Thesis 30 HP: Using machine learning to transcribe and classify audio

Huskvarna - Stensholmsvägen 20

Apply now Apply later

Background

Saab AB, Training and Simulation, provide our customers with realistic ground combat training systems all the way from combined arms on a brigade level down to individual soldiers. Our mission is to improve the training, as well as develop new capabilities that enhance the training ability, of our customers. We want to explore the potential in applied AI to provide specialized empirical grounds for decision making. It is important that the users of our systems have confidence in our predictions and explanations.

Introduction:

In recent years, automatic speech recognition (ASR) systems have significantly advanced, making it easier to convert audio into text for downstream tasks such as classification. However, traditional classification models often focus solely on the keywords without considering the underlying semantic explanations of the categories they classify into. This thesis proposes an integrated method that combines audio-to-text transcription with explanation-aware text classification. By leveraging audio transcription and language models for embedding both the transcribed text and category explanations, this approach enables that classification is driven by both the content of the audio and the semantic richness of the categories.

This combination is especially useful in complex classification tasks where surface level details alone may not suffice to capture the underlying nuances, and where explanations can provide deeper insight into the meaning of categories.

Problem Statement:

Conducting a military exercise can be very complex especially if it is a large exercise. A lot of things can happen at the same time at different locations which can make it difficult to keep track of what is actually going on. To make the training more efficient the aim is to leverage data analysis in order to highlight important actions taken during the exercise.  Ultimately this method is aimed at interpreting the voice communication during a military exercise and classifying the meaning of what is being said into tactical categories. For example, did the plan change, if it did, why? And what was the action taken? did someone ask for support-by-fire etc.  

Objectives:

The main objective is to evaluate in what manner the following suggestions work on audio recorded various military environments, including high stress levels and noisy audio.

  • Implement a pipeline that integrates audio-to-text transcription.

  • Develop a model that converts both transcribed text and category explanations into dense vector embeddings. The model shall either be pre-trained or trained in an unsupervised manner.

  • Classify the vector embeddings of the transcribed text into the corresponding category.

  • Evaluate the performance of the model in terms of both transcription quality and classification accuracy, using audio datasets.

  • Analyze how well the model performs in tasks where categories are nuanced, and explanations are essential for correct classification.

What you will be a part of

Saab is a leading defence and security company with an enduring mission, to help nations keep their people and society safe. Empowered by its 22,000 talented people, Saab constantly pushes the boundaries of technology to create a safer and more sustainable world.

Saab designs, manufactures and maintains advanced systems in aeronautics, weapons, command and control, sensors and underwater systems. Saab is headquartered in Sweden. It has major operations all over the world and is part of the domestic defence capability of several nations. Read more about us here.

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  1  0  0

Tags: ASR Classification Data analysis Machine Learning NLP Security

Region: Europe
Country: Sweden

More jobs like this