A confidence score for LLM answers

CDR (Amsterdam - Cedar), Netherlands

Apply now Apply later

PhD thesis project on Research on Confidence scoring for LLM answers

We propose a PhD thesis project on the topic of obtaining Confidence Scores for answers generated by Large Language Models (LLM), in particular for answer generation using Retrieval Augmented Generation pipelines.
 

Project:

Generative AI models like Large Language Models (LLMs) generate answers with varying levels of accuracy, giving factually inaccurate answers in some cases. As a result, validation, and evaluation of LLM results is an emerging field of interest to many users, developers and researchers. At ING Wholesale Banking Analytics (WBA) we are interested to research and develop techniques that make it possible to calculate confidence scores for LLM answers provided to question-prompts.

At ING bank we deal with a lot of documents and are doing multiple Generative AI projects to help process those efficiently, based on Retrieval Augmented Generation (RAG). A RAG pipeline couples a search engine to an LLM. This allows one to ask questions to documents and retrieve answers, known as generative question-answering (QA). Specifically: QA by the LLM based on relevant text passages retrieved from a document. Generated answers to questions are grounded by the retrieved text, thereby severely reducing the risk of hallucinations. The RAG projects aim to automate the extraction of information from unstructured documents. Typically, a fixed set of questions needs to be answered for a large batch of similar documents. We use the answers for automated form-filling, resulting in a structured summary dataset.

The problem at hand: are the generated answers reliable? Normally LLMs do not return confidence scores for generated answers, and these answers are not necessarily correct. (LLMs are not designed to do so.) The proposal is to research and develop a reliable confidence score that can be applied in (one of) ING's data extraction projects. To do so we shall use ground-truth datasets that have been manually labelled by expert analysts.

Research considerations:

  • The availability or non-availability of network weight and next-token probabilities in popular, commercial models such as ChatGPT and OpenAI.
  • How to account for the random component in generated answers. 
  • Multi-class answers. Multiple answers can be correct to the same question, for example extracted from different document pages.

Other details:

The ITP will join (parttime) an ongoing data extraction project as a satellite data scientist, to ensure the research is embedded and applied. This also facilitates collaborations with data scientists and subject matter experts.

Reference blog from our team:

https://medium.com/p/c668844d52c8

The aim is to apply the research to real-world use-cases that we have in our department. We have a RAG setup where we need to extract answers to a set of questions from a large set of documents.
The intern will be asked to study the latest research developments on confidence scores for LLMs, adjust it where needed for application to our practical use-cases, and define and build a hands-on prototype.

    The Wholesale Banking Advanced Analytics team is a large team of data scientists, data engineers, software developers and many more, that are focused on bringing data, machine learning and statistical modeling into the products that we build for our clients or internal users. The data scientists in WBAA furthermore have a strong desire to keep up with and be part of the latest developments in the fields of AI, tooling and statistics. Which they do by working closely together with master’s students on a variety of topics to solve academic yet practical problems.

    Our team has extensive experience with PhD student supervision.

    How to succeed

    We hire smart people like you for your potential. Our biggest expectation is that you’ll stay curious. Keep learning. Take on responsibility. In return, we’ll back you to develop into an even more awesome version of yourself.

    Are you a PhD student looking for a thesis project and are you interested in this one? Do you furthermore:

    • Have solid experience with Python
    • Have machine learning experience
    • Have solid skills in statistics and linear algebra (matrix rank, singular values, matrix decomposition, …)
    • Get at least six months to do your thesis project
    • Aim to go for a publication
    • Bring good vibes to your fellow data scientists

    Rewards and benefits

    This is a great opportunity to train with highly skilled people who are experts in their field. You’ll do a lot and learn a lot – not only about your specialist area and the bank, but also about yourself and whether this type of environment is right for you.

    You’ll also benefit from:

    • Internship allowance of 700 EUR based on a 36-hour work week

    • Your own work laptop

    • Hybrid working to blend home working for focus and office working for collaboration and co-creation

    • Personal growth and challenging work with endless possibilities

    • An informal working environment with innovative colleagues

    During the duration of your internship at ING, it is mandatory to be enrolled at a Dutch university (or EU-university for EU passport holders).

    Questions?

    Contact the recruiter attached to the advertisement. Want to apply directly? Please upload your CV and motivation letter by clicking the ‘Apply’ button.

    Provide recruiter and/or manager contact details.

    About our internships

    Every year, more than 350 students join our internship program. While there are no guarantees about your future, many of our former interns move into a permanent role or onto our International Talent Programme (traineeship).

    Whatever happens, an internship at ING is the ideal opportunity to meet a wide variety of people, to build up your own network, and to learn about many different aspects of banking – put simply, it’s a great start to your career.

    Apply now Apply later

    * Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

    Job stats:  3  0  0

    Tags: Banking ChatGPT Generative AI GPT Linear algebra LLMs Machine Learning OpenAI PhD Pipelines Python RAG Research Statistical modeling Statistics

    Perks/benefits: Career development Home office stipend Startup environment

    Region: Europe
    Country: Netherlands

    More jobs like this