Internship - Vision Language Models for Luggage Recognition

Courbevoie, FR, 92400

IDEMIA

We make it safer and easier for people to pay, connect, be identified, access, travel and stay safe in the physical and digital worlds.

View all jobs at IDEMIA

Apply now Apply later

 

Since our founding, IDEMIA has been on a mission to unlock the world and make it safer through our cutting-edge identity technologies. Our technology leadership makes us the partner of choice for hundreds of governments and thousands of enterprises in over 180 countries, including some of the biggest and most influential brands in the world. In applying our unique expertise in biometrics and cryptography, we enable our clients to unlock simpler and safer ways to pay, connect, access, identify, travel and protect public places – at scale and in total security.

 

Our teams work from 5 continents and speak 100+ different languages. We strongly believe that our diversity is a key driver of innovation and performance.

 

Context & Purpose

Bags/luggage are objects of interest in many travel and video surveillance applications. Abandoned Bags Detection, or Lost Luggage Identification are subjects on which the URT (IDEMIA's Research and Technology Unit) has been working for several years. For example, in partnership with Air France, IDEMIA developed ALIX, a solution that enables operators in airport to identify bags that have lost their tags, using only the bag’s photo. How does it work?

Once a bag is managed by the baggage handling system (BHS) of the airport, the ALIX Arch module performs high-quality captures of all sides of the luggage to create its augmented digital tag. If the traditional printed tag is torn off during its journey, the operator picks the bag, takes a new set of images of the bag and uses ALIX Core to retrieve its augmented digital tag, thus identifying the lost luggage.

ALIX recognition algorithms are mainly AI-based Computer Vision models. This internship aspires to contribute to the development of smarter, more intuitive and maybe explainable systems that can assist travelers and airport personnel alike by leveraging the synergy between visual data and natural language processing.

Key Missions

The Video Analytics team is seeking a motivated candidate, with a solid background in software development to strengthen our team over a period of 5 to 6 months. Our studies cover all aspects of scientific research, from exploration of the State of The Art (SOTA), data collection and adaptation, algorithms design and implementation to the publication of research papers or patents. The approach we want to investigate specifically in this internship is the use of Vision Language Models (VLM) to process textual information in addition to the visual data we usually use. First, the focus would be on using or finetuning existing models. If the initial results are promising, the goal is to delve into the details of training VLMs, which make this a perfect research-oriented yet solution-focused internship.

 

The main objectives of the project:

  • Understand the problem of Lost Luggage Identification, its challenges, and IDEMIA’s solutions,
  • [Research] Study the SOTA of VLMs (some call them Multimodal LLMs or even Foundation Models),
  • Design a modular software architecture to use existing [open source] VLM,
  • Apply the chosen VLM to different applications related to Lost Luggage Identification:
    • Visual Question answering: are these two images of the same bag?
    • Dense Image Captioning: describe the bag in the image.
    • Text-Image Retrieval: find the bag image that corresponds the best to a text description.
  • [Research] Adapt or finetune the solution on IDEMIA’s use-case (improve alignment, etc.),
  • [Research] Measure the performance of the proposed solution and compare it to existing ones,
  • [Research] Dive into VLMs’ training techniques (contrastive, masking, etc.), compare them, adapt them to IDEMIA data to get the best results.  
  • Present results and document findings (report, paper, patent).

 

Profile & Other Information

  • Student in an engineering school or a Master's (M2) student specializing in Computer Vision, Image Processing, or Deep Learning
  • Strong knowledge of Deep Learning
    • At least two [school or personal] projects / experiences in the field of Computer Vision or NLP
    • An experience with VLMs or LLMs is a plus

 

  • Proficiency in Python and PyTorch (or similar frameworks)
  • Solid training in data analysis and software development
  • Proficient in English (e.g., reading scientific articles, presenting work)
  • Curious, proactive, and autonomous
  • Result-oriented

 

By choosing to work at IDEMIA, you will join a unique tech company, offering a wide range of growth opportunities. You will contribute to a safer world, collaborating with an international and global community. We value the diversity of our teams and welcome people from all walks of life, regardless of how they look, where they come from, who they love, or what they think.

 

We deliver cutting edge, future proof innovation that reach the highest technological standards and we’re transforming, fast, to stay a leader in a world that’s changing fast, too.

 

At IDEMIA, people can develop their expertise and feel a sense of ownership and empowerment, in a global environment, as part of a company with the ambition and the ability to change the world.

 

Visit our website to know more about the leader in Identity Technologies

www.idemia.com

Apply now Apply later
Job stats:  17  6  0

Tags: Architecture Computer Vision Data analysis Deep Learning Engineering LLMs NLP Open Source Python PyTorch Research Security

Perks/benefits: Career development Startup environment

Region: Europe
Country: France

More jobs like this