8-bit floating-point formats for deep learning H/F

Grenoble

CEA

Le CEA est un acteur majeur de la recherche, au service de l'État, de l'économie et des citoyens. Il apporte des solutions concrètes à leurs besoins dans quatre domaines principaux : transition énergétique, transition numérique, technologies...

View all jobs at CEA

Apply now Apply later

General information

Organisation

The French Alternative Energies and Atomic Energy Commission (CEA) is a key player in research, development and innovation in four main areas :
• defence and security,
• nuclear energy (fission and fusion),
• technological research for industry,
• fundamental research in the physical sciences and life sciences.

Drawing on its widely acknowledged expertise, and thanks to its 16000 technicians, engineers, researchers and staff, the CEA actively participates in collaborative projects with a large number of academic and industrial partners.

The CEA is established in ten centers spread throughout France
  

Reference

2024-32990  

Description de l'unité

LSTA laboratory (Advanced Technologies and Systems-on-chip Laboratory) works on the development of innovative chips for various application domains: Artificial Intelligence, High Performance Computing (HPC) and Quantum computing.
In this lab, the AI team works on designing chips to implement AI algorithms efficiently, and conversely, to design AI algorithms suited for specific hardware.

Position description

Category

Mathematics, information, scientific, software

Contract

Internship

Job title

8-bit floating-point formats for deep learning H/F

Subject

The general goal of the proposed internship is to implement complete training of neural networks on diverse tasks using fp8 formats, and compare the results with 32-bit floating point (fp32), 16-bit floating-point (fp16), 8-bit fixed-point. If time allows, it may also encompass C++ implementation, energy measurements, cache miss/hit measurements, and/or implementation of other, more unusual numerical formats.

Contract duration (months)

6

Job description

By default, computations in a deep neural network are done with numbers represented in the 32-bit floating-point format (fp32). This format can represent a great variety of real-valued numbers but requires 4 bytes to store each number used, which can be a problem for memory-constrained environments such as embedded systems. 8-bit fixed-point (int8) is a common format for deep neural network inference [1], which enables great compression with little loss in accuracy [2]. But training a neural network in reduced precision is much less commonly done. When training, 8-bit fixed-point suffers from its relatively small dynamic range, which incurs significant degradation in accuracy.

To correct this flaw, some authors [3, 4] proposed to make all computations in the learning phase in 8-bit floating-point format (fp8). They claim that it yields networks with just the same performances as networks trained in full precision at various tasks (language modelling, image classification). Yet, despite these promises, no library is publicly available to perform deep learning in 8 bits.

During this internship, the intern will:

  • Produce a research bibliography on numerical formats for deep learning
  • Develop python deep learning modules simulating the behaviour of fp8
  • Run experiments on datasets and compare results with other numerical formats
  • (optional) Implement fp8 modules in C++
  • (optional) Measure energy consumption and cache miss/hit rate
  • (optional) Extend the previous work on other unusual numerical formats

What comes with the offer:

  • Office in Grenoble, France, a world-class nanotech hub, with high-level experts all around
  • A unique quality of life, with quick access to mountains: skiing, cycling, trailing, hiking, paragliding spots can be reached in less than 1hr by car
  • Subsidized lunch
  • Employee benefits : culture, sport events, free-of-charge music room, subsidized activities …

Start date is flexible: the internship may start during the second semester of the 2024-25 academic year.

 

References

[1] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, et Y. Bengio, « Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations ».

[2] S. Han, H. Mao, et W. J. Dally, « Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding », arXiv, arXiv:1510.00149, feb. 2016

[3] N. Wang, J. Choi, D. Brand, C.-Y. Chen, et K. Gopalakrishnan, « Training Deep Neural Networks with 8-bit Floating Point Numbers ».

[4] P. Micikevicius et al., « FP8 Formats for Deep Learning », 29 sept 2022, arXiv: arXiv:2209.05433

Methods / Means

Linux / Slurm / Python / C++ (optionnel)

Applicant Profile

The ideal candidate should:

  • Be enrolled in the final year of an engineering school or a university master’s degree with a strong focus on computer science;
  • Be comfortable with python and deep learning fundamentals;
  • Have experience with deep learning libraries in python, preferably PyTorch;
  • Experience with C++ is a plus;
  • Be curious and eager to solve complex problems;
  • Be fluent either in English or in French

In line with CEA's commitment to integrating people with disabilities, this job is open to all.

Position location

Site

Grenoble

Job location

France, Auvergne-Rhône-Alpes, Isère (38)

Location

  Grenoble

Candidate criteria

Languages

  • English (Intermediate)
  • French (Intermediate)

Prepared diploma

Bac+5 - Master 2

Recommended training

Engineering school / University (Computer Science / Applied Maths)

PhD opportunity

Oui

Requester

Position start date

09/01/2025

Apply now Apply later
  • Share this job via
  • 𝕏
  • or
Job stats:  1  0  0

Tags: Classification Computer Science Deep Learning Engineering HPC Industrial Linux Mathematics PhD Python PyTorch R Research Security

Perks/benefits: Career development Team events

Region: Europe
Country: France

More jobs like this