8-bit floating-point formats for deep learning H/F
Grenoble
Applications have closed
CEA
Le CEA est un acteur majeur de la recherche, au service de l'État, de l'économie et des citoyens. Il apporte des solutions concrètes à leurs besoins dans quatre domaines principaux : transition énergétique, transition numérique, technologies...General information
Organisation
The French Alternative Energies and Atomic Energy Commission (CEA) is a key player in research, development and innovation in four main areas :• defence and security,
• nuclear energy (fission and fusion),
• technological research for industry,
• fundamental research in the physical sciences and life sciences.
Drawing on its widely acknowledged expertise, and thanks to its 16000 technicians, engineers, researchers and staff, the CEA actively participates in collaborative projects with a large number of academic and industrial partners.
The CEA is established in ten centers spread throughout France
Reference
2024-32990Description de l'unité
LSTA laboratory (Advanced Technologies and Systems-on-chip Laboratory) works on the development of innovative chips for various application domains: Artificial Intelligence, High Performance Computing (HPC) and Quantum computing.
In this lab, the AI team works on designing chips to implement AI algorithms efficiently, and conversely, to design AI algorithms suited for specific hardware.
Position description
Category
Mathematics, information, scientific, software
Contract
Internship
Job title
8-bit floating-point formats for deep learning H/F
Subject
The general goal of the proposed internship is to implement complete training of neural networks on diverse tasks using fp8 formats, and compare the results with 32-bit floating point (fp32), 16-bit floating-point (fp16), 8-bit fixed-point. If time allows, it may also encompass C++ implementation, energy measurements, cache miss/hit measurements, and/or implementation of other, more unusual numerical formats.
Contract duration (months)
6
Job description
By default, computations in a deep neural network are done with numbers represented in the 32-bit floating-point format (fp32). This format can represent a great variety of real-valued numbers but requires 4 bytes to store each number used, which can be a problem for memory-constrained environments such as embedded systems. 8-bit fixed-point (int8) is a common format for deep neural network inference [1], which enables great compression with little loss in accuracy [2]. But training a neural network in reduced precision is much less commonly done. When training, 8-bit fixed-point suffers from its relatively small dynamic range, which incurs significant degradation in accuracy.
To correct this flaw, some authors [3, 4] proposed to make all computations in the learning phase in 8-bit floating-point format (fp8). They claim that it yields networks with just the same performances as networks trained in full precision at various tasks (language modelling, image classification). Yet, despite these promises, no library is publicly available to perform deep learning in 8 bits.
During this internship, the intern will:
- Produce a research bibliography on numerical formats for deep learning
- Develop python deep learning modules simulating the behaviour of fp8
- Run experiments on datasets and compare results with other numerical formats
- (optional) Implement fp8 modules in C++
- (optional) Measure energy consumption and cache miss/hit rate
- (optional) Extend the previous work on other unusual numerical formats
What comes with the offer:
- Office in Grenoble, France, a world-class nanotech hub, with high-level experts all around
- A unique quality of life, with quick access to mountains: skiing, cycling, trailing, hiking, paragliding spots can be reached in less than 1hr by car
- Subsidized lunch
- Employee benefits : culture, sport events, free-of-charge music room, subsidized activities …
Start date is flexible: the internship may start during the second semester of the 2024-25 academic year.
References
[1] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, et Y. Bengio, « Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations ».
[2] S. Han, H. Mao, et W. J. Dally, « Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding », arXiv, arXiv:1510.00149, feb. 2016
[3] N. Wang, J. Choi, D. Brand, C.-Y. Chen, et K. Gopalakrishnan, « Training Deep Neural Networks with 8-bit Floating Point Numbers ».
[4] P. Micikevicius et al., « FP8 Formats for Deep Learning », 29 sept 2022, arXiv: arXiv:2209.05433
Methods / Means
Linux / Slurm / Python / C++ (optionnel)
Applicant Profile
The ideal candidate should:
- Be enrolled in the final year of an engineering school or a university master’s degree with a strong focus on computer science;
- Be comfortable with python and deep learning fundamentals;
- Have experience with deep learning libraries in python, preferably PyTorch;
- Experience with C++ is a plus;
- Be curious and eager to solve complex problems;
- Be fluent either in English or in French
In line with CEA's commitment to integrating people with disabilities, this job is open to all.
Position location
Site
Grenoble
Job location
France, Auvergne-Rhône-Alpes, Isère (38)
Location
Grenoble
Candidate criteria
Languages
- English (Intermediate)
- French (Intermediate)
Prepared diploma
Bac+5 - Master 2
Recommended training
Engineering school / University (Computer Science / Applied Maths)
PhD opportunity
Oui
Requester
Position start date
09/01/2025
Tags: Classification Computer Science Deep Learning Engineering HPC Industrial Linux Mathematics PhD Python PyTorch R Research Security
Perks/benefits: Career development Team events
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.