Research Engineer - AI Computing System
Vancouver, British Columbia, Canada
Huawei Technologies Canada Co., Ltd.
Huawei is a leading global provider of information and communications technology (ICT) infrastructure and smart devices.Huawei Canada has an immediate permanent opening for an Engineer.
About the team:
The Advanced Computing and Storage Lab, currently a part of the Vancouver Research Centre, aims to explore adaptive computing system architectures to address the challenges posed by flexible and variable application loads in the future. It assists in ensuring the stability and quality of training clusters, constructs dynamic cluster configuration strategy solvers, and establishes precision control systems to create stable and efficient computing power clusters. One of the lab's goals is to focus on key industry AI application scenarios such as large model training/inference, based on key technologies like low-precision training, multi-modal training, and reinforcement learning, responsible for bottleneck analysis and the design and development of optimization solutions, thereby improving training and inference performance as well as usability.
About the job:
Aiming at key industry AI application scenarios such as large model training/inference, based on key technologies such as low-precision training, parallel strategy tuning, and training resource tuning, be responsible for the bottleneck analysis of the AI software system on the Ascend platform and the design and development of optimization solutions to improve training, inference performance, and ease of use.
Responsible for the design and development of optimization solutions for AI training/inference systems. Combined with the requirements of AI algorithms for the system, through architectural optimization in computing, IO, scheduling, etc., build large-model AI training frameworks, operator libraries, acceleration libraries and other software frameworks and acceleration features to provide a foundation for the next generation of architectural innovation.
Grasp the latest research progress and technological trends in the fields of AI computing cluster architecture design, training acceleration, and inference acceleration in the industry and academia, and continuously improve the competitiveness of AI computing cluster systems.
The base salary for this position ranges from $100,000 to $170,000 depending on education, experience and demonstrated expertise
Requirements
About the ideal candidate:
Master/PhD degree in Computer Science, Computer Engineering majors in artificial intelligence, computer science, software, automation, electronics, communications, robotics, etc.
Familiar with the common model structures of large models such as Deepseek and Llama, and have basic technical accumulation in large model training and inference optimization in the fields of LLM, MoE, multimodality, etc.
Familiar with the hardware architecture and programming system of AI accelerators such as GPU/NPU, and have experience in optimizing AI systems with coordinated software and hardware cores.
4. Those with any of the following experience are preferred:
1) Solid programming foundation, familiar with Python/C/C++ programming languages, good architecture design and programming habits;
2) Ability to work independently and solve problems, good at communication, willing to cooperate, keen on new technologies, good at summarizing and sharing, and like hands-on practice;
3) Experience in the development of AI training frameworks and AI reasoning engines, or algorithm hardware and related experience is preferred;
4) Strong research capabilities in new technologies and new architectures, can quickly track and gain insights into the most cutting-edge AI technologies in the industry, and lead the continuous leadership of system architecture innovation.
Tags: Architecture Computer Science Engineering GPU LLaMA LLMs Model training PhD Python Reinforcement Learning Research Robotics
Perks/benefits: Career development
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.