Research Engineer - MLLM Serving Optimization

Vancouver, British Columbia, Canada

Full Time Entry-level / Junior CAD 128K - 237K *

Huawei Technologies Canada Co., Ltd.

Huawei is a leading global provider of information and communications technology (ICT) infrastructure and smart devices.

View all jobs at Huawei Technologies Canada Co., Ltd.

Apply now Apply later

Posted 1 month ago

Huawei Canada has an immediate permanent opening for a researcher.

About the Team:

The Big Data and Intelligence Platform lab is focused on advancing core AI technologies for the Cloud, utilizing large language models (LLMs) to tackle complex real-world challenges across various sectors. Composed of researchers with advanced degrees from top Canadian universities, this lab specializes in integrating LLMs for operations research, analytical databases, and data systems, optimizing efficiency within LLM architectures. This lab prioritizes responsible AI practices, including data watermarking and federated learning. Committed to academic excellence, findings are published in leading conferences, shaping the future of AI technology and contributing to the scientific community.

About the Job:

Design, implement, and optimize a high-performance serving platform for MLLMs.

Integrate SOTA open-source serving frameworks such as vLLM, sglang, or lmdeploy.

Develop techniques for efficient resource utilization and low-latency inference for MLLMs in serverless environments.

Optimize memory usage, scalability, and throughput of the serving platform.

Conduct experiments to evaluate and benchmark MLLM serving performance..

Contribute novel ideas to improve serving efficiency and publish findings when applicable.

Requirements

What you’ll bring to the team:

Bachelor’s degree or higher in Computer Science, Electrical and Computer Engineering (ECE), or a related field.

Experience with one or more SOTA LLM serving frameworks such as vLLM, sglang, or lmdeploy.

Strong proficiency in PyTorch.

Familiarity with distributed systems, serverless architectures, and cloud computing platforms.

Experience with inference optimization for large-scale AI models.

Familiarity with multimodal architectures and serving requirements.

Previous experience in deploying AI platforms on cloud services.

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats: 3 0 0

Categories: Engineering Jobs Machine Learning Jobs Research Jobs

Tags: Architecture Big Data Computer Science Distributed Systems Engineering LLMs Open Source PyTorch Research Responsible AI vLLM

Perks/benefits: Conferences

Region: North America

Country: Canada

More jobs like this

« Back to job search To the top ↑

Explore more career opportunities

Find even more open roles below ordered by popularity of job title or skills/products/technologies used.