Co-op Researcher - Multimodal Large Language Model (MLLM) Serving Optimization

Vancouver, British Columbia, Canada

Full Time Mid-level / Intermediate USD 56K - 79K

Huawei Technologies Canada Co., Ltd.

Huawei is a leading global provider of information and communications technology (ICT) infrastructure and smart devices.

View all jobs at Huawei Technologies Canada Co., Ltd.

Apply now Apply later

Posted 1 month ago

Huawei Canada has an immediate co-op opening for a Researcher.

About the team:

The Intelligent Cloud Infrastructure Lab aims to innovate technologies, algorithms, systems, and platforms for next-generation cloud infrastructure. The lab addresses scalability, performance, and resource utilization challenges in existing cloud services while preparing for future challenges with appropriate technologies and architectures. Additionally, the lab aims to understand industry dynamics and technology trends to create a robust ecosystem.

About the job:

Design, implement, and optimize a high-performance serving platform for MLLMs.

Integrate SOTA open-source serving frameworks such as vLLM, sglang, or lmdeploy.

Develop techniques for efficient resource utilization and low-latency inference for MLLMs in serverless environments.
Optimize memory usage, scalability, and throughput of the serving platform.

Conduct experiments to evaluate and benchmark MLLM serving performance.
Contribute novel ideas to improve serving efficiency and publish findings when applicable.

Work with cross-functional teams, including researchers and engineers, to ensure seamless deployment and integration of the platform.
Provide technical guidance and support for platform users.

The base salary for this position ranges from $56,000 to $79,000 depending on education, experience and demonstrated expertise.

Requirements

About the ideal candidate:

Bachelor’s degree or higher in Computer Science, Electrical and Computer Engineering (ECE), or a related field.

Strong proficiency in PyTorch, Python and familiar with other programming languages as needed.
Experience with one or more SOTA LLM serving frameworks such as vLLM, sglang, or lmdeploy. Experience with inference optimization for large-scale AI models.
Familiarity with distributed systems, serverless architectures, and cloud computing platforms. Familiarity with multimodal architectures and serving requirements.