Principal Engineer - AI Inference Performance
Waterloo, Ontario, Canada
Huawei Technologies Canada Co., Ltd.
Huawei is a leading global provider of information and communications technology (ICT) infrastructure and smart devices.
Our team has an immediate permanent opening for a Principal Engineer.
Responsibilities:
- Develop and maintain real-time and historical performance monitoring tools for AI inference workloads, including profiling tools for various AI model types (small models, LLMs, VLMs, and multimodal systems) in applications like conversational AI, video processing, and real-time analytics.
- Analyze and classify inference workloads based on characteristics like profile, decode, pre/post-processing overheads, and computational complexity to develop tailored optimization strategies.
- Develop performance models that consider the systematic factors of AI inference, including model size, architecture (e.g., transformers, CNNs), application-specific constraints (e.g., latency for conversational AI), and compute resource characteristics (GPU, TPU, CPU, and specialized accelerators).
- Optimize inference workloads across various hardware resources by reducing latency, minimizing memory overhead, and improving throughput. Techniques include quantization, pruning, fusion, and caching. Ensure that models can scale efficiently across diverse compute platforms, from edge devices to large-scale cloud infrastructures.
- Lead efforts in creating benchmarks for different types of inference tasks. Utilize tools such as NVIDIA Nsight, PyTorch Profiler, and TensorBoard to gain insights into inference performance across diverse hardware platforms.
- Conduct benchmarking and performance comparisons across various hardware platforms (e.g., GPUs, TPUs, edge accelerators) to identify bottlenecks and optimization opportunities. Provide recommendations for software and hardware improvements based on inference throughput, latency, and power consumption.
- Work closely with AI research, software engineering, and DevOps teams to improve the end-to-end AI inference pipeline, ensuring optimized deployments across different production environments. Collaborate with system architects to incorporate resource-aware optimizations into design practices.
- Develop strategies to ensure the scalability of inference workloads in production environments, considering both model performance and resource scaling, whether in on-premises environments, cloud infrastructure, or edge computing devices.
Requirements
What you’ll bring to the team:
- Ph.D. or Master’s degree in Computer Science, Electrical Engineering, Machine Learning, or related field.
- Minimum 5+ years of experience in AI/ML engineering with a focus on inference performance, workload analysis, and system optimization.
- Extensive experience with AI frameworks (e.g., TensorFlow, PyTorch, ONNX) and model optimizationtechniques (e.g., quantization, pruning, kernel fusion, and hardware-aware tuning).
- Proficient with profiling tools (e.g., TensorBoard, PyTorch Profiler, NVIDIA Nsight) and workload analysis for diverse AI models and applications.
- Expertise in optimizing small models, large language models (LLMs), VLMs, and multimodal models for inference.
- Strong programming skills in Python, C++, CUDA, and experience with low-level hardware performance tuning.
- Familiarity with performance modeling methodologies and frameworks for predicting inference workload performance under varying conditions.
- Proven expertise in data parallelism, model parallelism, pipeline parallelism, and other distributed systems for performance improvements at scale.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Job stats:
0
0
0
Categories:
Deep Learning Jobs
Engineering Jobs
Tags: Architecture Computer Science Conversational AI CUDA DevOps Distributed Systems Engineering GPU LLMs Machine Learning ONNX Python PyTorch Research TensorFlow Transformers
Region:
North America
Country:
Canada
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.
Principal Data Scientist jobsBI Developer jobsStaff Data Scientist jobsPrincipal Data Engineer jobsData Scientist II jobsData Manager jobsJunior Data Analyst jobsData Science Manager jobsResearch Scientist jobsBusiness Data Analyst jobsLead Data Analyst jobsSenior AI Engineer jobsSr. Data Scientist jobsData Engineer III jobsData Science Intern jobsData Specialist jobsJunior Data Engineer jobsSenior Data Scientist, Performance Marketing jobsBI Analyst jobsSoftware Engineer, Machine Learning jobsSr Data Engineer jobsData Analyst Intern jobsData Analyst II jobsSenior Artificial Intelligence/Machine Learning Engineer - Remote, Latin America jobsJunior Data Scientist jobs
Snowflake jobsEconomics jobsLinux jobsHadoop jobsOpen Source jobsJavaScript jobsPhysics jobsComputer Vision jobsAirflow jobsKafka jobsMLOps jobsRDBMS jobsBanking jobsData Warehousing jobsNoSQL jobsScala jobsGoogle Cloud jobsData warehouse jobsKPIs jobsR&D jobsPostgreSQL jobsOracle jobsGitHub jobsSAS jobsCX jobs
Classification jobsStreaming jobsTerraform jobsScikit-learn jobsLooker jobsScrum jobsDistributed Systems jobsPandas jobsData Mining jobsBigQuery jobsPySpark jobsRobotics jobsJenkins jobsJira jobsIndustrial jobsRedshift jobsdbt jobsReact jobsUnstructured data jobsMicroservices jobsMySQL jobsData strategy jobsE-commerce jobsGPU jobsNumPy jobs