ML Appliance Architect

Tel Aviv-Yafo, Tel Aviv District, IL

Full Time Senior-level / Expert USD 87K - 163K *

NeuReality

NeuReality is the first complete, system-level solution designed to address the challenges of optimizing, deploying, managing, and scaling AI workflows.

View all jobs at NeuReality

Apply now Apply later

Posted 1 day ago

Description

We are seeking a highly skilled Hardware-Software Researcher to drive performance evaluation and optimization of Machine Learning (ML) algorithms on GPUs and specialized hardware accelerators in server environments. In this role, you will analyze algorithmic performance bottlenecks, suggest architectural improvements, and propose innovative solutions to maximize hardware utilization and efficiency.

You’ll collaborate closely with ML researchers, hardware engineers, and software developers to bridge the gap between algorithmic design and hardware execution, ensuring optimal performance across diverse workloads.

Key Responsibilities:

Analyze and benchmark the performance of ML algorithms running on GPUs and other accelerators.
Identify hardware and software bottlenecks affecting latency, throughput, and efficiency.
Develop performance models to predict system behavior under varying workloads.
Propose software and hardware optimizations to improve performance.
Suggest algorithmic changes or re-architectures to better align with hardware capabilities.
Implement proof-of-concept optimizations and evaluate their impact.
Provide insights into how future hardware capabilities can address current limitations.
Work closely with ML engineers, hardware engineers, and software teams to align performance goals.
Stay updated on the latest advancements in GPUs, accelerator hardware, and ML frameworks.
Propose innovative solutions to address emerging performance challenges.
Prepare performance reports and present results to stakeholders and technical teams.

Requirements

Education: Bachelor’s degree in computer science, Electrical Engineering, Computer Engineering, or a related field. (Master’s or PhD is an advantage)
Experience:
3+ years of experience in performance analysis and optimization for ML workloads.
Hands-on experience with GPUs (e.g., NVIDIA CUDA, cuDNN) or other hardware accelerators.
Technical Skills:
Proficiency in C++, Python, and parallel programming models (e.g., CUDA, OpenCL).
Experience with ML frameworks like TensorFlow, PyTorch, or ONNX.
Strong understanding of hardware architectures, including memory hierarchies, pipelining, and compute engines.
Analytical Skills:
Proven ability to identify and resolve performance bottlenecks at the intersection of hardware and software.
Experience with performance profiling tools (e.g., NVIDIA Nsight, VTune, perf).
Soft Skills:
Strong problem-solving and analytical thinking.
Excellent communication and technical documentation skills.
Ability to work in a collaborative, multidisciplinary environment.