ML Appliance Architect
Tel Aviv-Yafo, Tel Aviv District, IL
NeuReality
NeuReality is the first complete, system-level solution designed to address the challenges of optimizing, deploying, managing, and scaling AI workflows.Description
We are seeking a highly skilled Hardware-Software Researcher to drive performance evaluation and optimization of Machine Learning (ML) algorithms on GPUs and specialized hardware accelerators in server environments. In this role, you will analyze algorithmic performance bottlenecks, suggest architectural improvements, and propose innovative solutions to maximize hardware utilization and efficiency.
You’ll collaborate closely with ML researchers, hardware engineers, and software developers to bridge the gap between algorithmic design and hardware execution, ensuring optimal performance across diverse workloads.
Key Responsibilities:
- Analyze and benchmark the performance of ML algorithms running on GPUs and other accelerators.
- Identify hardware and software bottlenecks affecting latency, throughput, and efficiency.
- Develop performance models to predict system behavior under varying workloads.
- Propose software and hardware optimizations to improve performance.
- Suggest algorithmic changes or re-architectures to better align with hardware capabilities.
- Implement proof-of-concept optimizations and evaluate their impact.
- Provide insights into how future hardware capabilities can address current limitations.
- Work closely with ML engineers, hardware engineers, and software teams to align performance goals.
- Stay updated on the latest advancements in GPUs, accelerator hardware, and ML frameworks.
- Propose innovative solutions to address emerging performance challenges.
- Prepare performance reports and present results to stakeholders and technical teams.
Requirements
- Education: Bachelor’s degree in computer science, Electrical Engineering, Computer Engineering, or a related field. (Master’s or PhD is an advantage)
- Experience:
- 3+ years of experience in performance analysis and optimization for ML workloads.
- Hands-on experience with GPUs (e.g., NVIDIA CUDA, cuDNN) or other hardware accelerators.
- Technical Skills:
- Proficiency in C++, Python, and parallel programming models (e.g., CUDA, OpenCL).
- Experience with ML frameworks like TensorFlow, PyTorch, or ONNX.
- Strong understanding of hardware architectures, including memory hierarchies, pipelining, and compute engines.
- Analytical Skills:
- Proven ability to identify and resolve performance bottlenecks at the intersection of hardware and software.
- Experience with performance profiling tools (e.g., NVIDIA Nsight, VTune, perf).
- Soft Skills:
- Strong problem-solving and analytical thinking.
- Excellent communication and technical documentation skills.
- Ability to work in a collaborative, multidisciplinary environment.
Preferred Qualifications:
- Experience with cloud GPU/accelerator environments (e.g., AWS, Azure, GCP).
- Familiarity with low-level hardware design.
- Background in designing scalable systems for large ML workloads.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Architecture AWS Azure Computer Science CUDA cuDNN Engineering GCP GPU Machine Learning ONNX PhD Python PyTorch TensorFlow
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.