Senior Deep Learning Performance Engineer

Poland, Remote

NVIDIA

NVIDIA erfindet den Grafikprozessor und fördert Fortschritte in den Bereichen KI, HPC, Gaming, kreatives Design, autonome Fahrzeuge und Robotik.

View all jobs at NVIDIA

Apply now Apply later

We are seeking senior engineers with a passion for performance analysis and optimization to join our team in advancing ground breaking technologies for deep learning compilers and automated kernel generation. At NVIDIA, you will collaborate across the full hardware/software stack—from GPU architecture to deep learning frameworks—to push the boundaries of AI performance. This role provides an outstanding opportunity to craft both hardware and software roadmaps at a company that is at the forefront of the AI revolution. You will work alongside world-class engineers to implement innovative deep learning models and optimize end-to-end performance for NVIDIA’s DL software and hardware ecosystem. You'll have the chance to work on powerful, enterprise-grade GPU clusters delivering hundreds of PetaFLOPS, and gain access to unreleased hardware that is shaping the future of AI.

What you’ll be doing:

  • Profile, analyze, and optimize the performance of deep learning models and workloads on ground breaking hardware and software platforms.

  • Develop tooling for profiling and microbenchmarking of DL workloads running compiled models uncovering optimization opportunities.

  • Collaborate with teams across NVIDIA to provide performance insights and recommendations that improve the design and efficiency of DL frameworks and workloads.

  • Own the development and implementation of standard methodologies for compiling, testing, and deploying high-performance deep learning models.

  • Conduct performance benchmarking on enterprise-grade GPU clusters and pre-release hardware, driving improvements to NVIDIA’s DL software stack and hardware roadmap.

What we need to see:

  • 5+ years of experience in deep learning model implementation, software development, and performance optimization.

  • BSc, MS, or PhD in Computer Science, Computer Engineering, Electrical Engineering, Mathematics, Physics, or a related technical field, or equivalent practical experience.

  • Proficiency in Python, with extensive hands-on experience using at least one major deep learning framework (e.g., PyTorch, TensorFlow, JAX).

  • Strong problem-solving and analytical skills, with a proven track record in debugging, performance tuning, and workload optimization.

  • Experience with deep learning compilers (e.g., PyTorch’s torch.compile, XLA, or other similar technologies)

Ways to stand out from the crowd:

  • Experience with running large-scale workloads in HPC clusters

  • Knowledge and passion for DevOps/MLOps practices for Deep Learning-based product’s development.

  • Solid understanding of Linux environments and containerization technologies such as Docker

  • Familiarity with GPU programming or parallel computing.

NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most hard-working and forward-thinking people in the world working for us. If you're creative and autonomous, we want to hear from you! We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

#deeplearning
Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  1  0  0

Tags: Architecture Computer Science Deep Learning DevOps Docker Engineering GPU HPC JAX Linux Mathematics MLOps PhD Physics Python PyTorch TensorFlow Testing

Perks/benefits: Career development

Regions: Remote/Anywhere Europe
Country: Poland

More jobs like this