Principal Software Engineer, Kernel Development & Optimization

Belgrade, Belgrade, Serbia

Tenstorrent

Tenstorrent is a next-generation computing company that builds computers for AI. Headquartered in the U.S. with offices in Austin, Texas, and Silicon Valley, and global offices in Toronto, Belgrade, Seoul, Tokyo, and Bangalore, Tenstorrent...

View all jobs at Tenstorrent

Apply now Apply later

Tenstorrent is leading the industry on cutting-edge AI technology, revolutionizing performance expectations, ease of use, and cost efficiency. With AI redefining the computing paradigm, solutions must evolve to unify innovations in software models, compilers, platforms, networking, and semiconductors. Our diverse team of technologists have developed a high performance RISC-V CPU from scratch, and share a passion for AI and a deep desire to build the best AI platform possible. We value collaboration, curiosity, and a commitment to solving hard problems. We are growing our team and looking for contributors of all seniorities.

As part of our Kernel Development and Optimization team you will develop and optimize a set of machine learning operations (e.g., matrix multiplication, convolutions) contributing to an open source project. This includes writing GPU-style kernels for a range of TensTorrent AI hardware, host side code and developing parallelization strategies, with focus on performance. You will work closely with a team of highly skilled engineers driving technical discussions and providing guidance to ensure our software runs at peak efficiency and delivers high-quality results to our clients and users.

This role is onsite, based out of Belgrade, Serbia.

 

Responsibilities:

  • Software Development: Participate in the design, development, and maintenance of specific Tenstorrent software components (connected to the hardware platform) for our applications. Develop and optimize kernels and kernel libraries for efficient machine learning and HPC applications.
  • Special Program Optimization: Analyze and optimize low-level code to improve the performance and efficiency of our software, with a strong emphasis on tensor optimization.
  • Machine Learning Integration: Collaborate with machine learning engineers and data scientists to integrate optimized kernels and low-level routines into machine learning frameworks and pipelines.
  • Performance Profiling: Identify performance bottlenecks, conduct performance profiling, and develop strategies to address and resolve them.
  • Testing and Debugging: Write comprehensive unit tests, conduct thorough debugging, and ensure the stability and reliability of kernel-level code. Identify process and project issues and develop and lead sub-projects to implement relevant solutions.
  • Documentation: Create clear and concise documentation for code, APIs, and best practices to facilitate collaboration within the team.
  • Research and Innovation: Stay up-to-date with the latest developments in kernel development, tensor optimization, and machine learning to propose innovative solutions and improvements.
  • Product Software Engineering: Collaborate with the product managers on requirements for model’s implementation.Leadership: Drive team collaboration, functions as a technical lead for specific projects, help onboard new teammates, and offer mentorship to more junior colleagues depending on ongoing projects or company needs.

 

Experience & Qualifications:

  • Bachelor's or Master’s degree in Computer Science, Software Engineering, or a related field. Equivalent industry experience will also be considered.
  • Extensive experience designing and building performance-critical software systems, ideally in the ML or systems domain.
  • Deep understanding of machine learning frameworks, compiler architectures, and related optimization techniques.
  • Proven expertise in performance profiling and low-level optimization across diverse hardware platforms.
  • Strong programming skills in C/C++, with the ability to drive architectural decisions and mentor others on best practices.

 

Preferred Qualifications:

  • Demonstrated leadership or ownership in the development of GPU kernels or compiler backends, with a strong focus on low-level optimizations and tensor optimization.
  • Hands-on experience with GPU programming (CUDA, OpenCL, or similar), including an understanding of hardware architecture and memory hierarchy.

 

Tenstorrent offers a highly competitive compensation package and benefits, and we are an equal opportunity employer.

Due to U.S. Export Control laws and regulations, Tenstorrent is required to ensure compliance with licensing regulations when transferring technology to nationals of certain countries that have been licensing conditions set  by the U.S. government.

As this position will have direct and/or indirect access to information, systems, or technologies that are subject to U.S. Export Control laws and regulations, please note that citizenship/permanent residency, asylee and refugee information and supporting documentation will be required and considered as a condition of employment.

If a U.S. export license is required, employment will not begin until a license with acceptable conditions is granted by the U.S. government.  If a U.S. export license with acceptable conditions is not granted by the U.S. government, then the offer of employment will be rescinded.

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  1  0  0
Category: Engineering Jobs

Tags: APIs Architecture Computer Science CUDA Engineering GPU HPC Machine Learning Open Source Pipelines Research Testing

Perks/benefits: Career development Competitive pay

Region: Europe
Country: Serbia

More jobs like this