Staff Engineer, Server Inference

Belgrade, Belgrade, Serbia

⚠️ We'll shut down after Aug 1st - try foo🦍 for all jobs in tech ⚠️

Full Time Senior-level / Expert EUR 95K - 177K * ^est.

Tenstorrent

Tenstorrent is a next-generation computing company that builds computers for AI. Headquartered in the U.S. with offices in Austin, Texas, and Silicon Valley, and global offices in Toronto, Belgrade, Seoul, Tokyo, and Bangalore, Tenstorrent...

View all jobs at Tenstorrent

Apply now Apply later

Posted 5 hours ago

Tenstorrent is leading the industry on cutting-edge AI technology, revolutionizing performance expectations, ease of use, and cost efficiency. With AI redefining the computing paradigm, solutions must evolve to unify innovations in software models, compilers, platforms, networking, and semiconductors. Our diverse team of technologists have developed a high performance RISC-V CPU from scratch, and share a passion for AI and a deep desire to build the best AI platform possible. We value collaboration, curiosity, and a commitment to solving hard problems. We are growing our team and looking for contributors of all seniorities.

Join our Inference Server Technologies team, where we develop software that powers state-of-the-art AI inferencing on Tenstorrent’s cutting-edge hardware. Our team builds the layer that works on top of the Tenstorrent ML libraries - designing APIs, deploying workloads, and benchmarking end-to-end inference speed. You’ll help us shape how developers consume and scale model execution on Tenstorrent’s stack.

This role is hybrid based in Belgrade, Serbia.

We welcome candidates at various experience levels. During the interview process, candidates will be assessed for the appropriate level, and offers will align with that level, which may differ from the one in this posting.

Who You Are

An engineer who enjoys designing modern APIs and improving how ML models are deployed in production.
Curious about performance gains through techniques like batching, caching, and model parallelism.
Passionate about clean software architecture and effective abstraction layers.
Motivated to deliver backend systems that developers trust and rely on.

What We Need

Backend engineers who enjoy solving performance bottlenecks and scaling infrastructure.
Experience with web technologies, protocols, and system design.
Familiarity with Python, Docker, and Linux-based environments.
Strong coding practices and a clear ability to break down complex problems into high-quality, maintainable code.

What You Will Learn

How to optimize end-to-end ML inference on custom silicon.
Strategies for building scalable, reliable software interfaces for real-world AI applications.
How to shape the experience developers have when using Tenstorrent’s hardware for AI workloads.

Tenstorrent offers a highly competitive compensation package and benefits, and we are an equal opportunity employer.

Due to U.S. Export Control laws and regulations, Tenstorrent is required to ensure compliance with licensing regulations when transferring technology to nationals of certain countries that have been licensing conditions set by the U.S. government.

As this position will have direct and/or indirect access to information, systems, or technologies that are subject to U.S. Export Control laws and regulations, please note that citizenship/permanent residency, asylee and refugee information and supporting documentation will be required and considered as a condition of employment.

If a U.S. export license is required, employment will not begin until a license with acceptable conditions is granted by the U.S. government. If a U.S. export license with acceptable conditions is not granted by the U.S. government, then the offer of employment will be rescinded.

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats: 0 0 0

Categories: Engineering Jobs Leadership Jobs

Tags: APIs Architecture Docker Linux Machine Learning ML models Python