Principal Software Infrastructure Lead
Belgrade, Belgrade, Serbia
Tenstorrent
Tenstorrent is a next-generation computing company that builds computers for AI. Headquartered in the U.S. with offices in Austin, Texas, and Silicon Valley, and global offices in Toronto, Belgrade, Seoul, Tokyo, and Bangalore, Tenstorrent...Tenstorrent is leading the industry on cutting-edge AI technology, revolutionizing performance expectations, ease of use, and cost efficiency. With AI redefining the computing paradigm, solutions must evolve to unify innovations in software models, compilers, platforms, networking, and semiconductors. Our diverse team of technologists have developed a high performance RISC-V CPU from scratch, and share a passion for AI and a deep desire to build the best AI platform possible. We value collaboration, curiosity, and a commitment to solving hard problems. We are growing our team and looking for contributors of all seniorities.
Tenstorrent is leading the industry on cutting-edge AI technology, revolutionizing performance expectations, ease of use, and cost efficiency. With AI redefining the computing paradigm, solutions must evolve to unify innovations in software models, compilers, platforms, networking, and semiconductors. Our diverse team of technologists have developed a high performance RISC-V CPU from scratch, and share a passion for AI and a deep desire to build the best AI platform possible. We value collaboration, curiosity, and a commitment to solving hard problems. We are growing our team and looking for contributors of all seniorities.
In this role, you will define, develop, and own the software infrastructure that supports implementing, testing, and releasing ML models on Tenstorrent hardware. Your work will focus on integrating state-of-the-art models, i.e. their implementations on existing TT hardware and their tests, with CI/CD systems, to ensure productivity for engineers and quality for customers. This role is ideal for candidates passionate about enabling customer success, improving engineering productivity, coding in Python, and contributing to open-source software.
This role is hybrid, based out of Belgrade, Serbia.
Responsibilities:
- Define and implement highly available and reliable software infrastructure that supports running implemented ML models on Tenstorrent hardware, both end-to-end models and their parts as individual tests.
- Optimize CI/CD workflows for seamless integration, testing, and reliability. Streamline processes to enhance release and developer efficiency.
- Build tools to automate running ML model tests, measure performance and accuracy metrics, create dashboards for status and triaging, and automate model packaging for customer releases.
- Ensure enough capacity needed to run the fleet of ML models and tests within a desired timeframe, in collaboration with the hardware and CI infrastructure engineers.
- Collaborate closely with internal teams and occasionally with customer engineers, with the infrastructure initially serving internal teams before becoming a product for external customers.
Experience & Qualifications:
- BSc or a more advanced degree in Computer Engineering, Computer Science, Software Engineering, Electronics or a related field.
- 5+ years of experience in software infrastructure engineering, release engineering, software development, or related roles. Experience with docker and GitHub pipelines specifically is a plus.
- Strong understanding of release processes, including CI/CD pipelines, versioning strategies, and release branching models.
- Experience with Python software development and Linux scripting.
- Excellent communication and collaboration skills, with the ability to work effectively across cross-functional teams and interact with customers.
- Proven track record of driving continuous improvement and delivering results in a fast-paced, dynamic environment.
Tenstorrent offers a highly competitive compensation package and benefits, and we are an equal opportunity employer.
Due to U.S. Export Control laws and regulations, Tenstorrent is required to ensure compliance with licensing regulations when transferring technology to nationals of certain countries that have been licensing conditions set by the U.S. government.
As this position will have direct and/or indirect access to information, systems, or technologies that are subject to U.S. Export Control laws and regulations, please note that citizenship/permanent residency, asylee and refugee information and supporting documentation will be required and considered as a condition of employment.
If a U.S. export license is required, employment will not begin until a license with acceptable conditions is granted by the U.S. government. If a U.S. export license with acceptable conditions is not granted by the U.S. government, then the offer of employment will be rescinded.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: CI/CD Computer Science Docker Engineering GitHub Linux Machine Learning ML models Open Source Pipelines Python Testing
Perks/benefits: Competitive pay
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.