Infrastructure Developer, Kernel Development
Toronto, Ontario, Canada
Tenstorrent
Tenstorrent is a next-generation computing company that builds computers for AI. Headquartered in the U.S. with offices in Austin, Texas, and Silicon Valley, and global offices in Toronto, Belgrade, Seoul, Tokyo, and Bangalore, Tenstorrent...Tenstorrent is leading the industry on cutting-edge AI technology, revolutionizing performance expectations, ease of use, and cost efficiency. With AI redefining the computing paradigm, solutions must evolve to unify innovations in software models, compilers, platforms, networking, and semiconductors. Our diverse team of technologists have developed a high performance RISC-V CPU from scratch, and share a passion for AI and a deep desire to build the best AI platform possible. We value collaboration, curiosity, and a commitment to solving hard problems. We are growing our team and looking for contributors of all seniorities.
This role involves defining, developing, and managing the software infrastructure for implementing and testing acceleration kernels and ML models on Tenstorrent hardware. You will play a crucial role in developing automation infrastructure and standalone tools to analyze test data, automate quality checks and develop insights to maximize the performance of our ML accelerators.
The focus is on developing automation infrastructure for functional and performance testing, and developing tools to gain insights into ways to improve results. The ideal candidate will be passionate about improving engineering productivity, CI/DI workflows, software performance measurement, Python coding, and developing automation infrastructure using local and cloud technologies.
This role is hybrid, based out of Toronto, ON.
We welcome candidates at various experience levels for this role. During the interview process, candidates will be assessed for the appropriate level, and offers will align with that level, which may differ from the one in this posting.
Responsibilities:
- Build tools to automate running ML acceleration kernel and model tests, measure performance and accuracy metrics, create dashboards for status and triaging, and automate documentation and packaging for customer releases.
- Develop and maintain our test management system, using Python and cloud tools, to provide an easy to use developer experience.
- Work with our Devops team to integrate our testing needs into CI/DI pipelines, using in-house flows and github workflow infrastructure.
- Develop tools to analyze performance data and ensure performance is always improving.
- Conduct experiments, alongside other developers, to gain insights into how to improve quality, productivity and performance.
- Identify and analyze quality regressions, and work with developers to resolve them.
- Collaborate closely with developers in our team and in other teams to improve our test system, expand it to meet company goals, and communicate key test results to developers and senior management.
Experience & Qualifications:
- BSc or a more advanced degree in Computer Engineering, Computer Science, Software Engineering, Electronics or a related field.
- 2+ years of experience in software development, software test engineering, software infrastructure engineering, release engineering, or related roles.
- Experience with Python software development and Linux scripting.
- Experience with databases, cloud infrastructure, and visualization tools such as Superset.
- Familiarity with implementing machine learning models and/or acceleration on parallel systems such as GPUs is a plus.
- Experience with C++ is beneficial.
- Experience with docker and GitHub pipelines specifically is a plus.
- Excellent communication and collaboration skills, the ability to work effectively across cross-functional teams and interact with customers.
- Proven track record of driving continuous improvement and delivering results in a fast-paced, dynamic environment.
Tenstorrent offers a highly competitive compensation package and benefits, and we are an equal opportunity employer.
Due to U.S. Export Control laws and regulations, Tenstorrent is required to ensure compliance with licensing regulations when transferring technology to nationals of certain countries that have been licensing conditions set by the U.S. government.
As this position will have direct and/or indirect access to information, systems, or technologies that are subject to U.S. Export Control laws and regulations, please note that citizenship/permanent residency, asylee and refugee information and supporting documentation will be required and considered as a condition of employment.
If a U.S. export license is required, employment will not begin until a license with acceptable conditions is granted by the U.S. government. If a U.S. export license with acceptable conditions is not granted by the U.S. government, then the offer of employment will be rescinded.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Computer Science DevOps Docker Engineering GitHub Linux Machine Learning ML models Pipelines Python Superset Testing
Perks/benefits: Career development Competitive pay
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.