Senior Systems Software Manager - TAO Build, Automation and Release
US, CA, Santa Clara
NVIDIA
NVIDIA erfindet den Grafikprozessor und fördert Fortschritte in den Bereichen KI, HPC, Gaming, kreatives Design, autonome Fahrzeuge und Robotik.NVIDIA is hiring a Senior Systems Software Manager for Build, Automation, Release, Optimizations to join the TAO Toolkit Deep Learning Architectures team. The toolkit encompasses scalable and easy-to-use modules for training, fine-tuning, and optimization for Computer Vision and Multi-Modal AI, to help advance the state of the art while improving performance. If you have a passion for pioneering technologies and a commitment to developing scalable, optimized, and ethical AI, we invite you to join our strong team at NVIDIA.
In this role, you will lead and supervise the development, implementation, and optimization of continuous integration, continuous deployment (CI/CD) pipelines, and release management processes. This role is critical in ensuring the efficient and reliable delivery of software solutions. The ideal candidate will bring a deep understanding of modern DevOps practices, including automation, orchestration and infrastructure along with leadership experience to drive a high-performing engineering team.
What you’ll be doing:
Lead a team of developers to improve CI/CD tools integration/operations, and full automation of CI/testing
Lead efforts to resolve production issues and implement necessary integrations.
Lead the ongoing design, implementation, and preservation of systems and tools across the toolkit stack.
Design, implement, and manage cloud infrastructure for continuous integration, delivery, and deployment.
Partner with a multi-functional team including engineering, product, QA to improve development workflows, reduce bottlenecks, handle and minimize risks, and enhance software delivery speed and quality.
Lead the development of robust processes to write and maintain documentation infrastructure.
Communicate effectively with technical and non-technical partners to set shared expectations and ensure visibility around the release and deployment process.
Collaborate with diverse software, research, and hardware teams across geographies to analyze the interplay of hardware and software architectures to solve critical problems and future applications
What we need to see:
Bachelor’s/Master’s degree or equivalent experience in Computer Science, Information Systems, Engineering, or other related fields
8+ overall years of proven experience in software engineering, DevOps, or release management, with at least 3 years of leadership experience or managerial role.
Proven experience with automation and orchestration tools including Jenkins, Bazel, Gitlab, Docker, Kubernetes.
Strong expertise in cloud platforms like AWS, Azure, GCP, or others.
Proven experience in developing production-quality software pipelines for AI, computer vision or multi-modal algorithms, especially with LLMs and Multi-Modal Foundation models.
Expertise in release management, version control systems and configuration management.
Strong programming skills in Python and/or C++, and Experience developing integrated AI solutions.
Proven track record to lead projects, manage timelines, and deliver results in an Agile/Scrum environment.
Strong analytical and problem-solving skills with a focus on practical and scalable AI solutions.
Strong interpersonal skills and ability to work in a collaborative environment.
Ways to stand out from the crowd:
Knowledge of tools like Ansible, Terraform, and Puppet for automating repetitive tasks and infrastructure provisioning
Proven experience in automating the building and deploying of software around AI infrastructure.
Experience with security practices and trustworthy AI
Background with NVIDIA SDKs such as TensorRT, RAPIDS, CUDA, and CUDNN
NVIDIA is widely considered to be one of the technology industry's most desirable employers. We have some of the most forward-thinking and hard-working people working with us and our engineering teams. If you're a creative engineer with a real passion for building scalable and robust infrastructure, we want to hear from you.
The base salary range is 272,000 USD - 419,750 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.Tags: Agile Ansible Architecture AWS Azure Bazel CI/CD Computer Science Computer Vision CUDA cuDNN Deep Learning DevOps Docker Engineering GCP GitLab Jenkins Kubernetes LLMs ML infrastructure Pipelines Puppet Python Research Scrum Security TensorRT Terraform Testing
Perks/benefits: Career development Equity / stock options
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.