Senior Cloud Test Developer Architect
US, CA, Santa Clara, United States
NVIDIA
NVIDIA erfindet den Grafikprozessor und fördert Fortschritte in den Bereichen KI, HPC, Gaming, kreatives Design, autonome Fahrzeuge und Robotik.We are in search of a highly skilled Senior Test Developer Architect to join our dynamic Enterprise Software QA team. This role presents an outstanding opportunity to craft the design, optimization, and testing of large-scale cloud infrastructure for foundational NVIDIA Unified Cloud Services and Data Center offerings. Seeking cloud infrastructure expert with expertise in distributed systems, test automation, cloud architectures, for a dynamic role.
What You’ll Be Doing:
Leverage AI-powered testing tools to improve test automation, increase coverage, and accelerate testing cycles for cloud-based infrastructure.
Collaborate with product engineering teams to deeply understand cloud service architectures and provide mentorship to SWQA teams on testing cloud-native applications at scale.
Craft and develop end-to-end test strategies for validating cloud infrastructure, including compute, storage, networking, security, and orchestration layers.
Lead NVIDIA Cloud bring-up activities from a software quality assurance perspective, ensuring scalability, reliability, and performance.
Architect and implement cloud-native test automation frameworks to validate multi-cloud (AWS, Azure, Google Cloud) and hybrid-cloud environments.
Develop scalable and resilient infrastructure automation by using Infrastructure as Code (IaC), Configuration Management, and optimization techniques.
Improve observability and monitoring through AI-powered anomaly detection, predictive analytics, and intelligent alerting.
Ensure resilience and failover testing of cloud-based microservices and distributed architectures.
Collaborate with internal teams and cloud service partners to ensure alignment with industry standard methodologies and real-world use cases.
What We Need to See:
Master’s or Ph.D. in Computer Science, Cloud Computing, or a related field, or equivalent experience.
4+ years of hands-on experience in cloud-native cluster management, including Docker, Slurm, Kubernetes, OpenShift, and Ansible.
8+ years of experience working with cloud infrastructure platforms like AWS, Azure, and Google Cloud, with deep expertise in multi-cloud and hybrid-cloud architectures.
Strong hands-on experience with Cloud Networking (VPCs, Load Balancers, Service Mesh, API Gateways) and Storage Technologies (EBS, S3, Azure Blob, GFS).
Advanced proficiency in Infrastructure as Code (IaC) and Configuration Management tools (e.g., Terraform, CloudFormation, Pulumi, Ansible).
Deep expertise in Kubernetes administration, service mesh technologies (Istio, Linkerd), and container security.
Proficiency in Python, Go, or Java for cloud automation, testing frameworks, and infrastructure scripting.
Expertise in CI/CD pipelines using GitOps models, GitLab, Jenkins, ArgoCD, and Spinnaker for automated cloud deployments.
Hands-on experience with cloud observability and monitoring tools (Prometheus, Grafana, CloudWatch, Thanos, Datadog, New Relic).
Strong cloud security knowledge, including Kubernetes security, IAM policies, encryption, and vulnerability management.
Proven track record to debug complex cloud infrastructure issues, involving DNS, HTTP, Linux, cloud networking, and containers.
Ways to Stand Out from the Crowd:
A true innovator who isn't afraid to challenge the status quo and bring fresh ideas to the table. You're always looking for ways to improve existing systems and processes. Passion and curiosity about the latest technologies and trends in cloud infrastructure and distributed systems. You're not just familiar with the tools, but you understand the underlying principles and can demonstrate this knowledge to make strategic decisions. Committed to personal and professional growth. You're crafting opportunities to learn new skills and deepen your expertise.
Deep expertise in bringing to bear cloud testing powered by AI, demonstrating machine learning for predictive failure analysis, anomaly detection, and self-healing infrastructure.
Strong knowledge of Kubernetes Operators, Helm charts, and custom controllers for automating cloud operations.
Familiarity with Confidential Computing, Zero Trust Security models, and cloud-native security frameworks.
Excitement for the latest cloud architectures, like edge computing, infrastructure driven by AI, and serverless computing.
By joining our team, you will be part of a forward-thinking company that values innovation and creativity. We offer a competitive salary and benefits package, a flexible work environment, and the opportunity to work with some of the industry leading experts. If you're ready to take your career to the next level, we'd love to hear from you.
The base salary range is 200,000 USD - 391,000 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.Tags: Ansible APIs Architecture AWS Azure CI/CD CloudFormation Computer Science Distributed Systems Docker Engineering GCP GitLab Google Cloud Grafana Helm Java Jenkins Kubernetes Linux Machine Learning Microservices Pipelines Python Security Terraform Testing
Perks/benefits: Career development Competitive pay Equity / stock options Flex hours
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.