Manager of Global Solution Architecture, Customer Platform Engineering - EST (Remote)
United States
- Remote-first
- Website
- @weights_biases 𝕏
- GitHub
- Search
Weights & Biases
Weights & Biases, developer tools for machine learningWeights & Biases is a Series C company with $250M in funding and over 200 employees. We proudly serve over 1,000 customers and more than 30 foundation model builders including customers such as OpenAI, NVIDIA, Microsoft, and Toyota.
We are seeking a highly skilled and experienced SA/SRE (Solutions Architect / Site Reliability Engineering) Manager to lead our efforts in managing customer-managed deployments, improving on-premise deployments, streamlining upgrades, and building scalable systems and processes through automation. This role requires a strong technical background, leadership capabilities, and a deep understanding of deployment automation and reliability engineering.
Responsibilities:
- Lead and manage a team of SA engineers focused on supporting and scaling customer-managed and on-premise deployments.
- Design, implement, and enhance deployment architectures to improve reliability, scalability, and security.
- Develop and optimize upgrade processes to minimize downtime and operational risk.
- Build and maintain automation frameworks to streamline deployment, monitoring, and incident management.
- Collaborate closely with product and engineering teams to enhance software deliverability and maintainability for on-premise environments.
- Establish and enforce best practices for configuration management, infrastructure as code (IaC), and CI/CD pipelines.
- Lead incident response and root cause analysis for critical production issues, ensuring continuous improvement and proactive problem prevention.
- Drive a culture of operational excellence, automation, and continuous improvement across the organization.
- Customer empathy is vital and timely communication with customer stakeholders
Requirements:
- 7+ years of experience in SRE, DevOps, or Solutions Architecture roles, with at least 2+ years in a managerial or leadership capacity.
- Strong background in managing on-premise and customer-managed deployments at scale.
- Proficiency in infrastructure as code (Terraform, Ansible, or similar tools) and CI/CD automation.
- Experience with Kubernetes, Docker, and cloud/on-prem hybrid architectures.
- Expertise in monitoring, logging, and alerting tools (Prometheus, Grafana, ELK, etc.).
- Strong scripting and programming skills (Python, Go, Bash, etc.).
- Experience with security and compliance considerations in enterprise software deployments.
- Excellent communication and stakeholder management skills, with the ability to influence technical and business decisions.
- Experience working in SaaS and enterprise environments is a plus.
Why Join Us?
- Opportunity to drive large-scale transformation in enterprise software deployment and automation.
- Work with cutting-edge technology and a team of talented engineers.
- Competitive salary, benefits, and career growth opportunities.
Our Benefits:
- 🏝️ Flexible time off
- 🩺 Medical, Dental, and Vision for employees and Family Coverage
- 🏠 Remote first culture with in-office flexibility in San Francisco
- 💵 Home office budget with a new high-powered laptop
- 🥇 Truly competitive salary and equity
- 🚼 12 weeks of Parental leave (U.S. specific)
- 📈 401(k) (U.S. specific)
- Supplemental benefits may be available depending on your location
- Explore benefits by country
#LI-Remote
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Ansible Architecture CI/CD Deep Learning DevOps Docker ELK Engineering Generative AI Grafana Kubernetes Machine Learning OpenAI Pipelines Python Security Terraform Weights & Biases
Perks/benefits: Career development Competitive pay Equity / stock options Flex hours Flex vacation Gear Health care Medical leave Parental leave Startup environment
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.