Senior Solutions Architect, Platform Infrastructure - PST (Remote)
United States
- Remote-first
- Website
- @weights_biases 𝕏
- GitHub
- Search
Weights & Biases
Weights & Biases, developer tools for machine learningWeights & Biases is a Series C company with $250M in funding and over 200 employees. We proudly serve over 1,000 customers and more than 30 foundation model builders including customers such as OpenAI, NVIDIA, Microsoft, and Toyota.
The Senior Solutions Architect role at Weights & Biases is a unique hybrid, blending the technical expertise of a Site Reliability Engineer (SRE) with the communication and advisory skills of a Solutions Architect. In this role, you will focus on all aspects of the Weights & Biases Platform, managing customer deployments across various cloud infrastructures and on-prem environments to ensure scalability, reliability, and operational excellence.
You will work closely with customers to debug issues, provide best practices, and help them unlock the full potential of Weights & Biases. Additionally, you will produce technical content such as blog posts, documentation updates, and internal enablement material to support the Field Engineering team. This role requires deep collaboration with Support, Product, and Engineering teams to drive product improvements based on customer insights.
Responsibilities:
- Deployment & Operations:
- Work with customer operations teams to provision Weights & Biases services in Dedicated Cloud, Private Cloud, and on-prem environments.
- Manage complex infrastructure implementations, partnering with highly skilled customer engineers.
- Monitor and ensure the reliability, performance, and scalability of customer deployments using SRE best practices.
- Debugging & Troubleshooting:
- Diagnose and resolve issues in customer environments, documenting resolutions to accelerate future problem-solving.
- Provide hands-on support for containerized and distributed systems using Docker, Kubernetes, and related technologies.
- Customer Engagement:
- Lead technical discussions with customers, acting as a trusted advisor for infrastructure reliability and operational excellence.
- Deliver training sessions, product demos, and workshops to help customers maximize the value of Weights & Biases.
- Collaborate with customers to uncover desired outcomes and recommend solutions tailored to their needs.
- Enablement & Collaboration:
- Partner with AI Solution Engineers to streamline post-sales processes, including onboarding, adoption, and training.
- Collaborate with Sales Engineering to ensure a seamless transition from POC to onboarding.
- Provide insights to the Product team based on customer feedback to influence the product roadmap.
Requirements:
- Based in the Pacific Standard Time (PST) timezone.
- A proven track record of systematically diagnosing and resolving infrastructure issues.
- Prior experience in a customer-facing technical role.
- Expertise with Docker, Kubernetes, Helm charts, networking, and cloud-managed services (e.g., MySQL, Object Stores).
- Strong fundamentals in Infrastructure as Code (IaC), preferably Terraform.
- Proficiency with at least one cloud platform (AWS, GCP, Azure); experience with multiple platforms is a plus.
- Strong Linux/Unix command line experience.
- Basic proficiency in Python and familiarity with ML workflows or tools.
- Exceptional communication skills, both written and verbal, with the ability to simplify complex topics for diverse audiences.
- Proven ability to prioritize and manage multiple competing tasks in a dynamic environment.
Strong plus
- Deep proficiency in Kubernetes design patterns, including Operators.
- Familiarity with data engineering and MLOps tooling.
- Experience as an educator or facilitator for technical training sessions, workshops, or demos.
- SaaS, web service, or distributed systems operations experience.
Our Benefits:
- 🏝️ Flexible time off
- 🩺 Medical, Dental, and Vision for employees and Family Coverage
- 🏠 Remote first culture with in-office flexibility in San Francisco
- 💵 Home office budget with a new high-powered laptop
- 🥇 Truly competitive salary and equity
- 🚼 12 weeks of Parental leave (U.S. specific)
- 📈 401(k) (U.S. specific)
- Supplemental benefits may be available depending on your location
- Explore benefits by country
#LI-Remote
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: AWS Azure Deep Learning Distributed Systems Docker Engineering GCP Generative AI Helm Kubernetes Linux Machine Learning MLOps MySQL OpenAI Python Terraform Weights & Biases
Perks/benefits: Career development Competitive pay Equity / stock options Flex vacation Gear Health care Medical leave Parental leave
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.