Tech Lead Manager - Infrastructure

San Francisco Bay Area

Baseten

Effortlessly serve optimized open source & custom models on the fastest, most reliable model delivery network

View all jobs at Baseten

Apply now Apply later

ABOUT BASETEN

We’re a growing team of builders backed by top-tier investors, including IVP, Spark Capital, Greylock, and Sarah Guo at Conviction. ML teams at enterprises and category-defining AI-native companies like Descript, Bland.ai, Patreon, Writer, and Robust Intelligence use Baseten to power their core production workloads with best-in-class performance, security, and reliability. While we’ve unlocked PMF and secured Series B funding, the ML infrastructure market is massive, and we’re just getting started. If you’re excited to work on engaging and relevant problems while building something new from the ground up, come join us!

THE ROLE

Are you passionate about building robust, scalable infrastructure that powers cutting-edge machine learning applications? We are looking for a Tech Lead Manager - Infrastructure to lead our infrastructure team in designing, developing, and optimizing the core systems that support our ML platform. This is an ideal role for someone with a deep technical background in infrastructure engineering who enjoys mentoring and leading a team. If you’re excited about the challenges of scaling infrastructure for ML workloads in a fast-paced startup environment, we’d love to meet you.

RESPONSIBILITIES:

  • Lead, manage, and mentor the infrastructure engineering team responsible for building the backbone of Baseten’s ML platform.

  • Define and drive the technical strategy for infrastructure, ensuring performance, security, and scalability of core systems.

  • Collaborate closely with ML teams and cross-functional stakeholders to ensure smooth integration of models into production environments.

  • Design and implement scalable infrastructure solutions, including CI/CD pipelines, container orchestration, and cloud infrastructure (AWS, GCP, etc.).

  • Dive deep into performance optimization of our systems, identifying and addressing bottlenecks to improve overall infrastructure efficiency.

  • Own end-to-end project management for infrastructure initiatives, from planning and execution to monitoring and maintenance.

  • Promote engineering best practices and a culture of continuous improvement within the team.

REQUIREMENTS:

  • Bachelor’s, Master’s, or Ph.D. in Computer Science, Engineering, or related field.

  • 5+ years of professional experience in infrastructure or software engineering, with at least 2 years in a technical leadership role.

  • Expertise in infrastructure design, including containerization (Docker), orchestration (Kubernetes), and cloud platforms (AWS, GCP).

  • Strong experience with CI/CD pipelines, infrastructure as code (Terraform, Ansible), and monitoring systems.

  • Solid understanding of networking, security, and high-availability infrastructure design.

  • Experience managing and scaling infrastructure for machine learning or similar high-performance workloads.

  • Proven track record of leading teams and delivering large-scale, production-level infrastructure solutions.

  • Excellent problem-solving skills and the ability to drive technical projects from idea to completion.

BONUS POINTS:

  • Experience with optimizing infrastructure for machine learning workloads, including GPU utilization and distributed computing.

  • Familiarity with multi-cloud strategies and hybrid cloud deployments.

  • Deep understanding of security best practices in cloud-native environments.

  • Previous experience in a fast-paced startup environment, particularly in the ML or AI space.

BENEFITS:

  • Competitive compensation package (Unlimited PTO, 401k, covered healthcare premiums).

  • Opportunity to lead a talented infrastructure team in one of the most exciting engineering fields.

  • An inclusive and supportive work culture that fosters growth and continuous learning.

  • Exposure to cutting-edge ML infrastructure technologies and collaboration with top-tier ML teams and organizations.

Apply now Apply later
  • Share this job via
  • 𝕏
  • or
Job stats:  0  0  0
Category: Leadership Jobs

Tags: Ansible AWS CI/CD Computer Science Docker Engineering GCP GPU Kubernetes Machine Learning ML infrastructure Pipelines Security Spark Terraform

Perks/benefits: Career development Competitive pay Startup environment Unlimited paid time off

Region: North America
Country: United States

More jobs like this