Staff Compute Platform Engineer

Hybrid / Redwood City, CA

Snorkel AI

Unlock the power of programmatic AI data development to build production AI applications with Snorkel Flow—100x faster!

View all jobs at Snorkel AI

Apply now Apply later

We’re on a mission to democratize AI by building the definitive AI data development platform. The AI landscape has gone through incredible change between 2016, when Snorkel started as a research project in the Stanford AI Lab, to the generative AI breakthroughs of today. But one thing has remained constant: the data you use to build AI is the key to achieving differentiation, high performance, and production-ready systems. We work with some of the world’s largest organizations to empower scientists, engineers, financial experts, product creators, journalists, and more to build custom AI with their data faster than ever before. Excited to help us redefine how AI is built? Apply to be the newest Snorkeler!

As a Staff Compute Platform Engineer, you will be a technical leader shaping the core infrastructure that powers SnorkelFlow’s compute capabilities. Your expertise in orchestration, MLOps, SDK maintainability, and infrastructure components like data connectors and distributed systems (e.g., Ray) will drive scalable and reliable solutions for the platform. In this role, you will lead initiatives, mentor engineers, and collaborate across teams to deliver innovative solutions that support SnorkelFlow's mission to simplify AI development.

Main Responsibilities

Technical Leadership:

  • Architect and lead the development of the compute platform, focusing on scalability, reliability, and performance.
  • Drive best practices in MLOps and orchestration, ensuring seamless integration with AI and Data pipelines and workflows.
  • Work with other technical leads to define the technical roadmap for the compute layer, aligning with SnorkelFlow's broader platform strategy.
  • Identify opportunities to optimize compute workflows and implement solutions that reduce complexity while increasing efficiency.

Orchestration and Infrastructure:

  • Design and implement robust orchestration workflows using tools like Prefect and Ray to support distributed and parallel computing.
  • Build and scale infrastructure for integrating with data sources such as S3, Snowflake, and Databricks.
  • Ensure fault-tolerant, high-availability compute systems that can handle diverse workloads efficiently.

SDK Development:

  • Oversee the design and development of the SnorkelFlow SDK, ensuring it is intuitive, extensible, and aligned with user needs.
  • Collaborate with cross-functional teams to expose compute platform functionalities through APIs and SDK interfaces.
  • Drive improvements in SDK maintainability, including versioning, documentation, and developer tooling.

Cross-Team Collaboration:

  • Partner with AI and Data Platform teams to ensure seamless interoperability between compute workflows and other platform layers.
  • Work closely with the Application team to design APIs that enable advanced user workflows and orchestration capabilities.
  • Act as a technical advisor to other teams, providing expertise in compute infrastructure and orchestration.

Observability and Optimization:

  • Define and implement observability strategies, including monitoring tools, dashboards, and logging frameworks, to track compute platform performance.
  • Optimize system performance by identifying bottlenecks and implementing efficient compute and data processing solutions.
  • Establish metrics and reporting to ensure continuous improvement of compute workflows.

Mentorship and Growth:

  • Mentor engineers across all levels, fostering technical growth and a culture of excellence.
  • Lead code and design reviews, ensuring adherence to high-quality engineering standards.
  • Advocate for innovative solutions and inspire the team to push technical boundaries.

Required Qualifications

  • Bachelor’s or Master’s degree in Computer Science, Software Engineering, or a related field.
  • 8+ years of experience in backend or infrastructure engineering, with significant expertise in MLOps, and SDK development for AI applications.
  • Proven track record of architecting and deploying scalable infrastructure for distributed systems, preferably with tools like Prefect, Ray, or Airflow.
  • Strong expertise in Python and backend frameworks (e.g., FastAPI, Flask).
  • Extensive experience with CI/CD pipelines, containerization (Docker), and orchestration (Kubernetes).
  • In-depth knowledge of distributed computing, resource scheduling, and fault-tolerant system design.
  • Exceptional problem-solving skills, with the ability to design solutions for complex compute challenges.
  • Knowledge of AI/ML workflows

Preferred Qualifications

  • Hands-on experience with vector databases, data processing frameworks, and data connectors for enterprise systems.
  • Familiarity with monitoring and observability tools like Prometheus, Grafana, or DataDog.
  • Experience in scaling SDKs and APIs for large user bases.
  • Strong understanding of distributed system design patterns and performance optimization.
  • Understanding of the latest AI models and pipeline needs.

What We Offer

  • A leadership role with the opportunity to influence the technical direction of SnorkelFlow.
  • Competitive salary and benefits tailored to your experience.
  • Hybrid work environment with 3 days per week at our Redwood City HQ and SF Office.
  • "No Meeting" Tuesdays and Thursdays to focus on deep work.
  • The chance to work on cutting-edge infrastructure and drive impactful change in an innovative, fast-paced environment.

 

Be Your Best At Snorkel   Snorkel AI is on a mission to make machine learning practical for everyone, and it starts with building a team that welcomes, represents and gives opportunity to all. We work at the frontier of AI and software engineering, and believe that underrepresented communities need to play a part in shaping the future of these fields. At Snorkel AI, we actively work to create an environment that values end-to-end ownership, diverse forms of impact, and opportunities for personal growth.   Snorkelers are supported by an amazing team and an amazing set of benefits. For Full-time employees, we offer comprehensive medical, dental, and vision plans for Snorkelers and their families, plus a yearly wellness stipend. Our 401k program lets Snorkelers plan for their future and our parental leave program lets new parents take up to 20 weeks of paid time off. Learn more about these benefits and more — like our workstation setup allowance — on our Careers page.   Snorkel AI is proud to be an Equal Employment Opportunity employer and is committed to building a team that represents a variety of backgrounds, perspectives, and skills. Snorkel AI embraces diversity and provides equal employment opportunities to all employees and applicants for employment. Snorkel AI prohibits discrimination and harassment of any type on the basis of race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state, or local law. All employment is decided on the basis of qualifications, performance, merit, and business need.   We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.
Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  0  0  0

Tags: Airflow APIs CI/CD Computer Science Databricks Data pipelines Distributed Systems Docker Engineering FastAPI Flask Generative AI Grafana Kubernetes Machine Learning MLOps Pipelines Python Research Snowflake

Perks/benefits: 401(k) matching Career development Competitive pay Flex vacation Health care Medical leave Parental leave Wellness

Region: North America
Country: United States

More jobs like this