Senior Software Engineer — Infrastructure
Hybrid / San Francisco, CA or Redwood City, CA
Snorkel AI
Unlock the power of programmatic AI data development to build production AI applications with Snorkel Flow—100x faster!We’re on a mission to democratize AI by building the definitive AI data development platform. The AI landscape has gone through incredible change between 2016, when Snorkel started as a research project in the Stanford AI Lab, to the generative AI breakthroughs of today. But one thing has remained constant: the data you use to build AI is the key to achieving differentiation, high performance, and production-ready systems. We work with some of the world’s largest organizations to empower scientists, engineers, financial experts, product creators, journalists, and more to build custom AI with their data faster than ever before. Excited to help us redefine how AI is built? Apply to be the newest Snorkeler!
As a Senior Software Engineer on the Infrastructure team, you'll accelerate the Snorkel AI team and our customers by improving our developer platform and services for user and data management across the stack. You’ll work closely with other engineers, researchers, and product management to align on the highest leverage improvements for CI/CD, cloud infrastructure, deployment, security, authentication/authorization, and more.
Main Responsibilities
- Design, build, and maintain services and deployment for Snorkel’s enterprise platforms
- Design, build, and improve observability and alerting for Snorkel’s enterprise platforms
- Contribute to Snorkel’s inhouse deployment management software to installation and upgrades of various deployments for Snorkel’s enterprise customers
- Build and maintain Snorkel’s production and staging infrastructure; own our k8s and cloud strategy
- Work closely with various engineering teams in defining test strategies and build infrastructure to execute the same
- Deploy and optimize CI/CD pipelines across multiple environments and continuously improve development and deployment best practices
- Collaborate with enterprise customers to understand product use cases and translate into engineering specifications, and deliver high-quality solutions
- Participate in on-call rotations, post-incident reviews, and other operational duties to ensure service delivery quality
Work a hybrid schedule with three days per week in our Redwood City HQ or the SF office
Minimum Qualifications
- Bachelor's degree in Computer Science or related field, or equivalent demonstrated experience
- Strong development and debugging skills in Python
- 5+ years of software development experience in distributed systems and cloud-native applications
- Strong experience with cloud platforms and infrastructure as code (Terraform, CloudFormation, Helm)
- Practical experience with Docker containerization and clustering (Kubernetes/EKS/GKE)
- Proficiency in code and system health, diagnosis, resolution and software test engineering
- Strong communication and coding skills
- Regularly follows the best software engineering practices and hold a high bar for the team by leading design, code review and test plan reviews
Preferred Skills
- Extremely well versed in building and managing cloud infrastructure for enterprise platforms on (AWS, GCP, Azure) and services like EC2, EKS, VPC etc
- Experience in one or more of the build tools like Bazel, Gradle, Make etc. Extra points for someone who has hands on experience in building and managing large code bases with these tools
- Designed and implemented developer-friendly APIs or tools to boost developer productivity
- Familiarity in deployment, monitoring and maintenance of large-scale enterprise software products
- Familiarity in developing and releasing infrastructure software for SaaS and on-prem platforms
- [Nice to have]: Hands-on experience setting up and operating Kubernetes clusters at scale[Nice to have]: Experience with large scale distributed computing systems for ML Training or Serving, eg: Ray, Spark, Tensorflow etc
- [Nice to have]: Hands-on experience in creating and maintaining metrics and dashboards on observability platforms such as New Relic, DataDog, Chronosphere, or similar tools
- [Nice to have]: Experience building services and infrastructure for Machine learning and AI Systems
- [Nice to have]: Experience in cloud networking, security and service mesh like istio
Be Your Best At Snorkel Snorkel AI is on a mission to make machine learning practical for everyone, and it starts with building a team that welcomes, represents and gives opportunity to all. We work at the frontier of AI and software engineering, and believe that underrepresented communities need to play a part in shaping the future of these fields. At Snorkel AI, we actively work to create an environment that values end-to-end ownership, diverse forms of impact, and opportunities for personal growth. Snorkelers are supported by an amazing team and an amazing set of benefits. For Full-time employees, we offer comprehensive medical, dental, and vision plans for Snorkelers and their families, plus a yearly wellness stipend. Our 401k program lets Snorkelers plan for their future and our parental leave program lets new parents take up to 20 weeks of paid time off. Learn more about these benefits and more — like our workstation setup allowance — on our Careers page. Snorkel AI is proud to be an Equal Employment Opportunity employer and is committed to building a team that represents a variety of backgrounds, perspectives, and skills. Snorkel AI embraces diversity and provides equal employment opportunities to all employees and applicants for employment. Snorkel AI prohibits discrimination and harassment of any type on the basis of race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state, or local law. All employment is decided on the basis of qualifications, performance, merit, and business need. We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: APIs AWS Azure Bazel CI/CD CloudFormation Clustering Computer Science Data management Distributed Systems Docker EC2 Engineering GCP Generative AI Helm Kubernetes Machine Learning Pipelines Python Research Security Spark TensorFlow Terraform
Perks/benefits: 401(k) matching Career development Health care Medical leave Parental leave Wellness
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.