Senior ML Infrastructure Engineer - MLOps

United States, Remote

Full Time Senior-level / Expert USD 200K - 250K

Quince

Quince brings luxury products like Mongolian Cashmere, Italian Leather, Turkish Cotton and Washable Silk to everyone at radically low prices. Shop premium essentials with no middleman.

View all jobs at Quince

Apply now Apply later

Posted 2 hours ago

OUR STORY

Quince was started to challenge the existing idea that nice things should cost a lot. Our mission was simple: create an item of equal or greater quality than the leading luxury brands and sell them at a much lower price.

OUR VALUES

Customer First. Customer satisfaction is our highest priority.

High Quality. True quality is a combination of premium materials and high production standards that everyone can feel good about.

Essential design. We don’t chase trends, and we don’t sell everything. We’re expert curators that find the very best and bring it to you at the lowest prices.

Always a better deal. Through innovation and real price transparency we want to offer the best deal to both our customers and our factory partners.

Environmentally and Socially conscious. We’re committed to sustainable materials and sustainable production methods. That means a cleaner environment and fair wages for factory workers.

OUR TEAM AND SUCCESS

Quince is a retail and technology company co-founded by a team that has extensive experience in retail, technology and building early stage companies. You’ll work with a team of world-class talent from Stanford GSB, Google, D.E. Shaw, Stitch Fix, Urban Outfitters, Wayfair, McKinsey, Nike etc.

THE IDEAL CANDIDATE

We are seeking passionate individuals eager to revolutionize the way people purchase essential goods by leveraging cutting-edge ML and AI solutions. Our centralized data science team is dedicated to optimizing and automating decision-making processes while delivering valuable, actionable business insights. As an ML Infrastructure Engineer at Quince, you will play a critical role in shaping our ML development ecosystem. You will build and own the foundational ML development processes, operational pipelines, and production infrastructure necessary to support a scalable, efficient, and impactful ML practice. Your contributions will directly enhance our ability to drive meaningful business outcomes and innovation.

RESPONSIBILITIES:

Design, Build, and Maintain ML Pipelines: Develop and optimize end-to-end machine learning pipelines, including data ingestion, model training, validation, deployment, and monitoring.
Implement Continuous Integration/Continuous Deployment (CI/CD) for ML Models: Establish robust CI/CD processes to automate the testing, deployment, and monitoring of machine learning models in production environments.
Build and Own Production Infrastructure for Serving ML Models: Design, deploy, and maintain the production infrastructure necessary for real-time and batch serving of machine learning models, ensuring high availability, scalability, and reliability.
Build and Own the Feature Store: Design, implement, and manage the feature store to ensure efficient and scalable storage, retrieval, and versioning of features used in machine learning models, enabling consistent and reusable feature engineering across teams.
Collaborate with Data Scientists and Engineers: Work closely with data scientists, data engineers, and software engineers to ensure seamless integration of ML models into production systems, aligning models with business goals.
Monitor and Optimize Model Performance: Implement monitoring solutions to track the performance of ML models in production, identifying and addressing any issues such as data drift, model degradation, or system bottlenecks.
Ensure Scalability and Reliability: Design and implement scalable and reliable ML infrastructure, leveraging cloud platforms, containerization, and orchestration tools like Kubernetes and Docker.
Automate Data and Model Management: Develop automated solutions for version control, model registry, and experiment tracking to manage the lifecycle of ML models efficiently.
Optimize Resource Utilization: Manage and optimize the use of computational resources, such as GPUs and cloud instances, to balance performance with cost-effectiveness
Conduct Root Cause Analysis and Troubleshooting: Diagnose and resolve issues in ML pipelines, including debugging data, code, and model performance problems.
Document Processes and Systems: Create and maintain comprehensive documentation of ML pipelines, deployment processes, and operational workflows to ensure knowledge sharing and continuity

DESIRED SKILLS:

Bachelor degree in computer science, engineering or related field
5+ years of experience in ML Infrastructure or ML engineering.
Hands-on and expertise experience in
- building and maintaining ML pipelines
- building and managing scalable ML production infrastructure
- AWS or other major cloud services
Strong knowledge of CI/CD practices for ML models.
Familiarity with DevOps principles and tools.
Familiarity with TensorFlow, PyTorch, or similar frameworks.
Proficient in Python and Java (or Scala).
Excellent communication skills.
Move fast, be a team player, and kind.

We rely on market indicators and consider your specific job family, background, skills, and experience to determine your compensation in the market range. Bonus eligibility varies by role and is determined based on the position’s impact and contribution to our strategic goals.

Pay Range$200,000—$250,000 USDQuince provides equal employment opportunities to all employees and applications for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran or military status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state, or local laws.

Security Advisory: Beware of Frauds

At Quince, we're dedicated to recruiting top talent who share our drive for innovation. To safeguard candidates, Quince emphasizes legitimate recruitment practices. Initial communication is primarily via official Quince email addresses and LinkedIn; beware of deviations. Personal data and sensitive information will not be solicited during the application phase. Interviews are conducted via phone, in person, or through the approved platforms Google Meets or Zoom—never via messaging apps or other calling services. Offers are merit-based, communicated verbally, and followed up in writing. If personal information is requested to initiate the hiring process, rest assured it will be through secure and protected means.

Apply now Apply later

Job stats: 1 1 0

Categories: Engineering Jobs Machine Learning Jobs MLOps Jobs

Tags: AWS CI/CD Computer Science DevOps Docker Engineering Feature engineering Java Kubernetes Machine Learning ML infrastructure ML models MLOps Model training Pipelines Python PyTorch Scala Security TensorFlow Testing