Machine Learning Platform Engineer

San Francisco HQ

⚠️ We'll shut down after Aug 1st - try foo🦍 for all jobs in tech ⚠️

Alembic

Uncover marketing success with Alembic's AI-driven analytics. Predict revenue outcomes, optimize media spend, and gain actionable insights in real-time.

View all jobs at Alembic

Apply now Apply later

About Alembic

Alembic is pioneering a revolution in marketing, proving the true ROI of marketing activities. The Alembic Marketing Intelligence Platform applies sophisticated algorithms and AI models to finally solve this long-standing problem. When you join the Alembic team, you’ll help build the tools that provide unprecedented visibility into how marketing drives revenue, helping a growing list of Fortune 500 companies make more confident, data-driven decisions.

About the Role

Alembic is looking for a Machine Learning Platform Engineer to build and scale the infrastructure that powers our AI-driven products. In this role, you’ll work closely with our ML, data science, and platform teams to enable scalable training, deployment, and monitoring of models in production.

You’ll design the tools and systems that make ML experimentation fast, reproducible, and reliable—helping us deliver accurate, real-time insights to enterprise customers.

Key Responsibilities

  • Design, build, and maintain infrastructure to support the full ML lifecycle—from data ingestion to model deployment and monitoring

  • Implement tools and workflows for ML experimentation, feature engineering, training pipelines, and hyperparameter tuning

  • Build and run scalable and versioned deployment systems using modern frameworks (e.g., MLflow, Kubeflow, Airflow)

  • Collaborate with data scientists to productionize ML-based applications, accelerate training cycles, and streamline model validation

  • Integrate CI/CD practices into the ML workflow to ensure robust and automated model deployment

  • Establish monitoring, alerting, and logging for models in production (e.g., drift detection, performance metrics)

  • Champion best practices around reproducibility, traceability, and model governance

Must-Have Qualifications

  • 8+ years of experience in platform engineering, with some experience in machine learning infrastructure or MLOps

  • Deep understanding of the ML lifecycle, including data pipelines, training, model serving, and observability

  • Strong experience with cloud and cloud-native tooling (e.g. Kubernetes, Docker, etc.) as well as experience with Infrastructure as Code tools (e.g. Ansible)

  • Deep expertised with monitoring and observability tools (e.g., Prometheus, Grafana, DataDog, etc.)

  • Familiarity with ML workflow orchestration (e.g., Airflow, Temporal, Kubeflow, Flyte)

  • Proficiency in Python and experience integrating with ML libraries and frameworks

  • Strong collaboration skills and the ability to support cross-functional data and ML teams

Nice-to-Have

  • Experience deploying and managing GPU-based workloads for model training and inference

  • Familiarity with real-time inference and low-latency serving architectures

  • Understanding of data versioning (e.g., DVC, Delta Lake) and metadata management

  • Experience with C++

  • Knowledge of model governance, compliance, or auditing in enterprise environments

What You’ll Get

  • The opportunity to shape and scale the infrastructure behind a production ML platform

  • Close collaboration with data scientists and product teams tackling complex causal inference problems

  • The satisfaction of enabling better decisions for large enterprise customers through reliable ML delivery

  • A mission-driven, collaborative team focused on impact, innovation, and integrity

Apply now Apply later
Job stats:  0  0  0

Tags: Airflow Ansible Architecture Causal inference CI/CD Data pipelines Docker Engineering Feature engineering GPU Grafana Kubeflow Kubernetes Machine Learning MLFlow ML infrastructure MLOps Model deployment Model training Pipelines Python

Region: North America
Country: United States

More jobs like this