Machine Learning Platform Engineer
San Francisco HQ
⚠️ We'll shut down after Aug 1st - try foo🦍 for all jobs in tech ⚠️
Full Time Senior-level / Expert USD 218K - 240K
Alembic
Uncover marketing success with Alembic's AI-driven analytics. Predict revenue outcomes, optimize media spend, and gain actionable insights in real-time.About Alembic
Alembic is pioneering a revolution in marketing, proving the true ROI of marketing activities. The Alembic Marketing Intelligence Platform applies sophisticated algorithms and AI models to finally solve this long-standing problem. When you join the Alembic team, you’ll help build the tools that provide unprecedented visibility into how marketing drives revenue, helping a growing list of Fortune 500 companies make more confident, data-driven decisions.
About the Role
Alembic is looking for a Machine Learning Platform Engineer to build and scale the infrastructure that powers our AI-driven products. In this role, you’ll work closely with our ML, data science, and platform teams to enable scalable training, deployment, and monitoring of models in production.
You’ll design the tools and systems that make ML experimentation fast, reproducible, and reliable—helping us deliver accurate, real-time insights to enterprise customers.
Key Responsibilities
Design, build, and maintain infrastructure to support the full ML lifecycle—from data ingestion to model deployment and monitoring
Implement tools and workflows for ML experimentation, feature engineering, training pipelines, and hyperparameter tuning
Build and run scalable and versioned deployment systems using modern frameworks (e.g., MLflow, Kubeflow, Airflow)
Collaborate with data scientists to productionize ML-based applications, accelerate training cycles, and streamline model validation
Integrate CI/CD practices into the ML workflow to ensure robust and automated model deployment
Establish monitoring, alerting, and logging for models in production (e.g., drift detection, performance metrics)
Champion best practices around reproducibility, traceability, and model governance
Must-Have Qualifications
8+ years of experience in platform engineering, with some experience in machine learning infrastructure or MLOps
Deep understanding of the ML lifecycle, including data pipelines, training, model serving, and observability
Strong experience with cloud and cloud-native tooling (e.g. Kubernetes, Docker, etc.) as well as experience with Infrastructure as Code tools (e.g. Ansible)
Deep expertised with monitoring and observability tools (e.g., Prometheus, Grafana, DataDog, etc.)
Familiarity with ML workflow orchestration (e.g., Airflow, Temporal, Kubeflow, Flyte)
Proficiency in Python and experience integrating with ML libraries and frameworks
Strong collaboration skills and the ability to support cross-functional data and ML teams
Nice-to-Have
Experience deploying and managing GPU-based workloads for model training and inference
Familiarity with real-time inference and low-latency serving architectures
Understanding of data versioning (e.g., DVC, Delta Lake) and metadata management
Experience with C++
Knowledge of model governance, compliance, or auditing in enterprise environments
What You’ll Get
The opportunity to shape and scale the infrastructure behind a production ML platform
Close collaboration with data scientists and product teams tackling complex causal inference problems
The satisfaction of enabling better decisions for large enterprise customers through reliable ML delivery
A mission-driven, collaborative team focused on impact, innovation, and integrity
Tags: Airflow Ansible Architecture Causal inference CI/CD Data pipelines Docker Engineering Feature engineering GPU Grafana Kubeflow Kubernetes Machine Learning MLFlow ML infrastructure MLOps Model deployment Model training Pipelines Python
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.