MLOps Engineer

Toronto, Ontario, Canada

⚠️ We'll shut down after Aug 1st - try foo🦍 for all jobs in tech ⚠️

Applications have closed

Benevity

Transform your company's impact strategy with Benevity. Manage volunteering, donations, and grants—all in one trusted platform for corporate purpose.

View all jobs at Benevity

Find more jobs like this Jobs in Canada

Posted 2 months ago

Meet Benevity

Benevity is the way the world does good, providing companies (and their employees) with technology to take social action on the issues they care about. Through giving, volunteering, grantmaking, employee resource groups and micro-actions, we help most of the Fortune 100 brands build better cultures and use their power for good. We’re also one of the first B Corporations in Canada, meaning we’re as committed to purpose as we are to profits. We have people working all over the world, including Canada, Spain, Switzerland, the United Kingdom, the United States and more!

We’re seeking an experienced MLOps Engineer to lead operational excellence and infrastructure development within our AI team, focusing on the full machine learning lifecycle across classical ML and deep learning systems. You’ll be instrumental in designing, deploying, and managing scalable ML pipelines and platforms in our B2B SaaS environment, ensuring that our ML services are production-ready, secure, reliable, and observable.

This role operates within a Scrum team and involves close collaboration with ML researchers, data scientists, platform engineers, and DevOps teams to build robust ML solutions integrated into Benevity’s product ecosystem.

What you’ll do:

ML/AI Platform Engineering & Operations

Design and manage cloud-native infrastructure for ML model training, evaluation, deployment, and monitoring on platforms like Azure ML, SageMaker, Vertex AI, or Databricks.
Build and maintain Infrastructure-as-Code (IaC) using tools such as Terraform to support reproducible, scalable, and auditable ML deployments.
Develop end-to-end MLOps pipelines supporting continuous integration and delivery (CI/CD), model versioning, automated testing, and retraining workflows.
Implement observability practices including logging, monitoring, and alerting to ensure model and system performance in production.
Optimize infrastructure for cost-efficiency, model latency, throughput, and reliability.
Ensure security of ML pipelines and services through authentication, authorization, rate-limiting, and auditing mechanisms.

Operational Excellence & Observability

Instrument ML systems with metrics, traces, logs, and dashboards to support performance monitoring and issue detection.
Participate in incident management, including on-call rotations, writing operational runbooks, and conducting postmortems to drive continuous improvement.
Apply security and compliance best practices to data handling, model outputs, and system operations, aligning with regulatory standards.

Integration & Collaboration

Work closely with data scientists to move models from experimentation to production.
Collaborate with software engineers to integrate ML capabilities into core products such as recommendation engines, personalization, or predictive analytics.
Partner with DevOps, Security, and SRE teams to maintain compliance (e.g., SOC2, GDPR) and platform readiness.
Engage in architectural reviews and contribute to design decisions around machine learning infrastructure and APIs.

Scrum Delivery & Continuous Improvement

Actively participate in scrum ceremonies, including sprint planning, standups, and retrospectives.
Provide effort estimates, contribute to backlog grooming, and deliver quality features and improvements in a continuous delivery cycle.
Maintain clear documentation of ML infrastructure, processes, and decisions for transparency and collaboration.

Innovation & Learning

Stay current with advancements in GenAI infrastructure, large language models, and emerging patterns like Retrieval-Augmented Generation (RAG), vector search, and agent-based architectures.
Stay informed about emerging trends in MLOps, model deployment, monitoring, and data-centric AI practices.
Contribute to the evaluation and benchmarking of deployed models for accuracy, fairness, and efficiency.
Share insights, tools, and methodologies to support the broader AI/ML engineering community at Benevity.

What you’ll bring:

A degree in Computer Science, Engineering, or a related field.
3+ years of experience in DevOps, MLOps, or SRE roles with hands-on responsibility for ML model deployment and lifecycle management.
Experience with cloud ML platforms such as AWS SageMaker, GCP Vertex AI, Azure ML, or Databricks.
Proficiency in IaC tools (Terraform, CloudFormation) and workflow orchestration (Airflow, Kubeflow, or MLflow).
Strong Python skills for scripting, automation, and interaction with ML APIs and orchestration tools.
Familiarity with observability tools like Prometheus, Grafana, Datadog, or cloud-native monitoring (CloudWatch, GCP Monitoring, Azure Monitor).
Experience implementing CI/CD pipelines for ML using GitHub Actions, Jenkins, ArgoCD, or similar.
Solid understanding of data security, model governance, and compliance in the context of ML systems.
Ability to diagnose complex issues across infrastructure, models, and data flows.
Excellent communication skills and a collaborative mindset to work cross-functionally in scrum teams.

Technical Skills & Expertise:

Cloud Platforms: Azure ML, GCP Vertex AI, AWS SageMaker, Databricks
MLOps Tooling: MLflow, Kubeflow Pipelines, Airflow, TFX, DVC, Docker, Kubernetes, Triton Inference Server
CI/CD & Infrastructure: Terraform, GitHub Actions, Jenkins, ArgoCD, GitOps
Monitoring & Observability: Prometheus, Grafana, OpenTelemetry, Datadog, cloud-native monitoring tools
Languages: Python (primary), Bash. Bonus: Go, Rust, or Java for backend systems
APIs & Streaming: REST, gRPC, Kafka, Pub/Sub, Kinesis
Security & Compliance: IAM, Kubernetes RBAC, audit logging, TLS/SSL, VPC configurations, KMS, OPA, and compliance standards like SOC2, GDPR, and HIPAA

Discover your purpose at work

We’re not employees, we’re Benevity-ites. From all locations, backgrounds and walks of life, who deserve more …

Innovative work. Growth opportunities. Caring co-workers. And a chance to do work that fills us with a sense of purpose.

If the idea of working on tech that helps people do good in the world lights you up ... If you want a career where you’re valued for who you are and challenged to see who you can become …

It’s time to join Benevity. We’re so excited to meet you.

Where we work

At Benevity, we embrace a flexible hybrid approach to where we work that empowers our people in a way that supports great work, strong relationships, and personal well-being. For those located near one of our offices, while there’s no set requirement for in-office time, we do value the moments when coming together in person helps us build connection and collaboration. Whether it’s for onboarding, project work, or a chance to align and bond as a team, we trust our people to make thoughtful decisions about when showing up in person matters most.

Join a company where DEIB isn’t a buzzword

Diversity, equity, inclusion and belonging are part of Benevity’s DNA. You’ll see the impact of our massive investment in DEIB daily — from our well-supported employee resources groups to the exceptional diversity on our leadership and tech teams.

We know that diverse backgrounds, experiences, skills and passions are what move our business and our people forward, so we're committed to creating a culture of belonging with equal opportunities for everyone to shine.

That starts with a fair and accessible hiring process. If you want to feel seen, heard and celebrated, you belong at Benevity.

Candidates with disabilities who may require accommodations throughout the hiring or assessment process are encouraged to reach out to accommodations@benevity.com.

Find more jobs like this Jobs in Canada

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats: 6 3 0

Categories: Engineering Jobs Machine Learning Jobs MLOps Jobs

Tags: Airflow APIs Architecture AWS Azure CI/CD CloudFormation Computer Science Databricks Deep Learning DevOps Docker Engineering GCP Generative AI GitHub Grafana Java Jenkins Kafka Kinesis Kubeflow Kubernetes LLMs Machine Learning MLFlow ML infrastructure MLOps Model deployment Model training Pipelines Python RAG Rust SageMaker Scrum Security Streaming Terraform Testing TFX Vertex AI