MLOps Engineer - Machine Learning Platform - Toronto

Toronto, Ontario, Canada

Goldman Sachs

The Goldman Sachs Group, Inc. is a leading global investment banking, securities, and asset and wealth management firm that provides a wide range of financial services.

View all jobs at Goldman Sachs

Apply now Apply later

What We Do  

At Goldman Sachs, our Engineers don’t just make things – we make things possible.  Change the world by connecting people and capital with ideas.  Solve the most challenging and pressing engineering problems for our clients.  Join our engineering teams that build massively scalable software and systems, architect low latency infrastructure solutions, proactively guard against cyber threats, and leverage machine learning alongside financial engineering to continuously turn data into action.  Create new businesses, transform finance, and explore a world of opportunity at the speed of markets. 

Engineering, which is comprised of our Technology Division and global strategists’ groups, is at the critical center of our business, and our dynamic environment requires innovative strategic thinking and immediate, real solutions.  Want to push the limit of digital possibilities?  Start here. 

 

Who We Look For 

We are seeking a skilled and motivated engineer to join our Artificial Intelligence Platforms organization as an MLOps Engineer on our Machine Learning Services team. In this role, you will be part of an expert team responsible for our firmwide model registry and real-time serving products in the cloud. A key focus of this position will be on the implementation and optimization of Large Language Models (LLMs) which are pivotal in achieving our Generative AI agenda. 

 
Key Responsibilities: 

Deliver scalable, efficient, secure and automated processes for building, deploying and monitoring Machine Learning models 

Enable solutions that provide business customers with the ability to leverage the latest and greatest AI/ML infrastructure, frameworks, and tooling to deliver high impact outcomes 

Develop and demonstrate deep subject matter expertise on how to optimize machine learning model deployments to scale to the specific needs of each business customer 

Deliver high quality, production ready code leveraging CI/CD best practices 

Author and maintain high quality documentation for both the engineering team as well as for business customers 

Remain up to date with the latest advancements in AI/ML frameworks and related technologies 

 

Basic Qualifications

2+ years of experience in building production software using Python 

1+ years of experience as an ML Ops Engineer supporting the production implementation of models 

1+ years of experience working with containers (e.g. Docker) 

1+ years of experience with Unix-based systems 

1+ years of experience delivering solutions in a public cloud (e.g. AWS, GCP) 

Strong desire to keep learning and stay up to date with the latest and greatest developments in the model inference domain, especially for Large Language Models (LLMs) 

Strong problem-solving skills and the ability to work effectively in a fast-paced and collaborative environment 

 

Preferred Qualifications: 

Strong understanding of the end-to-end Model Development Lifecycle (MDLC) 

Strong understanding of Python frameworks, packages and tools 

Experience building Machine Learning models with frameworks such as PyTorch and TensorFlow  

Experience building containerized runtime environments for model serving (e.g. vLLM, SGLang, TensorRT, Triton, AWS Multi Model Server) 

Experience with infrastructure-as-code tools, such as Terraform or CloudFormation 

Experience with Kubernetes and other container orchestration platforms in the public cloud (e.g. AWS, GCP) 

Excellent communication skills and the ability to articulate complex technical concepts to both technical and non-technical stakeholders. 

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  3  4  0

Tags: AWS CI/CD CloudFormation Docker Engineering Finance GCP Generative AI Kubernetes LLMs Machine Learning ML infrastructure ML models MLOps Model inference Python PyTorch TensorFlow TensorRT Terraform vLLM

Perks/benefits: Startup environment

Region: North America
Country: Canada

More jobs like this