Senior Machine Learning Developer - LLMOps
Province of Quebec (Canada)
Play a key role in shaping LLMOps best practices
As a Senior Machine Learning Developer on the ML Platform team, you will play a key role in supporting the teams of applied scientists responsible for creating and training Large Language Models (LLMs) and other ML models at scale.
Your primary responsibility will be to create and maintain a suite of tools and workflows that enable the efficient development of robust, scalable, and maintainable ML models. You will also work closely with Applied Scientists to accelerate the iteration and experimentation process, ensuring faster and more effective model development.
Here is what makes this opportunity exciting:
The ML unit at Coveo focuses on finding ways to apply the latest advances in Recommender Systems, Ranking Optimization, LLMs and NLP to build innovative solutions in e-commerce, self-service and other business verticals.
We solve real problems with real data, for hundreds of large enterprise clients all around the world, on a modern platform that serves over 100M requests and automatically trains thousands of ML models on a daily basis.
Here is a glimpse at your responsibilities:
- Provide end-to-end ML tooling from data exploration to production deployment tooling.
- Facilitate development, deployment, automated testing, monitoring and debugging of ML models
- Analyze and improve the performance of our models and ML Platform to help meet critical SLOs for training models at scale and low-latency inference.
- Facilitate the adoption and usage of ML platform and observability resources and provide guidelines to improve operational efficiency and service reliability.
- Engage with your community of peers to challenge the status quo, improve our shared ways of working, and influence overall architecture decisions.
- Learn and evolve our modern tech stack which includes Python, AWS, Kubernetes, Pytorch, Terraform, Snowflake, Honeycomb and others
Here is what will qualify you for the role:
- You have 5+ years of Machine Learning industry experience.
- You have operationalized, instrumented and supported LLM and other ML models in production at a non-trivial scale before
- You are fluent in good data and software engineering practices, and you are able to develop the tools and culture which enable ML teams to deliver reliable production code in an efficient manner.
- You enjoy collaborating with scientists on a daily basis to understand their pain points and figure out how to improve their tools and increase their efficiency. You also have experience working in cross-functional teams.
Here is what will make you stand out:
- You master best practices in MLOps, ML engineering, and large-scale deployment of ML models.
- You have experience maintaining and evangelizing internal resources and libraries.
- You have acquired considerable MLOps experience hosting models at scale, by previously building tooling to facilitate data exploration and experimentation as well as automating and orchestrating complex and efficient training pipelines
- You are recognized for your communication skills and presenting complex technical subjects to audiences with different levels of technical proficiency.
Do you think you can bring this role to life?
You don’t need to check every single box; passion goes a long way and we appreciate that skillsets are transferable.
Send us your CV, we want to get to know you! Join the #Coveolife!
We encourage all qualified candidates to apply regardless of, for example, age, gender, disability, gaps in CV, national or ethnic background. We know that applying for a new role is a lot of work and we really appreciate your time.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Architecture AWS E-commerce Engineering Kubernetes LLMOps LLMs Machine Learning ML models MLOps NLP Pipelines Python PyTorch Recommender systems Snowflake Terraform Testing
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.