Senior Machine Learning Engineer, RAG

Toronto, Ontario

Applications have closed

Klue

Grow your leads and sales with Unbounce. Easily create, test, and optimize landing pages, and boost conversions using AI insights—start turning traffic into customers today!

View all jobs at Klue

👋 Klue Engineering is hiring!

We're looking for a Senior Machine Learning Engineer to join our ML team in Toronto, focusing on building and optimizing state-of-the-art RAG (Retrieval Augmented Generation) systems. You'll be joining us at an exciting time as we reinvent our RAG systems, making this an excellent opportunity for someone with strong ML and IR fundamentals who wants to dive deep into practical LLM applications.

💡 FAQ

Q: Klue who?

A: Klue is a VC-backed, capital-efficient growing SaaS company. Tiger Global and Salesforce Ventures led our US$62m Series B in the fall of 2021. We’re creating the category of competitive enablement: helping companies understand their market and outmaneuver their competition. We benefit from having an experienced leadership team working alongside several hundred risk-taking builders who elevate every day.

We’re one of Canada’s Most Admired Corporate Cultures by Waterstone HC, a Deloitte Technology Fast 50 & Fast 500 winner, and recipient of both the Startup of the Year and Tech Culture of the Year awards at the Technology Impact Awards.

Q: What are the responsibilities, and how will I spend my time? 

A: In this role, you'll focus on optimizing our RAG systems with scientific rigor and reproducible results. You'll measure and improve retrieval systems across the spectrum from BM25 to semantic search, using comprehensive evaluation metrics including Recall@K and Precision@K. A key challenge will be developing optimal chunking and enrichment strategies for diverse data sources including news articles, website changes, documents, CRM entries, call recordings and internal communications. You'll explore how different data types and formats impact retrieval performance and develop strategies to maintain high relevance across all sources.

Beyond RAG and retrieval, you'll work on prompt engineering to effectively utilize the retrieved context. This includes developing zero-shot and few-shot prompts with structured inputs/outputs, and implementing tight iteration loops with the right evaluation metrics. 

You'll also work on training and fine-tuning smaller, more efficient models that can match the performance of large LLMs at a fraction of the cost. This includes creating labeled datasets (sometimes using prompts), conducting careful hyperparameter optimizations, and building automated training pipelines. You'll also deploy and monitor these models in production, optimize their latency, and implement comprehensive offline/online metrics to track their performance. 

Throughout all this work, you'll apply your deep understanding of the latest breakthroughs in the field to connect new research advances to practical improvements in our systems. Working closely with backend engineers, you'll help build scalable, production-ready systems that turn cutting-edge ML experiments into reliable business value.

Q: What experience are we looking for? 

  • Masters or PhD in Machine Learning, NLP, or related field

  • 2+ years building and optimizing retrieval systems

  • 2+ years training/fine-tuning transformer models

  • Strong foundation in evaluating RAG systems - both retrieval and generation

  • Deep understanding of retrieval metrics and their trade-offs

  • Strong grasp of embedding models, semantic similarity techniques, and clustering similar content

  • Knowledge of query augmentation and content enrichment strategies

  • Expertise in automated LLM evaluation, including LLM-as-judge methodologies

  • Skilled at prompt engineering - including zero-shot, few-shot, and chain-of-thought

  • Experience deploying models to production and monitoring the health of the system and the predictions.  

  • Knowledge of ML infrastructure, model serving, and observability best practices

  • Proven ability to balance scientific rigor with driving business impact

  • Track record of staying current with ML research and breakthrough papers

Q: What makes you thrive at Klue? 

A: We're looking for builders who:

  • Take ownership and run with ambiguous problems

  • Jump into new areas and rapidly learn what's needed to deliver solutions

  • Bring scientific rigor while maintaining a pragmatic delivery focus

  • See unclear requirements as an opportunity to shape the solution

Q: What technologies do we use? 

  • LLM platforms: OpenAI, Anthropic, open-source models

  • ML frameworks: PyTorch, Transformers, spaCy

  • Search/Vector DBs: Elasticsearch, Pinecone, PostgreSQL

  • MLOps tools: Weights & Biases, MLflow, Langfuse

  • Infrastructure: Docker, Kubernetes, GCP

  • Development: Python, Git, CI/CD

How We Work at Klue:

  • Hybrid. Best of both worlds (remote & in-office)

  • Our main Canadian hubs are in Vancouver and Toronto. Ideally, this role would be located in Toronto.

  • You and your team will be in office at least 2 days per week.

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  2  0  0

Tags: Anthropic CI/CD Clustering Docker Elasticsearch Engineering GCP Git Kubernetes LLMs Machine Learning MLFlow ML infrastructure MLOps NLP OpenAI Open Source PhD Pinecone Pipelines PostgreSQL Prompt engineering Python PyTorch RAG Research Salesforce spaCy Transformers Weights & Biases

Perks/benefits: Startup environment

Region: North America
Country: Canada

More jobs like this