LLM Operations Engineer
Etobicoke, Ontario, Canada
Prophix
Transform your business with Prophix One, a Financial Performance Platform. Streamline budgeting, forecasting, and reporting. Try Prophix today.See what you can do with Prophix®
Prophix is a fast-growing global leader in financial performance management. Ambitious finance teams use Prophix One™, our Financial Performance Platform, to improve the speed and accuracy of their decision-making with a harmonized user experience, stepping confidently into the next generation of finance.
Headquartered in Etobicoke, Ontario with offices in 16 cities, we work with a global network of partners across North America, South America, the UK, Europe, and Asia to serve thousands of finance leaders across nearly one hundred countries. Learn more about our offices here.
The LLM Operations Engineer serves as the DevOps specialist within our AI team, focusing on managing the operational aspects of our AI platform, particularly our Large Language Models (LLMs) that power Prophix One Intelligence. You will build and maintain robust LLM workflows, implement monitoring systems, integrate feedback loops, and optimize the performance of our AI solutions. You'll work closely with AI Engineers and Product Owners to ensure our AI systems are reliable, secure, observable, and continuously improving.
What You Will Do
- Design, implement, and maintain LLM operations workflows using tools like Langfuse to monitor performance, track usage, and create feedback loops for continuous improvement
- Develop and maintain infrastructure-as-code for AI deployments using Terraform and AWS services (Lambda, SQS, API Gateway, OpenSearch, CloudWatch)
- Build and enhance monitoring, logging, and alerting systems to ensure optimal performance and reliability of our LLM infrastructure
- Collaborate with AI engineers to design and implement evaluation frameworks (including LLM-as-judge systems) to measure and improve model performance
- Manage prompt versioning, testing, and deployment pipelines through Concourse CI/CD and custom tooling
- Implement and maintain security guardrails for LLM interactions, ensuring compliance with best practices
- Create comprehensive documentation for LLM operations, including runbooks for production incidents
- Participate in on-call rotations to support mission-critical AI systems
- Drive innovation in LLM operations by researching and implementing best practices and emerging tools in the rapidly evolving GenAI space
What You Will Bring
To succeed in this role, you will need a combination of experience, technology skills, personal qualities, and education.
Required Qualifications
- 3+ years of experience in DevOps, SRE, or similar roles, with at least 1 year specifically working with LLMs or AI systems in production
- Strong hands-on experience with AWS cloud services, particularly Bedrock, Lambda, SQS, API Gateway, OpenSearch, and CloudWatch
- Experience with infrastructure-as-code using Terraform, CloudFormation, or similar tools
- Proficiency in Python and experience building automation tooling and pipelines
- Familiarity with LangOps platforms such as Langfuse for LLM observability and evaluation
- Experience with CI/CD pipelines using Concourse or similar tools
- Knowledge of logging, monitoring, and alerting systems
- Understanding of security best practices for AI systems, including prompt injection mitigation techniques
- Excellent troubleshooting and problem-solving skills
- Strong communication skills and ability to work effectively with cross-functional teams
- Must be legally entitled to work in the country where the role is located. Must be able to travel to the United States, Canada and/or internationally, and have a valid passport.
Preferred Qualifications
- Experience with prompt engineering and testing tools like Promptfoo
- Familiarity with vector databases and retrieval-augmented generation (RAG) systems
- Knowledge of serverless architectures and event-driven systems
- Experience with AWS Guardrails for LLM security
- Background in data engineering or machine learning operations
- Understanding of financial systems and data security requirements in the finance industry
- Familiarity with implementing technical solutions to meet compliance requirements outlined in SOC2, ISAE 3402, and ISO 27001
Why join?
A solid foundation - and a bright future
Prophix has been a pioneer in finance technology for 35 years and counting. And to further our mission and vision, we’re proud to work with our investors, Hg Capital, to grow our business and expand our market share.
Community, culture, and purpose
Phixers (the extraordinary team at Prophix) pursue excellence by creating wins for all, driving continuous innovation, and building purposeful solutions for our customers and partners.
We reward hard work, but we also know that life outside of work is vital. That’s why we provide highly competitive compensation, vacation, and benefits packages, and encourage you to get involved in our many charitable, sports, or knowledge clubs and seasonal celebrations.
Through our Corporate Social Responsibility (CSR) program, we aim to create a lasting impact on the global community with meaningful programs and initiatives. Participate in fundraising activities and get paid to volunteer for causes that matter to you. Our CSR committee also collaborates with local and international charities to donate $50,000 to deserving projects each quarter we meet our profit goals.
Learn more about us on our Careers Page!
Apply now!
Prophix promotes a diverse, inclusive, and accessible workplace. If you feel like you are a great fit for this role, please apply. While we can’t guarantee an interview, we will consider the full breadth of your experience and background.
At Prophix, we are committed to creating a working environment that is barrier-free. Please advise our Recruitment team if you require reasonable accommodation during the interview and assessment process, and we will work with you to meet your needs.
#LI-HYBRID #LI-BL1
Tags: APIs Architecture AWS CI/CD CloudFormation DevOps Engineering Finance Generative AI ISO 27001 Lambda LLMs Machine Learning OpenSearch Pipelines Prompt engineering Python RAG Security Terraform Testing
Perks/benefits: Career development Competitive pay Team events
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.