MLOps Engineer
Düsseldorf, North Rhine-Westphalia, Germany - Remote
About Cognigy
Cognigy is transforming the customer service industry with the most advanced AI Agent platform for enterprise contact centers. Its award-winning solution, Cognigy.AI, empowers enterprises to deliver instant, hyper-personalized, multilingual service on any channel. By integrating Generative and Conversational AI to create Agentic AI, Cognigy delivers AI Agents that redefine customer experiences, drive satisfaction, and support contact center employees in real-time.
Our skilled #CognigyCrew are the people behind our leading technology and we are now looking for more talented people to join our global team.
Why you’ll love working at Cognigy - Our promise to you
We empower our people to be successful as part of a diverse, passionate and respectful team who are proud to be enabling customer and employee service that is loved by everyone.
We do this by challenging each other to succeed and being enabled to do our best work. Encouraging and supporting growth is at the heart of our success, founded on a culture of mutual respect and trust – always! It’s no wonder that the values that inspire and drive our #CognigyCrew are our 4Ts - Team, Trust, Transparency, Technology.
Your new role – MLOps Engineer
Location: On-site in Düsseldorf or remote in Germany
We are looking for a skilled and ambitious MLOps Engineer to join our Engineering team and take ownership of building and operating scalable, secure infrastructure for Large Language Models (LLMs). You will support our Machine Learning, Product, and SRE teams in deploying and maintaining production-grade AI workloads on Kubernetes using cutting-edge technologies like KubeRay.
You’ll help ensure optimal performance, reliability, observability, and cost-efficiency of Cognigy’s AI infrastructure, automating processes and championing modern MLOps best practices.
Your responsibilities will include
- Build & Operate LLM Infrastructure – Design and maintain scalable LLM-serving systems using Kubernetes and KubeRay.
- Automate & Optimize – Automate deployments, rollbacks, and scaling of LLMs while optimizing resource usage and performance.
- Enhance Observability – Ensure robust monitoring, logging, and alerting for LLM operations (Prometheus, Grafana, etc.).
- Support AI Teams – Empower ML and product engineers with self-service pipelines and scalable infrastructure.
- Prioritize Security – Enforce secure deployments, compliance practices, and robust incident response strategies.
- Improve Documentation – Create and maintain technical documentation to streamline knowledge sharing and onboarding.
- Drive Innovation – Evaluate, adopt, and integrate the latest MLOps and LLM-serving technologies.
- Reduce SRE Toil – Eliminate repetitive tasks and improve operational efficiency across the platform.
Growth Potential
At Cognigy we are committed to your professional growth. This role offers significant opportunities for career development, including access to ongoing training, and involvement in high-impact projects allowing you to showcase and advance your unique skills and experience.
Requirements
About you
- Hands-on experience running production ML or LLM workloads in Kubernetes
- Familiarity with distributed ML frameworks such as KubeRay, Ray Serve, or similar
- Deep understanding of Kubernetes internals, especially GPU scheduling, autoscaling, and multi-tenant environments
- Proficiency with CI/CD systems for ML models, and versioned deployment strategies
- Strong experience with cloud platforms (AWS, GCP, or Azure), networking, and security best practices
- Skilled in monitoring and observability for ML workloads (e.g., Prometheus, Grafana)
- Passion for automation, performance tuning, and cost optimization for LLM workloads
- Clear communicator and proactive team player who thrives in fast-paced, cross-functional environments
- MLOps or DevOps certifications (nice to have)
Benefits
Life at Cognigy - What we offer you
We are an ambitious and international tech company with a great culture, and we make sure that everyone feels welcome. Our excellent benefits make us a fantastic place to work - these include
- Attractive and performance-oriented salary
- Company Pension Scheme
- 25 days paid leave, plus 5 floating days, plus public holidays
- Unique opportunity to help build and shape the company, with little hierarchy
- Flexible working options
- Colleague recognition, reward and celebration events
- Global Employee Assistance Program
- ClassPass membership, giving you access to a variety of fitness and wellness experiences
- Ongoing learning and development opportunities, including Udemy
- One paid ‘Giving Back Day' each year, so you can volunteer for a charity or community activity of your choice
- Subscription to the Calm app for you plus five friends/family members, giving you access to guided meditation, sleep stories, music, masterclasses, and much more
Equal Opportunity Employer Statement - Cognigy does not discriminate on the basis of race, sex, color, religion, age, national origin, marital status, disability, veteran status, genetic information, sexual orientation, gender identity or any other reason prohibited by law in provision of employment opportunities and benefits.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: AWS Azure CI/CD Conversational AI DevOps Engineering GCP GPU Grafana Kubernetes LLMs Machine Learning ML infrastructure ML models MLOps Pipelines Security
Perks/benefits: Career development Flex hours Startup environment Team events
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.