AI/ML DevOps Engineer
London - 62 Buckingham Gate, United Kingdom
Millennium
Millennium is a global, diversified alternative investment firm with the mission to deliver high-quality returns for our investors.Our Infrastructure AI and Data Engineering Team is responsible for providing the foundational firm-wide AI Enablement platform. We are transitioning this platform onto K8s and we are seeking an experienced DevOps Engineer to lead this effort. The ideal candidate will help drive our cloud-native infrastructure initiatives and lead the implementation of DevOps best practices across our organization. This is a unique opportunity to not only join one of the leading hedge funds in the world, but to provide leadership on the core AI Enablement platform which is used by every aspect of the business on a daily basis.
Key Responsibilities:
- Design and implement high-availability solutions for critical AI infrastructure
- Partner with AI/ML teams to optimize platform performance and scalability
- Drive architectural decisions for the next generation of the AI platform
- Lead the development and maintenance of CI/CD pipelines using tools like Jenkins or GitHub Actions
- Architect and implement Infrastructure as Code (IaC) solutions using Terraform or similar tools
- Optimize container orchestration platforms (Kubernetes) and microservices architecture
- Improve and maintain monitoring, alerting and incident response systems (Datadog, OpsGenie)
- Lead incident response and participate in on-call rotation
- Mentor junior team members and contribute to technical documentation
- Collaborate with development team to improve deployment processes and system reliability
Required Qualifications:
- 5+ years of experience in DevOps, Site Reliability Engineering, or similar roles
- Strong experience with cloud platforms (AWS/GCP/Azure)
- Expert knowledge of containerization (Docker) and orchestration (Kubernetes and Helm)
- Proficiency in Infrastructure as Code and configuration management tools
- Experience with high-performance, low-latency systems
- Track record of successfully delivering large-scale infrastructure projects
- Experience with CI/CD tools and methodologies
- Deep understanding of networking, security, and system architecture
- Excellent troubleshooting and analytical skills.
- Strong communication skills to collaborate with various stakeholders
Preferred Qualifications:
- Experience in financial services or hedge fund environment
- Experience with Python (FastAPI)
- Knowledge of machine learning operations (MLOps)
- Experience with data processing frameworks and big data technologies
- Experience with MultiCloud and/or On-Prem Kubernetes
- Experience running CUDA-enabled accelerated workloads
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Architecture AWS Azure Big Data CI/CD CUDA DevOps Docker Engineering FastAPI GCP GitHub Helm Jenkins Kubernetes Machine Learning Microservices ML infrastructure MLOps Pipelines Python Security Terraform
Perks/benefits: Career development
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.