Lead DevOps Engineer
Hyderabad, TS, India
Blend360
Blend360 co-creates value with leading companies through the integration of data, advanced analytics, technology & people. Get in touch with us today.Company Description
Blend is a premier AI services provider, committed to co-creating meaningful impact for its clients through the power of data science, AI, technology, and people. With a mission to fuel bold visions, Blend tackles significant challenges by seamlessly aligning human expertise with artificial intelligence. The company is dedicated to unlocking value and fostering innovation for its clients by harnessing world-class people and data-driven strategy. We believe that the power of people and AI can have a meaningful impact on your world, creating more fulfilling work and projects for our people and clients. For more information, visit www.blend360.com
Job Description
We are seeking a highly skilled Lead DevOps Engineer with strong On-Premise infrastructure expertise to join our team and drive the end-to-end deployment, scalability, and operationalization of machine learning models in production. You will collaborate closely with data scientists, data engineers, and DevOps teams to ensure seamless CI/CD, reproducibility, monitoring, and governance of ML pipelines.
Key Responsibilities
Design, implement, and maintain CI/CD pipelines for deploying and monitoring microservices efficiently in on-premise environments.
Manage infrastructure as code using Terraform (or equivalent on-prem solutions) for repeatable and scalable provisioning.
Deploy and optimize containerized applications using Docker across on-premise environments, integrating with systems such as Harbor (or other private registries), Vault, and on-prem messaging/file storage solutions.
Apply best practices for securing Docker images, including vulnerability scanning, reducing image size, and optimizing build efficiency.
Implement and maintain centralized logging, monitoring, and alerting systems (e.g., Prometheus, Grafana, ELK stack) to ensure system reliability and observability.
Ensure security best practices across on-prem environments, including secrets management, access control, and compliance with organizational policies.
(Nice to have) Design and manage multi-client architectures within shared pipelines and storage solutions (e.g., NFS, Object Storage).
Qualifications
6+ years of experience in DevOps or MLOps with a strong focus on production-grade ML solutions in on-premise infrastructure.
Strong expertise in CI/CD tooling, container orchestration (Docker, Kubernetes on-prem clusters), and on-premise infrastructure security.
Proficiency in Terraform (or Ansible, Puppet, or similar tools) for infrastructure automation.
Deep understanding of Docker, including best practices for securing, optimizing, and managing images.
Experience implementing centralized logging and monitoring using on-prem tools (e.g., ELK, Prometheus, Grafana).
Experience with security best practices, including secrets management, role-based access, and compliance in an on-premise environment.
Experience with Docker Compose for local development and multi-container orchestration.
Additional Information
Experience with Databricks on private cloud or equivalent on-prem data processing platforms.
Experience deploying, securing, and managing vector databases.
Hands-on experience with MLFlow for model tracking and deployment.
Familiarity with best practices for multi-client architecture in shared on-prem pipelines and storage.
Python experience for microservices development, if interested in contributing to application code.
Experience with Docker Compose for local development and multi-container orchestration.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Ansible Architecture CI/CD Databricks DevOps Docker ELK Grafana Kubernetes Machine Learning Microservices MLFlow ML models MLOps Pipelines Puppet Python Security Terraform
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.