DevOps SRE Engineer
Bengaluru, Karnataka, India
This role is for one of the Weekday's clients
We are looking for a proactive DevOps Engineer with a strong focus on automation, resilience, and stability. In this role, you will play a key part in ensuring system reliability and robustness while enhancing development efficiency. In the short term, you will drive improvements to developer experience and system uptime, while in the long term, you will help establish best practices in MLOps, Infrastructure as Code (IaC), and advanced database management.
Requirements
Key Responsibilities
System Resilience, Stability & Developer Experience
- System Resilience & Stability
- Monitor and enhance system resilience across multi-cloud environments (AWS, Azure, GCP).
- Develop incident response, disaster recovery, and failover strategies to minimize downtime.
- Deploy automation and monitoring tools to detect and resolve performance issues proactively.
- Developer Experience
- Optimize CI/CD pipelines using GitLab to streamline deployments and improve efficiency.
- Collaborate with development teams to identify and resolve pain points within the software development lifecycle (SDLC).
- Implement automated testing frameworks to maintain high code quality.
MLOps, Infrastructure as Code & Database Management
- MLOps Initiatives
- Implement efficient MLOps practices for model development, deployment, and retraining.
- Evaluate frameworks that support technologies such as Azure OpenAI models.
- Infrastructure as Code (IaC) & Cloud Automation
- Design and implement IaC solutions using Terraform or CloudFormation for consistent cloud resource management.
- Automate routine tasks to ensure secure, scalable, and reproducible environments.
- Database Management
- Manage and scale MongoDB for optimized NoSQL operations.
- Maintain Qdrant to support vector search and ML-driven data operations.
- Ensure automation, monitoring, and alignment of database solutions with application requirements.
Containerization & Orchestration
- Utilize Docker and Kubernetes to containerize applications and manage scalable, resilient deployments.
- Collaborate with development teams to design and refine microservices architectures.
Monitoring, Logging & Quality Assurance
- Establish robust monitoring, logging, and alerting systems to proactively address issues.
- Analyze operational metrics to drive continuous improvements in reliability and system stability.
- Champion best practices in operational resilience and automation.
Collaboration & Best Practices
- Work closely with development teams to foster a stability-focused culture.
- Document processes, architectures, and best practices for effective communication and onboarding.
- Advocate for DevOps best practices, ensuring security, automation, and scalability are prioritized.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Architecture AWS Azure CI/CD CloudFormation DataOps DevOps Docker GCP GitLab Kubernetes Machine Learning Microservices ML models MLOps MongoDB NoSQL OpenAI Pipelines SDLC Security Terraform Testing
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.