DevOps SRE Engineer

Bengaluru, Karnataka, India

Apply now Apply later

This role is for one of the Weekday's clients

We are looking for a proactive DevOps Engineer with a strong focus on automation, resilience, and stability. In this role, you will play a key part in ensuring system reliability and robustness while enhancing development efficiency. In the short term, you will drive improvements to developer experience and system uptime, while in the long term, you will help establish best practices in MLOps, Infrastructure as Code (IaC), and advanced database management.

Requirements

Key Responsibilities

System Resilience, Stability & Developer Experience

  • System Resilience & Stability
    • Monitor and enhance system resilience across multi-cloud environments (AWS, Azure, GCP).
    • Develop incident response, disaster recovery, and failover strategies to minimize downtime.
    • Deploy automation and monitoring tools to detect and resolve performance issues proactively.
  • Developer Experience
    • Optimize CI/CD pipelines using GitLab to streamline deployments and improve efficiency.
    • Collaborate with development teams to identify and resolve pain points within the software development lifecycle (SDLC).
    • Implement automated testing frameworks to maintain high code quality.

MLOps, Infrastructure as Code & Database Management

  • MLOps Initiatives
    • Implement efficient MLOps practices for model development, deployment, and retraining.
    • Evaluate frameworks that support technologies such as Azure OpenAI models.
  • Infrastructure as Code (IaC) & Cloud Automation
    • Design and implement IaC solutions using Terraform or CloudFormation for consistent cloud resource management.
    • Automate routine tasks to ensure secure, scalable, and reproducible environments.
  • Database Management
    • Manage and scale MongoDB for optimized NoSQL operations.
    • Maintain Qdrant to support vector search and ML-driven data operations.
    • Ensure automation, monitoring, and alignment of database solutions with application requirements.

Containerization & Orchestration

  • Utilize Docker and Kubernetes to containerize applications and manage scalable, resilient deployments.
  • Collaborate with development teams to design and refine microservices architectures.

Monitoring, Logging & Quality Assurance

  • Establish robust monitoring, logging, and alerting systems to proactively address issues.
  • Analyze operational metrics to drive continuous improvements in reliability and system stability.
  • Champion best practices in operational resilience and automation.

Collaboration & Best Practices

  • Work closely with development teams to foster a stability-focused culture.
  • Document processes, architectures, and best practices for effective communication and onboarding.
  • Advocate for DevOps best practices, ensuring security, automation, and scalability are prioritized.

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  1  0  0
Category: Engineering Jobs

Tags: Architecture AWS Azure CI/CD CloudFormation DataOps DevOps Docker GCP GitLab Kubernetes Machine Learning Microservices ML models MLOps MongoDB NoSQL OpenAI Pipelines SDLC Security Terraform Testing

Region: Asia/Pacific
Country: India

More jobs like this