Manager Site Reliability Engineer
Tasks
- Architect data infrastructure
- Automate failover mechanisms
- Build observability stack for database health
- Conduct RCA for incidents
- Coordinate incident response for critical outages
- Implement automated self healing platforms
- Improve database reliability and availability
- Lead SRE team
- Lead backlog planning and continuous improvement
- Manage incident troubleshooting
- Mentor and develop engineers
- Reduce MTTR
Perks/Benefits
Skills/Tech-stack
AWS | Agile | Aurora | Automation | Cause analysis | Cloud SQL | Database Administration | Database performance | Database performance tuning | Datadog | Distributed Systems | EC2 | Failover | GCP | GKE | Grafana | Helm | High Availability | Kubernetes | Linux | Mean Time To Resolution | Monitoring and Management | Networking | Observability | Percona Monitoring | Percona Monitoring and Management | Performance Tuning | Prometheus | Query Optimization | RDS | Reliability Engineering | Replication | Root Cause Analysis | Root cause | SLAs | SLI | SLO | SQL | Scrum | Site Reliability | Site Reliability Engineering | Time to Resolution
Education
Related jobs
-
Engineering Manager - Content Machine Learning NZD 87K-100KAI Safety | Automation | Content Moderation | Cross-Functional Collaboration | Cross-functionalEquity | Flexible leave | Parental leave | Wellbeing allowanceMid-level Full TimeAuckland, Auckland, New Zealand1mo ago