Manager Site Reliability Engineer
Tasks
- Architect data infrastructure
- Automate failover mechanisms
- Build observability stack for database health
- Conduct RCA for incidents
- Coordinate incident response for critical outages
- Implement automated self healing platforms
- Improve database reliability and availability
- Lead SRE team
- Lead backlog planning and continuous improvement
- Manage incident troubleshooting
- Mentor and develop engineers
- Reduce MTTR
Perks/Benefits
Skills/Tech-stack
AWS | Agile | Aurora | Automation | Cause analysis | Cloud SQL | Database Administration | Database performance | Database performance tuning | Datadog | Distributed Systems | EC2 | Failover | GCP | GKE | Grafana | Helm | High Availability | Kubernetes | Linux | Mean Time To Resolution | Monitoring and Management | Networking | Observability | Percona Monitoring | Percona Monitoring and Management | Performance Tuning | Prometheus | Query Optimization | RDS | Reliability Engineering | Replication | Root Cause Analysis | Root cause | SLAs | SLI | SLO | SQL | Scrum | Site Reliability | Site Reliability Engineering | Time to Resolution
Education
Related jobs
-
Data Engineer - Senior Analyst / Manager NZD 112K-132KAccess Controls | Apache Spark | Azure Data | Azure Data Factory | Azure Data LakeAnnual summer shutdown | Coaching | Employee assistance program | Flexible working | Health insuranceSenior-level Full TimeWellington - PwC Centre, Level 4, …1mo ago