Senior Site Reliability Engineer

London

Full Time Senior-level / Expert GBP 80K - 148K * ^est.

Xtremepush

Turn visitors into loyal players with real-time data, AI and gamification. Channels include; web, app & social engagement.

View all jobs at Xtremepush

Apply now Apply later

Posted 1 month ago

We are seeking a Senior SRE with experience of working with scaled SaaS production infrastructure. The successful candidate will work as part of a team focused on site reliability, security, and scalability, as we manage our rapid growth.

The ideal candidate will be a proactive and driven individual, who excels at understanding and working on complex technical solutions requiring performance and optimisation at scale. Our core technologies include PHP, MySQL, Vue.js and AWS. Participating in an on-call roster is required as part of this role.

This is a hybrid role (2 days in the office). #LI-Hybrid

Act as a senior member of the SRE team, supporting activities including the backlog and workload of the team, scoping requirements, peer review of code, providing feedback to the rest of the team.
Represent the team in management and stakeholder meetings. Ensure best practices are kept, and suggest improvements to our development processes where you see gaps.
Investigate, test, and resolve technical problems, working closely with other engineers to deliver core product functionality.
Defining SLOs, SLIs, and SLAs for key metrics that indicate the health, security, stability and uptime of production, staging and development environments
Monitoring the above environments and reacting to alerts and issues that may arise in day-to-day operation of their product line.
Participate in an on-call rota for priority-1 level alarms with the rest of the Platform teams
Ongoing upgrades and improvements to operational processes to optimise performance, stability and cost.
Working with the platform engineering team to contribute to the planning of how we carry application/infrastructure releases and configuration changes.
Interact with internal teams and external 3rd party vendors to troubleshoot and resolve complex problems

5+ years experience in an engineering role responsible for supporting a scaled SaaS platform running on Linux in a cloud environment
Experience working with high-performance systems, and solving complex engineering problems at scale (our platform processes ~100 Billion messages per year)
Understanding of distributed systems design – including asynchronous tasks, event driven architecture, scheduling, caching and queue processing
Ability to apply distributed systems design knowledge to resolve scaling constraints. The capability to carry out performance tuning from the API to Application to Database layer of the platform.
Strong communication skills and ability to explain complex technical solutions simply to others
Strong understanding of PHP, GoLang, MySQL, Opentelemetry, Prometheus
Experience with Cloud and DevOps technologies (AWS, Terraform, CI/CD etc.)
Experience with specific technologies in our stack: Clickhouse, Kafka, Pulsar, Python
Experience with networking and security concepts
Interest or experience with marketing technologies
Interest or experience with big data, data analytics, AI and machine learning

Ireland (Dublin) or UK (London or Milton Keynes)

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats: 0 0 0

Categories: Big Data Jobs Engineering Jobs

Tags: APIs Architecture AWS Big Data CI/CD Data Analytics DevOps Distributed Systems Engineering Golang Kafka Linux Machine Learning MySQL PHP Pulsar Python Security Terraform Vue