Site Reliability Engineer,Observability
The United States
Summary
The Observability Squad is responsible for maintaining the tooling used by engineers and customers to monitor Pismo services. The squad also develops guidance and standards used by engineers to create observability of their systems and gain insights from that observability.
What you'll do
As a site reliability engineer working as part of the Observability Squad you will be responsible for developing guidance and best practices to help engineers create observability for their systems. You will be tasked with developing methods and patterns to monitor the many aspects of Pismo systems. You will use your knowledge of observability and related technologies to advise engineers on how to monitor their business, client, application, and system telemetry. You will also lead teams through incrementally improving their monitoring over time, defining and tuning alarms and alerts, to reduce operator fatigue within teams.
- Manage and improve the observability services provided to Pismo engineers
- Develop standards and best practices for adoption by engineers to create observability for their APIs
- Guide engineers on tuning their monitoring to reduce signal to noise ratios
- Provide guidance to engineers on how to apply machine learning to their observability
- Help engineers to conduct root cause analysis using observability tooling
Minimum Qualifications
Technical Skills:
- The candidate has a background in software development. They have familiarity with the use of integrated development environments such as Eclipse, IdeaJ, VS Code, or similar.
- Used and developed software professionally using languages such as Python, Golang, Java, or Javascript.
- Has experience designing distributed systems, with a working understanding of concepts like enterprise integration patterns, REST, and can identify the benefits and drawbacks of any potential solution
- Has used monitoring tools to gain observability into distributed systems. You should have created dashboards, alarms, and searched logs and traces. The candidate should be able to explain the difference between business metrics, client experience metrics, and application metrics
- Familiar with OpenTelemetry and has experience using it. The candidate has configured and managed OpenTelemetry Collectors and has instrumented code to use OpenTelemetry in a programming language such as Python, Golang, Java, or Javascript
Desirable Qualifications
- Experienced in using the tools and technologies associated with Amazon Web Services (AWS) and the Cloud Native Computing Foundation (CNCF).
- Candidates should also be familiar with building systems based on Amazon Web Services. They will be familiar with the set of services offered by AWS and can make informed decisions about which service best fit a system need
- Used tools such as Terraform or OpenTofu for managing cloud services as code using GitHub and the GitOps methodology.
- Experience building dashboards in Grafana, or similar tools, for different audiences (executives, operations, development) and can advise others on best practices for creating and optimising useful dashboards
--
Pismo is an Equal Employment Opportunity employer that proudly pursues and hires a diverse workforce. Pismo does not make hiring or employment decisions on the basis of race, color, religion or religious belief, ethnic or national origin, nationality, sex, gender, gender identity, sexual orientation, disability, age or any other basis protected by applicable laws or prohibited by company policy. Pismo also strives for a healthy and safe workplace and strictly prohibits harassment of any kind.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: APIs AWS Distributed Systems GitHub Golang Grafana Java JavaScript Machine Learning Python Terraform
Perks/benefits: Career development
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.