Senior Site Reliability Engineer (SRE)

Israel - hybrid

Full Time Senior-level / Expert USD 87K - 162K * ^est.

Viz.ai

Viz.ai is the leading AI care coordination platform for disease detection and workflow optimization. Trusted by 1,400 hospitals.

View all jobs at Viz.ai

Apply now Apply later

Posted 3 weeks ago

About Viz.ai

Viz.ai is the pioneer in the use of AI algorithms and machine learning to increase the speed of diagnosis and care across 1,700+ hospitals and health systems in the U.S. and Europe. The AI-powered Viz.ai OneTM is an intelligent care coordination solution that identifies more patients with a suspected disease, informs critical decisions at the point of care, and optimizes care pathways and helps improve outcomes. Backed by real-world clinical evidence, Viz.ai One delivers significant value to patients, providers, and pharmaceutical and medical device companies. For more information visit Viz.ai.

About the role:

We are seeking a skilled Site Reliability Engineer (SRE) to join our team and help build, maintain, and improve the reliability, scalability, and performance of our systems. As an SRE, you will be responsible for owning observability tools, driving incident management processes, and implementing automation to enhance our infrastructure. This role involves collaborating across teams to ensure a robust and efficient technology stack supporting mission-critical systems.

You will:

Proactively enhance system reliability, scalability, and performance through automation, monitoring, and capacity planning.
Develop and maintain observability systems, including distributed tracing, logging, and metrics platforms.
Establish and maintain organizational standards for monitoring, leveraging tools like Prometheus, Grafana, and OpenTelemetry.
Drive incident management, root cause analysis, and continuous improvement initiatives.
Partner with development teams to integrate reliability best practices into the software development lifecycle.
Manage infrastructure at scale in cloud services (AWS advantage) and platforms like Kubernetes or ECS.
Optimize resource utilization to reduce costs while maintaining service quality.

What success looks like:

You will have reduced the frequency and impact of production incidents by building resilient systems and improving incident response processes.
You will have improved observability: Key metrics, logs, and traces are available and actionable for all critical services, empowering teams to quickly detect and resolve issues.
You will be actively engaged in proactive problem solving: You identify and resolve systemic issues before they impact customers, and continuously refine SLOs/SLIs to reflect evolving business needs.
Leadership & Mentorship: You are seen as a reliable thought leader within the organization, mentoring others and helping shape the future of our SRE practices.

We are looking for:

At least 3 years of experience as a SRE.
Strong experience with Observability Tools: Proficiency with OpenTelemetry, Grafana, Prometheus, and ELK stack (Elasticsearch, Logstash, Kibana).
Experience with Cloud Platforms: In-depth knowledge of AWS services, including EC2, S3, RDS, and CloudFormation/Terraform for infrastructure-as-code.
Proficiency in scripting and/or development languages like Bash or Python.
Thorough understanding of CI/CD pipelines and automation tools.
Understanding of Infrastructure as Code, and strong experience with automation tools like Terraform and/or Ansible.
Solid troubleshooting and debugging skills.
A team player with a strong can-do mentality.

Why should you join us?

If you are looking to make an impact, join our mission to develop life-saving products.
If you want to be part of an amazing team, our people are at the heart of everything we do.
If you are a self-starter and naturally motivated.
You have a passion for innovative technologies in the healthcare sector, this may be the place for you!.

Location:

We are located in San Francisco, Tel Aviv, This position is based in Tel Aviv.

Our office in Tel-Aviv is located in Menachem Begin 150, within walking distance of Arlozorov and Ha'Shalom train stations.

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats: 2 0 0

Categories: Big Data Jobs Engineering Jobs

Tags: Ansible AWS CI/CD CloudFormation EC2 ECS Elasticsearch ELK Grafana Kibana Kubernetes Logstash Machine Learning Pharma Pipelines Python Terraform