Platform Reliability Engineer

Dallas, TX, United States

Apply now Apply later

Platform Reliability Engineers (PREs) at Homecare Homebase ensure that our most critical healthcare services remain reliable, resilient, and high-performing at scale. Blending software engineering with systems operations, PREs focus on automation, observability, incident response, and the continuous reduction of toil across complex distributed platforms.

 

This role calls for confident execution in high-stakes, high-visibility scenarios—particularly during major incidents—alongside proactive efforts to harden existing systems and improve service health over time. Ideal candidates are those who thrive in complex environments, take ownership of production reliability, and find purpose in creating systems that recover gracefully and support exceptional care delivery.

 

Platform Reliability Engineers work closely with HCHB’s Architects, Product & Development teams, System Administrators, Platform Engineers, DBAs, and Product Support in the execution of their responsibilities.

 

RESPONSIBILITIES

  • Deliver solutions that enhance the overall reliability of the platform and/or reduce toil.
  • Establish modern observability patterns and implement those patterns.
  • Monitor the overall platform health as well as manage overall uptime and availability.
  • Operationalization of services including system testing, instrumentation, monitoring, capacity model development, training, and transition to operation teams.
  • Manage deployments of major releases.
  • Lead and coordinate resolution efforts during major incidents by serving as the incident commander.
  • Participate in an equitable 24×7 on-call rotation—serving as first responder for production alerts and escalation point for other teams. 

MINIMUM QUALIFICATIONS

  • Bachelor’s degree in Computer Science, Systems Engineering, Math or related (equivalent experience considered) required.
  • 3+ years experience in a 24x7 production enterprise-class environment as an SRE or comparable role.
  • 1+ years Kubernetes administration/support in a production environment.
  • 1+ years Azure or comparable cloud PaaS, IaaS, and resource administration/support in a production environment. 
  • Demonstrated composure and effectiveness in situations requiring rapid analysis, clear prioritization, and decisive action – particularly in incidents with significant business or customer impact.
  • Excellent problem solving and analytical skills with attention to detail and driving issues to resolution.
  • Experience solving problems via automation using orchestration platforms such as Ansible, Azure Automation, and ServiceNow Flows.
  • Proficient with scripting languages (multiple preferred): Bash, PowerShell, Python, and JavaScript. 
  • Proficient with data tier languages: TSQL.
  • Proficient with the following monitoring solutions (multiple preferred): Splunk, Prometheus/Grafana, ThousandEyes, Application Insights, Azure Monitor, and Microsoft SCOM.
  • Proficient with modern SRE and Observability concepts (eg. OTEL, service level management, etc).

 

PREFERRED QUALIFICATIONS

  • Academic coursework in Algorithms, Data Structures, Distributed Systems, and Information Security. 
  • 1+ year(s) serving as incident commander for major incidents.
  • Proficient with networking and troubleshooting (ie. addressing, routing, DNS, load balancing, mesh networking). 
  • Ability to debug and optimize infrastructure as code pipelines using Ansible, Terraform, and Azure ARM. 
  • Proficient with ITSM\ITIL practices such as service management, change management, incident management, and problem management particularly in ServiceNow.
  • Experience designing large-scale distributed systems.
  • Experience designing and developing software oriented towards systems or network automation.
  • Proficient with administration, automation, and orchestration of large-scale Windows and Linux environments using configuration management solutions such as DSC and Ansible.
  • Experience operating in large SQL databases with complex business logic.
  • Experience utilizing ML\AI technologies to accelerate your work.
  • Experience with Healthcare industry HIPAA regulations (similar regulated industry experience considered ie. PCI, SOX)
  • Experience working in an Agile and/or SAFe environment.

CERTIFICATION / TRAINING

  • Candidates with relevant certifications are preferred, including but not limited to the following:
    • ITIL Foundations
    • Configuration: RHCE-Ansible
    • Kubernetes: CKA, KCSP
    • Linux: RHCE, CompTIA Linux+, GCUX, LPI
    • Microsoft: Azure Administrator, Azure DevOps Engineer, MCSE

 

About Us

Founded in 1999, Homecare Homebase, a subsidiary of Hearst Corporation is a market leader in healthcare software development providing mobile cloud-based solutions for clinical, operational, and financial improvement of home-based care throughout the United States. Our software enables real-time solutions for wireless information exchange and communication between the office and clinicians in the field. 

Our success is fueled by our talented teams that are driven by their passion to make a difference in patient care. Our employees work in a culture that is guided by our CARES values: Care, Act, Respect, Excel, and Smile (a positive attitude). If you want to work in a role where your skills have a direct influence on empowering patient care, Homecare Homebase is the next step in your career. 

 

What You Can Expect from Us 

At Homecare Homebase, we don't just help our clients succeed; we help our employees succeed. Competitive pay, robust benefits, and professional development opportunities are a few of the many reasons that Homecare Homebase is a great place to build your career. 

 

Our Team Members Also Enjoy

Meaningful work. Our employees often tell us that their work gives them a sense of purpose because it makes a difference in the lives of clinicians and home-based care staff, as well as the patients they serve. 

Leaders who care. President Luke Rutledge has continued the mission to create a culture that cares – one that appreciates and looks after its people. As a result, being an employee of HCHB feels like being a member of the family. 

Flexibility. We value work-life balance because we know that happy employees create happy clients. That's why Homecare Homebase offers both full and part-time career opportunities to fit life's unique demands.

A company that gives back. Every year, Homecare Homebase proudly supports numerous charitable fundraising initiatives that align with our mission of empowering exceptional care and helping others in need.

 

Sound like a good fit? We’d love to hear from you.

 

This position does not provide sponsorship. All applicants should either be US Citizens or Permanent Residents eligible to work in the US without immigration restrictions.

#LI-CC1

#LI-Hybrid

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  1  0  0

Tags: Agile Ansible Azure Computer Science DevOps Distributed Systems Engineering Excel Grafana ITIL JavaScript Kubernetes Linux Machine Learning Mathematics ML models Pipelines Python Security Splunk SQL Terraform Testing T-SQL

Perks/benefits: Career development Competitive pay Health care

Region: North America
Country: United States

More jobs like this