Senior Instrumentation & Observability Engineer
Remote Worker
AdaptHealth
With 672 locations in 47 states and over 37,000 home deliveries each day, AdaptHealth empowers patients to live their best lives.Position Summary:
The Senior Instrumentation and Observability Engineer leads the design, implementation, and maintenance of advanced observability strategies for complex distributed systems. This role is responsible for architecting sophisticated monitoring solutions, establishing observability standards, and driving the adoption of best practices across the organization. As a senior technical contributor, you will provide deep expertise in telemetry data collection, analysis, and visualization while mentoring team members and influencing observability practices throughout the engineering organization.
Essential Functions and Job Responsibilities:
Observability Architecture & Strategy
- Architect comprehensive observability solutions for complex distributed systems and microservice architectures
- Define the long-term technical vision and strategy for observability across the organization.
- Establish best practices, standards, and patterns for instrumenting applications and infrastructure.
- Lead cross-team initiatives to improve system observability and reduce mean time to detect/resolve issues.
- Design scalable telemetry data collection, processing, and storage systems capable of handling high volumes.
Advanced Monitoring & Analysis
- Design sophisticated monitoring systems with advanced alerting logic and minimal alert fatigue.
- Develop comprehensive SLI/SLO frameworks aligned with business objectives.
- Create advanced visualization systems providing actionable insights into system behavior and performance.
- Implement anomaly detection and predictive analytics to identify potential issues before they impact users.
- Lead post-incident reviews with data-driven analysis to prevent recurrence of issues.
Leadership & Mentorship
- Serve as a technical leader and subject matter expert on observability across the organization.
- Mentor junior engineers and provide guidance on observability practices and techniques.
- Collaborate with engineering leadership to define and implement observability roadmaps.
- Lead the evaluation and adoption of new observability technologies and approaches.
- Partner with product and development teams to integrate observability considerations into the design phase.
Platform Innovation
- Architect and develop custom observability solutions for unique business and technical requirements.
- Lead the design and implementation of observability data pipelines with sophisticated processing capabilities.
- Create advanced self-service tools that empower teams to manage their observability configurations.
- Develop integrations between observability systems and other enterprise platforms.
- Optimize observability infrastructure for cost-effectiveness and efficiency.
- Maintain patient confidentiality and function within the guidelines of HIPAA.
- Complete assigned compliance training and other educational programs as required.
- Maintain compliance with AdaptHealth’s Compliance Program.
- Perform other related duties as assigned.
- Assist in vendor contract reviews with managers and legal.
Competency, Skills, and Abilities:
- Exceptional problem-solving abilities with systematic approach to complex issues
- Outstanding communication skills with ability to explain technical concepts to varied audiences.
- Strategic thinking with ability to balance immediate needs and long-term vision.
- Demonstrated leadership with ability to influence without direct authority.
- Proactive mindset with focus on continuous improvement
- Ability to mentor others and share knowledge effectively.
- Experience building and leading observability teams or functions.
- Expertise in multiple cloud platforms and their native observability solutions
- Contributions to open-source observability projects or published work on observability topics.
- Experience implementing observability in high-compliance environments (finance, healthcare)
- Background in statistical analysis, data science, or machine learning as applied to system monitoring.
- Experience with AIOps and automated remediation systems
- Observability Platforms: Expert-level knowledge of multiple platforms (Prometheus, Grafana, Datadog, New Relic, Elastic Stack, Splunk)
- Distributed Tracing: Deep experience with OpenTelemetry, Jaeger, Zipkin and trace-based analysis
- Programming: Advanced proficiency in Python, Go, Java, or similar languages
- Infrastructure: Expert knowledge of Kubernetes, service mesh, and cloud platforms (AWS, GCP, Azure)
- Data Engineering: Advanced experience with time-series databases, streaming data, and data processing pipelines
- Automation: Expert-level CI/CD knowledge and Infrastructure as Code practices
Education and Experience Requirements:
- Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience), Master’s degree preferred.
- 7+ years of experience in software engineering, DevOps, or site reliability engineering
- 4+ years of specialized experience with observability platforms and practices
- Expert-level knowledge of observability tools (Prometheus, Grafana, Datadog, New Relic, Elastic Stack, Splunk)
- Advanced proficiency in at least one programming language (Go, Python, Java, etc.)
- Extensive experience with distributed tracing systems and implementation patterns
- Deep understanding of cloud-native technologies and containerized environments
- Proven track record of leading technical initiatives and influencing engineering practices.
Physical Demands and Work Environment:
- Must be able to bend, stoop, stretch, stand, and sit for extended periods.
- Ability to perform repetitive motions of wrists, hands, and/or fingers due to extensive computer use.
- The work environment may be stressful at times, as overall office activities and work levels fluctuate.
- Subject to prolonged periods of sitting and exposure to computer screens.
- Ability to utilize a personal computer and other office equipment.
- Must be able to lift 30 pounds as needed.
- Physical and mental ability to analyze, solve problems and lead others.
- Excellent ability to communicate both verbally and in writing.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: AIOps Architecture AWS Azure CI/CD Computer Science Data pipelines DevOps Distributed Systems Engineering Finance GCP Grafana Java Kubernetes Machine Learning Open Source Pipelines Python Splunk Statistics Streaming
Perks/benefits: Career development Team events
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.