Expert Observability Engineer

Cairo, Cairo Governorate, Egypt

SWATX

We are doing some maintenance on our site. Please come back later.

View all jobs at SWATX

Apply now Apply later

We are seeking an experienced Observability SME with deep expertise in observability architectures and leading monitoring platforms. This role will be responsible for designing, implementing, and optimizing end-to-end observability solutions for applications, infrastructure, and networks. The ideal candidate will have extensive hands-on experience with platforms such as ELK (Elasticsearch, Logstash, Kibana), Dynatrace, BMC TrueSight, and SolarWinds, ensuring seamless monitoring, alerting, and analytics to enhance IT operations and service reliability.

 

 

Key Responsibilities:

·       Observability Strategy & Architecture: Design and implement comprehensive observability solutions to monitor applications, infrastructure, and network performance.

·       Monitoring Tool Implementation & Optimization: Deploy and fine-tune monitoring solutions using ELK, Dynatrace, BMC TrueSight, and SolarWinds.

·       Log Management & Analysis: Establish centralized logging, log parsing, and correlation for improved event detection and troubleshooting.

·       Metrics & Performance Monitoring: Define KPIs, dashboards, and alerts for proactive IT service monitoring.

·       Incident Management & Root Cause Analysis: Collaborate with IT operations, DevOps, and SRE teams to diagnose and resolve performance issues.

·       Automation & Integration: Integrate monitoring tools with ITSM platforms, AIOps solutions, and automation frameworks for enhanced efficiency.

·       Capacity Planning & Optimization: Analyze historical trends and real-time data to optimize resource allocation and performance.

·       Stakeholder Collaboration: Work closely with developers, network engineers, system administrators, and business units to ensure observability best practices are followed.

·       Continuous Improvement: Stay updated on emerging observability technologies and recommend improvements to existing processes and tools

Qualifications:

  • Bachelor's degree in Computer Science, Information Technology, or related field (or equivalent experience).

·       Expertise in Observability & Monitoring Platforms:  8+ Years Hands-on experience with ELK Stack, Dynatrace, BMC TrueSight, SolarWinds, and similar platforms.

·       Strong Knowledge of Infrastructure & Application Monitoring: Experience monitoring cloud, on-premise, and hybrid environments.

·       Experience with Log & Event Correlation: Ability to configure and analyze logs for anomaly detection and security insights.

·       Automation & Scripting: Proficiency in scripting languages such as Python, PowerShell, or Bash for automation.

·       Cloud & DevOps Understanding: Experience with cloud platforms (AWS, Azure, GCP) and CI/CD pipelines.

·       ITIL & Incident Management Exposure: Understanding of ITIL processes and IT service management (ITSM) practices.

·       Networking & Security Awareness: Knowledge of network monitoring, SNMP, and security monitoring practices.

·       Excellent Communication & Documentation Skills: Ability to present findings, create technical documentation, and train teams on observability best practices.

 

Preferred Qualifications:

·       Certifications in Dynatrace, ELK, BMC TrueSight, or SolarWinds.

·       Experience with AIOps, Machine Learning for Anomaly Detection, or AI-driven Observability.

·       Background in Site Reliability Engineering (SRE) or DevOps.

·       Familiarity with Infrastructure as Code (IaC) tools such as Terraform, Ansible.

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  0  0  0

Tags: AIOps Ansible Architecture AWS Azure CI/CD Computer Science DevOps Elasticsearch ELK Engineering GCP ITIL Kibana KPIs Logstash Machine Learning Pipelines Python Security Terraform

Region: Middle East
Country: Egypt

More jobs like this