Site Reliability Architect
A USD 155K-190K (estimate) Senior-level Full Time
Tasks
- Define and manage SLIs and SLOs
- Design and implement unified observability dashboards
- Enable GenAI for incident summarization and runbook recommendations
- Implement error budgets
- Implement static and dynamic alerting
- Integrate OpenTelemetry telemetry pipelines
- Leverage AI ML for anomaly detection and incident prediction
- Monitor microservices and downstream APIs
- Operate Dynatrace for metrics traces and logs
- Perform root cause analysis
- Propose auto remediation suggestions
- Reduce alert noise using alert correlation
- Troubleshoot distributed system dependency issues
- Use ELK or EFK for log analytics
- Use Prometheus and Grafana for monitoring
Perks/Benefits
- N/A
Skills/Tech-stack
AIOps | AWS | Alerting | Anomaly Detection | Azure | Baseline Modeling | Cause analysis | Data platforms | Dependency Mapping | Distributed Systems | Dynamic Thresholds | EFK | ELK | Error budget | GenAI | Google Cloud | Grafana | Infrastructure as Code | JSON | Kafka | Language Models | Large Language Models | Log Analytics | Machine Learning | Microservices | OpenTelemetry | Prometheus | Reliability Engineering | Root Cause Analysis | Root cause | Runbook Automation | SLI | SLO | Seasonality Detection | Series analysis | Site Reliability | Site Reliability Engineering | Streaming Data | Streaming Data Platforms | Telemetry enrichment | Terraform | Time Series | Time Series Analysis | Trace Correlation | Unified Observability | “as-code”
Education
N/A
Related jobs
-
Senior-level Full TimeDallas, Texas, United States2h ago
-
As-a-Service | C++ | Cloud Functions | Cloud platform | Compute TechnologiesSenior-level Full TimeSunnyvale, CA, USA4h ago
-
Research Engineer, Gemini Latent Thinking, DeepMind USD 207K-300KAblation Study | Algorithm Development | Deep learning | Experiment design | Language ModelsSenior-level Full TimeCambridge, MA, USA; Mountain View, CA, …4h ago
-
Software Engineer III, Infrastructure, Cloud Storage USD 147K-211KC++ | Cloud Storage | Compute Technologies | Continuous integration | Data StorageSenior-level Full TimeSeattle, WA, USA4h ago
-
Research Scientist, Robotics, Embodied AI, DeepMind USD 147K-211KDeep learning | Language Models | Machine Learning | Python | Reinforcement LearningSenior-level Full TimeMountain View, CA, USA4h ago
-
Research Scientist, Biomedical AI, DeepMind USD 147K-211KArtificial Intelligence | Benchmarking | Computational pipeline | Data Analysis | EvaluationMid-level Full TimeMountain View, CA, USA4h ago
-
C# | C++ | Co-design | Compiler technology | Computer ArchitectureSenior-level Full TimeSunnyvale, CA, USA4h ago
-
AWS S3 | Almabase | Amazon Athena | Azure Data | Azure Data LakeMid-level Full TimeRemote within Texas, TX, US R6h ago
-
AI Data Platform Lead USD 164K-229KAWS | Airflow | Audit Logging | DBT | Data GovernanceFloating holidays | Wellness daySenior-level Full TimeUnited States9h ago
-
ALB | ALB/NLB | AWS | AWS CDK | AWS CloudSenior-level Full TimeDallas, Texas, United States12h ago
-
Software Engineer I/II, Machine Learning USD 129K-190KAWS | Amazon S3 | Apache Airflow | Apache Arrow | CI/CD401k matching | Flexible spending account | Health insurance | Hybrid work | Life insuranceMid-level Full TimeBoston Office16h ago
-
ML Ops Engineer USD 174K-226KAWS | Cloud infrastructure | Cost Optimization | Data Ingestion | GCPHybrid work schedule | In-office at least 3 days per weekMid-level Full TimeSan Francisco HQ Office R16h ago
-
Machine Learning Engineer - 1 USD 130K-228KCNN | Cross-validation | Data Pipelines | Deep learning | Document processingEquity options | Flexible-hybrid work | Medical, dental & vision coverage | Professional development budget | Team offsitesNone Full TimeHybrid - San Mateo, California R16h ago
-
Data Scientist Lead USD 175K-210KAWS | Apache Spark | Data Governance | Data Modeling | DatabricksBackup childcare | Financial coaching | Health care coverage | Mental health support | Onsite wellness centersSenior-level Full TimeOH, United States17h ago
-
Senior-level Full TimeSan Jose, CA, United States17h ago
-
Senior Machine Learning Engineer - Cybersecurity USD 80K-200KAnomaly Detection | Behavioral analytics | Cyber Threat | Cyber Threat Detection | CybersecuritySenior-level Full TimeSan Jose, CA, United States17h ago
-
Senior-level Full TimePalo Alto17h ago
-
Lead AI Engineer - AI & Credit Analytics USD 156K-234KAWS | CI/CD | Data Governance | Generative AI | LLMOpsFlexible time off | Flexible work environment | Hybrid work option | Matching 401k | Medical/Dental/Vision insuranceSenior-level Full TimeCosta Mesa, CA, United States R17h ago
-
Senior-level Full TimePalo Alto18h ago
-
Software Engineer AI Data Platform - CoreAI USD 84K-180KAutomated testing | Batching | C# | C++ | CI/CDEntry-level Full TimeRedmond, WA, US18h ago
-
Software Engineer SME (TS/SCI with Poly Required) USD 187K-318KAPI | API Key | Amazon Kinesis | Amazon Web Services | Apache AirflowMid-level Full TimeChantilly, Virginia, United States18h ago
-
Software Engineer SME (TS/SCI with Poly Required) USD 187K-318KAWS | Anaconda | Apache NiFi | Azure | Azure MicroservicesMid-level Full TimeMcLean, Virginia, United States18h ago
-
Mid-level Full TimeOmaha, NE18h ago
-
DevOps Engineer USD 130K-200KAWS | AWS CDK | Ansible | Bash | CloudFormationAD&D insurance | Disability benefits | Employee assistance resources | Healthcare coverage | Learning and development resourcesSenior-level Full TimeMcLean, Virginia, United States18h ago
-
AI/ML Engineer - Shared Services Automation-Remote USD 128K-200KAI Center | Agentic Frameworks | Azure | CI/CD | Cloud infrastructureDental insurance | FSA | HSA | Health insurance | Retirement planMid-level Full TimeRochester, MN, United States R18h ago