Principal Reliability Engineer - EDS
USD 152K-229K Senior-level Full Time
Tasks
- Architect cloud reliability patterns
- Architect data observability frameworks
- Build anomaly detection and alert correlation
- Conduct post-incident reviews
- Create SLI SLO frameworks
- Define operational readiness and runbook quality
- Define reliability best practices for data pipelines
- Define reliability engineering strategy
- Design logging metrics tracing distributed profiling
- Develop AI driven operations automation
- Develop intelligent runbooks and troubleshooting agents
- Drive elimination of toil
- Embed idempotency and checkpointing patterns
- Establish incident response patterns
- Establish reliability engineering roadmaps
- Implement CI/CD automation
- Implement autonomous remediation
- Implement data quality and timeliness checks
- Implement predictive capacity management
- Mentor engineering teams and production support
- Oversee reliability controls and fail safe patterns
- Represent reliability engineering in architectural reviews
- Serve as technical escalation point
- Set and enforce Infrastructure as Code standards
Perks/Benefits
Skills/Tech-stack
AI Operations | AIOps | AWS | AWS Bedrock | Alert Correlation | Amazon EMR | Anomaly Detection | Apache Spark | Autonomous remediation | Availability | CI/CD | Checkpointing | CloudFormation | Data Governance | Data Lineage | Data Quality | Disaster Recovery | Distributed profiling | GCP | Hadoop | Idempotency | Infrastructure as Code | Kubernetes | LLM | Lineage Informed Alerting | Logging | Machine Learning | Metadata | Metrics | Observability | OpenTelemetry | Performance Engineering | Predictive Modeling | Prompt engineering | Python | Reliability Engineering | Resilience Engineering | SLI | SLO | SageMaker | Scripting | Snowflake | Terraform | Tracing | Vertex AI | “as-code”
Education
N/A
Regions
Countries
States
Cities
Related jobs
-
Data Engineer / BI Developer USD 91K-130KAmazon Web Services | Apache Airflow | Cloud platform | DBT | Data ModelingMid-level Full TimeCenter, Center District, IL4h ago
-
Embedded Software Engineer - Body Control Modules USD 79K-178KA/D | ASIL A D | AUTOSAR | C# | CI/CDAdoption and surrogacy expense reimbursement | Employee resource groups | Fertility treatments | Flexible family care days | Medical, dental & vision coverageSenior-level Full TimeDearborn, MI, United States5h ago
-
Software Engineer III - Senior Java Spark Developer USD 113K-188KAgile | Apache Spark | CI/CD | Concurrency | Distributed SystemsSenior-level Full TimeJersey City, New Jersey, United States6h ago
-
Software Engineer, Video USD 141K-251KAV1 | AV2 | Artificial Intelligence | Audio CODEC | Automated testingMid-level Full TimeBellevue, WA | Menlo Park, CA …8h ago
-
Applied AI Engineer - AI Solutions USD 172K-300KAgentic Workflows | Airflow | Apache Spark | Chroma | CrewAIAnnual travel up to 25% | Employee stock options | Hybrid work | Professional developmentMid-level Full TimeNew York City, NY (Hybrid); Redwood … R14h ago
-
Associate Director, Data Analytics Delivery & Operations USD 215K-261KAWS | Agile | Amazon Web Services | Analytics | Business Intelligence401 K | Dental insurance | Holidays | Medical insurance | Paid time offMid-level Full TimeMettawa, IL, United States16h ago
-
AI Solutions Engineer, Talent Acquisition USD 129K-171KAPIs | Access Control | Agentic Workflows | Audit trails | AuthenticationMid-level Full TimeSeattle, Washington, United States17h ago
-
Network Engineer, Supercomputing USD 350K-475KCUDA | Congestion Control | Container Orchestration | Debugging | Deep learningDental benefits | Health benefits | Paid parental leave | Relocation support | Unlimited PTOSenior-level Full TimeSan Francisco17h ago
-
Sr. AI Engineer USD 175K-350KAgent Orchestration | Artificial Intelligence | Backend Development | Database Design | Frontend Development401k | Dental insurance | Flexible paid time off | Home office stipend | Medical insuranceSenior-level Full TimeSanta Monica, CA18h ago
-
Product Analytics Engineer USD 130K-140KA/B | A/B Testing | Airflow | B testing | DBT401k retirement savings plan | Employer-sponsored healthcare | Flexible spending account | Health savings account | Paid parental leaveSenior-level Full TimeRemote, USA R19h ago
-
Principal Data Scientist USD 126K-255KAWS Bedrock | AWS Kinesis | AWS SageMaker | Amazon Athena | Apache KafkaEducational assistance | Health care coverage | Learning resources | Paid time off | Parental leaveSenior-level Full Time245 Summer St, Boston MA, United … R19h ago
-
Principal Product Manager, Inference Engine USD 218K-273KAutoscaling | Batching | Capacity Efficiency | Capacity Planning | GPU EconomicsEmployee assistance program | Flexible time off | LinkedIn Learning | Training reimbursementSenior-level Full TimeSeattle20h ago
-
AI Engineer, Agentic Ad Creative (Multimodal) USD 120K-220KA/B | A/B Testing | Ad Policy Compliance | B testing | CUDA401k matching | Commuter benefits | FSA | HSA | Health, dental, and vision insuranceMid-level Full TimeMountain View, California, United States20h ago
-
Lead Principal Machine Learning Engineer USD 169K-355KAccess Control | Agent systems | AgentOps | Auditability | Autogen401k match | Life insurance | Paid Holidays | Paid parental leave | Paid sick leaveSenior-level Full TimeSan Francisco, CA, United States20h ago
-
Continuous Deployment | Continuous integration | Dependency management | Docker | JuliaLearning and development stipend | Paid time off | Parental leave | Sick leaveSenior-level Full TimeLong Beach, California20h ago
-
Senior Data Engineer USD 126K-142KAzure Cloud | Azure Cloud Platform | Azure Data | Azure Data Factory | Azure Data LakeSenior-level Full TimeUnited States20h ago
-
Multidisciplinary Analysis and Optimization Engineer II USD 126K-172KAerospace Engineering | Continuous Deployment | Continuous integration | Dependency management | DockerLearning and development stipend | Paid time off | Parental leave | Sick leaveEntry-level Full TimeLong Beach, California20h ago
-
2026 Fall Health AI Scholar, Digital Health Algorithms USD 124K-150KAWS SageMaker | Amazon Redshift | Android | Artificial Intelligence | C++Entry-level Full Time665 Clyde Avenue, Mountain View, CA, …21h ago
-
Senior Software Engineer - Infrastructure Storage USD 266K-395KAPI Design | Block Storage | Ceph | Distributed Systems | Fibre Channel401k match | Commuter stipend | Flexible paid time off | Health, dental, vision coverage | Wellness stipendSenior-level Full TimeSan Francisco Office (Fremont St)21h ago
-
Senior-level Full TimeCosta Mesa, California, United States21h ago
-
Sr. Data Engineer (Remote) USD 163K-192KAccess Control | Amazon Web Services | Apache Iceberg | Apache Kafka | Apache Spark401k plan | Dental insurance | Disability insurance | Employee assistance program | FSA/HSASenior-level Full TimeRemote - United States R21h ago
-
Machine Learning Engineer - Simulation Framework USD 160K-234K3D Graphics | C++ | CUDA | Deep learning | Deterministic systemsEntry-level Full TimeFoster City, CA21h ago
-
Reliability Engineer, Supercomputing USD 350K-475KBMC | Container Orchestration | DCGM | Debugging | Firmware ManagementDental benefits | Health benefits | Paid parental leave | Relocation support | Unlimited PTOMid-level Full TimeSan Francisco21h ago
-
AWS | Agent Orchestration | CI/CD | Cloud platform | Databricks401k match | Counseling membership | Employer subsidized medical dental and vision | Flexible time away program | Life insuranceMid-level Full Time-REMOTE, USA- R21h ago
-
Software Engineer III - Big Data & AWS USD 175K-186KAPI Gateway | AWS Glue | AWS Lambda | AWS Step Functions | Amazon APISenior-level Full TimePlano, TX, United States21h ago