Principal Site Reliability Engineer (Intelligent Automation)
USD 162K-302K Senior-level Full Time
Tasks
- Automate deployment of ML pipelines and HPC clusters
- Build resilient highly available architectures for ML and HPC
- Design and implement infrastructure as code solutions
- Develop automation scripts and workflows for infrastructure management
- Ensure security governance and regulatory compliance
- Implement disaster recovery and business continuity
- Lead AIOps incident management
- Mentor and train engineers in IaC and HPC
- Monitor and optimize cloud usage and costs
- Partner with cross functional teams to align solutions to business goals
- Provide technical leadership to engineers
- Provision and manage cloud infrastructure for ML and HPC workloads
- Run chaos engineering experiments
- Set up monitoring logging and alerting
Perks/Benefits
- N/A
Skills/Tech-stack
AIOps | AWS | Auto Scaling | Azure | Bash | Business Continuity | CI/CD | Chaos Engineering | Cloud platform | CloudFormation | Data Pipelines | Datadog | Disaster Recovery | Distributed Systems | ELK Stack | Feature Store | GPU Computing | Go | Google Cloud | Google Cloud Platform | Grafana | HPC | Infrastructure as Code | Kubernetes | Load Balancing | MLOps | Machine Learning | Machine Learning Pipelines | Model Deployment | Model Monitoring | Model versioning | Multi-region | Multi-region Deployments | NVIDIA CUDA | Prometheus | Pulumi | Python | Reliability Engineering | Serverless computing | Site Reliability | Site Reliability Engineering | Spacelift | Storage Systems | TensorFlow | Terraform | “as-code”
Education
Bachelor of Engineering | Bachelor of Science | Master of Science
Regions
Countries
States
Related jobs
-
Data Engineer USD 90K-126KAgile | Azure Data | Azure Data Factory | Change Data Capture | Crystal Reports401k matching | Loan forgiveness | Tuition reimbursementMid-level Full TimeChicago, IL, United States3h ago
-
Staff Data Platform Software Engineer, Graph — Veza USD 176K-308KAWS | Auditing | Azure | Caching | Cloud platform401k match | ESPP | Family leave programs | Flexible spending accounts | Flexible time awaySenior-level Full TimeSanta Clara, California, United States9h ago
-
Finance Analytics Engineer USD 76K-95KAI | Acceptance Testing | Dashboarding | Data Modeling | Data Pipelines401k matching | Bereavement leave | Employee assistance program | Employee discount program | Health and dental insuranceSenior-level Full TimeRemote - Nationwide, United States R10h ago
-
Infrastructure Engineer - Storage USD 100K-120KAnsible | Azure | Azure Blob | Azure Blob Storage | Azure Files401k plan | Bereavement | Disability insurance | Employee assistance program | Employee discount programMid-level Full TimeSt. Louis, MO, United States10h ago
-
Staff Data Platform Software Engineer, Graph — Veza USD 176K-308KAWS | Azure | Caching | Cloud platform | Container Orchestration401k plan with company match | ESPP | Family leave programs | Flexible spending accounts | Flexible time awaySenior-level Full TimeSanta Clara, California, United States10h ago
-
Senior Infrastructure Kafka Engineer USD 125K-186KAWS | Alerting | Apache Kafka | Bash | Confluent KafkaContract-to-hire | Hybrid work model | Remote work optionSenior-level Full TimePhoenix, AZ11h ago
-
Senior-level Full TimeHerndon, VA11h ago
-
Senior Platform AI Engineer USD 119K-180KAPI Design | Asynchronous programming | Authentication | Concurrency | Distributed SystemsSenior-level Full TimeCenter, Center District, IL12h ago
-
Senior-level Full TimeCenter, Center District, IL12h ago
-
Sr. Data Engineer (ON-SITE) USD 145K-160KAWS Glue | AWS Lambda | Amazon Athena | Amazon DynamoDB | Amazon EC2Complimentary club membership | Personal training | Pilates | Shop discounts or incentives | SpaßSenior-level Full TimeNew York, NY, United States13h ago
-
Senior-level ContractJersey City, United States13h ago
-
AWS Lambda | Amazon DynamoDB | Amazon Kinesis | Amazon SNS | Amazon SQSHybrid workSenior-level ContractSeattle, United States13h ago
-
Lead Software Engineer - Java/Python - Learn AI / LLM USD 175K-215KAgile | Amazon Web Services | Application Resiliency | Artificial Intelligence | CI/CDBackup childcare | Financial coaching | Health care coverage | Mental health support | On-site health and wellness centersSenior-level Full TimeNew York, NY, United States14h ago
-
Quant Analytics [Multiple Positions Available] USD 150K-185KAWS Redshift | CTE | Data Aggregation | Data Enrichment | Data TransformationBackup childcare | Financial coaching | Health care coverage | Mental health support | On-site wellness centersSenior-level Full TimePlano, TX, United States14h ago
-
Benchmarking | CUDA | Communication optimization | Data parallelism | Deep learningMid-level Full TimeSeattle, Washington, United States14h ago
-
Machine Learning Engineer USD 130K-194KAI machine learning | AWS AI | AWS AI Machine Learning | Amazon DynamoDB | Amazon EC2Professional development | Work from homeMid-level Full TimeRemote, NY, US R15h ago
-
Software Engineer III - Data, AWS, ETL, Java/Python, USD 173K-185KAPIs | AWS | Agile methodologies | Apache Airflow | Apache FlinkBackup childcare | Financial coaching | Health care coverage | Mental health support | On-site health and wellness centersSenior-level Full TimePlano, TX, United States15h ago
-
Algorithms Engineer USD 72K-120KARIMA | Anomaly Detection | Causal Inference | Causal forests | Change point detectionEntry-level Full TimeCenter, Center District, IL15h ago
-
Data parallelism | Deep learning | Distributed Training | GPU Acceleration | Model BenchmarkingMid-level Full TimeSan Jose, California, United States15h ago
-
A/B | A/B Testing | B testing | Computer Vision | Deep learningEntry-level Full TimeSeattle, Washington, United States15h ago
-
Computer Vision | Deep learning | Information Retrieval | Language Processing | Machine LearningEntry-level Full TimeSan Jose, California, United States15h ago
-
Partner Engineer, Generative AI USD 173K-247KAWS | Agent Orchestration | Azure | Bias Mitigation | C plus plusSenior-level Full TimeMenlo Park, CA16h ago
-
AI Research Scientist, SysML - FAIR USD 143K-208KArtificial Intelligence | C# | C++ | Co-design | Compiler designMid-level Full TimeMenlo Park, CA | Boston, MA …16h ago
-
Data Engineer, Analytics (Technical Leadership) USD 175K-242KDashboards | Data Architecture | Data Governance | Data Marts | Data ModelingSenior-level Full TimeMenlo Park, CA | New York, …16h ago
-
AI Research Engineer, FAIR Chemistry USD 141K-208KApplied Mathematics | Artificial Intelligence | Computational statistics | Data Science | Density Functional TheorySenior-level Full TimeSan Francisco, CA16h ago