Principal Site Reliability Engineer (Intelligent Automation)
USD 162K-302K Senior-level Full Time
Tasks
- Automate deployment of ML pipelines and HPC clusters
- Build resilient highly available architectures for ML and HPC
- Design and implement infrastructure as code solutions
- Develop automation scripts and workflows for infrastructure management
- Ensure security governance and regulatory compliance
- Implement disaster recovery and business continuity
- Lead AIOps incident management
- Mentor and train engineers in IaC and HPC
- Monitor and optimize cloud usage and costs
- Partner with cross functional teams to align solutions to business goals
- Provide technical leadership to engineers
- Provision and manage cloud infrastructure for ML and HPC workloads
- Run chaos engineering experiments
- Set up monitoring logging and alerting
Perks/Benefits
- N/A
Skills/Tech-stack
AIOps | AWS | Auto Scaling | Azure | Bash | Business Continuity | CI/CD | Chaos Engineering | Cloud platform | CloudFormation | Data Pipelines | Datadog | Disaster Recovery | Distributed Systems | ELK Stack | Feature Store | GPU Computing | Go | Google Cloud | Google Cloud Platform | Grafana | HPC | Infrastructure as Code | Kubernetes | Load Balancing | MLOps | Machine Learning | Machine Learning Pipelines | Model Deployment | Model Monitoring | Model versioning | Multi-region | Multi-region Deployments | NVIDIA CUDA | Prometheus | Pulumi | Python | Reliability Engineering | Serverless computing | Site Reliability | Site Reliability Engineering | Spacelift | Storage Systems | TensorFlow | Terraform | “as-code”
Education
Bachelor of Engineering | Bachelor of Science | Master of Science
Regions
Countries
States
Related jobs
-
Finance Analytics Engineer USD 76K-95KAI | Acceptance Testing | Dashboarding | Data Modeling | Data Pipelines401k matching | Bereavement leave | Employee assistance program | Employee discount program | Health and dental insuranceSenior-level Full TimeRemote - Nationwide, United States R8h ago
-
Infrastructure Engineer - Storage USD 100K-120KAnsible | Azure | Azure Blob | Azure Blob Storage | Azure Files401k plan | Bereavement | Disability insurance | Employee assistance program | Employee discount programMid-level Full TimeSt. Louis, MO, United States8h ago
-
Senior Infrastructure Kafka Engineer USD 125K-186KAWS | Alerting | Apache Kafka | Bash | Confluent KafkaContract-to-hire | Hybrid work model | Remote work optionSenior-level Full TimePhoenix, AZ10h ago
-
Senior-level Full TimeHerndon, VA10h ago
-
Senior Platform AI Engineer USD 119K-180KAPI Design | Asynchronous programming | Authentication | Concurrency | Distributed SystemsSenior-level Full TimeCenter, Center District, IL10h ago
-
Senior-level Full TimeCenter, Center District, IL11h ago
-
Senior-level ContractJersey City, United States12h ago
-
AWS Lambda | Amazon DynamoDB | Amazon Kinesis | Amazon SNS | Amazon SQSHybrid workSenior-level ContractSeattle, United States12h ago
-
Lead Software Engineer - Java/Python - Learn AI / LLM USD 175K-215KAgile | Amazon Web Services | Application Resiliency | Artificial Intelligence | CI/CDBackup childcare | Financial coaching | Health care coverage | Mental health support | On-site health and wellness centersSenior-level Full TimeNew York, NY, United States12h ago
-
Quant Analytics [Multiple Positions Available] USD 150K-185KAWS Redshift | CTE | Data Aggregation | Data Enrichment | Data TransformationBackup childcare | Financial coaching | Health care coverage | Mental health support | On-site wellness centersSenior-level Full TimePlano, TX, United States13h ago
-
Benchmarking | CUDA | Communication optimization | Data parallelism | Deep learningMid-level Full TimeSeattle, Washington, United States13h ago
-
Machine Learning Engineer USD 130K-194KAI machine learning | AWS AI | AWS AI Machine Learning | Amazon DynamoDB | Amazon EC2Professional development | Work from homeMid-level Full TimeRemote, NY, US R13h ago
-
Software Engineer III - Data, AWS, ETL, Java/Python, USD 173K-185KAPIs | AWS | Agile methodologies | Apache Airflow | Apache FlinkBackup childcare | Financial coaching | Health care coverage | Mental health support | On-site health and wellness centersSenior-level Full TimePlano, TX, United States13h ago
-
Algorithms Engineer USD 72K-120KARIMA | Anomaly Detection | Causal Inference | Causal forests | Change point detectionEntry-level Full TimeCenter, Center District, IL13h ago
-
Data parallelism | Deep learning | Distributed Training | GPU Acceleration | Model BenchmarkingMid-level Full TimeSan Jose, California, United States13h ago
-
A/B | A/B Testing | B testing | Computer Vision | Deep learningEntry-level Full TimeSeattle, Washington, United States13h ago
-
Computer Vision | Deep learning | Information Retrieval | Language Processing | Machine LearningEntry-level Full TimeSan Jose, California, United States13h ago
-
Partner Engineer, Generative AI USD 173K-247KAWS | Agent Orchestration | Azure | Bias Mitigation | C plus plusSenior-level Full TimeMenlo Park, CA14h ago
-
AI Research Scientist, SysML - FAIR USD 143K-208KArtificial Intelligence | C# | C++ | Co-design | Compiler designMid-level Full TimeMenlo Park, CA | Boston, MA …14h ago
-
Data Engineer, Analytics (Technical Leadership) USD 175K-242KDashboards | Data Architecture | Data Governance | Data Marts | Data ModelingSenior-level Full TimeMenlo Park, CA | New York, …14h ago
-
AI Research Engineer, FAIR Chemistry USD 141K-208KApplied Mathematics | Artificial Intelligence | Computational statistics | Data Science | Density Functional TheorySenior-level Full TimeSan Francisco, CA14h ago
-
IP Validation Engineer - Machine Learning Accelerators USD 142K-203KAHB | APB | AXI | Android | C#Cross-functional collaboration | On device AI work | Prototype and silicon developmentMid-level Full TimeSunnyvale, CA | Burlingame, CA14h ago
-
Mid-level Full TimeMenlo Park, CA14h ago
-
Research Engineer - Perception and Machine Learning USD 177K-251KC++ | Computer Vision | Data Pipelines | Knowledge Distillation | Language ModelsSenior-level Full TimeRedmond, WA | Menlo Park, CA …14h ago
-
Research Engineer - Computer Vision and Robotics USD 141K-208K3D Reconstruction | C plus plus | Computational imaging | Computer Vision | Data AnalysisMid-level Full TimeRedmond, WA14h ago