Staff Cloud SRE – AI/ML Platform & GPU Compute
Tasks
- Build automation cluster operations training workflows scaling
- Build dashboards platform health
- Define SLOs SLIs error budgets
- Design operate monitoring logging tracing alerting
- Harden CI CD release processes
- Implement self-healing patterns
- Improve alert noise
- Improve change management validation rollback
- Lead incident triage escalation communications root cause analysis
- Own reliability availability performance
- Participate in 24/7 on-call
- Perform post-incident root cause analysis
- Support infrastructure as code policy guardrails
Perks/Benefits
Skills/Tech-stack
AWS | Azure | CI/CD | Datadog | GCP | Go | Grafana | Infrastructure as Code | Kubernetes | Linux | MLOps | OpenTelemetry | Prometheus | Python | SLI | SLO | Terraform | “as-code”
Education
N/A
Related jobs
-
Applied AI ML Lead- Agentic AI & Python GBP 81K-109KAgentic AI | CI/CD | Cloud Computing | Code review | ContainerizationSenior-level Full TimeGLASGOW, LANARKSHIRE, United Kingdom4h ago
-
AI Engineer III GBP 84K-115KAWS | Evaluation | GCP | GRPC | GoCareer development | Learning opportunities | MentorshipSenior-level Full TimeLONDON, United Kingdom4h ago
-
Senior AI Engineer I GBP 85K-120KAWS | Evaluation | GCP | GRPC | GoCareer development | Comprehensive health benefits | Learning opportunities | Wellbeing supportSenior-level Full TimeLONDON, United Kingdom4h ago
-
Senior AI Engineer II GBP 84K-120KAWS | Distributed Systems | Evaluation and monitoring | Event Driven | Event-driven architectureSenior-level Full TimeLONDON, LONDON, United Kingdom4h ago
-
Bash | Cloud platform | Data Processing | Docker | Google CloudAsynchronous culture | Career growth | Competitive salary | Friendly work environment | Impactful productMid-level Full TimeEdinburgh, United Kingdom11h ago
-
Bash | Data Pipelines | Data Processing | Docker | GCPAsynchronous work culture | Remote-friendly cultureMid-level Full TimeBirmingham, United Kingdom11h ago
-
Bash | Cloud platform | Data Processing | Docker | Google CloudMid-level Full TimeNottingham, United Kingdom11h ago
-
Bash | Cloud platform | Data Ingestion | Data Ingestion Pipelines | Data ProcessingAsynchronous work culture | Flexible work environment | Remote work opportunityMid-level Full TimeGlasgow, United Kingdom11h ago
-
Senior Machine Learning Engineer - AdTech GBP 75K-91KAerospike | Bayesian analysis | Decision Trees | Distributed Computing | ExperimentationCommuter benefits | Health and wellness benefits | Hybrid work schedule | Life and disability insurance | Paid time offSenior-level Full TimeCambridge, United Kingdom15h ago
-
Senior Machine Learning Engineer - AdTech GBP 75K-91KAerospike | Apache Spark | Bayesian analysis | Data Analysis | Decision TreesDisability insurance | Financial wellness support | Flexible remote work | Healthcare benefits | Life insuranceSenior-level Full TimeManchester, United Kingdom15h ago
-
Senior Data Engineer GBP 75K-90KAWS Glue | Airflow | BigQuery | Cloud services | DBTFlexible time off | Hybrid work | Medical insurance | Paid birthday off | Pension schemeSenior-level Full TimeKing's Cross, London, United Kingdom15h ago
-
Freelance Machine Learning Engineer USD 110KLangchain | Language Models | Large Language Models | MLOps | Machine LearningPart-time projects | Project based workMid-level FreelanceUnited Kingdom - Remote R15h ago
-
Research Fellow in Formal Methods for Robotics and AI Safety - School of Computer Science - 107478 - Grade 7 GBP 36K-48KAI Safety | C++ | Computational tool development | Control Theory | Cyber-Physical SystemsMid-level Full TimeUnited Kingdom16h ago
-
Resident Solutions Architect (Full Stack Engineer) GBP 78K-100KAWS | Apache Spark | Azure | CI/CD | Cloud ComputingOccasional travel to London office | Occasional travel to client sites | Remote workSenior-level Full TimeLondon, United Kingdom20h ago
-
AWS | Ansible | Azure | Cohesity | CommvaultSenior-level Full TimeLondon, United Kingdom22h ago
-
Senior Specialist Solutions Engineer (AI/ML) GBP 80K-100KAWS | Apache Spark | Apache Spark MLlib | Artificial Intelligence | AzureMentorship | Technical training | Travel up to 30 percentSenior-level Full TimeLondon, United Kingdom23h ago
-
A/B | A/B Testing | AWS | B testing | Batch inferenceEquity | Group life assurance | Hybrid working | Income protection | Paid sabbaticalSenior-level Full TimeLondon23h ago
-
Senior AI Engineer - Knowledge Graphs GBP 78K-109KAWS | Clustering | Community Detection | Entity Disambiguation | Entity ResolutionEnhanced parental leave | Flexible working | Group life assurance | Hybrid working | Income protectionSenior-level Full TimeLondon23h ago
-
Lead Analytics Engineer (6 Month FTC) GBP 60K-70KCI/CD | DBT | Data Governance | Data Modelling | Data QualityDiscounted private healthcare | Discretionary bonus scheme | Employee assistance programme | Enhanced family leave | Free onsite gymSenior-level Contract Full Time TemporaryManchester / Hybrid, England, United Kingdom R1d ago
-
Embedded Software Engineer GBP 30K-34KAutomated testing | Bare Metal | Bluetooth | C plus plus | C#Annual leave | Community and charity initiatives | Company pension | Cycle to work scheme | Performance bonusMid-level Full TimeMelbourn, United Kingdom1d ago
-
Senior Data Engineer GBP 60K-70KAirflow | Amazon Redshift | Apache Kafka | CI/CD | DBTCareer development | Collaborative team environment | Health insurance | Modern office spaceSenior-level Full TimeLondon, UK1d ago
-
AWS | Airflow | CI/CD | Distributed Systems | DockerContinuous improvement culture | Hybrid work environment | Mentorship and knowledge-sharing | On-call rotation support | Professional development opportunitiesSenior-level Full TimeLondon, England, United Kingdom1d ago
-
Senior Lead Data Engineer GBP 72K-109KAWS | Cloud infrastructure | Data Governance | Data Processing | Data StreamingSenior-level Full TimeLONDON, LONDON, United Kingdom1d ago
-
MLOps Engineering Specialist GBP 55K-58KAWS | AWS CDK | AWS CloudFormation | AWS Glue | AlertingDiscounted mobile and broadband | Gym membership discounts | Holiday purchase scheme | Online GP service | Paid Maternity LeaveMid-level Full TimeLondon, GB, E1 8EP R1d ago
-
Lead Machine Learning Engineer GBP 90K-115KAWS | Azure | Cloud Computing | Cloud platform | ContainerizationCoaching | Dental insurance | Enhanced parental leave | Flexible working | Hybrid workingSenior-level Full TimeLondon1d ago