Site Reliability Engineer - Platform

London

InstaDeep

InstaDeep delivers AI-powered decision-making systems for the Enterprise. With expertise in both machine intelligence research and concrete business deployments, we provide a competitive advantage to our customers in an AI-first world.

View all jobs at InstaDeep

Apply now Apply later

InstaDeep, founded in 2014, is a pioneering AI company at the forefront of innovation. With strategic offices in major cities worldwide, including London, Paris, Berlin, Tunis, Kigali, Cape Town, Boston, and San Francisco, InstaDeep collaborates with giants like Google DeepMind and prestigious educational institutions like MIT, Stanford, Oxford, UCL, and Imperial College London. We are a Google Cloud Partner and a select NVIDIA Elite Service Delivery Partner. We have been listed among notable players in AI, fast-growing companies, and Europe's 1000 fastest-growing companies in 2022 by Statista and the Financial Times. Our recent acquisition by BioNTech has further solidified our commitment to leading the industry.
Join us to be a part of the AI revolution!
As a vital member of the Platform Squad and member of the Infrastructure/SRE team at InstaDeep, you will work alongside various stakeholders, reporting directly to the Lead SRE of the squad. Your primary responsibility will involve managing the deployment and maintenance of our platform engineering and production systems and will be responsible for ensuring that our services remain reliable and performant consistently. Additionally, you'll contribute significantly to enhancing our platform engineering capabilities, including observability and incident management. This role requires bringing forward thoughtful proposals and demonstrating the ability to work independently, backed by strong experience in production environments.
Technical stack :CSPs: Google Cloud (mainly), AWS and Azure.Observability: Prometheus, Grafana, Alert Manager, Mimir, Loki and Tempo.Core: Kubernetes, Crossplane, FluxCD and ArgoCD.Development: Mainly Python and Bash.

  • Participate in the architecture design of the internal Platform Engineering Framework.
  • Develop and maintain custom Crossplane functions and compositions using Python.
  • Deploy, architecture and maintain production grade observability components across various environments.
  • Ensure and define SLOs of products and various projects with stakeholders to maintain their reliability.
  • Participate in provisioning and managing infrastructure through Infrastructure as Code and Gitops-ing everything.
  • Required Qualifications
  • MSc degree in Computer Science or similar engineering discipline.
  • Eligibility to work in FR or in the UK
  • 5+ years experience in the technology industry (likely as SRE, DevOps or MLOps).
  • Strong experience in Kubernetes and containerized architectures.Strong expertise with at least one of the major CSPs (likely GCP, AWS or Azure).
  • Knowledge about observability powered by Prometheus, Grafana and monitoring Kubernetes services or similar stack.
  • Familiarity with infrastructure technologies such as Helm, Terraform, or similar.
  • Familiarity with GitOps workflows.
  • Comfortable with UNIX/Linux and systems operations.
  • Comfortable Python development skills (more than basic scripting) for automating infrastructure.
  • Experience in scripting languages like Bash or Python.
  • Preferred - optional qualifications
  • Comfortable with SRE principles.
  • Familiarity with Crossplane or comparable declarative infrastructure management solutions.
  • Understanding of virtualization technologies (KVM, VMware, OpenStack).
  • Possessing a Kubernetes Certification (CKA, CKAD, CKS, KCNA).
Our commitment to our peopleWe empower individuals to celebrate their uniqueness here at InstaDeep. Our team comes from all walks of life, and we’re proud to continue encouraging and supporting applicants from underrepresented groups across the globe. Our commitment to creating an authentic environment comes from our ability to learn and grow from our diversity, and how better to experience this than by joining our team? We operate on a hybrid work model with guidance to work at the office 3 days per week to encourage close collaboration and innovation. We are continuing to review the situation with the well-being of InstaDeepers at the forefront of our minds.
Right to work: Please note that you will require the legal right to work in the location you are applying for.
Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  0  0  0

Tags: Architecture AWS Azure Computer Science DevOps Engineering GCP Google Cloud Grafana Helm Kubernetes Linux MLOps OpenStack Python Terraform

Region: Europe
Country: United Kingdom

More jobs like this