AI-Ops & Cloud Platform Engineer

Tel Aviv-Yafo, Tel Aviv District, IL

⚠️ We'll shut down after Aug 1st - try foo🦍 for all jobs in tech ⚠️

KPMG Israel

KPMG היא אחת מארבע הפירמות המובילות בעולם בשירותי ביקורת, מיסים, ייעוץ וטכנולוגיה. הפירמה פועלת ב- 146 מדינות ומעסיקה מאות אלפי שותפים ועובדים המובילים בתחומם. בישראל מונה הפירמה כ-1,600 מומחים רב תחומיים.

View all jobs at KPMG Israel

Apply now Apply later

Description

We are KPMG’s technology arm in Israel. KPMG delves headfirst into the power of emerging technologies and scientific breakthroughs to craft solutions, projects, and products for companies facing complex business challenges in today’s continuously changing world. By uniting groundbreaking technology with industry expertise, we are able to harness the potential of cloud, AI, ML, digital, and cyber to design and implement top-of-the-line tailored solutions.

Join our Platform Team as an AI-Ops & Cloud Engineer, leading the automation, monitoring, and reliability of infrastructure across both traditional systems and LLM-powered applications. This role combines cloud engineering, DevOps, and AI-driven observability to create smart, self-healing platforms that scale with our AI solutions and business needs.


What You’ll Do

· Monitor and optimize LLM application performance (latency, token usage, drift, failures)

· Automate anomaly detection and remediation using Python and ML-based tooling

· Design and manage cloud infrastructure (AWS, Azure, or GCP) using Terraform

· Build dashboards, alerts, and predictive models to ensure system reliability

· Ensure infrastructure is scalable, secure, and cost-effective

Requirements

· 3+ years of experience in DevOps, SRE, or Cloud Engineering

· Proficient in at least one major cloud provider: AWS, Azure, or GCP

· Hands-on experience with Terraform and Python automation

· Proven ability to design and implement cloud-native architectures

· Built secure Landing Zones with strong network/security best practices

· Experience with monitoring tools such as Prometheus, Datadog, or ELK

· Comfortable with Kubernetes, Docker, and Serverless infrastructures

· CI/CD experience using Azure DevOps, GitHub Actions, or GitLab


Bonus Points

· Experience with LLMOps and vector databases (e.g., Pinecone, Weaviate)

· Background in anomaly detection or AI/ML-based alerting systems

· Knowledge of FinOps practices and cloud cost optimization

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  0  0  0

Tags: Architecture AWS Azure CI/CD DevOps Docker ELK Engineering GCP GitHub GitLab Kubernetes LLMOps LLMs Machine Learning Pinecone Python Security Terraform Weaviate

Region: Middle East
Country: Israel

More jobs like this