AI-Ops & Cloud Platform Engineer
Tel Aviv-Yafo, Tel Aviv District, IL
⚠️ We'll shut down after Aug 1st - try foo🦍 for all jobs in tech ⚠️
KPMG Israel
KPMG היא אחת מארבע הפירמות המובילות בעולם בשירותי ביקורת, מיסים, ייעוץ וטכנולוגיה. הפירמה פועלת ב- 146 מדינות ומעסיקה מאות אלפי שותפים ועובדים המובילים בתחומם. בישראל מונה הפירמה כ-1,600 מומחים רב תחומיים.Description
We are KPMG’s technology arm in Israel. KPMG delves headfirst into the power of emerging technologies and scientific breakthroughs to craft solutions, projects, and products for companies facing complex business challenges in today’s continuously changing world. By uniting groundbreaking technology with industry expertise, we are able to harness the potential of cloud, AI, ML, digital, and cyber to design and implement top-of-the-line tailored solutions.
Join our Platform Team as an AI-Ops & Cloud Engineer, leading the automation, monitoring, and reliability of infrastructure across both traditional systems and LLM-powered applications. This role combines cloud engineering, DevOps, and AI-driven observability to create smart, self-healing platforms that scale with our AI solutions and business needs.
What You’ll Do
· Monitor and optimize LLM application performance (latency, token usage, drift, failures)
· Automate anomaly detection and remediation using Python and ML-based tooling
· Design and manage cloud infrastructure (AWS, Azure, or GCP) using Terraform
· Build dashboards, alerts, and predictive models to ensure system reliability
· Ensure infrastructure is scalable, secure, and cost-effective
Requirements
· 3+ years of experience in DevOps, SRE, or Cloud Engineering
· Proficient in at least one major cloud provider: AWS, Azure, or GCP
· Hands-on experience with Terraform and Python automation
· Proven ability to design and implement cloud-native architectures
· Built secure Landing Zones with strong network/security best practices
· Experience with monitoring tools such as Prometheus, Datadog, or ELK
· Comfortable with Kubernetes, Docker, and Serverless infrastructures
· CI/CD experience using Azure DevOps, GitHub Actions, or GitLab
Bonus Points
· Experience with LLMOps and vector databases (e.g., Pinecone, Weaviate)
· Background in anomaly detection or AI/ML-based alerting systems
· Knowledge of FinOps practices and cloud cost optimization
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Architecture AWS Azure CI/CD DevOps Docker ELK Engineering GCP GitHub GitLab Kubernetes LLMOps LLMs Machine Learning Pinecone Python Security Terraform Weaviate
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.