Senior Cloud & AI Platforms Engineer - Cloud & AI Platforms Services (Spain or Ireland)

Remote Spain

Red Hat

Red Hat is the world’s leading provider of enterprise open source solutions, including high-performing Linux, cloud, container, and Kubernetes technologies.

View all jobs at Red Hat

Apply now Apply later

Red Hat is seeking a proficient Cloud & AI Platforms Engineer with a strong background in OpenShift and Kubernetes cluster deployment and configuration to join our Cloud & AI Platforms team in Spain or Ireland.

The Red Hat Cloud & AI Platforms Services organization is responsible for the strategy, design, and implementation of the end-to-end customer experience with Red Hat Cloud Services and AI Platforms. Our team delivers world-class support to customers and partners worldwide for our Platform as a Service (PaaS) and Software as a Service (SaaS) offerings. As a global team, we cultivate a transparent and inclusive environment that welcomes different perspectives. We embrace a blameless culture, learning from our failures to drive continuous improvement. This role offers an exciting opportunity to join one of the fastest-growing enterprise software and services companies and a leader in open-source software.

In this role, you will serve as the first line for mitigating customer issues and, at times, lead processes to manage customer incidents effectively. This often involves coordination across diverse teams to ensure a fast and efficient resolution, restoring customer operations swiftly. Your primary focus will be troubleshooting issues related to both Red Hat OpenShift managed clusters and the Red Hat OpenShift AI stack. You will provide clear communications on the status of ongoing incidents and prepare detailed post-incident analyses, including causes, team involvement, and mitigation steps. You will also offer recommendations to prevent similar incidents in the future, which you may share with senior leadership, partners, and customers.

The candidate needs to be allocated in Spain or Ireland, the preference is a location which is closer to our major offices (Barcelona, Madrid, Cork, Waterford).

What will you do

  • Commitment to providing exceptional customer experience by using professional communication and applying product knowledge and deep troubleshooting to perform direct actions in cluster environments to resolve various issues.
  • Contribute to global initiatives and projects to constantly reduce customer effort, improve tooling, and design and write automation software to improve efficiency.
  • Act as the direct contact, adviser, and mentor for customer inquiries and issues with their Cloud & AI Platforms Services through our Customer Portal, conference call, and remote access.
  • Proactively analyze cluster status and identify single points of failure and other high-risk architecture issues; propose and implement more resilient resolutions.
  • Record customer interactions, including investigation, troubleshooting, and resolution of issues, to document diagnostic steps and issue resolution and create reusable solutions for future incidents.
  • Responsible for partnering with internal teams and external parties to deliver seamless infrastructure support for Red Hat’s Cloud Services & AI Platforms
  • Strong work ethic, able to work as part of a team and focus on customers and resolving their issues.
  • Be available to perform weekend shift duties on a rotational schedule.

What will you bring

  • Proven experience in Infrastructure Implementation, Deployment, Administration, and Production Support of container technologies and orchestration platforms (cri-o, Kubernetes, xKS, Docker, OpenShift Container Platform).
  • Exceptional technical, analytical, and troubleshooting skills using tools like curl, strace, oc (kubectl), and Wireshark analysis to investigate and form precise action plans for issue remediation with components such as networking, system performance issues, Kubernetes, OpenShift Container Platform, Service Mesh, and RESTful API calls.
  • Experience working in a Technical Support role that interfaces with Site Reliability Engineers (SRE), Development Engineering teams, and partner vendors to resolve customer issues.
  • Strong DevOps and/or MLOps background, agile concepts, application development, and deployment tools.
  • Experience with application development, ideally with Python or other languages like Go, Java, and C/C++.
  • Knowledge of training, tuning, and serving ML models using tools like Pytorch, Tensorflow, Ray, Kubeflow Pipelines, Jupyter, or similar.
  • Demonstrates solid customer-centric focus, balancing technical expertise and customer interaction while effectively managing competing priorities, learning and teaching modern technologies, and excelling in technical communication and collaboration.

The following is considered a plus:

  • Experience developing and deploying large-scale AI applications and generative AI applications.
  • Experience in training, tuning, and serving ML models using tools like Pytorch, Tensorflow, Ray, Kserve, ModelMesh, Kubeflow Pipelines, or similar
  • Knowledge of machine learning algorithms and concepts (e.g., supervised learning, unsupervised learning, deep learning) as applied to generative AI.
  • Experience supporting, tuning, and troubleshooting Jupyter environments on Kubernetes systems in production.
  • Experience as a Customer-Facing Site Reliability Engineer (SRE) or SRE or knowledge of SRE procedures, including incident management, monitoring and alerting, capacity planning, and automation of operational tasks.

About Red Hat
Red Hat is the world’s leading provider of enterprise open source software solutions, using a community-powered approach to deliver high-performing Linux, cloud, container, and Kubernetes technologies. Spread across 40+ countries, our associates have the flexibility to choose the work environment that suits their needs from in-office to fully remote to office-flex. Red Hatters are encouraged to bring their best ideas, no matter their title or tenure. We're a leader in open source because of our open and inclusive environment. We hire creative, passionate people ready to contribute their ideas, help solve complex problems, and make an impact. Opportunities are open. Join us.

Diversity, Equity & Inclusion at Red Hat
Red Hat’s culture is built on the open source principles of transparency, collaboration, and inclusion, where the best ideas can come from anywhere and anyone. When this is realized, it empowers people from diverse backgrounds, perspectives, and experiences to come together to share ideas, challenge the status quo, and drive innovation. Our aspiration is that everyone experiences this culture with equal opportunity and access, and that all voices are not only heard but also celebrated. We hope you will join our celebration, and we welcome and encourage applicants from all the beautiful dimensions of diversity that compose our global village.

Equal Opportunity Policy (EEO)
Red Hat is proud to be an equal opportunity workplace and an affirmative action employer. We review applications for employment without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, citizenship, age, veteran status, genetic information, physical or mental disability, medical condition, marital status, or any other basis prohibited by law.


Red Hat does not seek or accept unsolicited resumes or CVs from recruitment agencies. We are not responsible for, and will not pay, any fees, commissions, or any other payment related to unsolicited resumes or CVs except as required in a written contract between Red Hat and the recruitment agency or party requesting payment of a fee.


Red Hat supports individuals with disabilities and provides reasonable accommodations to job applicants. If you need assistance completing our online job application, email application-assistance@redhat.com. General inquiries, such as those regarding the status of a job application, will not receive a reply.

Apply now Apply later
  • Share this job via
  • 𝕏
  • or

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  2  1  0

Tags: Agile APIs Architecture CX Deep Learning DevOps Docker Engineering Generative AI Java Jupyter KServe Kubeflow Kubernetes Linux Machine Learning ML models MLOps Open Source Pipelines Python PyTorch Teaching TensorFlow Unsupervised Learning

Perks/benefits: Career development Transparency

Regions: Remote/Anywhere Europe
Country: Spain

More jobs like this