AI Systems Engineer – LLM Execution

Pozuelo de Alarcón, Spain

⚠️ We'll shut down after Aug 1st - try foo🦍 for all jobs in tech ⚠️

OpenNebula Systems

OpenNebula is an Open Source Cloud Computing Platform to build and manage Enterprise Clouds. OpenNebula provides unified management of IT infrastructure and applications, avoiding vendor lock-in and reducing complexity, resource consumption and...

View all jobs at OpenNebula Systems

Apply now Apply later

For over a decade now, OpenNebula Systems has been leading the development of the European open source technology that helps organizations around the world to manage their corporate data centers and build their Enterprise Clouds.

If you want to join an established leader in the cloud infrastructure industry and the global open source community, keep reading, because you can now join a team of exceptionally passionate and talented colleagues whose mission is to help the world's leading enterprises to implement their next-generation edge and cloud strategies. We are hiring!

Since 2019, and thanks to the support from the European Commission, OpenNebula Systems has been leading the edge computing innovation in Europe, investing heavily in research and open source development, and playing a key role in strategic EU initiatives such as the IPCEI-CIS and the “European Alliance for Industrial Data, Edge and Cloud”.

OpenNebula’s new AI Factory product line delivers sovereign, edge-to-cloud AI infrastructure—enabling enterprises and governments to deploy, orchestrate, and optimize next-generation AI workloads with full control. This role is key to building the execution layer powering that vision. We are currently looking for an AI Systems Engineer to come and join us in Europe as part of our new team developing the AI Factory product line.


Job Description

We are looking for a highly skilled AI Systems Engineer with hands-on experience in executing, tuning, and scaling Large Language Models (LLMs) across multi-GPU infrastructures. This role is central to the development of our new AI Factory product line, which enables open, sovereign, and disaggregated AI infrastructure across cloud and edge environments.
You will help design and optimize LLM execution pipelines, working at the intersection of inference engines, orchestration platforms, and LLM model catalogs. Your responsibilities will include communicating with users, addressing their needs, troubleshooting, and providing step by step solutions.


Responsibilities

  • Design, implement, and optimize LLM inference pipelines for multi-GPU and multi-node environments.
  • Integrate with cutting-edge inference engines (e.g., vLLM, TensorRT-LLM, DeepSpeed, etc.).
  • Tune execution parameters for latency, throughput, and memory efficiency across heterogeneous infrastructures.
  • Work closely with orchestration frameworks such as Ray, NVIDIA NeMo/Dynamo, and others to coordinate LLM serving at scale.
  • Integrate with LLM catalogs and registries such as HuggingFace, NVIDIA NIM, and internal repositories.
  • Collaborate with product and platform teams to shape a modular, portable AI Factory execution layer.
  • Interact with users and use cases, providing systems support, system architecture definition, making recommendations based on user needs, implementation, testing, user training, and deployment of open source solutions.
  • Troubleshoot incidents, identify root causes, fix and document problems, and implement preventive measures.
  • Deliver quality performance indicators within the scope of the assigned project, including project journals, status reports, and other standard documentation.
  • Work with other companies in the cloud-edge ecosystem within international projects and open-source communities. Availability to occasional travel and participation in international events and meetings.
  • Write and maintain software documentation and project reports.

Experience required

Academic Background and Certifications

  • Bachelor’s or Master’s degree in Computer Science, Software Engineering, or a related field.

Professional Experience

  • Strong hands-on experience deploying and optimizing LLMs in production environments
  • Experience with inference frameworks such as vLLM, TensorRT, Triton Inference Server, DeepSpeed-Inference, etc.
  • Hands-on experience with orchestration tools like Ray, NVIDIA NeMo/Dynamo, or KServe.
  • Experience deploying LLM workloads on hybrid or sovereign cloud environments.
  • Contributions to open-source LLM or inference projects.

Technical Experience

  • Deep knowledge of multi-GPU systems and GPU memory management.
  • Solid understanding of distributed systems and networking bottlenecks in model serving.
  • Programming experience in Python, with knowledge of CUDA and model quantization a plus.
  • Familiarity with LLM catalogs (e.g., HuggingFace, NGC, NIM).
  • Familiarity with open-source MLOps or AI workload orchestration platforms.

Language Skills

  • English fluency at a professional or native-equivalent level, with excellent clarity and expression in both writing and speech.

Soft Skills & Collaboration

  • Strong customer service mindset, with a focus on responsiveness and user satisfaction.
  • Clear communication and documentation with strong written and verbal English, async collaboration, and visibility of work.
  • Excellent problem-solving skills and a proactive approach to identifying and resolving issues.
  • Self-management and accountability with ability to work independently, manage time, and take ownership of tasks and deadlines
  • Technical autonomy and tool proficiency with confidence in using Git, CI/CD, remote collaboration tools (Slack, Zoom, GitHub, etc.), and solving problems without direct supervision.

What's in it for me?

Some of our benefits and perks vary depending on location and employment type, but we are proud to provide employees with the following;

  • Competitive compensation package and Flexible Remuneration Options: Meals, Transport, Nursery/Childcare…
  • Customized workstation (macOS, Windows, Linux any distro is welcome)
  • Private Health Insurance
  • 6 hours workday on Fridays and everyday during August
  • PTO: Holidays, Personal Time, Sick Time, Parental leave.
  • All Remote company with bright HQ centrally located in Madrid, and offices in Boston (USA) and Brno (Czech Republic)
  • Healthy Work-Life Balance : We encourage the right for Digital Disconnecting and promote harmony between employees personal and professional lives
  • Flexible hiring options: Full Time/Part Time, Employee (Spain/Usa) / Contractor (other locations)
  • We are building an awesome, Engineering First Culture and your opinion matters: Thrive in the high-energy environment of a young company where openness, collaboration, risk-taking, and continuous growth are valued
  • Be exposed to a broad technology ecosystem. We encourage learning and researching new technologies and methods as part of your everyday duties
Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  1  0  0

Tags: Architecture CI/CD Computer Science CUDA Distributed Systems Engineering Git GitHub GPU HuggingFace Industrial KServe Linux LLMs ML infrastructure MLOps Open Source Pipelines Python Research TensorRT Testing vLLM

Perks/benefits: Career development Competitive pay Flex hours Flex vacation Health care Parental leave Startup environment Team events

Region: Europe
Country: Spain

More jobs like this