AI Solution Architect
Contern, Luxembourg, Luxembourg
Gcore
We provide powerful solutions that will help your business grow globally. Try our superior performance for freeCompany Description
Have you ever wondered why your favorite apps, social media content, and video games load in the blink of an eye? It's likely because of Gcore behind the scenes!
Join a team that collaborates with industry giants like Intel, Dell, NVIDIA, Graphcore, and Equinix to accelerate AI training, provide cutting-edge cloud services, and optimize content delivery.
If you are passionate about transforming the internet and contributing to cutting-edge innovations, come join us at Gcore!
We are over 550 professionals and currently looking for an ...
Job Description
The RoleAs an AI Solution Architect at Gcore, you will serve as a trusted advisor to our AI-focused customers. You'll collaborate closely with clients to design and deploy large-scale GPU clusters, containerized training pipelines, and production inference systems. Your expertise in automation, infrastructure as code, and orchestration will ensure seamless, repeatable deployments across hundreds to thousands of GPUs
Your Responsibilities
- Architect & Deploy: Design end-to-end GPU cluster architectures (on-premises and cloud) using Ansible, Terraform, Kubernetes, and Slurm.
- Customer Engagement: Lead technical deep-dives, conduct workshops, and present solutions to stakeholders at all levels.
- Automation & IaC: Build and maintain Infrastructure as Code modules to automate provisioning, scaling, and monitoring of GPU resources.
- Documentation & Enablement: Produce whitepapers, runbooks, and training materials; host webinars and training sessions.
- Feedback Loop: Partner with Gcore's engineering and product teams to relay customer insights and drive product enhancements.
Qualifications
What We're Looking For
- Experience: 3+ years in Cloud or GPU AI Infrastructure DevOps.
- Infrastructure Skills: Proven track record deploying GPU clusters at scale, including multi-node, multi-GPU setups.
- Automation Expertise: Hands-on with Ansible or similar configuration management tools; Terraform (IaC).
- Orchestration & Scheduling: Strong familiarity with Kubernetes (K8s) and Slurm.
- Programming: Proficient in Python / Go.
- ML Proficiency: Solid understanding of ML ecosystems—models, tooling, and production deployment patterns.
- Communication: Excellent verbal and written skills; ability to translate complex technical concepts for diverse audiences.
Nice-to-Haves
- Experience deploying high-availability inference infrastructure for production AI workloads.
- ML Ops Pipelines: Implement and optimize distributed training and inference pipelines with MLflow, REST APIs, and popular frameworks (PyTorch, TensorFlow, JAX).
- Demonstrated ability to transition ML pipelines from proof-of-concept to robust, scalable production systems.
- Familiarity with GitOps workflows, Docker, Helm charts, and CI/CD for ML.
- Knowledge of Hugging Face transformers, Scikit-learn, and experiment tracking best practices.
Additional Information
What We Offer:
We value our employees and offer a benefits package designed to support your health, well-being, and professional growth throughout your journey at Gcore:
- Competitive salary
- Flexible working hours
- Remote, hybrid, or office work options depending on your role
- Work from anywhere in the world for up to 45 days per year
- Private medical insurance for you and your family*
- 5 additional vacation days*
- Additional fully paid sick leave days*
- Allowance for significant life events and birthdays
- Language classes
- Modern office space with free snacks, drink and entertainment options*
- Team sports activities*
*Please be aware that this benefit may vary depending on your country.
About the Company
Gcore is an international cloud and edge leader in providing first-class web performance, content delivery, and security. Headquartered in Luxembourg, with offices around the world, the company provides its solutions to global leaders in numerous industries.
Millions of people worldwide use apps and play games based on our infrastructure and services: we are trusted by World of Tanks, Albion Online, Avast, Photon, Unity, Sandbox Interactive, and others.
Equal Opportunity Employer
We provide equal opportunity to all applicants without regard to race, color, religion, sex, sexual orientation, age, gender identity, gender expression, national origin, disability, or any other legally protected characteristics.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Ansible APIs Architecture CI/CD DevOps Docker Engineering GPU Helm JAX Kubernetes Machine Learning MLFlow ML infrastructure Pipelines Python PyTorch Scikit-learn Security TensorFlow Terraform Transformers
Perks/benefits: Competitive pay Flex hours Flex vacation Health care Insurance Medical leave Startup environment Team events
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.