Machine Learning Engineer – GPU Acceleration & Distributed Training
Amsterdam, Netherlands
Are you passionate about how technology can make a real impact in cancer? Join us at kaiko.ai in building the state-of-the-art Data & AI platform, enabling large-scale training of multi-modal foundation models, and transforming the clinical workflow to deliver better patient outcomes.
Our culture
At Kaiko, we have an open, creative and non-hierarchical work atmosphere which offers continuous learning and direct impact in return for accountability and team spirit.
We offer flexibility - for instance, through remote working – alongside an expectation for managing and delivering your own goals; our team’s ownership, passion and shared commitment to improving health outcomes through data is something that sets us apart.
At the intersection of healthcare and data we recognize the implications on wellbeing and trust and approach our work with the utmost sensitivity. Data privacy, compliance and security are core to everything we do. Our open, creative environment gives talented people room to explore new ideas and we reward this with an attractive package and opportunities for further personal development.
About the Role
As a Machine Learning Engineer specializing in GPU acceleration and distributed training, you will focus on enhancing the efficiency of handling very long sequence lengths in Transformers, State Space Models (SSM) and other architectures using CUDA/Triton & Torch. Additionally, you will scale training processes across multi-node distributed systems to ensure robust and efficient model development. You will work closely with our ML Research teams to build and maintain high-performance training pipelines.
How you’ll contribute
- Efficiency Optimization: Leverage CUDA, Triton and Torch to improve the efficiency of Transformers, SSMs and other architectures for very long sequence lengths.
- Distributed Training: Scale custom machine learning training pipelines efficiently across multi-node GPU clusters.
- Collaboration: Work with ML Researchers and Engineering teams to integrate optimized training solutions into the development lifecycle.
What you'll bring
- Master's degree in computer science, Engineering, or a related field. Ph.D. is a plus.
- Proficient in Python with extensive experience in PyTorch.
- Deep expertise with CUDA and/or Triton for optimizing GPU performance, specifically for large-scale sequence processing.
- Proven experience in scaling machine learning trainings to multi-node distributed GPU environments.
- Strong understanding of Transformer, State Space Models (SSMs) and other common architectures and their optimization.
- Skilled in performance tuning and profiling for both software and hardware in machine learning contexts.
- Ability to diagnose and resolve complex technical challenges related to GPU acceleration and distributed training.
- Excellent communication skills and ability to work effectively within a multidisciplinary team.
- Capable of managing multiple projects simultaneously and adapting to evolving priorities in a fast-paced environment.
Nice to Have
- Experience with containerization technologies, such as Docker or Kubernetes.
- Experience with cloud computing platforms, such as Azure, AWS or GCP
Additional Information
This position is full-time and requires residency in either the Netherlands or Switzerland, a valid work permit, and proximity to our offices in Amsterdam or Zürich. A Certificate of Conduct will be necessary upon finalizing the employment contract due to the handling of sensitive data.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Architecture AWS Azure Computer Science CUDA Distributed Systems Docker Engineering GCP GPU Kubernetes Machine Learning ML models Pipelines Privacy Python PyTorch Research Security Transformers
Perks/benefits: Career development
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.