Senior ML Engineer - Distributed training and performance

Zürich, Switzerland

Kaiko

Kaiko’s data framework for cancer research facilitates hospitals and research institutes with data insights, A.I. support for medical doctors and the latest developments in machine-learning research.

View all jobs at Kaiko

Apply now Apply later

About kaiko 

In cancer care, treatment decisions can take many days—but patients don’t have that time. One of the reasons for delays? Cancer patients' data is scattered across many places: doctor’s notes, medical imagery, genomics data. At kaiko, we are developing AI foundation models to bring this data together and integrate it into clinical workflows, enabling doctors to make faster, more effective treatments decisions. 

We also collaborate closely with the leading Dutch cancer research institute (NKI) on multiple AI research projects and a joint clinical validation initiative. In 2025, we plan on expanding our partnerships to even more hospitals. 

We raised significant long-term funding and have offices in Zurich and Amsterdam. Over the past year, our team has nearly doubled in size, now comprising 80+ people from 25 countries and we just got recognized as a Rising Innovator at the GenAI conference 

About the role 

As a Senior ML Engineer specializing in Distributed Training & Performance, you will be a critical force multiplier for our growing team of ML Researchers. Your mission is to optimize the backbone that trains our multimodal models, streamlining their efforts and empowering them to focus on scientific discovery. You will be responsible for both scaling 10–100B+ parameter models across hundreds of GPUs and diving deep into the code to eliminate performance bottlenecks, creating tools and frameworks that enable our research team to self-serve. The role is based in Zurich

Some areas of responsibility 

  • Distributed Training & Scaling: Research, design, and develop state-of-the-art distributed training strategies for large-scale models using tensor, pipeline, and data parallelism (e.g., FSDP, DeepSpeed, Megatron-LM). 
  • Performance Engineering & Profiling: Proactively use profiling tools (NVIDIA Nsight, PyTorch Profiler) to interpret traces, diagnose bottlenecks, and optimize everything from NCCL collectives and CUDA kernels to GPU memory, GPUDirect Storage, and high-bandwidth fabrics (NVLink, InfiniBand). 
  • Kernel-Level Optimizations: Develop or optimize custom CUDA or Triton kernels  
  • Fault-Tolerant Orchestration: Design and manage robust, fault-tolerant training jobs at scale using orchestration frameworks like Ray Train, Kubernetes, or SLURM. 
  • Mentorship & Collaboration: Mentor ML researchers and engineers, turning model requirements into scalable training pipelines and evangelizing best practices in writing high-performance, production-grade code. 

About you 

  • Expert-level Python and C++/CUDA; with a deep knowledge of PyTorch internals. 
  • End-to-end experience training ≥10B-parameter models at scale, preferably multimodal. 
  • Proven ability to design experiments, benchmark collectives, interpret profiling traces, and resolve complex performance issues in GPU-bound workloads. 
  • Strong systems intuition: PCIe, NVLink, HBM, NUMA, CUDA Graphs, InfiniBand, and mixed-precision training (BF16/FP8). 
  • Proven ability to translate complex systems-level constraints (e.g., memory bandwidth, interconnect topology) into actionable guidance and robust tooling for a large, PhD-level research team. 

We are excited to gather a broad range of perspectives in our team, as we believe it will help us build better products to support a broader set of people. If you’re excited about us but don’t fit every single qualification, we still encourage you to apply: we’ve had incredible team members join us who didn’t check every box! 

Why kaiko  
At kaiko, we believe the best ideas come from collaboration, ownership and ambition. We’ve built a team of international experts where your work has direct impact. Here’s what we value: 

  • Ownership: You’ll have the autonomy to set your own goals, make critical decisions, and see the direct impact of your work. 
  • Collaboration: You’ll have to approach disagreement with curiosity, build on common ground and create solutions together. 
  • Ambition: You’ll be surrounded by people who set high standards for themselves and others, who see obstacles as opportunities, and who are relentless in their work to create better outcomes for patients. 

  
In addition, we offer: 

  • An attractive and competitive salary, a good pension plan and 25 vacation days per year. 
  • Great offsites and team events to strengthen the team and celebrate successes together. 
  • A EUR 1000 learning and development budget to help you grow. 
  • Autonomy to do your work the way that works best for you, whether you have a kid or prefer early mornings. 
  • An annual commuting subsidy. 
Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  4  2  0

Tags: CUDA Engineering FSDP Generative AI GPU InfiniBand Kubernetes Machine Learning NVLink PhD Pipelines Python PyTorch Research

Perks/benefits: Career development Competitive pay Team events

Region: Europe
Country: Switzerland

More jobs like this