Member of Technical Staff - ML Performance

New York

Modal

Bring your own code, and run CPU, GPU, and data-intensive compute at scale. The serverless platform for AI and data teams.

View all jobs at Modal

Apply now Apply later

About Us:

Modal is building the serverless compute platform to support the next generation of AI companies. In order to deliver the developer experience we wanted, we went deep and built our own infrastructure—including our own custom file system, container runtime, scheduler, container image builder, and much more.

We're a small team based out of New York, Stockholm and San Francisco. In just one year, we've reached 8-figure revenue, tripled our headcount, scaled to support thousands of GPUs, and raised over $32M in funding.

Working at Modal means joining one of the fastest-growing AI infrastructure organizations at an early stage, with many opportunities to grow within the company. Our team includes creators of popular open-source projects (e.g. Seaborn, Luigi), academic researchers, international olympiad medalists, and experienced engineering and product leaders with decades of experience.

The Role:

We are looking for strong engineers with experience in making ML systems performant at scale. If you are interested in contributing to open-source projects and Modal’s container runtime to push language and diffusion models towards higher throughput and lower latency, we’d love to hear from you!

Requirements:

  • 5+ years of experience writing high-quality, high-performance code.

  • Experience working with torch, high-level ML frameworks, and inference engines (vLLM or TensorRT).

  • Familiarity with Nvidia GPU architecture and CUDA.

  • Experience with ML performance engineering (tell us a story about boosting GPU performance — debugging SM occupancy issues, rewriting an algorithm to be compute-bound, eliminating host overhead, etc).

  • Nice-to-have: familiarity with low-level operating system foundations (Linux kernel, file systems, containers, etc).

  • Ability to work in-person, in our NYC, San Francisco or Stockholm office.

Apply now Apply later
Job stats:  0  0  0

Tags: Architecture CUDA Diffusion models Engineering GPU Linux Machine Learning ML infrastructure Open Source Seaborn TensorRT vLLM

Perks/benefits: Startup environment

Region: North America
Country: United States

More jobs like this