Model Optimization Engineer

San Francisco Office (HQ)

Apply now Apply later

At World Labs, our mission is to revolutionize artificial intelligence by developing Large World Models, taking AI beyond language and 2D visuals into the realm of complex 3D environments, both virtual and real. We're the team that's envisioning a future where AI doesn't just process information but truly understands and interacts with the world around us.

We're looking for the overachievers, the visionaries, and the relentless innovators who aren't satisfied with the status quo. You know that person who's always dreaming up the next big breakthrough? That's us. And we want you to be part of it.

ML Inference Engineer / GPU Performance Engineer

We're seeking an experienced engineer to bridge the gap between our research team's state-of-the-art models and production-ready inference systems. You'll take PyTorch research code and transform it into highly optimized, low-latency inference solutions.

Key Responsibilities:

  • Optimize neural network models for inference through quantization, pruning, and architectural modifications while maintaining accuracy

  • Profile and benchmark model performance to identify computational bottlenecks

  • Implement optimizations using torch.compile, custom CUDA kernels, and specialized inference frameworks

  • Deploy multi-GPU inference solutions with efficient model parallelism and serving architectures

  • Collaborate with research teams to ensure optimization techniques integrate smoothly with model development workflows

Required Skills:

  • 3+ years optimizing deep learning models for production inference

  • Expert-level PyTorch and CUDA programming experience

  • Hands-on experience with model quantization (INT8/FP16) and inference frameworks (TensorRT, ONNX Runtime)

  • Proficiency in GPU profiling tools and performance analysis

  • Experience with multi-GPU inference and model serving at scale

  • Strong understanding of transformer architectures and modern ML model optimization techniques

Preferred:

  • Custom CUDA kernel development experience

  • Experience with Triton, vLLM, or similar high-performance serving frameworks

  • Background in both research and production ML environments

Who You Are:

  • Fearless Innovator: We need people who thrive on challenges and aren't afraid to tackle the impossible.

  • Resilient Builder: Impacting Large World Models isn't a sprint; it's a marathon with hurdles. We're looking for builders who can weather the storms of groundbreaking research and come out stronger.

  • Mission-Driven Mindset: Everything we do is in service of creating the best spatially intelligent AI systems, and using them to empower people.

  • Collaborative Spirit: We're building something bigger than any one person. We need team players who can harness the power of collective intelligence.

We're hiring the brightest minds from around the globe to bring diverse perspectives to our cutting-edge work. If you're ready to work on technology that will reshape how machines perceive and interact with the world - then World Labs is your launchpad. 

Join us, and let's make history together.

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  0  0  0
Category: Engineering Jobs

Tags: Architecture CUDA Deep Learning GPU Machine Learning ML models ONNX PyTorch Research TensorRT vLLM

Region: North America
Country: United States

More jobs like this