Model Optimization Engineer

San Francisco Office (HQ)

Full Time Senior-level / Expert USD 147K - 274K * ^est.

World Labs

View all jobs at World Labs

Apply now Apply later

Posted 23 hours ago

At World Labs, our mission is to revolutionize artificial intelligence by developing Large World Models, taking AI beyond language and 2D visuals into the realm of complex 3D environments, both virtual and real. We're the team that's envisioning a future where AI doesn't just process information but truly understands and interacts with the world around us.

We're looking for the overachievers, the visionaries, and the relentless innovators who aren't satisfied with the status quo. You know that person who's always dreaming up the next big breakthrough? That's us. And we want you to be part of it.

ML Inference Engineer / GPU Performance Engineer

We're seeking an experienced engineer to bridge the gap between our research team's state-of-the-art models and production-ready inference systems. You'll take PyTorch research code and transform it into highly optimized, low-latency inference solutions.

Key Responsibilities:

Optimize neural network models for inference through quantization, pruning, and architectural modifications while maintaining accuracy
Profile and benchmark model performance to identify computational bottlenecks
Implement optimizations using torch.compile, custom CUDA kernels, and specialized inference frameworks
Deploy multi-GPU inference solutions with efficient model parallelism and serving architectures
Collaborate with research teams to ensure optimization techniques integrate smoothly with model development workflows

Required Skills:

3+ years optimizing deep learning models for production inference
Expert-level PyTorch and CUDA programming experience
Hands-on experience with model quantization (INT8/FP16) and inference frameworks (TensorRT, ONNX Runtime)
Proficiency in GPU profiling tools and performance analysis
Experience with multi-GPU inference and model serving at scale
Strong understanding of transformer architectures and modern ML model optimization techniques

Preferred:

Custom CUDA kernel development experience
Experience with Triton, vLLM, or similar high-performance serving frameworks
Background in both research and production ML environments

Who You Are:

Fearless Innovator: We need people who thrive on challenges and aren't afraid to tackle the impossible.
Resilient Builder: Impacting Large World Models isn't a sprint; it's a marathon with hurdles. We're looking for builders who can weather the storms of groundbreaking research and come out stronger.
Mission-Driven Mindset: Everything we do is in service of creating the best spatially intelligent AI systems, and using them to empower people.
Collaborative Spirit: We're building something bigger than any one person. We need team players who can harness the power of collective intelligence.

We're hiring the brightest minds from around the globe to bring diverse perspectives to our cutting-edge work. If you're ready to work on technology that will reshape how machines perceive and interact with the world - then World Labs is your launchpad.

Join us, and let's make history together.

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats: 0 0 0

Category: Engineering Jobs

Tags: Architecture CUDA Deep Learning GPU Machine Learning ML models ONNX PyTorch Research TensorRT vLLM