AI Inference Engineer

Copenhagen, DK

Teton

Empower staff & deliver data-driven care with Teton.ai's AI-powered assistant. Solutions for hospitals & care homes. Learn more & request a demo today!

View all jobs at Teton

Apply now Apply later

AI Inference Engineer

About Us

At Teton, we are redefining the role of healthcare workers through cutting-edge AI technology. Facing a global nursing shortage, our solutions offer vital support to overburdened health systems. We distinguish ourselves by focusing relentlessly on product excellence and user experience, rapidly deploying solutions that make a real difference.

At this stage of our company we require a physical presence in our office in Copenhagen, Denmark. We believe this enables the fastest and most efficient iteration cycles to build an impactful product that users love to use.

The Job

We're looking for a highly specialized AI Inference Engineer who thrives on optimizing AI models for real-world deployment at scale. You'll be the technical force behind making our healthcare AI systems blazingly fast, efficient, and production-ready. This role demands deep technical expertise in model optimization, CUDA programming, and cutting-edge inference frameworks.

You will be responsible for:

  • Model Optimization & Quantization: Implementing advanced quantization techniques, pruning, and distillation to maximize inference speed while maintaining accuracy

  • CUDA & Low-Level Optimization: Writing and optimizing CUDA kernels, leveraging TensorRT, and pushing the boundaries of GPU utilization

  • DeepStream Integration: Building robust inference pipelines using NVIDIA DeepStream, Jetpack, and edge deployment frameworks

  • Transformer Optimization: Specializing in transformer model inference optimization, including attention mechanisms, KV-cache optimization, and memory management

  • Infrastructure Scaling: Designing and implementing scalable inference infrastructure that can handle healthcare's demanding real-time requirements

  • Performance Engineering: Profiling, benchmarking, and continuously improving model serving latency and throughput

What You Bring

  • Deep AI Optimization Expertise: 3+ years of hands-on experience optimizing deep learning models for production inference

  • CUDA Mastery: Strong proficiency in CUDA programming, kernel optimization, and GPU memory management

  • Inference Frameworks: Extensive experience with TensorRT, DeepStream, Triton Inference Server, or similar high-performance serving frameworks

  • Transformer Specialization: Deep understanding of transformer architectures and their optimization challenges (attention mechanisms, memory patterns, sequence handling)

  • Systems Programming: Proficiency in Python, C++, and PyTorch with a focus on performance-critical code

  • Edge Deployment: Experience with NVIDIA Jetpack, edge computing, and resource-constrained environments

  • Performance Mindset: Obsessed with benchmarking, profiling, and squeezing every ounce of performance from hardware

Bonus Points

  • Experience with custom CUDA kernel development

  • Knowledge of mixed-precision training and inference

  • Familiarity with distributed inference and model parallelism

  • Experience with healthcare or safety-critical AI applications

  • Contributions to open-source inference optimization projects

What We Offer

  • Participation in our warrant program (stock options)

  • Work with state-of-the-art AI optimization technology in a pioneering field

  • Access to cutting-edge hardware and compute resources

  • A vibrant, learning-focused work environment with fellow optimization enthusiasts

  • Direct impact on healthcare delivery through performance-critical AI systems

Join Our Team

We're looking for engineers who get excited about shaving milliseconds off inference time and making AI models run faster than anyone thought possible. If you're passionate about the intersection of AI, systems programming, and real-world impact, come help us transform healthcare through optimized AI inference.

Ready to push the boundaries of what's possible with AI optimization? Join us in Copenhagen and be part of our mission to revolutionize healthcare.

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  3  0  0

Tags: Architecture CUDA Deep Learning Engineering GPU Model inference Open Source Pipelines Python PyTorch TensorRT

Perks/benefits: Career development Equity / stock options

Region: Europe
Country: Denmark

More jobs like this