Member of Technical Staff - Edge AI Inference Engineer

Boston

Full Time Senior-level / Expert USD 147K - 274K * ^est.

Liquid AI

We build capable and efficient general-purpose AI systems at every scale. Liquid Foundation Models (LFMs) are a new generation of generative AI models that achieve state-of-the-art performance at every scale, while maintaining a smaller memory...

View all jobs at Liquid AI

Apply now Apply later

Posted 1 week ago

Liquid AI, an MIT spin-off, is a foundation model company headquartered in Boston, Massachusetts. Our mission is to build capable and efficient general-purpose AI systems at every scale.
Our goal at Liquid is to build the most capable AI systems to solve problems at every scale, such that users can build, access, and control their AI solutions. This is to ensure that AI will get meaningfully, reliably and efficiently integrated at all enterprises. Long term, Liquid will create and deploy frontier-AI-powered solutions that are available to everyone.
What this role actually is:
As we prepare to deploy our models across various edge device types, including CPUs, embedded GPUs, and NPUs, we seek an expert to optimize inference stacks tailored to each platform. We're looking for someone who can take our models, dive deep into the task, and return with a highly optimized inference stack, leveraging existing frameworks like llama.cpp, Executorch, and TensorRT to deliver exceptional throughput and low latency.
The ideal candidate is a highly skilled engineer with extensive experience in inference on embedded hardware and a deep understanding of CPU, NPU, and GPU architectures. They should be self-motivated, capable of working independently, and driven by a passion for optimizing performance across diverse edge hardware platforms.
Proficiency in building and enhancing edge inference stacks is essential. Additionally, experience with mobile development and expertise in cache-aware algorithms will be highly valued.

Responsibilities

Strong ML Experience: Proficiency in Python and PyTorch to effectively interface with the ML team at a deeply technical level.
Hardware Awareness: Must understand modern hardware architecture, including cache hierarchies and memory access patterns, and their impact on performance.
Proficient in Coding: Expertise in Python, C++, or Rust for AI-driven real-time embedded systems
Optimization of Low-Level Primitives: Responsible for optimizing core primitives to ensure efficient model execution.
Self-Guided and Ownership: Ability to independently take a PyTorch model and inference requirements and deliver a fully optimized edge inference stack with minimal guidance

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats: 1 0 0

Categories: Deep Learning Jobs Engineering Jobs Leadership Jobs

Tags: Architecture GPU LLaMA Machine Learning Python PyTorch Rust TensorRT