Machine Learning Applications Engineer (GPU-Accelerated)

San Francisco HQ

⚠️ We'll shut down after Aug 1st - try foo🦍 for all jobs in tech ⚠️

Full Time Senior-level / Expert USD 235K - 258K

Alembic

Uncover marketing success with Alembic's AI-driven analytics. Predict revenue outcomes, optimize media spend, and gain actionable insights in real-time.

View all jobs at Alembic

Apply now Apply later

Posted 23 hours ago

About Alembic

Alembic is pioneering a revolution in marketing, proving the true ROI of marketing activities. The Alembic Marketing Intelligence Platform applies sophisticated algorithms and AI models to finally solve this long-standing problem. When you join the Alembic team, you’ll help build the tools that provide unprecedented visibility into how marketing drives revenue, helping a growing list of Fortune 500 companies make more confident, data-driven decisions.

About the Role

We’re looking for a Machine Learning Applications Engineer with GPU, Python, and C++ expertise to help productionize cutting-edge causal AI models. You’ll work closely with ML scientists to turn experimental research code into optimized, scalable, and well-structured software that powers Alembic’s real-time analytics and inference systems.

This is a hands-on, performance-focused role where you’ll operate at the intersection of applied ML, systems engineering, and high-performance computing.

Key Responsibilities

Translate early-stage ML research and prototypes into reliable, testable, and performant software components
Use CUDA, Triton, and Numba to optimize GPU-accelerated workloads for inference and preprocessing
Contribute to core libraries and performance-critical routines using modern C++ in hybrid Python/C++ environments
Develop modular, reusable infrastructure that supports deployment of ML workloads at scale
Collaborate with data scientists and engineers to optimize data structures, memory usage, and execution paths
Build interfaces and APIs to integrate ML components into Alembic’s broader platform
Implement logging, profiling, and observability tools to track performance and model behavior

Must-Have Qualifications

4–7 years of software engineering experience, including substantial time in Python and C++
Hands-on experience with GPU programming, including CUDA, Triton, Numba, or related frameworks
Strong familiarity with the Python data stack (Pandas, NumPy, PyArrow) and low-level performance tuning
Experience writing high-performance, memory-efficient code in C++
Demonstrated ability to work cross-functionally with researchers, platform engineers, and product teams
Comfort transforming research-grade ML code into maintainable, production-grade software

Nice-to-Have

Experience with hybrid Python/C++ or Python/CUDA extension development (e.g., Pybind11, Cython, custom ops)
Familiarity with ML serving or inference tools (e.g., TorchServe, ONNX Runtime, Triton Inference Server)
Exposure to structured data modeling, causal inference, or large-scale statistical computation
Background in distributed systems or parallel processing is a plus

What You’ll Get

A pivotal role building GPU-accelerated software at the heart of a real-world AI product
Collaboration with an elite team of ML scientists, engineers, and product leaders
The opportunity to shape performance-critical infrastructure powering enterprise decision-making
A culture rooted in technical rigor, curiosity, and product impact

Apply now Apply later

Job stats: 0 0 0

Categories: Engineering Jobs Machine Learning Jobs

Tags: APIs Causal inference CUDA Distributed Systems Engineering GPU Machine Learning NumPy ONNX Pandas Python Research Statistics