Machine Learning Applications Engineer (GPU-Accelerated)

San Francisco HQ

⚠️ We'll shut down after Aug 1st - try foo🦍 for all jobs in tech ⚠️

Alembic

Uncover marketing success with Alembic's AI-driven analytics. Predict revenue outcomes, optimize media spend, and gain actionable insights in real-time.

View all jobs at Alembic

Apply now Apply later

About Alembic

Alembic is pioneering a revolution in marketing, proving the true ROI of marketing activities. The Alembic Marketing Intelligence Platform applies sophisticated algorithms and AI models to finally solve this long-standing problem. When you join the Alembic team, you’ll help build the tools that provide unprecedented visibility into how marketing drives revenue, helping a growing list of Fortune 500 companies make more confident, data-driven decisions.

About the Role

We’re looking for a Machine Learning Applications Engineer with GPU, Python, and C++ expertise to help productionize cutting-edge causal AI models. You’ll work closely with ML scientists to turn experimental research code into optimized, scalable, and well-structured software that powers Alembic’s real-time analytics and inference systems.

This is a hands-on, performance-focused role where you’ll operate at the intersection of applied ML, systems engineering, and high-performance computing.

Key Responsibilities

  • Translate early-stage ML research and prototypes into reliable, testable, and performant software components

  • Use CUDA, Triton, and Numba to optimize GPU-accelerated workloads for inference and preprocessing

  • Contribute to core libraries and performance-critical routines using modern C++ in hybrid Python/C++ environments
    Develop modular, reusable infrastructure that supports deployment of ML workloads at scale
    Collaborate with data scientists and engineers to optimize data structures, memory usage, and execution paths

  • Build interfaces and APIs to integrate ML components into Alembic’s broader platform
    Implement logging, profiling, and observability tools to track performance and model behavior

Must-Have Qualifications

  • 4–7 years of software engineering experience, including substantial time in Python and C++

  • Hands-on experience with GPU programming, including CUDA, Triton, Numba, or related frameworks
    Strong familiarity with the Python data stack (Pandas, NumPy, PyArrow) and low-level performance tuning
    Experience writing high-performance, memory-efficient code in C++

  • Demonstrated ability to work cross-functionally with researchers, platform engineers, and product teams

  • Comfort transforming research-grade ML code into maintainable, production-grade software

Nice-to-Have

  • Experience with hybrid Python/C++ or Python/CUDA extension development (e.g., Pybind11, Cython, custom ops)

  • Familiarity with ML serving or inference tools (e.g., TorchServe, ONNX Runtime, Triton Inference Server)

  • Exposure to structured data modeling, causal inference, or large-scale statistical computation

  • Background in distributed systems or parallel processing is a plus

What You’ll Get

  • A pivotal role building GPU-accelerated software at the heart of a real-world AI product

  • Collaboration with an elite team of ML scientists, engineers, and product leaders

  • The opportunity to shape performance-critical infrastructure powering enterprise decision-making

  • A culture rooted in technical rigor, curiosity, and product impact

Apply now Apply later
Job stats:  0  0  0

Tags: APIs Causal inference CUDA Distributed Systems Engineering GPU Machine Learning NumPy ONNX Pandas Python Research Statistics

Perks/benefits: Team events

Region: North America
Country: United States

More jobs like this