Senior Machine Learning Engineer

Palo alto, CA

Full Time Senior-level / Expert USD 220K - 300K

IntelliPro Group Inc.

View all jobs at IntelliPro Group Inc.

Apply now Apply later

Posted 12 hours ago

Senior Machine Learning Engineer (Remote, US)

Compensation: $220K–$300K + Equity
Department: Applied Research
Location: Remote (US-based) | Full-Time

We’re seeking a Senior Machine Learning Engineer to help optimize the performance of state-of-the-art foundation models across a diverse range of hardware environments. If you're passionate about performance tuning, systems-level thinking, and scaling ML workloads beyond NVIDIA/CUDA constraints, this is your chance to shape the frontier of AI infrastructure.

What You’ll Be Doing:

Design and maintain abstractions that scale model performance efficiently across heterogeneous hardware platforms—not just CUDA/NVIDIA.
Profile and optimize memory usage, latency, and throughput in PyTorch; build or integrate low-level solutions (e.g., Triton kernels) as needed.
Benchmark our model and system performance to guide product decisions around cost, throughput, and deployment tradeoffs.
Collaborate with hardware and systems partners to uncover bottlenecks and push for performance improvements in future iterations.
Work hand-in-hand with research and engineering teams to ensure systems are planned and built with efficiency in mind from the start.

Qualifications:

Deep experience profiling and optimizing PyTorch code for performance (memory, latency, throughput).
Familiarity with tools like torch.compile, torch.XLA, PyTorch profiler, and memory or trace viewers.
Experience building performance-portable abstractions and optimizing ML pipelines for a variety of hardware/software stacks.
Strong understanding of transformer models and modern attention mechanisms.
Hands-on work with parallel inference strategies (tensor parallelism, pipeline parallelism, etc.).

Bonus Points For:

Proficiency with Triton or CUDA, especially writing custom kernels and fusions for hot code paths.
Experience writing high-performance parallel C++, particularly in a machine learning context (e.g., data loading, inference).
Previous work building efficient ML demos or inference environments (Gradio, Docker, etc.).
Experience deploying models on non-NVIDIA hardware platforms.

Why This Role Matters:

You’ll be building the technical backbone that allows cutting-edge multimodal AI models to run smoothly and efficiently across the world. Your work will directly influence how our models scale and how accessible they are in terms of cost, performance, and reach.

Compensation & Benefits:

Base Salary: $220,000 – $300,000 / year (based on experience & location)
Equity: Generous stock options
Benefits: Full health coverage, flexible PTO, home office support, and more

Join a lean, expert team building next-gen AI from the ground up. If you thrive at the intersection of ML, systems, and performance—and love solving deep efficiency challenges—we want to hear from you.

Apply now Apply later

Job stats: 0 0 0

Categories: Engineering Jobs Machine Learning Jobs

Tags: CUDA Docker Engineering Generative AI Gradio Machine Learning ML infrastructure Pipelines PyTorch Research