ML Engineer (Inference)

Palo Alto, CA

Full Time Entry-level / Junior USD 150K - 220K

PlayAI

View all jobs at PlayAI

Apply now Apply later

Posted 3 hours ago

About Us:

PlayAI is at the forefront of generative voice and conversational LLMs. With our Speech Synthesis and Voice Cloning models, we are building the SOTA conversational AI products.

We are building a platform and infrastructure for Conversational AI Voice Agents so that every business, developer, or tinkerer can easily build talking human-like AI agents and use them to serve their customers; this will unlock massive value for the world and a lot of happiness for people using these delightful agents.

We joined YC last year in the YC W23 batch. Since then, we have raised $20m in seed funding and seen significant growth in users and revenue (20x the last two years).

What are we looking for?

We are in search of Machine Learning Engineers who are passionate about solving challenging problems in multimodal (voice, text, etc.) foundational model inference and enabling revolutionary experience in human-AI interaction. By joining our team, you have the opportunity to be a founding engineer and play a pivotal role in shaping the future of Conversational AI. If you're keen on pushing AI boundaries and making a significant impact, this role is for you.

Responsibilities:

Designing, building and optimizing multimodal foundational mode inference frameworks.
Inventing and implementing novel algorithms and features for inferencing streaming multimodal models..
Co-designing next generation multimodal foundation model architectures with the training team to hit the pareto frontier of quality and efficiency.

Qualifications:

Demonstrates a growth mindset and a passion for solving challenging problems.
Possesses previous academic or work experience in:
- LLM inference frameworks (TensorRT-LLM, vLLM, SGLang, etc.)
- LLM Inference algorithms (quantization, sparse attention, speculative sampling, etc.)
- LLM architecture and training (parallelization, MoE, etc.) is a plus
- Low level implementations (Flash Attention, GEMM, etc.) is a plus
- General machine learning (diffusion, GAN, etc.) is a plus
Experience with Pytorch, Python and C++. Mastery in CUDA is a plus.
Master's degree in a related technical field or Bachelor's degree from a top-tier university with relevant work experience (internships, full-time roles, or equivalent). Recent graduates and current students are encouraged to apply.