ML Engineer (Inference)

Palo Alto, CA

Apply now Apply later

About Us:

PlayAI is at the forefront of generative voice and conversational LLMs. With our Speech Synthesis and Voice Cloning models, we are building the SOTA conversational AI products.

We are building a platform and infrastructure for Conversational AI Voice Agents so that every business, developer, or tinkerer can easily build talking human-like AI agents and use them to serve their customers; this will unlock massive value for the world and a lot of happiness for people using these delightful agents.

We joined YC last year in the YC W23 batch. Since then, we have raised $20m in seed funding and seen significant growth in users and revenue (20x the last two years).

What are we looking for?

We are in search of Machine Learning Engineers who are passionate about solving challenging problems in multimodal (voice, text, etc.) foundational model inference and enabling revolutionary experience in human-AI interaction. By joining our team, you have the opportunity to be a founding engineer and play a pivotal role in shaping the future of Conversational AI. If you're keen on pushing AI boundaries and making a significant impact, this role is for you.

Responsibilities:

  • Designing, building and optimizing multimodal foundational mode inference frameworks.

  • Inventing and implementing novel algorithms and features for inferencing streaming multimodal models..

  • Co-designing next generation multimodal foundation model architectures with the training team to hit the pareto frontier of quality and efficiency.

Qualifications:

  • Demonstrates a growth mindset and a passion for solving challenging problems.

  • Possesses previous academic or work experience in:

    • LLM inference frameworks (TensorRT-LLM, vLLM, SGLang, etc.)

    • LLM Inference algorithms (quantization, sparse attention, speculative sampling, etc.)

    • LLM architecture and training (parallelization, MoE, etc.) is a plus

    • Low level implementations (Flash Attention, GEMM, etc.) is a plus

    • General machine learning (diffusion, GAN, etc.) is a plus

  • Experience with Pytorch, Python and C++. Mastery in CUDA is a plus.

  • Master's degree in a related technical field or Bachelor's degree from a top-tier university with relevant work experience (internships, full-time roles, or equivalent). Recent graduates and current students are encouraged to apply.

What We Offer:

  • Challenging problems to solve.

  • Autonomous working environment

  • Competitive compensation

  • Flexible work hours

  • Health, dental, and vision insurance

  • Commuter benefits

  • Flexible PTO + holidays

Final offer amounts are determined by multiple factors, including experience, and may vary from the amounts listed above.

Apply now Apply later
Job stats:  1  0  0

Tags: Architecture Conversational AI CUDA LLMs Machine Learning Model inference Python PyTorch Speech synthesis Streaming TensorRT vLLM

Perks/benefits: Career development Competitive pay Flex hours Flex vacation Health care

Region: North America
Country: United States

More jobs like this