ML Engineer (Inference)
Palo Alto, CA
Full Time Entry-level / Junior USD 150K - 220K
About Us:
PlayAI is at the forefront of generative voice and conversational LLMs. With our Speech Synthesis and Voice Cloning models, we are building the SOTA conversational AI products.
We are building a platform and infrastructure for Conversational AI Voice Agents so that every business, developer, or tinkerer can easily build talking human-like AI agents and use them to serve their customers; this will unlock massive value for the world and a lot of happiness for people using these delightful agents.
We joined YC last year in the YC W23 batch. Since then, we have raised $20m in seed funding and seen significant growth in users and revenue (20x the last two years).
What are we looking for?
We are in search of Machine Learning Engineers who are passionate about solving challenging problems in multimodal (voice, text, etc.) foundational model inference and enabling revolutionary experience in human-AI interaction. By joining our team, you have the opportunity to be a founding engineer and play a pivotal role in shaping the future of Conversational AI. If you're keen on pushing AI boundaries and making a significant impact, this role is for you.
Responsibilities:
Designing, building and optimizing multimodal foundational mode inference frameworks.
Inventing and implementing novel algorithms and features for inferencing streaming multimodal models..
Co-designing next generation multimodal foundation model architectures with the training team to hit the pareto frontier of quality and efficiency.
Qualifications:
Demonstrates a growth mindset and a passion for solving challenging problems.
Possesses previous academic or work experience in:
LLM inference frameworks (TensorRT-LLM, vLLM, SGLang, etc.)
LLM Inference algorithms (quantization, sparse attention, speculative sampling, etc.)
LLM architecture and training (parallelization, MoE, etc.) is a plus
Low level implementations (Flash Attention, GEMM, etc.) is a plus
General machine learning (diffusion, GAN, etc.) is a plus
Experience with Pytorch, Python and C++. Mastery in CUDA is a plus.
Master's degree in a related technical field or Bachelor's degree from a top-tier university with relevant work experience (internships, full-time roles, or equivalent). Recent graduates and current students are encouraged to apply.
What We Offer:
Challenging problems to solve.
Autonomous working environment
Competitive compensation
Flexible work hours
Health, dental, and vision insurance
Commuter benefits
Flexible PTO + holidays
Final offer amounts are determined by multiple factors, including experience, and may vary from the amounts listed above.
Tags: Architecture Conversational AI CUDA LLMs Machine Learning Model inference Python PyTorch Speech synthesis Streaming TensorRT vLLM
Perks/benefits: Career development Competitive pay Flex hours Flex vacation Health care
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.