Research Scientist – Speech and Audio Understanding (Large Models & Multimodal Systems)

US-Washington-Bellevue, United States

⚠️ We'll shut down after Aug 1st - try foo🦍 for all jobs in tech ⚠️

Full Time Senior-level / Expert USD 141K - 265K

Tencent

腾讯于1998年11月成立，是一家互联网公司，通过技术丰富互联网用户的生活，助力企业数字化升级。我们的使命是“用户为本科技向善”。Founded in 1998, Tencent is an Internet-based platform company using technology to enrich the lives of Internet users and assist the digital upgrade of enterprises. Our mission...

View all jobs at Tencent

Apply now Apply later

Posted 1 month ago

Business Unit

What the Role Entails

Job Responsibilities:

We are building large-scale, native multimodal model systems that jointly support vision, audio, and text to enable comprehensive perception and understanding of the physical world. You will join the core research team focused on speech and audio, contributing to the following key research areas:
Develop general-purpose, end-to-end large speech models covering multilingual automatic speech recognition (ASR), speech translation, speech synthesis, paralinguistic understanding, and general audio understanding.
Advance research on speech representation learning and encoder/decoder architectures to build unified acoustic representations for multi-task and multimodal applications.
Explore representation alignment and fusion mechanisms between audio/speech and other modalities in large multimodal models, enabling joint modeling with image and text.
Build and maintain high-quality multimodal speech datasets, including automatic annotation and data synthesis technologies.

Who We Look For

Ph.D. in Computer Science, Electrical Engineering, Artificial Intelligence, Linguistics, or a related field; or Master’s degree with several years of relevant experience.
Solid understanding of speech and audio signal processing, acoustic modeling, language modeling, and large model architectures.
Proficient in one or more core speech system development pipelines such as ASR, TTS, or speech translation; experience with multilingual, multitask, or end-to-end systems is a plus.
Candidates with in-depth research or practical experience in the following areas are strongly preferred:
Speech representation pretraining (e.g., HuBERT, Wav2Vec, Whisper)
Multimodal alignment and cross-modal modeling (e.g., audio-visual-text)
Experience driving state-of-the-art (SOTA) performance on audio understanding tasks with large models
Proficient in deep learning frameworks such as PyTorch or TensorFlow; experience with large-scale training and distributed systems is a plus.
Familiar with Transformer-based architectures and their applications in speech and multimodal training/inference.

Location State(s)

US-Washington-Bellevue

The expected base pay range for this position in the location(s) listed above is $141,480.00 to $265,200.00 per year. Actual pay may vary depending on job-related knowledge, skills, and experience. Employees hired for this position may be eligible for a sign on payment, relocation package, and restricted stock units, which will be evaluated on a case-by-case basis. Subject to the terms and conditions of the plans in effect, hired applicants are also eligible for medical, dental, vision, life and disability benefits, and participation in the Company’s 401(k) plan. The Employee is also eligible for up to 15 to 25 days of vacation per year (depending on the employee’s tenure), up to 13 days of holidays throughout the calendar year, and up to 10 days of paid sick leave per year. Your benefits may be adjusted to reflect your location, employment status, duration of employment with the company, and position level. Benefits may also be pro-rated for those who start working during the calendar year.

Equal Employment Opportunity at Tencent

As an equal opportunity employer, we firmly believe that diverse voices fuel our innovation and allow us to better serve our users and the community. We foster an environment where every employee of Tencent feels supported and inspired to achieve individual and common goals.

Apply now Apply later

Job stats: 5 0 0

Categories: Data Science Jobs Research Jobs

Tags: Architecture ASR Computer Science Deep Learning Distributed Systems Engineering Linguistics NLP Pipelines PyTorch Research Speech synthesis TensorFlow

Perks/benefits: Career development Equity / stock options Health care Medical leave Relocation support Startup environment

Region: North America

Country: United States

More jobs like this

« Back to job search To the top ↑

Explore more career opportunities

Find even more open roles below ordered by popularity of job title or skills/products/technologies used.

Research Scientist – Speech and Audio Understanding (Large Models & Multimodal Systems)

US-Washington-Bellevue, United States

⚠️ We'll shut down after Aug 1st - try foo🦍 for all jobs in tech ⚠️

Full Time Senior-level / Expert USD 141K - 265K

Tencent

More jobs like this

Sr Data Scientist

Data Scientist

Sr. Data Scientist

Decision Science Analyst Senior - Fraud

Data Scientist Lead - Bank AI/ML

Senior Machine Learning Scientist, MLDD (Large Molecule Drug Discovery)

Senior/Principal Machine Learning Scientist, MLDD (Structure, Scoring, and Simulation)

Senior Machine Learning Scientist, BRAID (Clinical Sciences ML)

Senior Machine Learning Scientist, MLDD (Molecular Dynamics & Structure-based Drug Design)

Senior Machine Learning Scientist, BRAID (Foundational ML)

Explore more career opportunities

Research Scientist – Speech and Audio Understanding (Large Models & Multimodal Systems)

US-Washington-Bellevue, United States ⚠️ We'll shut down after Aug 1st - try foo🦍 for all jobs in tech ⚠️

Full Time Senior-level / Expert USD 141K - 265K

Tencent

More jobs like this

Sr Data Scientist

Data Scientist

Sr. Data Scientist

Decision Science Analyst Senior - Fraud

Data Scientist Lead - Bank AI/ML

Senior Machine Learning Scientist, MLDD (Large Molecule Drug Discovery)

Senior/Principal Machine Learning Scientist, MLDD (Structure, Scoring, and Simulation)

Senior Machine Learning Scientist, BRAID (Clinical Sciences ML)

Senior Machine Learning Scientist, MLDD (Molecular Dynamics & Structure-based Drug Design)

Senior Machine Learning Scientist, BRAID (Foundational ML)

Explore more career opportunities

US-Washington-Bellevue, United States

⚠️ We'll shut down after Aug 1st - try foo🦍 for all jobs in tech ⚠️