Research Scientist - Acoustic and Multi-Modal Scene Understanding
Cambridge, UK
Meta
Giving people the power to build community and bring the world closer togetherWe are developing all the technologies needed to enable breakthrough Smartglasses, AR glasses and VR headsets, including optics and displays, computer vision, audio, graphics, brain-computer interfaces, haptic interaction, eye/hand/face/body tracking, perception science, and true telepresence. Some of those will advance much faster than others, but they all need to happen to enable AR and VR that are so compelling that they become an integral part of our lives.
The Audio team within RL Research is looking for an experienced Research Scientist with an in-depth understanding of real-time and efficient signal processing and machine learning on audio and multi-modal signals to join our team. You will be doing core and applied research in technologies that improve listener’s hearing abilities under challenging listening conditions using wearable computing, and alongside a team of dedicated researchers, developers, and engineers. You will operate at the intersection of egocentric perception, acoustics, computer vision, and signal processing algorithms with hardware and software co-design.Research Scientist - Acoustic and Multi-Modal Scene Understanding Responsibilities
- Design innovative solutions for challenging multi-modal egocentric recognition problems with resource constraints
- Communicate research results internally and externally in the form of technical reports and scientific publications
- Experience of consistently working under own initiative implementing state of the art models and techniques on Pytorch, Tensorflow or other platforms, seeking feedback and input where appropriate
- Identify, motivate, and execute on reasonable medium to large hypotheses (each with many tasks) for model improvements through data analysis, and domain knowledge, with capacity to communicate learnings effectively.
- Design, perform, and analyze online and offline experiments with specific and well thought-out hypotheses in mind.
- Generate reliable, correct training data with great attention to detail.
- Identify and debug common issues in training machine learning models such as overfitting/underfitting, leakage, offline/online inconsistency
- Aware of common systems considerations and modeling issues, and factor this into modeling choices.
- Design acoustic or audio-visual models which can have a small computational footprint on mobile devices and wearables such as smart glasses.
- Currently has a PhD or a postdoctoral assignment in the field of deep learning, Machine Learning, Computer Vision, Computer Science, Computer Engineering or Statistics or a related field.
- 4+ years experience with development and implementation of signal processing and deep learning algorithms in the fields of acoustic and multi-modal detection, recognition and/or tracking problems.
- 4+ years experience with scientific programming languages such as Python, C++, or similar.
- 3+ years experience with research-oriented software engineering skills, including fluency with machine learning (e.g., PyTorch, TensorFlow, Scikit-learn, Pandas) and libraries for scientific computing (e.g. SciPy ecosystem).
- Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment.
- Demonstrated experience of implementing and evaluating end-to-end prototypical learning systems
- Ability to independently resolve most online and offline issues which affect the hypothesis testing
- Understand the model architecture used, and the consequences of this for different hypotheses tested. In general, you have a good understanding of computer vision from an applied perspective, even though you may not be up to date with the state of the art.
- Experience in communicating effectively with a broad range of stakeholders and collaborators at different levels
- Experience with audio-visual learning, computer vision, source localization and tracking, audio and visual SLAM systems, egocentric multimodal learning, etc.
- Experience with building low-complexity models on acoustic and multi-modal problems aimed at low-power mobile devices and wearables
- Experience with integration of development models on real-time running mobile platforms with different levels of compute (on-sensor computation, system on chip, low power island, etc.)
- Experience with acoustic localization or visual multi-object tracking problems
- Proven track record of achieving significant results and innovation as demonstrated by first-authored publications and patents.
Equal Employment Opportunity and Affirmative Action Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, reproductive health decisions, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, genetic information, political views or activity, or other applicable legally protected characteristics. You may view our Equal Employment Opportunity notice here.
Meta is committed to providing reasonable support (called accommodations) in our recruiting processes for candidates with disabilities, long term conditions, mental health conditions or sincerely held religious beliefs, or who are neurodivergent or require pregnancy-related support. If you need support, please reach out to accommodations-ext@meta.com.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Architecture Computer Science Computer Vision Data analysis Deep Learning Engineering Machine Learning ML models Pandas PhD Physics Python PyTorch Research Scikit-learn SciPy SLAM Statistics TensorFlow Testing VR V-SLAM
Perks/benefits: Career development
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.