Research Scientist - Synthetic Data
London
Applications have closed
Encord
Discover how the world's leading AI teams use Encord to accelerate the development of the next generation of AI applications.At Encord, we're building the AI infrastructure of the future. One of the biggest challenges AI companies face today is data quality. The success of any AI application relies heavily on the quality of its training data, yet for most teams, this crucial step is both the most costly and time-consuming. We’re here to change that.As former computer scientists, physicists, and quants, we’ve experienced firsthand how a lack of tools to prepare quality training data impedes progress in building AI. We believe AI is at a stage similar to the early days of computing or the internet—where the potential is clear, but the surrounding tools and processes are still catching up. That's why we started Encord.
We are a talented and ambitious team of 60, working at the cutting edge of computer vision and deep learning. Backed by $30M in Series B funding from top investors like CRV and Y Combinator, we’re one of the fastest-growing companies in our space. Our platform is consistently rated the best by our customers, and we have big plans ahead. We’re looking for a Research Scientist to help our customers get the right data faster, easier, and cheaper.
The Role
As a Research Scientist focusing on generating synthetic data at Encord, you'll play a critical role in helping customers proliferate their datasets at ease. Although starting narrow with a single domain in mind, you'll progressively work across a variety of industries and domains such as healthcare, geospatial, sports analytics, and surveillance, ensuring that customers can efficiently harness synthetic data to improve their AI models. Example tasks range from building easily adaptable diffusion models for generating new data with particular properties to developing novel ways of conditioning generative models to obtain new data with specific properties; All to mitigate customers’ data problems.
You'll follow the latest research, push state-of-the-art technologies forward to empower customers in their data journeys. This role offers a great growth opportunity, with the potential to lead a team of scientists over time in our efforts to provide high-quality synthetic data.
What you will be doing:
- Building, Fine-tuning, and experimenting with deep learning-based approaches for (conditional) synthetic data generation, like Stable Diffusion and GANs.
- Developing scalable and novel ways to condition data generation based on information from our data development platform.
- Follow the latest machine learning research to identify and apply new methods that improve outcomes.
- Work on cutting-edge generative models, starting with text-to-image models and potentially expanding into more domains like video and audio.
- Ensure our customers have the world’s best platform for expanding datasets synthetically.
Skills for the job:
- A PhD or similarly strong academic background in machine learning with 2+ years of hands-on experience in synthetic data generation for images or video (e.g., Stable Diffusion, GANs, Normalizing Flows, etc.).
- Proficiency with frameworks like PyTorch, Tensorflow, JAX, Pandas, and OpenCV.
- A quick learner with a structured, organized approach to problem-solving.
- Excellent communication skills with an ability to uncover use cases and solve problems efficiently.
- Ambitious and self-motivated, with a proven track record of top performance in academic or professional settings.
Bonus skills:
- Experience working with data in the order of millions.
- Experience with PEFT techniques like QLoRA
- Familiarity with cloud-based model training and inference
At Encord, you’ll have the unique opportunity to be part of a fast-growing startup with a clear mission and vision. You’ll work on real-world AI use cases across a variety of industry verticals and get hands-on experience with cutting-edge computer vision and deep learning technologies. This is a role where you'll grow quickly, take ownership of projects, and help shape the future of our company.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Computer Vision Data quality Deep Learning Diffusion models GANs Generative modeling JAX Machine Learning ML infrastructure Model training OpenCV Pandas PhD PyTorch Research Stable Diffusion TensorFlow
Perks/benefits: Career development Competitive pay Equity / stock options Salary bonus Startup environment
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.