Staff Data Engineer - Research/Machine Learning

New York City

Applications have closed

Character.AI

Meet AIs that feel alive. Chat with anyone, anywhere, anytime. Experience the power of super-intelligent chat bots that hear you, understand you, and remember you.

View company page

Find more jobs like this Jobs in the United States

Posted 1 month ago

About us

Character’s mission is to empower everyone with AGI. Our vision is to enable people with our technology so that they can use Character.AI any moment of any day.

Character.AI is one of the world’s leading personal AI platforms. Founded in 2021 by AI pioneers Noam Shazeer and Daniel De Freitas, Character.AI is a full-stack AI company with a globally scaled direct-to-consumer platform. As of 2023 that platform was #2 in the space in user engagement. Character.AI is uniquely centered around people, letting users personalize their experience by interacting with AI “Characters.” The company achieved unicorn status in 2023 and was named Google Play’s AI App of the Year.

Noam co-invented the key tech powering LLMs and was recently named to TIME100’s Most Influential People in AI list. TIME called him “one of the most important and impactful people of the space’s past, present, and future.” Daniel created and led LaMDA, the breakthrough conversational tech project currently powering Bard.

To learn more, please visit beta.character.ai.

About the role

You would be a great fit for this role if you are an experienced engineer who will be instrumental in building the world's best LLMs by collecting and refining the essential training data that powers them. In pursuit of the best language models, your responsibility is twofold:

First, identify and collect data at the scale required to feed our largest models. This involves managing a diverse set of sources, including structured and unstructured content from text and multimedia formats. Your engineering expertise is crucial in crafting the infrastructure and tools necessary to efficiently collect and manage petabytes of data.
Second, you will experiment with various methods of extracting a balanced and comprehensive training dataset from the raw data. You will leverage your expertise in data to build datasets reflecting a hypothesis, train models, and evaluate experimental results. Through this experimentation, you will create the training datasets for our largest models.

These are critical steps in the construction of AI. With petabytes of data and numerous design decisions, each step requires careful attention. Expertise in AI is not necessary, but enthusiasm for the space and a track record of adapting to new domains is important.

Who we’re looking for

Required Experience:

5+ years of production software engineering experience
Experience building large-scale data processing pipelines, with tools like PySpark, Beam, or Flink
Familiarity with Machine Learning and NLP and willingness to learn more on the job
Track record of adapting to new domains and a desire to use data to improve products

Additional Desired Experience:

ML experience as an ML engineer, Data Scientist, or another similar role
Experience with cloud platforms like AWS or Azure, or tools such as Kubernetes and Terraform
Passionate about Conversational AI or large language models

You will be a good fit if you are proactive and have a “get things done” mindset. Given our current pace of growth and load on our systems, most people have had a significant impact during their first week at the company.

Character is an equal opportunity employer and does not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or any other legally protected status. We value diversity and encourage applicants from a range of backgrounds to apply.

Find more jobs like this Jobs in the United States

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Tags: AGI AWS Azure Bard Conversational AI Engineering Flink Kubernetes LLMs Machine Learning NLP Pipelines PySpark Research Terraform

Perks/benefits: Career development

Region: North America

Country: United States

Job stats: 10 0 0

Categories: Engineering Jobs Leadership Jobs Machine Learning Jobs Research Jobs

More jobs like this

« Back to job search To the top ↑

Explore more AI, ML, Data Science career opportunities

Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.

Staff Data Engineer - Research/Machine Learning

New York City

Applications have closed

Character.AI

Who we’re looking for

More jobs like this

Lead Developer (AI)

Research Engineer

Ecosystem Manager

Founding AI Engineer, Agents

Member of Technical Staff: Research Engineer, Product

Senior Machine Learning Engineer - AI Incubations

IT Senior Data Engineer - Remote

Software Engineer, Supercomputing, HPC Infrastructure

Analyst/Senior Associate, Quantitative Market Risk Models - Financial Engineering & Modelling (FEM)

Senior Machine Learning Engineer, Search Relevance

Explore more AI, ML, Data Science career opportunities