Member of Technical Staff - Foundational Model Data
San Francisco
Liquid AI
We build capable and efficient general-purpose AI systems at every scale. Liquid Foundation Models (LFMs) are a new generation of generative AI models that achieve state-of-the-art performance at every scale, while maintaining a smaller memory...
Liquid AI, an MIT spin-off, is a foundation model company headquartered in Boston, Massachusetts. Our mission is to build capable and efficient general-purpose AI systems at every scale.
Our goal at Liquid is to build the most capable AI systems to solve problems at every scale, such that users can build, access, and control their AI solutions. This is to ensure that AI will get meaningfully, reliably and efficiently integrated at all enterprises. Long term, Liquid will create and deploy frontier-AI-powered solutions that are available to everyone.
We are seeking a highly skilled Member of Technical Staff, Foundation Model Data to play a critical role in our foundation model development process. This role focuses on consolidating, gathering, and generating high-quality text data for pretraining, midtraining, SFT, and preference optimization.
Our goal at Liquid is to build the most capable AI systems to solve problems at every scale, such that users can build, access, and control their AI solutions. This is to ensure that AI will get meaningfully, reliably and efficiently integrated at all enterprises. Long term, Liquid will create and deploy frontier-AI-powered solutions that are available to everyone.
We are seeking a highly skilled Member of Technical Staff, Foundation Model Data to play a critical role in our foundation model development process. This role focuses on consolidating, gathering, and generating high-quality text data for pretraining, midtraining, SFT, and preference optimization.
Key Responsibilities
- Create and maintain data cleaning, filtering, selection pipeline than can handle >100TB of data.
- Watch out for the release of public dataset on huggingface and other platforms.
- Create crawlers to gather datasets from the web where public data is lacking.
- Write and maintain synthetic data generation pipelines.
- Run ablations to assess new dataset and judging pipelines.
Required Qualifications
- Experience Level: B.S. + 5 years experience or M.S. + 3 years experience or Ph.D. + 1 year of experience.
- Dataset Engineering: Expertise in data curation, cleaning, augmentation, and synthetic data generation techniques.
- Machine Learning Expertise: Ability to write and debug models in popular ML frameworks, and experience working with LLMs.
- Software Development: Strong programming skills in Python, with an emphasis on writing clean, maintainable, and scalable code.
Preferred Qualifications
- M.S. or Ph.D. in Computer Science, Electrical Engineering, Math, or a related field.
- Experience fine-tuning or customizing LLMs.
- First-author publications in top ML conferences (e.g. NeurIPS, ICML, ICLR).
- Contributions to popular open-source projects.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Job stats:
1
1
0
Category:
Leadership Jobs
Tags: Computer Science Engineering HuggingFace ICLR ICML LLMs Machine Learning Mathematics ML models NeurIPS Open Source Pipelines Python
Perks/benefits: Conferences
Region:
North America
Country:
United States
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.
BI Developer jobsSr. Data Engineer jobsData Engineer II jobsBusiness Intelligence Analyst jobsPrincipal Data Engineer jobsStaff Data Scientist jobsStaff Machine Learning Engineer jobsData Manager jobsData Science Manager jobsPrincipal Software Engineer jobsData Science Intern jobsBusiness Data Analyst jobsJunior Data Analyst jobsData Analyst Intern jobsData Specialist jobsSoftware Engineer II jobsLead Data Analyst jobsResearch Scientist jobsSr. Data Scientist jobsDevOps Engineer jobsStaff Software Engineer jobsAI/ML Engineer jobsData Engineer III jobsSenior Backend Engineer jobsBI Analyst jobs
Git jobsAirflow jobsEconomics jobsOpen Source jobsLinux jobsComputer Vision jobsKafka jobsGoogle Cloud jobsJavaScript jobsMLOps jobsNoSQL jobsData Warehousing jobsTerraform jobsPhysics jobsKPIs jobsRDBMS jobsPostgreSQL jobsScikit-learn jobsBanking jobsHadoop jobsScala jobsGitHub jobsData warehouse jobsStreaming jobsPandas jobs
Classification jobsR&D jobsBigQuery jobsDistributed Systems jobsOracle jobsPySpark jobsdbt jobsLooker jobsCX jobsScrum jobsReact jobsRAG jobsMicroservices jobsRobotics jobsJira jobsRedshift jobsIndustrial jobsSAS jobsData Mining jobsNumPy jobsPrompt engineering jobsGPT jobsELT jobsMySQL jobsData strategy jobs