Computational Linguist (Gen AI Evaluation)
United States, United States
⚠️ We'll shut down after Aug 1st - try foo🦍 for all jobs in tech ⚠️
Sigma
Create smarter AI with better training data. Sigma.AI provides highest quality data annotation and data collection at scale, custom-fit to your machine learning needs.🌟 Join Sigma.AI – Shaping the Future of Artificial Intelligence 🌍
🔹 What is Sigma?
Sigma is a leading global technology company specializing in data collection and annotation for Artificial Intelligence. With over 30 years of experience, offices in Spain, the US, and the UK, and operations in more than 200 languages, we support top multinational clients in developing cutting-edge AI solutions.
About the Job
We’re looking for a versatile Computational Linguist to join our R&D team focused on evaluating and supporting Generative AI systems. This role combines linguistic expertise, data analysis, and hands-on experimentation with large language models. You’ll help design annotation workflows, create and refine guidelines and internal documentation, prototype task-specific evaluation metrics, configure annotation tools, and analyze annotator and model performance using real-world data, contributing to papers and articles as needed.
This is a hybrid linguistics + data science role: ideal for someone who can move between qualitative language analysis and quantitative evaluation. You'll be working cross-functionally with researchers, and annotators to design creative, rigorous, and scalable evaluation processes for LLM-driven workflows.
Required Qualifications
- Master’s degree (or equivalent experience) in Computational Linguistics, NLP, Linguistics, or a related field
- 2+ years of experience in NLP or AI projects (industry or research)
- Experience using and fine-tuning transformer-based language models (e.g., BERT, GPT)
- Proficiency in Python programming
- Comfortable with Linux environments and Bash scripting
- Experience working with public datasets (such as from HuggingFace, Kaggle, etc)
- Familiarity with LLM behavior, prompt-based evaluation, and generative model outputs
- Comfortable with structured data formats (JSONL, CSV), Jupyter notebooks, and pandas-based analysis
- Fluent in English
Preferred Qualifications
- Strong understanding of current trends and techniques in generative AI
- Experience with annotation tools (e.g., Label Studio, Prodigy) and quality metrics for human data
- Experience creating and curating bespoke datasets
- Familiarity with evaluation challenges in creative or subjective NLP tasks
- Proficient with Python NLP and data science libraries: pandas, numpy, scikit-learn, NLTK
- Experience with generative AI SDKs and frameworks (e.g., OpenAI, Google, Anthropic, LangChain)
- Understanding of linguistic typology, multilingual NLP, or sociolinguistic variation
- Experience working in WSL environments
- Experience collaborating with annotation teams and QA processes
Salary: 80-90 K $US
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Anthropic BERT CSV Data analysis Generative AI GPT HuggingFace Jupyter LangChain Linguistics Linux LLMs NLP NLTK NumPy OpenAI Pandas Python R R&D Research Scikit-learn
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.