Research Scientist, LLM Evaluation & Post-Training
USD 150K-160K Senior-level Full Time
Tasks
- Analyze model behavior and failure patterns
- Build evaluation and post training pipelines with ML teams
- Create benchmark datasets and evaluation reports
- Define and execute LLM evaluation research agenda
- Design experiments for post training outcomes
- Develop evaluation frameworks and benchmarks
- Implement scoring reliability and measurement validity
- Improve evaluation redesign recommendations
- Partner with customers to review evaluation methodologies
- Publish research findings and contribute to open-source
- Run human and automated evaluation studies
Perks/Benefits
- N/A
Skills/Tech-stack
Benchmarking | Context evaluation | DPO | Data Processing | Error Analysis | Experimental Design | GRPO | Hugging Face | Hugging Face Transformers | Human evaluation | JAX | LLM Evaluation | Language Processing | Long Context | Long Context Evaluation | Machine Learning | Metric Design | Model Alignment | Multimodal evaluation | Natural Language | Natural Language Processing | PPO | PyTorch | Python | RAG | RLAIF | RLHF | Reinforcement Learning | Robustness Testing | SFT | Significance Testing | Statistical Analysis | Stress Testing | TensorFlow | Uncertainty Quantification | Vector Databases
Education
Roles
Related jobs
-
Communication | Data Science | Data analytics | Power BI | Problem Solving401k matching | Adoption Assistance | Caregiver leave | Dental insurance | Employee assistance programEntry-level Full Time2911 Lake Vista Drive, TX, 500 … R11h ago
-
Data Scientist - Materials R&D - Remote-Travel USD 110K-141KAWS | Azure | Classification | Cloud Platforms | ClusteringCareer development | Health benefits | Remote work | Training and development | Travel opportunitiesSenior-level Full TimeMarysville, MI, 48040, US R23h ago
-
[2026] Data Scientist, Social - PhD Early Career USD 185K-221KA/B | A/B Testing | Apache Airflow | Apache Hive | Apache SparkMid-level Full TimeSan Mateo, CA, United States R1d ago
-
Senior Data Scientist - Ecosystem and Learning Platform USD 263K-322KAnomaly Detection | Causal Inference | Dashboards | Data Pipelines | ETLEquity compensation | Health benefitsSenior-level Full TimeSan Mateo, CA, United States R1d ago
-
Principal Data Scientist USD 162K-258KAmazon Redshift | Amazon S3 | Amplitude | Apache Spark | Causal InferenceHealthcare coverage | Hybrid work | LifeTime Membership | Parental leave | Remote workSenior-level Full TimeRemote - United States R1d ago
-
Algorithm Development | Bioinformatics | Data Analysis | Data Visualization | DocumentationHybrid workSenior-level Full TimeSouth San Francisco, CA, United States R1d ago
-
Senior Data Scientist (Credit Risk) USD 150K-185KA/B | A/B Testing | B testing | Boosting | Bootstrapping401k matching | Bonus plan | Company holidays | Dental insurance | Health insuranceSenior-level Full TimeDallas, TX R1d ago
-
Staff Data Scientist USD 163K-190K3D processing | Computer Vision | Computer Vision 3D | Computer Vision 3D Processing | Data Governance401k match | Catered lunch | Dental insurance | Disability insurance | Employee assistance programSenior-level Full TimeNorth Bethesda, MD R1d ago
-
Staff Data Scientist USD 163K-190K3D Mesh | 3D mesh processing | BRep Processing | Computer Vision | Data Governance401k match | Catered lunch | Dental insurance | Disability insurance | EAPSenior-level Full TimeWaltham, Massachusetts R1d ago
-
Data Scientist, Learning Supports USD 121K-148KCausal Inference | Data Management | Data Quality | Data quality control | Descriptive AnalyticsMid-level Full TimeUS-Remote R1d ago
-
Lead Risk Analytics Data Scientist USD 125K-155KAI Automation | Actuarial pricing | Claims analytics | DBT | Data Modeling401k | Dental insurance | Medical insurance | Paid time off | Vision insuranceSenior-level Full TimeRemote, US R1d ago
-
Senior Data Scientist - GTM Data USD 165K-185KA/B | A/B Testing | Airflow | B testing | Bayesian MethodsGlobal team connectivity | Kind co-workers | Virtual team-bonding events | Work-life balanceSenior-level Full TimeRemote (USA) R1d ago
-
Generative AI Scientist - (Model Risk & Validation) USD 110K-130KAI Platform | AWS | Amazon SageMaker | Apache Spark | Azure401k matching | Insurance | Paid Holidays | Paid family leave | Paid time offEntry-level Full TimeRemote, United States R1d ago
-
Statistician USD 80K-110KCensored Data | Data Mining | Data cleaning | Dose Reconstruction | Left Censored Data401k | Dental insurance | Life insurance | Medical insurance | Paid time offMid-level Full TimeRemote, United States R1d ago
-
IT Data Scientist USD 102K-152KAPI | Bayesian Inference | CI/CD | Causal modeling | ClassificationAdoption Assistance | Dental benefits | Educational assistance program | Flexible spending accounts | Fully remoteMid-level Full TimeAAO Oak Brook - 2025 Windsor … R1d ago
-
Senior Data Scientist, Generative AI USD 147K-169KCloud Computing | Data Pipelines | Data Processing | Data Visualization | DocumentationBusiness accident insurance | Dental insurance | Life insurance | Long-term disability | Medical insuranceSenior-level Full TimeUS017 NJ New Brunswick - 1 … R1d ago
-
Analytics Team Lead USD 109K-230KData Analysis | Data Modeling | Experimentation | Financial crime | Fraud DetectionAnnual incentive bonus | Remote work optionsSenior-level Full TimeHome based-Florida, United States R1d ago
-
Bioinformatics Staff Scientist at NSF-NCEMS USD 61K-115KATAC-seq | Alignment | Bioinformatics pipelines | ChIP-seq | Data StandardizationSenior-level Full TimePenn State University Park, United States R1d ago
-
Machine Learning Staff Scientist at NSF-NCEMS USD 61K-115KAuto-regressive models | CNN | Causal Inference | Cell analysis | ClassificationSenior-level Full TimePenn State University Park, United States R1d ago
-
Principal Algorithms Researcher (Remote) USD 160K-250KAgentic AI | Algorithms | Artificial Intelligence | Cybersecurity | Data ScienceCompetitive vacation and holidays | Comprehensive wellness programs | Employee networks | Great Place to Work certified | Paid adoption leaveSenior-level Full TimeUSA VA Remote, United States R1d ago
-
Applied Statistics Staff Scientist at NSF-NCEMS USD 61K-115KBatch Effects | Cell analysis | Confounding | Data Analysis | Epigenomic data analysisSenior-level Full TimePenn State University Park, United States R1d ago
-
AWS | Boosting algorithms | CatBoost | Cloud Computing | Data PipelinesMid-level Full TimeRemote, USA R2d ago
-
Amazon Web Services | Anthropic | Cloud platform | Cohere | Google CloudCross-functional collaboration | Mentoring | Team research discussionsSenior-level Full TimeRemote, US or Europe R2d ago
-
Senior Applied Scientist - Search USD 200K-200KData Science | Fine Tuning | Hybrid search | Information Retrieval | Knowledge graphs401k retirement plan | Equity package | Growth opportunities | Hybrid work schedule | Medical, dental, and vision coverageSenior-level Full TimeNew York City R2d ago
-
Electromagnetism | Mechanics | NumPy | Numerical Simulation | PandasFlexible schedule | Freelance projects | Part-time work | Project based workEntry-level FreelanceMichigan, United States - Remote R3d ago