AI Evaluation Scientist
Tasks
- Analyze model behavior and performance drift
- Build automated evaluation scripts tests and pipelines
- Design human in the loop evaluation workflows
- Develop benchmark datasets and challenge sets
- Document evaluation processes criteria and results
- Implement AI evaluation frameworks
- Integrate evaluation results into evaluation reports
- Perform error analysis and behavioral audits
- Support responsible AI compliance documentation and risk assessments
Perks/Benefits
- N/A
Skills/Tech-stack
AI Governance | Agile | Dataset creation | Embeddings | Evaluation metrics | Experimental Design | Hugging Face | Human-in-the-loop | Langchain | Language Models | Language Processing | Large Language Models | Machine Learning | Natural Language | Natural Language Processing | Prompt evaluation | PyTorch | Python | RAG | Ragas | Retrieval-Augmented Generation | Scikit-learn | Statistical Testing | Test harnesses | The Loop
Education
Bachelor of Engineering | Bachelor of Science | Master of Science
Roles
Related jobs
-
Associate AI - Data Scientist / ML Engineer (Azure) USD 46K-110KAzure Container | Azure Container Instances | Azure Data | Azure Data Factory | Azure Data LakeDental insurance | Disability insurance | Employee assistance program | Life insurance | Medical insuranceMid-level Full TimeNashville, TN, US5h ago
-
Applied Scientist USD 170K-271KApache Spark | Bayesian Modeling | Causal Inference | Data Processing | Deep learning401k | Life insurance | Medical/Dental/Vision insurance | Unlimited PTOSenior-level Full TimeMiami, Florida, United States5h ago
-
Senior Applied Scientist , Sponsored Products USD 183K-273KA/B | A/B Testing | B testing | Bandit Algorithms | Causal InferenceSenior-level Full TimeNew York, New York, USA7h ago
-
Applied Scientist - ML/Ai USD 156K-310KComputer Vision | Convolutional Neural Networks | Data Processing | Deep Neural Networks | Deep learning401k | Dental insurance | Life insurance | Medical insurance | Unlimited PTOSenior-level Full TimeMiami, Florida, United States; Office - …9h ago
-
Staff Data Scientist, Firefox USD 138K-218KA/B | A/B Testing | B testing | Causal Inference | Causal analysisBirthday day off | Country specific holidays | Home office stipend | Medical, dental, and vision coverage | Paid parental leaveSenior-level Full TimeRemote US R10h ago
-
Senior-level Full TimeSan Francisco, California11h ago
-
AI/ML Engineering Manager USD 140K-215KAWS | AWS CDK | AWS CloudFormation | AWS Glue | Agent systems401k plan | Company laptop | Dental insurance | Equipment and office stipend | Flexible spending accountMid-level Full TimeUSA R12h ago
-
AWS | Agent systems | Apache Spark | Azure | Cloud platformConference speaking opportunities | Performance bonus eligibility | Remote work eligibilitySenior-level Full TimeUnited States13h ago
-
Forward-Deployed Data Scientist II USD 98K-183KAirflow | CI/CD | CatBoost | Cloud platform | Code DocumentationFlexible paid time off | Health, dental, vision insurance | Hybrid work | Learning stipend | Parental leaveMid-level Full TimeSan Francisco14h ago
-
Forward-Deployed Data Scientist II USD 98K-183KAirflow | CI/CD | CatBoost | Cloud platform | Code reviewDisability insurance | Donation matching | Employee resource groups | Employee stock purchase plan | Fertility benefitsMid-level Full TimeAustin14h ago
-
Forward-Deployed Data Scientist II USD 98K-183KAirflow | CatBoost | Data integration | ETL | GCPEmployee stock purchase plan | Flexible paid time off | Health, dental, vision insurance | Learning stipend | Parental leaveMid-level Full TimeChicago14h ago
-
Forward-Deployed Data Scientist II USD 98K-183KAPIs | Airflow | CI/CD | CatBoost | Cloud platformEmployee stock purchase plan | Equal Paid Parental Leave | Flexible paid time off | Health, dental, vision benefits | Hybrid workMid-level Full TimeNew York City14h ago
-
Senior Data Scientist USD 100K-166KAPI Integration | Advanced Statistics | ArcGIS Pro | Arcpy | BokehSenior-level Full TimeSpringfield, VA, United States15h ago
-
AWS | Data Privacy | Data pipeline | Deep learning | Feature Engineering401k | Dental | Medical | Paid sick leave | VisionSenior-level ContractSouth San Francisco, United States17h ago
-
Lead Data Engineer – Azure Databricks USD 150K-180KARIMA | Azure Databricks | Data Processing | Data analytics | Machine LearningOnsite workSenior-level Full TimeTampa, United States18h ago
-
Data Scientist USD 94K-157KArtificial Intelligence | Data Mining | Data Pipelines | Data analytics | Machine LearningHealth insurance | Holiday pay | Learning and development | Life insurance | Long-term disabilityEntry-level Full TimeUSA-Remote Work R18h ago
-
Senior-level Full TimeLansing, MI, United States18h ago
-
Associate Data Scientist USD 82K-124KAlteryx | Data Visualization | ETL | Machine Learning | Microsoft AzureMid-level Full TimeUnited States-Ohio-Cleveland18h ago
-
Data Scientist II, School of Public Health USD 50K-65KAlgorithm Development | Artificial Intelligence | Bioinformatics | Data Analysis | Data WorkflowPaid parental leave | Paid time off | Public service loan forgiveness | Tuition reimbursement | Wellness programsMid-level Full TimeTexas-Dallas-5323 Harry Hines Blvd18h ago
-
Director, Marketing Data Science USD 253K-314KAI tools | Campaign Optimization | Decision Science | Frequency Measurement | Funnel AnalysisExecutive-level Full TimeAustin, TX20h ago
-
Research Engineer, Gemini Latent Thinking, DeepMind USD 207K-300KAblation Study | Algorithm Development | Deep learning | Experiment design | Language ModelsSenior-level Full TimeCambridge, MA, USA; Mountain View, CA, …20h ago
-
Frontier AI Research Scientist, DeepMind USD 147K-211KArtificial Intelligence | Data Structures | Evaluation metrics | Experiment design | Machine LearningMid-level Full TimeMountain View, CA, USA20h ago
-
Research Scientist, Robotics, Embodied AI, DeepMind USD 147K-211KDeep learning | Language Models | Machine Learning | Python | Reinforcement LearningSenior-level Full TimeMountain View, CA, USA20h ago
-
Research Scientist, Biomedical AI, DeepMind USD 147K-211KArtificial Intelligence | Benchmarking | Computational pipeline | Data Analysis | EvaluationMid-level Full TimeMountain View, CA, USA20h ago
-
Senior-level Full TimeDallas22h ago