AI Evaluation Scientist
Tasks
- Analyze model behavior and performance drift
- Build automated evaluation scripts tests and pipelines
- Design human in the loop evaluation workflows
- Develop benchmark datasets and challenge sets
- Document evaluation processes criteria and results
- Implement AI evaluation frameworks
- Integrate evaluation results into evaluation reports
- Perform error analysis and behavioral audits
- Support responsible AI compliance documentation and risk assessments
Perks/Benefits
- N/A
Skills/Tech-stack
AI Governance | Agile | Dataset creation | Embeddings | Evaluation metrics | Experimental Design | Hugging Face | Human-in-the-loop | Langchain | Language Models | Language Processing | Large Language Models | Machine Learning | Natural Language | Natural Language Processing | Prompt evaluation | PyTorch | Python | RAG | Ragas | Retrieval-Augmented Generation | Scikit-learn | Statistical Testing | Test harnesses | The Loop
Education
Bachelor of Engineering | Bachelor of Science | Master of Science
Roles
Related jobs
-
Mid-level Full TimeFort Lauderdale, FL, United States3h ago
-
Lead Applied Scientist USD 150K-170KAgentic AI | Autogen | CrewAI | Data Quality | Deep learning401k match | Dental insurance | Employee assistance program | Employee recognition | Employee stock purchase planSenior-level Full TimeWork From Home, United States R4h ago
-
Sr. Tech Lead, GTM Applied AI & Analytics USD 150K-243KAirflow | Data Warehousing | Databricks | Fine Tuning | LLM APIsSenior-level Full TimeSan Francisco, CA, United States8h ago
-
Senior-level Full TimeMenlo Park, CA9h ago
-
Data Scientist USD 167K-203KData Modeling | Experimental Design | Forecasting | Hypothesis Testing | OptimizationEntry-level Full TimeMenlo Park, CA9h ago
-
Data Scientist, Product USD 209K-235KClustering | Data Mining | Descriptive Statistics | Distributed Systems | ETLTelecommutingSenior-level Full TimeMenlo Park, CA | Remote, US R9h ago
-
AI Engineer, Professional Services, Google Cloud USD 183K-265KApache Beam | Apache Spark | C++ | Data Validation | Data WarehousingTechnical workshops | Travel opportunitiesSenior-level Full TimeAustin, TX, USA; Atlanta, GA, USA9h ago
-
Staff Product Data Scientist, Google Play App USD 192K-278KA/B | A/B Testing | B testing | Experiment design | LoggingSenior-level Full TimeMountain View, CA, USA9h ago
-
Research Data Scientist, Operations Data Science USD 147K-211KMachine Learning | Operations Research | Python | R | SQLMid-level Full TimeAustin, TX, USA; Atlanta, GA, USA9h ago
-
Senior Staff Research Data Scientist, DevIE USD 262K-365KData Modeling | Data Quality | Machine Learning | Python | RSenior-level Full TimeSunnyvale, CA, USA; Kirkland, WA, USA9h ago
-
Manager, IT AI Engineering - Agent Engineering USD 96K-131KA/B | A/B Testing | API Design | Access Control | Agent architecture401k program | Car discounts | Cruise discounts | Employee assistance program | Flexible spending accountsMid-level Full TimeFort Worth, TX, US12h ago
-
Intern, AI Engineering USD 80K-124KCUDA | Hugging Face | Hugging Face Transformers | Inference Optimization | Language ModelsEmployee benefits | Flexible work environment | Remote work optionsEntry-level InternshipSan Francisco, California17h ago
-
AI APIs | Browser Automation | Cloud Computing | Dashboard Development | Data VisualizationCareer development | Intern community | Mentorship | TrainingEntry-level InternshipNew York, New York17h ago
-
Scientist, Advanced Microscopy (Data Science & TEM) USD 100K-120K3D Imaging | Computer Vision | Data Analysis | Data Modeling | Data VisualizationSenior-level Full TimePhoenix, AZ, United States18h ago
-
Data Scientist , Amazon Business USD 136K-184KAWS | Data Analysis | Data Modeling | Fraud Detection | Machine LearningMid-level Full TimeSeattle, Washington, USA20h ago
-
Lead Architect , AI Solutions Architecture - PI USD 169K-279KAI Act | AI Foundry | AI RMF | AI Services | API Integration401k match | Health insurance | Mental health counseling | Paid Holidays | Paid time offSenior-level Full TimeHartford - Tower, United States20h ago
-
Lead Architect , AI Solutions Architecture - EDDA USD 169K-279KAI RMF | AWS Bedrock | Agentic AI | Artificial Intelligence | CI/CD401k match | Health insurance | Paid time off | Volunteer rewards | Wellness programSenior-level Full TimeHartford - Tower, United States20h ago
-
AI Governance | Acceptance Testing | Agentic AI | Artificial Intelligence | CGMP compliance401k | Bonus | Employee assistance program | Fitness benefits | Flexible spending accountsMid-level Full TimeUS: Indianapolis IN Parkwood West, United …20h ago
-
Senior Architect, AI Solutions Architecture - ETS USD 139K-230KAWS Bedrock | Agent Orchestration | Artificial Intelligence | CI/CD | Crew AI401k match | Health insurance | Mental health counseling | Paid Holidays | Paid time offSenior-level Full TimeHartford - Tower, United States20h ago
-
Lead Architect, AI Solutions Architecture - BSI/Cyber USD 169K-279KAI Act | AI Ops | AI RMF | AWS Bedrock | Agentic AI401k match | Health insurance | Mental health counseling | Paid time off | Volunteer rewardsSenior-level Full TimeHartford - Tower, United States20h ago
-
Lead Architect , AI Solutions Architecture - Claim USD 169K-279KAI Act | AI Ops | AWS Bedrock | Agent Orchestration | Agentic AI401k match | Employee assistance program | Health insurance | Matching gift program | Paid time offSenior-level Full TimeHartford - Tower, United States20h ago
-
Lead Architect , AI Solutions Architecture - BI/INTL USD 169K-279KAI Act | AI Ops | AI RMF | AI Services | AWS Bedrock401k match | Free counseling services | Health coaching | Health insurance | Matching giftSenior-level Full TimeHartford - Tower, United States20h ago
-
Senior Architect, AI Solutions Architecture - CorpTech USD 139K-230KAI Governance | AI Ops | AWS Bedrock | Agent Orchestration | Agentic AI401k matching | Health insurance | Mental health counseling | Paid Holidays | Paid time offSenior-level Full TimeHartford - Tower, United States20h ago
-
Scientist III, Data Engineer USD 161K-186KACID | API Gateway | AWS API | AWS API Gateway | AWS CodePipeline401k | Accident insurance | Commuter benefits | Dental insurance | Employee assistance programSenior-level Full TimeUS - Waltham, MA - 168 … R20h ago
-
Data Scientist - Innovation - PhD (Irving, TX) USD 95K-115KAWS | Agentic AI | Attention | CNV | CNV CallingOn-site employment | Relocation assistanceMid-level Full TimeIrving - HQ, United States20h ago