AI Evaluation Scientist
Tasks
- Analyze model behavior
- Assess AI model outputs
- Build evaluation scripts
- Collaborate with data scientists
- Contribute to evaluation framework development
- Design evaluation processes
- Develop benchmark datasets
- Develop test harnesses
- Document evaluation results
- Perform error analysis
- Support responsible deployment
Perks/Benefits
Skills/Tech-stack
AI Evaluation | AI evaluation frameworks | Behavior Analysis | Data Analysis | Evaluation Frameworks | Evaluation metrics | Hugging Face | Langchain | Language Processing | Model behavior | Model behavior analysis | Natural Language | Natural Language Processing | PyTorch | Python | Scikit-learn | Statistical Testing | Test Design
Education
Roles
Related jobs
-
Associate AI - Data Scientist / ML Engineer (Azure) USD 46K-110KAzure Container | Azure Container Instances | Azure Data | Azure Data Factory | Azure Data LakeDental insurance | Disability insurance | Employee assistance program | Life insurance | Medical insuranceMid-level Full TimeNashville, TN, US8h ago
-
Applied Scientist USD 170K-271KApache Spark | Bayesian Modeling | Causal Inference | Data Processing | Deep learning401k | Life insurance | Medical/Dental/Vision insurance | Unlimited PTOSenior-level Full TimeMiami, Florida, United States9h ago
-
Senior AI Solutions Engineer, Enterprise Knowledge Work USD 260K-325KAgentic Systems | Dspy | Evaluation | LLM orchestration | LanggraphCollaborative culture | Flexible working hours | Supportive work environmentSenior-level Full TimeNew York, New York, United States; …10h ago
-
Senior AI Solutions Engineer, Software Engineering USD 260K-325KAgentic Software | Agentic Software Engineering | Agentic Workflows | Benchmarking | Code generationCollaborative culture | Five days per week | Flexible working hours | Supportive work environmentSenior-level Full TimeNew York, New York, United States; …10h ago
-
Senior Applied Scientist , Sponsored Products USD 183K-273KA/B | A/B Testing | B testing | Bandit Algorithms | Causal InferenceSenior-level Full TimeNew York, New York, USA10h ago
-
Principal, AI Platform Engineer USD 125K-187KAWS | Azure | CI/CD | Data leakage | Deterministic executionSenior-level Full TimeAtlanta, Georgia, US United States, 3034010h ago
-
AI Research Fellowship, (Summer and Fall 2026) USD 102K-192KAgent planning | Benchmarking | Computer Vision | Data Analysis | ForecastingEmployee assistance program | Housing stipend | In-office lunch credits | Relocation support | Wellness reimbursementSenior-level Full TimeSan Francisco, CA13h ago
-
Applied Scientist - ML/Ai USD 156K-310KComputer Vision | Convolutional Neural Networks | Data Processing | Deep Neural Networks | Deep learning401k | Dental insurance | Life insurance | Medical insurance | Unlimited PTOSenior-level Full TimeMiami, Florida, United States; Office - …13h ago
-
AI Engineer USD 150K-250KAPI Integration | Agents | Data integration | Debugging | Evaluation401k | Dental insurance | Health insurance | Vision insuranceMid-level Full TimeSan Francisco13h ago
-
Internal Systems Engineer (Applied AI) USD 140K-160KAgentic Workflows | CI/CD | DAG | Freshdesk | Gitbook401k | Catered lunch | Flexible PTO | Flexible spending accounts | Great coffeeEntry-level Full TimeColumbus13h ago
-
Staff Data Scientist, Firefox USD 138K-218KA/B | A/B Testing | B testing | Causal Inference | Causal analysisBirthday day off | Country specific holidays | Home office stipend | Medical, dental, and vision coverage | Paid parental leaveSenior-level Full TimeRemote US R14h ago
-
Senior-level Full TimeSan Francisco, California14h ago
-
AI/ML Engineering Manager USD 140K-215KAWS | AWS CDK | AWS CloudFormation | AWS Glue | Agent systems401k plan | Company laptop | Dental insurance | Equipment and office stipend | Flexible spending accountMid-level Full TimeUSA R15h ago
-
Staff, Data Scientist USD 155K-228KCloud Computing | Data Analysis | Data Pipelines | Data Visualization | ETL401k retirement account | Corporate Bonus Plan | Equity plan | Generous time off | HealthcareSenior-level Full TimeRemote - US R16h ago
-
AI Engineering Lead USD 220KAgent systems | Amazon Web Services | Azure | CI/CD | Cloud platform5 days a week | Collaborative work culture | Flexible working hours | Supportive work environment | Work with top talentSenior-level Full TimeNew York, New York, United States16h ago
-
Data Scientist II USD 85K-100KBI engineering | Data Mining | Data Preprocessing | Data Visualization | Data Warehousing401k match | Dental insurance | ESPP | Flexible spending account | Health insuranceMid-level Full TimeRemote, United States R16h ago
-
AWS | Agent systems | Apache Spark | Azure | Cloud platformConference speaking opportunities | Performance bonus eligibility | Remote work eligibilitySenior-level Full TimeUnited States16h ago
-
Staff Data Scientist USD 155K-228KA/B | A/B Testing | Airflow | Apache Spark | Applied statisticsHealth insurance | Parental leave | Remote work | Retirement savings program | Time offSenior-level Full TimeRemote - US R16h ago
-
Data Scientist, Product Analytics USD 140K-166KA/B | A/B Testing | Analytics | B testing | Data Instrumentation401k | Apparel discounts | Child Care Discounts | Commuter benefits | Employee stock purchase planMid-level Full TimeNew York, New York16h ago
-
Staff Data Scientist, Monetization USD 180K-300KA/B | A/B Testing | Analytics | B testing | ExperimentationSenior-level Full TimeRedwood City, CA16h ago
-
AI Engineer - Model Performance USD 165K-250KAttention Backend | Audio Processing | Batching | CUDA | CUDA graphAsync communication | Innovation-focused culture | Remote work | Startup environment | Supportive teamMid-level Full TimeSF Hybrid R17h ago
-
Senior AI Data Engineer - Cloud and Marketing Analytics USD 130K-140KData Modeling | Data Pipelines | DataOps | DevOps | Machine LearningSenior-level Full TimeRedmond, WA17h ago
-
Senior-level Full TimeNew York, NY17h ago
-
Forward-Deployed Data Scientist II USD 98K-183KAirflow | CI/CD | CatBoost | Cloud platform | Code DocumentationFlexible paid time off | Health, dental, vision insurance | Hybrid work | Learning stipend | Parental leaveMid-level Full TimeSan Francisco17h ago
-
Forward-Deployed Data Scientist II USD 98K-183KAirflow | CI/CD | CatBoost | Cloud platform | Code reviewDisability insurance | Donation matching | Employee resource groups | Employee stock purchase plan | Fertility benefitsMid-level Full TimeAustin17h ago