AI Evaluation Scientist
Tasks
- Analyze model behavior and performance drift
- Build automated evaluation scripts tests and pipelines
- Design human in the loop evaluation workflows
- Develop benchmark datasets and challenge sets
- Document evaluation processes criteria and results
- Implement AI evaluation frameworks
- Integrate evaluation results into evaluation reports
- Perform error analysis and behavioral audits
- Support responsible AI compliance documentation and risk assessments
Perks/Benefits
- N/A
Skills/Tech-stack
AI Governance | Agile | Dataset creation | Embeddings | Evaluation metrics | Experimental Design | Hugging Face | Human-in-the-loop | Langchain | Language Models | Language Processing | Large Language Models | Machine Learning | Natural Language | Natural Language Processing | Prompt evaluation | PyTorch | Python | RAG | Ragas | Retrieval-Augmented Generation | Scikit-learn | Statistical Testing | Test harnesses | The Loop
Education
Bachelor of Engineering | Bachelor of Science | Master of Science
Roles
Related jobs
-
Associate AI - Data Scientist / ML Engineer (Azure) USD 46K-110KAzure Container | Azure Container Instances | Azure Data | Azure Data Factory | Azure Data LakeDental insurance | Disability insurance | Employee assistance program | Life insurance | Medical insuranceMid-level Full TimeNashville, TN, US6h ago
-
Applied Scientist USD 170K-271KApache Spark | Bayesian Modeling | Causal Inference | Data Processing | Deep learning401k | Life insurance | Medical/Dental/Vision insurance | Unlimited PTOSenior-level Full TimeMiami, Florida, United States6h ago
-
Senior Applied Scientist , Sponsored Products USD 183K-273KA/B | A/B Testing | B testing | Bandit Algorithms | Causal InferenceSenior-level Full TimeNew York, New York, USA8h ago
-
Applied Scientist - ML/Ai USD 156K-310KComputer Vision | Convolutional Neural Networks | Data Processing | Deep Neural Networks | Deep learning401k | Dental insurance | Life insurance | Medical insurance | Unlimited PTOSenior-level Full TimeMiami, Florida, United States; Office - …10h ago
-
AI Engineer USD 150K-250KAPI Integration | Agents | Data integration | Debugging | Evaluation401k | Dental insurance | Health insurance | Vision insuranceMid-level Full TimeSan Francisco11h ago
-
Staff Data Scientist, Firefox USD 138K-218KA/B | A/B Testing | B testing | Causal Inference | Causal analysisBirthday day off | Country specific holidays | Home office stipend | Medical, dental, and vision coverage | Paid parental leaveSenior-level Full TimeRemote US R11h ago
-
Senior-level Full TimeSan Francisco, California12h ago
-
AI/ML Engineering Manager USD 140K-215KAWS | AWS CDK | AWS CloudFormation | AWS Glue | Agent systems401k plan | Company laptop | Dental insurance | Equipment and office stipend | Flexible spending accountMid-level Full TimeUSA R13h ago
-
Head of AI Agent Systems USD 180K-265KAgent systems | Artificial Intelligence | Compliance | Debugging | Developer toolsExecutive-level Full TimeSan Francisco13h ago
-
Staff, Data Scientist USD 155K-228KCloud Computing | Data Analysis | Data Pipelines | Data Visualization | ETL401k retirement account | Corporate Bonus Plan | Equity plan | Generous time off | HealthcareSenior-level Full TimeRemote - US R14h ago
-
AI Engineering Lead USD 220KAgent systems | Amazon Web Services | Azure | CI/CD | Cloud platform5 days a week | Collaborative work culture | Flexible working hours | Supportive work environment | Work with top talentSenior-level Full TimeNew York, New York, United States14h ago
-
AWS | Agent systems | Apache Spark | Azure | Cloud platformConference speaking opportunities | Performance bonus eligibility | Remote work eligibilitySenior-level Full TimeUnited States14h ago
-
Staff Data Scientist USD 155K-228KA/B | A/B Testing | Airflow | Apache Spark | Applied statisticsHealth insurance | Parental leave | Remote work | Retirement savings program | Time offSenior-level Full TimeRemote - US R14h ago
-
Data Scientist, Product Analytics USD 140K-166KA/B | A/B Testing | Analytics | B testing | Data Instrumentation401k | Apparel discounts | Child Care Discounts | Commuter benefits | Employee stock purchase planMid-level Full TimeNew York, New York14h ago
-
AI Engineer - Model Performance USD 165K-250KAttention Backend | Audio Processing | Batching | CUDA | CUDA graphAsync communication | Innovation-focused culture | Remote work | Startup environment | Supportive teamMid-level Full TimeSF Hybrid R15h ago
-
Senior AI Data Engineer - Cloud and Marketing Analytics USD 130K-140KData Modeling | Data Pipelines | DataOps | DevOps | Machine LearningSenior-level Full TimeRedmond, WA15h ago
-
Senior-level Full TimeNew York, NY15h ago
-
Forward-Deployed Data Scientist II USD 98K-183KAirflow | CI/CD | CatBoost | Cloud platform | Code DocumentationFlexible paid time off | Health, dental, vision insurance | Hybrid work | Learning stipend | Parental leaveMid-level Full TimeSan Francisco15h ago
-
Forward-Deployed Data Scientist II USD 98K-183KAirflow | CI/CD | CatBoost | Cloud platform | Code reviewDisability insurance | Donation matching | Employee resource groups | Employee stock purchase plan | Fertility benefitsMid-level Full TimeAustin15h ago
-
Forward-Deployed Data Scientist II USD 98K-183KAirflow | CatBoost | Data integration | ETL | GCPEmployee stock purchase plan | Flexible paid time off | Health, dental, vision insurance | Learning stipend | Parental leaveMid-level Full TimeChicago15h ago
-
Forward-Deployed Data Scientist II USD 98K-183KAPIs | Airflow | CI/CD | CatBoost | Cloud platformEmployee stock purchase plan | Equal Paid Parental Leave | Flexible paid time off | Health, dental, vision benefits | Hybrid workMid-level Full TimeNew York City15h ago
-
Benchmarking | CUDA | CUDNN | Cutlass | Deep learningMid-level Full TimeUS-WA-Bellevue16h ago
-
Senior Data Scientist USD 100K-166KAPI Integration | Advanced Statistics | ArcGIS Pro | Arcpy | BokehSenior-level Full TimeSpringfield, VA, United States16h ago
-
Principal Data Scientist USD 200K-245KCloud deployment | Data Cleansing | Data Pipelines | Data Wrangling | Distributed ComputingCareer progression | Hybrid work model | Internal mobility | Unlimited PTOSenior-level Full TimeJersey City, NJ, United States17h ago
-
AWS | Data Privacy | Data pipeline | Deep learning | Feature Engineering401k | Dental | Medical | Paid sick leave | VisionSenior-level ContractSouth San Francisco, United States19h ago