AI Evaluation Scientist
Tasks
- Analyze model behavior and performance drift
- Build automated evaluation scripts tests and pipelines
- Design human in the loop evaluation workflows
- Develop benchmark datasets and challenge sets
- Document evaluation processes criteria and results
- Implement AI evaluation frameworks
- Integrate evaluation results into evaluation reports
- Perform error analysis and behavioral audits
- Support responsible AI compliance documentation and risk assessments
Perks/Benefits
- N/A
Skills/Tech-stack
AI Governance | Agile | Dataset creation | Embeddings | Evaluation metrics | Experimental Design | Hugging Face | Human-in-the-loop | Langchain | Language Models | Language Processing | Large Language Models | Machine Learning | Natural Language | Natural Language Processing | Prompt evaluation | PyTorch | Python | RAG | Ragas | Retrieval-Augmented Generation | Scikit-learn | Statistical Testing | Test harnesses | The Loop
Education
Bachelor of Engineering | Bachelor of Science | Master of Science
Roles
Related jobs
-
Featured Feat. Data Scientist USD 80K-157KAWS | Airflow | Annotation | Azure | ClassificationHybrid work mode | Professional development opportunitiesMid-levelGeorgetown University: Main Campus: Walsh School …14d ago
-
API Integration | Agent Orchestration | Artificial Intelligence | Automation | Automation frameworksSenior-level Full TimeAustin, TX | Menlo Park, CA …2h ago
-
Adversarial ML | Benchmarking | Data Mining | Environment Design | Function CallingMid-level Full TimeMountain View, CA, USA; New York, …2h ago
-
Product Data Scientist, Payments Platform Experience USD 138K-198KCausal Inference | Data Modeling | Data Quality | Experiment design | Machine LearningMid-level Full TimeMountain View, CA, USA2h ago
-
AI Cybersecurity Team Lead, DeepMind USD 262K-365KC plus plus | Code security | Cybersecurity | Cybersecurity Research | Data MiningSenior-level Full TimeMountain View, CA, USA; San Francisco, …3h ago
-
Research Scientist, Gemini Vision, DeepMind USD 147K-211KAgentic Systems | Artificial Intelligence | Autonomous AI | Computer Vision | Data InfrastructureSenior-level Full TimeMountain View, CA, USA; Cambridge, MA, …3h ago
-
Senior Business Data Scientist, GCS Product Ads Finance USD 163K-237KData Modeling | Database querying | Financial Modeling | Forecasting | Machine LearningSenior-level Full TimeMountain View, CA, USA; New York, …3h ago
-
Sr. AI Software Developer USD 116K-188KAI orchestration | Agent-based | Agent-based systems | Angular | Cloud NativeSenior-level Full TimeKing of Prussia, PA, US, 194068h ago
-
ML Engineer, Surrogate Modeling (Vehicle Engineering) USD 125K-175KActive Learning | Adaptive Sampling | CFD | Continuous integration | Data Pipelines401k retirement plan | Employee stock purchase plan | Life insurance | Long-term disability insurance | Long-term incentivesEntry-level Full TimeHawthorne, CA10h ago
-
GenAI Enablement Specialist USD 170K-231KAgentic Frameworks | Embeddings | LLM | LLM Evaluation | Prompt engineeringBackup childcare | Financial coaching | Health care coverage | Mental health support | On-site health and wellness centersExecutive-level Full TimeNew York, NY, United States12h ago
-
Senior-level Full TimePalo Alto, California12h ago
-
Forward Deployed Engineer USD 70K-100KAPIs | Agent systems | Automation | BPM | Cloud ComputingEquity | Health benefits | Remote workSenior-level Full TimeBoston, Massachusetts; New York, New York; …12h ago
-
Forward Deployed Engineer USD 70K-100KAPI Integration | Agent systems | Agentic AI | BPM | Cloud ComputingEquity | Flexible work environment | Health benefits | Remote work optionsSenior-level Full TimeBoston, Massachusetts; New York, New York; …12h ago
-
Intern, AI Engineering USD 64K-106KCUDA | CUDA kernel | CUDA kernel development | Hugging Face | Inference OptimizationEntry-level InternshipSan Francisco, California12h ago
-
Lead AI Research Scientist USD 357K-357KAgent evaluation | CUDA | Deterministic Planning | Human Feedback | JAXSenior-level Full TimeSan Francisco, California12h ago
-
Data Scientist, Operations USD 140K-180KAPIs | Causal Inference | Experimentation | Forecasting | LLMs401k match | Commuter benefits | Dental insurance | Generous time off | Life insuranceMid-level Full TimeSan Francisco, CA13h ago
-
AI Software Engineer (Vehicle Engineering) USD 125K-175K3D Reconstruction | Agent systems | Agentic AI | Anomaly Detection | CI/CD401k retirement plan | Dental insurance | Employee stock purchase plan | Life insurance | Life insurance coverageSenior-level Full TimeHawthorne, CA13h ago
-
Lead AI Engineer (ML Ops) USD 116K-170KAPIs | AWS | Agile Scrum | Azure | CI/CD401k match | Employee assistance program | Employee charity match | Employee stock purchase plan | Health and wellness allowanceSenior-level Full TimeIrving - 6011 Connection, United States14h ago
-
C++ | Computer Vision | JAX | Machine Learning | PyTorchEntry-level Full TimeMountain View, CA USA; San Francisco, …14h ago
-
AI Integration Analyst USD 86K-176KAWS | Agent workflows | Artificial Intelligence | Azure | Backlog ManagementSenior-level Full TimeWashington, DC16h ago
-
AI Prompt Engineer USD 90K-175KAI Agents | Amazon Kiro | Appian AI | Data integration | Google GeminiDisability insurance | Employee assistance program | Employee stock purchase program | Health coverage | Life insuranceMid-level Full TimeMcLean, Virginia17h ago
-
Data Analysis | Data Visualization | Financial Modeling | Machine Learning | Power BICross-functional collaboration | Mentoring junior analysts | On site work five days per weekSenior-level Full TimeWashington, DC Metro Area17h ago
-
Data Science Senior Associate - Digiops Analytics USD 170K-201KA/B | A/B Testing | Adobe Analytics | Advanced Analytics | AlteryxBackup childcare | Financial coaching | Health care coverage | Mental health support | On-site health and wellness centersMid-level Full TimeWilmington, DE, United States17h ago
-
Senior AI Engineer USD 160K-250KAPI Design | Agent Orchestration | Agent systems | Audit Logging | Authentication401k eligibility | Flexible work environment | Hybrid work option | Paid time off | Parental leave eligibilitySenior-level Full TimeUnited States (Remote) R18h ago
-
Head of Global Workplace GenAI - Remote USD 163K-250KAI Governance | AI adoption | Anthropic Claude | Change Management | ChatGPT enterpriseRemote work | Travel for businessExecutive-level Full TimeTexas, TX, United States R19h ago