Research Engineer – Benchmarking, Evals & Failure Analysis

San Francisco, CA; Onsite

A USD 130K-500K Senior-level Full Time

@ R...

Apply Save

Found 1mo ago

Tasks

Align evaluation systems with training and product goals
Analyze data quality and performance trends
Build evaluation pipelines
Conduct failure analysis
Create rubrics and evaluators
Design benchmarking systems
Develop scoring dashboards and reporting
Run LLM evaluations and experiments

Perks/Benefits

Skills/Tech-stack

Education

N/A

Roles

Apply Save

Language: en Views:

1 Clicks:

0 Saves: 0

Related jobs

AI/ML Engineer USD 77K-176K

API Design | API Integration | Automated testing | Azure | C#

Mid-level Full Time

King George, VA, United States

3h ago
AI & Data Solutions Architect USD 150K-200K

API Integration | AWS | Apache Spark | Azure | CI/CD

Remote work | Travel required

Senior-level Full Time

Seattle, United States

6h ago
AI/ML Project Manager USD 155K-175K

Agile | Amazon Web Services | Artificial Intelligence | Azure | Data Engineering

Senior-level Full Time

US-NY-New York

7h ago
Senior Software Engineer, AI Platform Engineering USD 160K-240K

AWS | Amazon SageMaker | Containerization | Docker | EC2

401k matching | Dental insurance | Life insurance | Medical insurance | Paid Holidays

Senior-level Full Time

New York

7h ago
AI/ML Strategist, Subject Matter Expert USD 116K-194K

API Security | Access Controls | Algorithmic bias | Artificial Intelligence | Compliance

Senior-level Full Time

USA-VA-Arlington

7h ago
Junior AI/ML Strategist USD 63K-105K

AI | Cloud Computing | Data Analysis | Data Pipelines | Documentation

Entry-level Full Time

USA-VA-Arlington

7h ago
Software Engineer, Databases (Technical Leadership) USD 160K-293K

AI | Automation | Consensus Protocols | Data Integrity | Database Internals

Senior-level Full Time

Bellevue, WA | Menlo Park, CA

8h ago
Software Engineer III, AI/ML GenAI, Google Cloud Data Management USD 147K-211K

C++ | Data Preparation | Data Processing | Debugging | GenAI

Senior-level Full Time

Mountain View, CA, USA

8h ago
Staff Software Engineer, AI/ML Recommendations, Rankings, Predictions, YouTube USD 207K-301K

Data Processing | Data Structures | Debugging | Distributed Systems | Embedding

Senior-level Full Time

Mountain View, CA, USA; San Bruno, …

8h ago
Software Engineer III, AI/ML Recommendations, Rankings, Predictions, YouTube USD 147K-211K

C++ | Data Processing | Debugging | Embedding | Information Retrieval

Senior-level Full Time

Mountain View, CA, USA; San Bruno, …

8h ago
Senior Software Engineer, AI/ML Recommendations, Rankings, Predictions, YouTube USD 174K-253K

Algorithms | C++ | Data Processing | Data Structures | Debugging

Senior-level Full Time

Mountain View, CA, USA; San Bruno, …

8h ago
AI Builder Intern USD 74K-111K

API Integration | Anthropic API | Autogen | CrewAI | JavaScript

Commuter stipend | Comprehensive health dental and vision | Generous PTO | Learning and development stipend | Retirement benefits

Entry-level Internship

San Francisco, CA; New York, NY

14h ago
Member of Technical Staff (ML Engineer, Recommendations & User Modeling) USD 220K-405K

A/B | A/B Testing | B testing | Engagement modeling | Feature Engineering

Senior-level Full Time

San Francisco

15h ago
Member of Technical Staff (AI Software Engineer, Agents) USD 220K-405K

AI Evaluation | Browser technologies | CDP | Code Quality | Context engineering

Senior-level Full Time

San Francisco

16h ago
Senior Software Engineer, Perception Future Sensing Platforms USD 213K-263K

ADAS | Autonomous Vehicles | C++ | Camera | Data Processing

Company benefits program | Company bonus | Equity incentive plan | Hybrid work schedule

Senior-level Full Time

Mountain View, CA, USA; San Francisco, …

18h ago
Senior Embedded Engineer USD 165K-218K

ADC | AXI | AXI-Stream | Bootloader | DDR

Equity grants | Health insurance | Paid time off | Recovery support

Senior-level Full Time

Hudson, New Hampshire, United States

18h ago
Forward Deployed Engineer USD 120K-158K

Angular | Code Reviews | Customer enablement | Documentation | Git

Mid-level Full Time

Atlanta, Georgia, United States; Chicago, Illinois, …

19h ago
Forward Deployed Engineer USD 120K-158K

Angular | Git | JavaScript | Node.js | Python

Mid-level Full Time

Redwood City, California, United States

19h ago
Software Technical Account Manager - Wichita, Kansas USD 93K-150K

AI-powered applications | Analytics | Cause analysis | Cloud Computing | Customer adoption

401k employer match | Discretionary paid time off | Emotional and mental wellness support | Fitness programs | Learning and development programs

Mid-level Full Time

Kansas-Remote R

19h ago
Software Engineer, Propulsion Simulation & Data Analysis USD 125K-175K

.NET | Angular | C# | C++ | Combustion Engineering

401k retirement plan | Dental insurance | Employee stock purchase plan | Health insurance | Paid Holidays

Senior-level Full Time

Hawthorne, CA

19h ago
Sr. Software Engineer, Propulsion Simulation & Data Analysis (Raptor) USD 165K-230K

.NET | Angular | C# | C++ | CI/CD

401k retirement plan | Company stock options | Dental insurance | Employee stock purchase plan | Life insurance

Senior-level Full Time

Hawthorne, CA

19h ago
Senior Data Engineer USD 165K-216K

Access Control | Data Governance | Data Lineage | Data Modeling | Data Quality

Senior-level Full Time

Houston, TX, 77040, USA

19h ago
Senior Software Engineer (Pipeline team) USD 185K-259K

A/B | A/B Testing | AWS Bedrock | AWS Lambda | AWS S3

Senior-level Full Time

United States - Remote R

19h ago
Senior GenAI Software Engineer (North America) USD 165K-230K

A/B | A/B Testing | B testing | Debugging | Evaluation

Equity | Health, dental, and vision benefits | In person team gatherings quarterly | Remote-first work | Wellness stipends

Senior-level Full Time

United States R

20h ago
Senior Software Engineer, AI Developer Experience USD 202K-230K

API Integration | Agentic Workflows | Artificial Intelligence | Code review | Command Line

Career coaching and support | In-office culinary options | Inclusive family building benefits | Long term savings or retirement plans | Mental health wellness and fitness benefits

Senior-level Full Time

New York City R

21h ago

Research Engineer – Benchmarking, Evals & Failure Analysis

Tasks

Perks/Benefits

Skills/Tech-stack

Education

Roles

Regions

Countries

States

Cities

Related jobs