Research Engineer – Benchmarking, Evals & Failure Analysis
Tasks
- Align evaluation systems with training and product goals
- Analyze data quality and performance trends
- Build evaluation pipelines
- Conduct failure analysis
- Create rubrics and evaluators
- Design benchmarking systems
- Develop scoring dashboards and reporting
- Run LLM evaluations and experiments
Perks/Benefits
Skills/Tech-stack
API Integration | Algorithms | Benchmarking | Cloud infrastructure | Dashboards | Data Structures | Deep learning | Evaluation Pipelines | Experiment tracking | Failure analysis | LLM Evaluation | Language Models | Large Language Models | Machine Learning | NoSQL | Python | SQL | Scoring systems
Education
N/A
Regions
Countries
States
Related jobs
-
Mid-level Full TimeKing George, VA, United States5h ago
-
AI & Data Solutions Architect USD 150K-200KAPI Integration | AWS | Apache Spark | Azure | CI/CDRemote work | Travel requiredSenior-level Full TimeSeattle, United States8h ago
-
AI/ML Project Manager USD 155K-175KAgile | Amazon Web Services | Artificial Intelligence | Azure | Data EngineeringSenior-level Full TimeUS-NY-New York8h ago
-
Senior Software Engineer, AI Platform Engineering USD 160K-240KAWS | Amazon SageMaker | Containerization | Docker | EC2401k matching | Dental insurance | Life insurance | Medical insurance | Paid HolidaysSenior-level Full TimeNew York8h ago
-
AI/ML Strategist, Subject Matter Expert USD 116K-194KAPI Security | Access Controls | Algorithmic bias | Artificial Intelligence | ComplianceSenior-level Full TimeUSA-VA-Arlington8h ago
-
Junior AI/ML Strategist USD 63K-105KAI | Cloud Computing | Data Analysis | Data Pipelines | DocumentationEntry-level Full TimeUSA-VA-Arlington8h ago
-
Software Engineer, Databases (Technical Leadership) USD 160K-293KAI | Automation | Consensus Protocols | Data Integrity | Database InternalsSenior-level Full TimeBellevue, WA | Menlo Park, CA10h ago
-
C++ | Data Preparation | Data Processing | Debugging | GenAISenior-level Full TimeMountain View, CA, USA10h ago
-
Data Processing | Data Structures | Debugging | Distributed Systems | EmbeddingSenior-level Full TimeMountain View, CA, USA; San Bruno, …10h ago
-
C++ | Data Processing | Debugging | Embedding | Information RetrievalSenior-level Full TimeMountain View, CA, USA; San Bruno, …10h ago
-
Algorithms | C++ | Data Processing | Data Structures | DebuggingSenior-level Full TimeMountain View, CA, USA; San Bruno, …10h ago
-
AI Builder Intern USD 74K-111KAPI Integration | Anthropic API | Autogen | CrewAI | JavaScriptCommuter stipend | Comprehensive health dental and vision | Generous PTO | Learning and development stipend | Retirement benefitsEntry-level InternshipSan Francisco, CA; New York, NY15h ago
-
A/B | A/B Testing | B testing | Engagement modeling | Feature EngineeringSenior-level Full TimeSan Francisco16h ago
-
Member of Technical Staff (AI Software Engineer, Agents) USD 220K-405KAI Evaluation | Browser technologies | CDP | Code Quality | Context engineeringSenior-level Full TimeSan Francisco17h ago
-
ADAS | Autonomous Vehicles | C++ | Camera | Data ProcessingCompany benefits program | Company bonus | Equity incentive plan | Hybrid work scheduleSenior-level Full TimeMountain View, CA, USA; San Francisco, …19h ago
-
Senior Embedded Engineer USD 165K-218KADC | AXI | AXI-Stream | Bootloader | DDREquity grants | Health insurance | Paid time off | Recovery supportSenior-level Full TimeHudson, New Hampshire, United States20h ago
-
Forward Deployed Engineer USD 120K-158KAngular | Code Reviews | Customer enablement | Documentation | GitMid-level Full TimeAtlanta, Georgia, United States; Chicago, Illinois, …20h ago
-
Mid-level Full TimeRedwood City, California, United States20h ago
-
Software Technical Account Manager - Wichita, Kansas USD 93K-150KAI-powered applications | Analytics | Cause analysis | Cloud Computing | Customer adoption401k employer match | Discretionary paid time off | Emotional and mental wellness support | Fitness programs | Learning and development programsMid-level Full TimeKansas-Remote R21h ago
-
Software Engineer, Propulsion Simulation & Data Analysis USD 125K-175K.NET | Angular | C# | C++ | Combustion Engineering401k retirement plan | Dental insurance | Employee stock purchase plan | Health insurance | Paid HolidaysSenior-level Full TimeHawthorne, CA21h ago
-
.NET | Angular | C# | C++ | CI/CD401k retirement plan | Company stock options | Dental insurance | Employee stock purchase plan | Life insuranceSenior-level Full TimeHawthorne, CA21h ago
-
Senior Data Engineer USD 165K-216KAccess Control | Data Governance | Data Lineage | Data Modeling | Data QualitySenior-level Full TimeHouston, TX, 77040, USA21h ago
-
Senior Software Engineer (Pipeline team) USD 185K-259KA/B | A/B Testing | AWS Bedrock | AWS Lambda | AWS S3Senior-level Full TimeUnited States - Remote R21h ago
-
Senior GenAI Software Engineer (North America) USD 165K-230KA/B | A/B Testing | B testing | Debugging | EvaluationEquity | Health, dental, and vision benefits | In person team gatherings quarterly | Remote-first work | Wellness stipendsSenior-level Full TimeUnited States R22h ago
-
Senior Software Engineer, AI Developer Experience USD 202K-230KAPI Integration | Agentic Workflows | Artificial Intelligence | Code review | Command LineCareer coaching and support | In-office culinary options | Inclusive family building benefits | Long term savings or retirement plans | Mental health wellness and fitness benefitsSenior-level Full TimeNew York City R22h ago