Research Engineer – Benchmarking, Evals & Failure Analysis
Tasks
- Align evaluation systems with training and product goals
- Analyze data quality and performance trends
- Build evaluation pipelines
- Conduct failure analysis
- Create rubrics and evaluators
- Design benchmarking systems
- Develop scoring dashboards and reporting
- Run LLM evaluations and experiments
Perks/Benefits
Skills/Tech-stack
API Integration | Algorithms | Benchmarking | Cloud infrastructure | Dashboards | Data Structures | Deep learning | Evaluation Pipelines | Experiment tracking | Failure analysis | LLM Evaluation | Language Models | Large Language Models | Machine Learning | NoSQL | Python | SQL | Scoring systems
Education
N/A
Regions
Countries
States
Related jobs
-
Behavior Modeling | CTR Prediction | CVR Prediction | Conversion Rate | Conversion rate optimizationEntry-level Full TimeSeattle, Washington, United States3h ago
-
Blended Ranking | Click Through Rate | Click Through Rate Prediction | Conversion Rate | Conversion Rate PredictionEntry-level Full TimeSan Jose, California, United States3h ago
-
Machine Learning Engineer Graduate (TikTok-Data-Search-Local Service) - 2026 Start (PhD) USD 129K-243KBehavior Modeling | Blended Ranking | Click Through Rate | Click Through Rate Prediction | Coarse RankingEntry-level Full TimeSeattle, Washington, United States3h ago
-
Software Engineer, Databases (Technical Leadership) USD 151K-293KAI | Agent Orchestration | Automated Performance Tuning | Consensus Protocols | Data IntegritySenior-level Full TimeBellevue, WA | Menlo Park, CA3h ago
-
Low Power Design Methodology and Optimization Engineer USD 163K-237KCPF | CPU Power Optimization | Logic synthesis | Low power | Low power designSenior-level Full TimeAustin, TX, USA4h ago
-
Mid-level Full TimeSanta Barbara, CA, USA4h ago
-
Software Engineer, Managed Service for Apache Spark USD 147K-211KAPI Integration | Apache Flink | Apache Hadoop | Apache Spark | Apache YARNMid-level Full TimeKirkland, WA, USA4h ago
-
Data Center Analytics Engineer USD 120K-172KAnalytics | Artificial Intelligence | Data Engineering | Data Preparation | Data QualityMid-level Full TimeAustin, TX, USA4h ago
-
Staff Software Engineer, Cooling Optimization USD 207K-300KC++ | Compute Technologies | Control Theory | Cooling systems | Data StructuresSenior-level Full TimeSunnyvale, CA, USA4h ago
-
Senior Software Engineer, Generative AI, Safety Classifiers, Agentic Systems, Google Ads USD 174K-252KAI Agents | Data Processing | Debugging | Deep learning | GenAISenior-level Full TimeMountain View, CA, USA4h ago
-
Staff Software Engineer, AI/ML GenAI, Google Cloud USD 207K-300KCloud platform | Computer Vision | Data Processing | Data Structures | Data structures algorithmsSenior-level Full TimeNew York, NY, USA4h ago
-
Senior Software Engineer, AI/ML, Creative Intelligence USD 174K-252KAlgorithms | C++ | Data Processing | Data Structures | Deep learningSenior-level Full TimeMountain View, CA, USA4h ago
-
Computer Vision | Data Processing | Data Storage | Debugging | Deep learningSenior-level Full TimeSunnyvale, CA, USA4h ago
-
Senior Staff Software Engineer, AI/ML, IAM USD 262K-365KAccess Management | Authentication | Authorization | C++ | Cloud infrastructureSenior-level Full TimeSeattle, WA, USA; San Francisco, CA, …4h ago
-
Data Engineer, Product Data Warehouse, Go-To-Market USD 156K-226KApache Flume | Apache Spark | Business Intelligence | Code review | DashboardsMid-level Full TimeNew York, NY, USA; Atlanta, GA, …4h ago
-
Staff Software Engineer, Data Cloud Frontier AI USD 207K-300KComputer Vision | Data Processing | Distributed Systems | Fine Tuning | Language ModelsSenior-level Full TimeSeattle, WA, USA; Kirkland, WA, USA4h ago
-
Mid-level Full TimeMountain View, CA, USA4h ago
-
Genome Editing Pipeline Data Scientist USD 94K-141KAI Model Deployment | AI model | Analytics | Bias Mitigation | Business IntelligenceDental insurance | Health insurance | Paid time off | Retirement plan | Sick leaveMid-level Full TimeChesterfield, Missouri, US6h ago
-
Adversarial prompting | Computer Architecture | Computer Engineering | Computer networks | Data labelingFlexible schedule | Fully remote | No visa sponsorshipEntry-level ContractRemote (USA) R10h ago
-
Adversarial prompting | Engineering Mechanics | Engineering design | Engineering principles | Error detectionFlexible hours | Fully remoteMid-level ContractRemote (USA) R10h ago
-
Lead ML Inference Engineer, Advertising USD 246K-486KArtificial Intelligence | Co-design | Distributed Systems | GPU Acceleration | Hardware-Software Co-designCommuter benefits | Dental insurance | Disability benefits | Equity awards | Health insuranceSenior-level Full TimeSan Jose, California11h ago
-
Senior AI Engineer USD 139K-229KAnt | Apache Lucene | Apache Solr | Big Data | Configuration ManagementHealth and wellness programs | Time offSenior-level Full TimeSunnyvale, CA, United States12h ago
-
Senior Software Engineer/Computer Scientist USD 145K-170KC# | C++ | Configuration Management | Continuous integration | Distributed SystemsEmployee-owned company | Onsite work | Reasonable accommodationSenior-level Full TimeOrlando, FL, US13h ago
-
Staff Machine Learning Engineer 2, Ads USD 159K-309KAWS | Airflow | Apache Spark | BigQuery | Cloud Platforms401k plan company match | Disability insurance | Electric Car Charging Station | Employee assistance program | Flexible spending accountSenior-level Full TimeMountain View, USA14h ago
-
Staff Machine Learning Engineer 2, Ads USD 164K-282KAWS | Airflow | Amazon SageMaker | Apache Spark | BigQuery401k plan with company match | Dental insurance | Disability insurance | Electric car charging | Employee assistance programSenior-level Full TimeMountain View, USA14h ago