AI Evaluation & Reliability Engineer (Agents & LLM Systems)
Center, Center District, IL
USD 163K-240K (estimate) Senior-level Full Time
Tasks
- Build LLM as a judge pipelines for correctness and output quality
- Build data driven evaluation pipelines using synthetic and real world datasets
- Collaborate with AI platform and database teams to improve agent data interaction quality
- Define metrics benchmarks scorecards and methodologies for agent reliability
- Design and implement evaluation frameworks for AI agents and multi agent systems
- Develop agent based evaluation systems for scalable testing
- Identify and analyze failure modes edge cases and non deterministic behaviors
- Improve agent robustness consistency and reliability in production
Perks/Benefits
- N/A
Skills/Tech-stack
A/B | A/B Testing | Agent systems | B testing | Data Pipelines | Evaluation Frameworks | LLM-as-a-Judge | Language Models | Large Language Models | Multi-Agent | Multi-Agent Systems | Non Determinism | Prompt engineering | Python | Ranking Systems | Real-world datasets | Statistical Evaluation | Synthetic data | Testing Frameworks
Education
N/A
Regions
Countries
States
Cities
Related jobs
-
Senior Data Engineer (St. Louis) USD 111K-139KANSI SQL | API | Amazon Redshift | Apigee | Azure401k matching | Bereavement | Dental insurance | Disability insurance | Employee assistance programSenior-level Full TimeRemote - Nationwide, United States R2h ago
-
Lead Data Analyst – World Cup Reporting Initiative USD 126K-267KAgile | Data Lakehouse | Data Pipelines | Databricks | ETLSenior-level ContractBeaverton, OR, US4h ago
-
Cognos | DBT | Data Migration | Data Modeling | Data SecurityBasic life insurance | Dental and vision coverage | Education and development opportunities | Flextime work schedule | Paid HolidaysSenior-level Full TimeUnited States of America-OHIO-Franklin County-Columbus7h ago
-
Machine Learning Engineer Graduate (E-Commerce Supply Chain & Logistics)- 2026 Start (PhD) USD 150K-316KData Mining | Deep learning | Knowledge graphs | LLM | Language ModelsEntry-level Full TimeSan Jose, California, United States7h ago
-
Machine Learning Engineer Graduate (TikTok E-Commerce - Conversational AI)-2026 Start (PhD) USD 150K-316KAIGC | Conversational AI | Language Models | Language Processing | Large Language ModelsEntry-level Full TimeSan Jose, California, United States7h ago
-
UX Engineer, Robot Infrastructure, DeepMind USD 176K-189K3D visualization | Angular | C++ | Data Annotation | Full StackMid-level Full TimeMountain View, CA, USA8h ago
-
Software Engineer III, AI/ML GenAI, YouTube USD 147K-211KC++ | Code review | Computer Vision | Data Processing | DebuggingSenior-level Full TimeMountain View, CA, USA8h ago
-
C# | C++ | Code review | Compute Technologies | Data StructuresSenior-level Full TimeSunnyvale, CA, USA8h ago
-
Backup & Recovery | BigQuery | Cloud platform | Clustering | Data LakesTechnical workshops and training delivery | Travel up to 30 percent timeMid-level Full TimeReston, VA, USA8h ago
-
Senior-level Full TimeBoston, MA13h ago
-
Systems Engineer - Analytical chemistry USD 120K-130KCavity Ring-Down Spectroscopy | Computational Methods | Data Analysis | Data Visualization | Experimental uncertainty401k | Dental insurance | Employee referral program | Flexible spending account | Health savings accountMid-level Full TimeSanta Clara, CA14h ago
-
AI Engineer USD 120K-200KActive Learning | Data Flywheel | Data Generation | Dataset Construction | Deep learningIn-person collaboration | Medical, dental & vision coverageEntry-level Full TimeSan Francisco Office18h ago
-
Mid-level Full TimePHK, United States R20h ago
-
AI Engineer - Experienced Associate USD 63K-140KAI orchestration | AWS | Analytics Cloud | Cloud Native | Cloud platformMid-level Full TimeChicago - One North Wacker Drive, …20h ago
-
Senior-level Full TimeSan Jose, California, United States, United …20h ago
-
Data Engineer USD 77K-176KAWS Athena | AWS EMR | AWS Glue | AWS Redshift | AgileDependent care assistance | Paid leave | Professional development | Tuition assistance | Work-life programsMid-level Full TimeUSA, GA, Atlanta (575 Morosgo Drive …20h ago
-
Sr Data Engineer USD 30KAWS | AWS Glue | AWS IAM | AWS Lambda | Amazon CloudWatch401k match | Adoption Assistance | Associate Assistance Plan | Dental insurance | Education assistance programSenior-level Full TimeIrving TX (Greenway), United States20h ago
-
Senior Software Engineer – Embedded USD 86K-165KAgile | C# | C++ | CUDA | Code Analysis401k match | Dental insurance | Employee assistance program | Flexible spending accounts | Flexible work schedulesSenior-level Full TimeUS-TX-MCKINNEY-513WC ~ 2501 W University Dr …20h ago
-
CI/CD | Data Generation | Data Privacy | Docker | Hugging Face401k | Medical/Dental/Vision insurance | Paid time off | Wellness programsSenior-level Full Time6400 LAS COLINAS BLVD IRVING, United …20h ago
-
Machine Learning Engineer USD 99K-225KC plus plus | Computer Vision | Data Engineering | Data Processing | Deep learningDependent care | Disability insurance | Health insurance | Life insurance | Paid leaveMid-level Full TimeUSA, VA, McLean (8283 Greensboro Dr …20h ago
-
Machine Learning Engineer USD 77K-176KAnomaly Detection | Computer Vision | Deep learning | Generative AI | Machine LearningPaid leave | Professional development | Tuition assistanceMid-level Full TimeUSA, VA, McLean (8283 Greensboro Dr, …20h ago
-
Machine Learning Engineer USD 77K-176KComputer Vision | Data Engineering | Data handling | Deep learning | Generative AIDependent care | Paid leave | Professional development | Tuition assistance | Work-life programsMid-level Full TimeUSA, VA, McLean (8283 Greensboro Dr, …20h ago
-
Foundry Automation ML Engineer USD 149K-275KAlgorithms | Bayesian analysis | CI/CD | Computer Vision | Data StructuresHealth insurance | Retirement plan | VacationMid-level Full TimeUSA - OR - Hillsboro, United …20h ago
-
AI Engineer (React UI) - Remote US USD 135K-170KAWS | Accessibility | Anthropic Claude | Apache Airflow | AzureRemote workSenior-level Full TimeWauwatosa, WI, United States R1d ago
-
Robotics Engineer USD 62K-101KAutoCAD | Automation | Debugging | Electrical troubleshooting | Electrical wiringAdvancement opportunities | Recognition | Training and skill development | Travel opportunities | Workplace Benefits ProgramsEntry-level Full TimeHouston, TX, United States1d ago