Senior AI Quality Engineer (LLM Evaluation & Automation) 1754
Tasks
- Build eval harness
- Create regression packs
- Define scorecard metrics
- Develop adversarial testing plan
- Develop drift testing plan
- Fail builds on quality regression
- Generate candidate eval cases with AI
- Integrate evals into CI
- Maintain exception tasks
- Maintain golden tasks
- Partner with product and tech lead on thresholds
- Set release gate thresholds
Perks/Benefits
Skills/Tech-stack
Automation | Benchmarking | CI/CD | Case design | LLM Evaluation | Language Models | Large Language Models | Machine Learning | Probabilistic Modeling | Regression testing | Scripting | Test Case | Test Case Design
Education
N/A
Related jobs
-
Featured Feat. Associate Director, Data Labs USD 167K-167KAWS | Cloud Computing | Compute Infrastructure | Data Analysis | LLM GovernanceConference speaking opportunities | Hybrid work schedule | Media appearancesSenior-level Full TimeWashington, District of Columbia, 20004, United … R6d ago
-
Featured Feat. Software Engineer USD 40K-150K.NET | APIs | AWS | Angular | AzureContractor hours flexibility | Remote workMid-level ContractRemote R12d ago
-
Senior Software Engineer USD 221K-253KAlgorithm Design | Audio technologies | C++ | Cause analysis | Code ReviewsBonus | Equity | Health benefits | Hybrid work scheduleSenior-level Full TimeMountain View, CA, USA R6h ago
-
AWS | Apache Airflow | Apache Spark | Azure | CI/CDCSE | Cooptation bonus | Employee Incentive Plan | Health insurance | Holiday bonusSenior-level Full TimeCourbevoie, IDF, France R7h ago
-
AWS | AWS Glue | Airflow | Azure | Azure DataCareer development | Cooption program | Diversity and inclusion initiatives | Employee representatives council | Health insuranceSenior-level Full TimeAix-en-Provence, Provence-Alpes-Côte d'Azur, France R7h ago
-
Airflow | Ansible | BigQuery | Bigtable | CI/CDEmployee benefits plan | Health insurance | Paid vacation bonus | Referral bonus | Remote workSenior-level Full TimeColomiers, Occitanie, France R8h ago
-
Async Programming | Backend Development | Distributed Systems | Evaluation | Failure recoveryFlexible arrangements | Fully remote work | Inclusive and diverse work environment | Learning and growth opportunitiesSenior-level Full TimeBrazil R11h ago
-
AWS | Azure | CI/CD | Databricks | Databricks SQL100 percent remote within Latin America | Long term project extension potential | Professional growthSenior-level Full TimeBrazil R11h ago
-
AWS | Bash | CI/CD | Databricks | DatadogContinuous learning culture | Healthcare coverage | International collaboration opportunities | Paid parental leave | Relocation assistanceSenior-level Full TimeRomania R12h ago
-
AWS | Azure | Bash | CI/CD | Data pipelineFully remote work | Healthcare coverage | Inclusive culture | International collaboration opportunities | Paid parental leaveSenior-level Full TimePortugal R12h ago
-
AWS | Alerting | Azure | Bash | CI/CDContinuous learning | Fully remote | Healthcare coverage | Inclusive culture | International collaborationSenior-level Full TimeSwitzerland R12h ago
-
AWS | Azure | Databricks | Datadog | FastAPIContinuous learning culture | Healthcare coverage | International collaboration | Paid parental leave | Relocation assistanceSenior-level Full TimeFrance R12h ago
-
AWS | Bash | CI/CD | Databricks | DatadogContinuous learning culture | Fully remote | Healthcare coverage | International collaboration opportunities | Paid parental leaveSenior-level Full TimeSpain R12h ago
-
AWS | Azure | Bash | CI/CD | Data pipelineContinuous learning | Fully remote work | Healthcare coverage | Inclusive culture | International collaboration opportunitiesSenior-level Full TimeGermany R13h ago
-
Data Governance Engineer INR 980K-2000KAzure Data | Azure Data Factory | Azure Data Lake | Azure Data Lake Storage | Azure DevOpsAgile career development | Comprehensive benefits dependent on location | Inclusive workplace | Remote work flexibilityMid-level Full TimeIN - Hyderabad, India R17h ago
-
Senior Gen AI Developer INR 1000K-5000KAPI Integration | Azure OpenAI | CNN | Deep learning | Embedding ModelsRemote work | Work from homeSenior-level Full TimeIndia - Remote R17h ago
-
Data Engineer Azure EUR 35K-42KAgile | Azure | Azure Data | Azure Data Lake | Azure Data Lake StorageBonus rewards | Flexible work schedule | Life insurance | Paid training | Private medical insuranceMid-level Full TimeSEVILLA, Spain R17h ago
-
Azure | Azure Functions | Backend APIs | CI/CD | LLMAutonomy | Career development | Health insurance | Team cultureSenior-level TemporaryBarcelona R17h ago
-
AI Solutions Engineer CAD 173K-233KAPIs | Authentication | CI/CD | Circuit Breakers | ContainersDental insurance | Employee stock purchase plan | Flexible spending accounts | Health insurance coverage | Paid time offMid-level Full TimeRemote Canada R18h ago
-
AI Solutions Engineer USD 195K-280KAPIs | Authentication | CI/CD | Containerization | DBTDental insurance | ESPP | Flexible spending accounts | Health insurance | Remote work flexibilityMid-level Full TimeRemote US R18h ago
-
AI/ML Platform Software Developer USD 160K-220KAPIs | Agentic AI | Backend Development | Cloud Native | Continuous MonitoringSenior-level Full TimeRemote (United States) R20h ago
-
Lead AI Engineer USD 198K-261KAgentic Frameworks | CI/CD | Cloud Platforms | Containers | Fine TuningSenior-level Full TimeChicago, Illinois, USA R21h ago
-
Mid-level Full TimeRemote, Brazil R22h ago
-
QA Automation Engineer SR (QB - QASR - 20260701) USD 110K-149KAI Assisted Development | API mocking | Allure | Automated code review | Azure DevOpsSenior-level Contract Full TimeRemote R1d ago
-
Automated testing | CI/CD | Computer Vision | Continuous integration | DeepStreamRelocation bonus | Remote-friendly | Team onsite opportunities | Travel stipendMid-level Full TimeNY, SF or Remote R1d ago