Technical Lead - Agent & Model Evaluation
Tasks
- Build adversarial evaluation harnesses
- Build automated evaluation pipelines
- Create automated RL and fine-tuning pipelines
- Create benchmark suites for agent capabilities
- Define metrics for agent evaluation
- Design evaluation framework
- Develop failure analysis systems
- Develop replay systems for agent improvement
- Establish quality gates and release criteria
- Implement A/B testing and deployment infrastructure
Perks/Benefits
Skills/Tech-stack
A/B | A/B Testing | Automation | B testing | Benchmarking | Deep learning | Evaluation systems | ML pipelines | Machine Learning | Reinforcement Learning | Statistical Analysis | Systems engineering
Education
Roles
Regions
Countries
States
Cities
Related jobs
-
Lead Applied Scientist USD 150K-170KAgentic AI | Autogen | CrewAI | Data Quality | Deep learning401k match | Dental insurance | Employee assistance program | Employee recognition | Employee stock purchase planSenior-level Full TimeWork From Home, United States R12h ago
-
Tech Lead, GTM Applied AI and Analytics USD 138K-225KAirflow | Amazon SageMaker | DBT | Databricks | Deep learningSenior-level Full TimeSan Francisco, CA, United States13h ago
-
Lead Data Architect USD 173K-276KAI metadata | Cloud Platforms | Data Architecture | Data Governance | Data Lakes401k | Dental insurance | Health insurance | Paid time off | Vision insuranceExecutive-level Full TimeFoster City, CA, United States16h ago
-
Principal System Integration Lead (Quantum Hardware) USD 195K-260KCause analysis | Change Control | Configuration Management | Controls engineering | DOORSSenior-level Full TimeMilpitas, California, United States1d ago
-
Lead Architect , AI Solutions Architecture - PI USD 169K-279KAI Act | AI Foundry | AI RMF | AI Services | API Integration401k match | Health insurance | Mental health counseling | Paid Holidays | Paid time offSenior-level Full TimeHartford - Tower, United States1d ago
-
Lead Architect , AI Solutions Architecture - EDDA USD 169K-279KAI RMF | AWS Bedrock | Agentic AI | Artificial Intelligence | CI/CD401k match | Health insurance | Paid time off | Volunteer rewards | Wellness programSenior-level Full TimeHartford - Tower, United States1d ago
-
Lead Architect, AI Solutions Architecture - BSI/Cyber USD 169K-279KAI Act | AI Ops | AI RMF | AWS Bedrock | Agentic AI401k match | Health insurance | Mental health counseling | Paid time off | Volunteer rewardsSenior-level Full TimeHartford - Tower, United States1d ago
-
Lead Architect , AI Solutions Architecture - Claim USD 169K-279KAI Act | AI Ops | AWS Bedrock | Agent Orchestration | Agentic AI401k match | Employee assistance program | Health insurance | Matching gift program | Paid time offSenior-level Full TimeHartford - Tower, United States1d ago
-
Lead Architect , AI Solutions Architecture - BI/INTL USD 169K-279KAI Act | AI Ops | AI RMF | AI Services | AWS Bedrock401k match | Free counseling services | Health coaching | Health insurance | Matching giftSenior-level Full TimeHartford - Tower, United States1d ago
-
Senior Lead Machine Learning Engineer USD 229K-286KAWS | Apache Spark | Bias Variance | Big Data | Cloud ArchitectureSenior-level Full TimeNew York, NY, United States1d ago
-
Lead Machine Learning Engineer USD 197K-225KAWS | Azure | Cloud platform | Continuous Deployment | Continuous integrationSenior-level Full TimeMcLean, VA, United States1d ago
-
Lead Machine Learning Engineer USD 157K-237KA/B | A/B Testing | Airflow | B testing | Data PipelinesSenior-level Full TimeUS TX Austin1d ago
-
Lead Data Scientist USD 225K-275KA/B | A/B Testing | API Integration | B testing | DBTCollaborative work environment | Dental insurance | Education Impact | Health insurance | Hybrid work modelSenior-level Full TimeSan Francisco R1d ago
-
Technical Lead Manager, AI/ML Networking USD 207K-300KArtificial Intelligence | C++ | Compute Technologies | Dataplane Encryption | Deep Learning Execution ProviderSenior-level Full TimeRaleigh, NC, USA; Durham, NC, USA1d ago
-
Autonomy | Closed Loop | Closed loop control | Computer Vision | Control SystemsSenior-level Full TimeNew York, New York, USA2d ago
-
Senior-level Full TimePortland, Oregon, United States2d ago
-
AI workflow | AI workflow design | Artificial Intelligence | Business Communication | Data ScienceSenior-level Full TimeNorth Carolina, North Carolina, United States2d ago
-
Senior Manager - Commercial Data Science USD 138K-195KBusiness Intelligence | Data Engineering | Data Visualization | Dataiku | HadoopSenior-level Full TimeUSA - FL - Team Disney …2d ago
-
A/B | A/B Testing | B testing | Credit Risk | Credit risk modelingSenior-level Full TimeMcLean, VA, United States2d ago
-
AWS | Apache Spark | Bias Variance | Cloud platform | Continuous DeploymentSenior-level Full TimeNew York, NY, United States2d ago
-
Senior Lead Analytic Consultant USD 239KAnomaly Detection | Cloud platform | Data Pipelines | Financial Modeling | GCPCommuter benefits | Disability benefits | Hybrid work schedule | Life insurance | Paid time offSenior-level Full Time101969-AZ-A Building, Chandler Campus, United States2d ago
-
Data Science Manager, Borrow USD 156K-179KAirflow | DBT | Data Pipelines | Data Visualization | ExperimentationMid-level Full TimeCA - San Francisco; NY - …2d ago
-
Senior-level Full TimeChicago, Illinois, USA R2d ago
-
AWS | Agile | Amazon Web Services | Apache Spark | Data EngineeringAccess to cutting-edge technologies | Collaborative team environment | Flexible work hours | Professional development opportunities | Remote work within United StatesSenior-level Full TimeMassachusetts R2d ago
-
Agile | Artificial Intelligence | CAP | CLIA | Clinical software401k match | Career coaching | Employee resource groups | Employee stock purchase program | Leadership developmentSenior-level Full TimePennsylvania R2d ago