Applied AI / Evaluation Engineer
Tasks
- Build AI evaluation harnesses
- Build AI regression testing systems
- Build golden reference datasets
- Capture human review signals
- Create automated quality gates
- Define evaluation dimensions
- Detect AI drift
- Develop human in the loop tooling
- Implement LLM as judge pipelines
- Implement agent observability tracing
- Implement continuous evaluation in CI CD
- Instrument AI telemetry
- Monitor latency and cost
- Normalize and associate review signals with interactions
- Produce evaluation reports and quality metrics
- Track AI regression metrics
- Validate judge rationales
Perks/Benefits
Skills/Tech-stack
Adversarial Testing | Agent Performance | Agent Performance Monitoring | Bias detection | CI/CD | Drift Detection | End to End | End-to-end tracing | Evaluation Harness | Golden datasets | Human-in-the-loop | LLM-as-judge | Language Processing | Learning evaluation | Machine Learning | Machine learning evaluation | NLP evaluation | Natural Language | Natural Language Processing | Observability | Performance Monitoring | Python | Quality gates | Regression testing | Statistics | Telemetry | The Loop
Education
Related jobs
-
AI Software Engineer USD 100K-200KDeep learning | Language Models | Large Language Models | Machine Learning | PyTorchMid-level Full TimeSan Francisco, CA, US / Palo …1h ago
-
Senior Confluent Kafka Lead USD 140K-213KAWS | Access Control | Access Control Lists | Apache Kafka | AvroSenior-level Full TimeColumbus, United States3h ago
-
Software Development Engineer - AI/LLM Network - Global Frontier Tech Research Program - 2027 Start USD 202K-368KC++ | Cause analysis | Fault Localization | High Availability | LinuxEntry-level Full TimeSeattle, Washington, United States3h ago
-
Causal Inference | Cross-modal fusion | Data Modeling | Direct Preference Optimization | Graph Neural NetworksEntry-level Full TimeSeattle, Washington, United States3h ago
-
Machine Learning Engineer Intern (E-commerce Governance Algorithms) - 2026 Summer (BS/MS) USD 122K-246KAlgorithm Design | Fraud Detection | Machine Learning | Python | Risk AssessmentDevelopment workshops | Hands-on experience | Industry exposure | Social eventsEntry-level InternshipSeattle, Washington, United States3h ago
-
Artificial Intelligence | Big Data | Data Processing | Distributed Systems | High PerformanceEntry-level InternshipSan Jose, California, United States4h ago
-
Staff Backend Engineer, Core Data Service USD 187K-280KAI | Active architecture | Active-active Architecture | Active/Active | Data ConsistencySenior-level Full TimeSan Jose, California, United States4h ago
-
Senior Backend Engineer, Core Data Service USD 187K-280KAI | Active architecture | Active-active Architecture | Active/Active | Anomaly DetectionSenior-level Full TimeSan Jose, California, United States4h ago
-
Software Engineer, Search AI Infra Performance USD 174K-252KData Processing | Debugging | Distributed Systems | Generative AI | Language ModelsMid-level Full TimeMountain View, CA, USA5h ago
-
Senior Data Engineer, YouTube Data Science USD 156K-226KApache Flume | Apache Spark | Automation | Business Intelligence | ComplianceSenior-level Full TimeSan Bruno, CA, USA5h ago
-
Staff Software Engineer, YouTube Data Science USD 207K-300KBig Data | Data Structures | Data Structures and Algorithms | Data analytics | Distributed ComputingSenior-level Full TimeSan Bruno, CA, USA5h ago
-
Software Engineer III, BigLake OSS USD 147K-211KApache Arrow | Apache Iceberg | Apache Spark | C++ | Data StorageSenior-level Full TimeSeattle, WA, USA5h ago
-
ADLS Gen2 | API Gateway | AppDynamics | Autoscaling | AzurePaid time offSenior-level Full TimeAddison, United States16h ago
-
Senior Data Engineer USD 113K-188KApache Spark | Azure Data | Azure Data Factory | Azure Data Lake | Azure Data Lake Storage401k retirement plan | Adoption Assistance | Employee referral program | Health savings account | Parental leaveSenior-level Full TimeGH Office: San Antonio, TX (9903 …16h ago
-
Senior Staff AI Data Infrastructure Engineer USD 203K-344KApache Iceberg | Apache Spark | C++ | Concurrent programming | Data LakehouseSenior-level Full TimeSanta Clara, CA18h ago
-
Software Engineer - GPU Inference USD 165K-330KAPI | Async Scheduling | CLI | CUDA | Distributed Systems401k | Fertility and family building stipend | Flexible PTO | Medical/Dental/Vision insurance | Paid parental leaveSenior-level Full TimeSan Francisco19h ago
-
Communication Protocols | Computer Vision | Control Systems | Debugging | GRPC401k plan | Dental insurance | Medical insurance | Relocation benefits | Unlimited PTOMid-level Full TimeSan Francisco, CA1d ago
-
Palantir Senior Data Engineer USD 135K-200KData Management | Data Processing | Data integration | Feature Engineering | Generative AISenior-level Full TimeAtlanta, Georgia, United States1d ago
-
Mid-level Full TimeMalvern, Pennsylvania, United States1d ago
-
Applied Research - Evals & Data USD 150K-300KAccelerate | Data Pipelines | Data Versioning | Distributed Systems | Distributed tracingConference attendance | Professional development budget | Relocation support | Remote work | Team offsitesSenior-level Full TimeSan Francisco1d ago
-
Staff Data Engineer USD 187K-245KAPI Gateway | Alerting | Amazon Redshift | Apache Airflow | BigQueryEquity | Flexible paid time off | Health insurance 100% paid premium | Lifestyle stipend | Parental leaveSenior-level Full TimeRemote, US R1d ago
-
Training: ML Framework Engineer USD 205K-445KDistributed Systems | Machine Learning | Performance optimization | Profiling | PythonHybrid work model | Relocation assistanceMid-level Full TimeSan Francisco1d ago
-
Staff AI engineer USD 125K-170KAI Evaluations | AWS | Agent Orchestration | Caching | Data PipelinesFlexible working hours | Hybrid work culture | Unlimited time offSenior-level Full TimeSan Francisco1d ago
-
Machine Learning Engineer: Perception and Planning USD 184K-275KAutomated testing | Behavior Prediction | C++ | Classification | Code reviewSenior-level Full TimeOakland, CA1d ago
-
Robotics System Engineer USD 110K-275KAutonomous Systems | C++ | Data Analysis | Metrics pipelines | PythonSenior-level Full TimeOakland, CA1d ago