Machine Learning Eval Engineer
Tasks
- Build customer specific benchmarks for real world workflows
- Build internal tooling for model visualization and evaluation analysis
- Build workflows to identify failure modes across datasets
- Design and build scalable evaluation systems for production LLM applications
- Design evaluation infrastructure for production AI systems
- Design statistical evaluation methodologies
- Develop benchmarks metrics and automated evaluation pipelines
- Own evaluation systems from design through production deployment
- Partner with ML engineers to prioritize model improvements using evaluation insights
- Prototype evaluation techniques using LLM as a judge
- Work with unstructured documents including PDFs spreadsheets and OCR outputs
Perks/Benefits
- Direct collaboration with founding teams
- Rapid Career Growth Opportunities
- Remote work
- Visa sponsorship available
Skills/Tech-stack
AWS S3 | Benchmarking | Debugging | Document AI | Document processing | Evaluation Pipeline | Experimentation | Flask | LLM Evaluation | LLM-as-a-Judge | Language Models | Large Language Models | Learning Metrics | Machine Learning | Machine learning metrics | OCR | Precisión | Prompt engineering | Python | Recall | Statistical Evaluation | TypeScript | Unstructured Data | Vision Language Models | Vision-language
Education
Bachelor of Engineering | Bachelor of Science | Master of Science | PhD
Regions
Countries
States
Related jobs
-
Featured Feat. Associate Director, Data Labs USD 167K-167KAWS | Cloud Computing | Compute Infrastructure | Data Analysis | LLM GovernanceConference speaking opportunities | Hybrid work schedule | Media appearancesSenior-level Full TimeWashington, District of Columbia, 20004, United … R5d ago
-
Computational Designer USD 95K-118KC# | C++ | Computational Geometry | Computer Graphics | Data Pipelines401k plan | Dental insurance | Education assistance | Fertility support | Flexible time offMid-level Full TimePortland, OR, US R14h ago
-
Senior Software Engineer, Data Foundation USD 189K-256KDjango | Go | High Volume | High-volume APIs | Python401k | Enhanced parental leave | Generous vacation | Holiday Breaks | Medical, dental & vision coverageSenior-level Full TimeSan Francisco, US (Hybrid) R20h ago
-
Principal Machine Learning Engineer USD 190K-305KDeep learning | Experimentation | Feature Engineering | Information Retrieval | LLMHealth and wellbeing resources | Paid volunteer daysSenior-level Full TimeSingapore - Singapore - Central Singapore … R20h ago
-
Software Engineer II - Data Platform USD 116K-162KAirflow | BigQuery | DBT | Docker | GoFlexible time off | In-office workspace | LinkedIn Learning access | Medical insurance | Paid HolidaysMid-level Full TimeUnited States (Remote) R21h ago
-
APIs | AWS Glue | AWS Redshift | Amazon Web Services | Apache AirflowFully remote | Health insurance | On call production support | Paid time off | Retirement plansSenior-level Full TimeOrlando, FL, United States R23h ago
-
Analytics Lead USD 160K-240KA/B | A/B Testing | Agent architecture | Airtable | B testing401k | Dental insurance | Medical insurance | Vision insuranceSenior-level Full TimeRedwood City, CA (Hybrid) R1d ago
-
Machine Learning / Data Science Engineer USD 90K-200KAgent Frameworks | Agent Orchestration | Agentic AI | Amazon RDS | Automation1 1 care | 401k matching | Carrot Fertility | Company paid stipend program | Disability insuranceSenior-level Full TimeReston, VA, United States R1d ago
-
Machine Learning / Data Science Engineer USD 90K-200KAgent Orchestration | Agentic AI | Amazon RDS | Apache Spark | AutomationDisability insurance | Employer 401K matching | Fertility coverage | Flexible work environment | Generous PTOSenior-level Full TimeRichmond, VA, United States R1d ago
-
Lead Machine Learning / Data Science Engineer USD 90K-200KAgent Orchestration | Agentic AI | Amazon RDS | Apache Spark | Azure SQL401k matching | Carrot fertility benefits | Disability insurance | Employee resource groups | Flexible work environmentSenior-level Full TimeReston, VA, United States R1d ago
-
Lead Machine Learning / Data Science Engineer USD 90K-200KAgent Orchestration | Agentic AI | Amazon RDS | Automation | Azure SQL401k matching | Disability insurance | Employee resource groups | Fertility coverage | Generous paid time offSenior-level Full TimeRichmond, VA, United States R1d ago
-
Applied AI Engineer USD 160K-160KAsynchronous task management | CI/CD | Context Management | Django | FastAPI401k match | Medical insurance | Paid parental leave | Paid time off | Remote-first work environmentSenior-level Full TimeRemote (anywhere in the U.S.) R1d ago
-
[2026] Senior Machine Learning Engineer (Systems), Embodied AI/NPCs, ML Platform - PhD Early Career USD 196K-243KAWS | Azure | Cloud platform | Continuous batching | Data PipelinesEquity compensation | Health benefits | Paid time offSenior-level Full TimeSan Mateo, CA, United States R1d ago
-
[2026] Senior Machine Learning Engineer (Systems), Embodied AI/NPCs, ML Platform - PhD Early Career USD 196K-243KAWS | Azure | Cloud platform | Continuous batching | Deep learningSenior-level Full TimeSan Mateo, CA, United States R1d ago
-
Lead Embedded Linux Engineer - Intercom USD 210K-300KBash | C# | C++ | Device Drivers | Distributed SystemsCommuter benefits | Dental insurance | FSA | Flexible PTO | HSASenior-level Full TimeSan Mateo, CA United States R1d ago
-
Data Engineer, Engineering & Operations USD 115K-145KAccess Control | Aggregation Thresholds | Airflow | Alerting | Anonymization401k | Dental insurance | Discounts | Fully remote | Medical insuranceMid-level Full TimeNew York, NEW YORK, United States R1d ago
-
Sr. Databricks Consultant USD 175K-250KAWS | Amazon S3 | Apache Spark | Azure | Azure DataRemote work | Travel as neededSenior-level Full TimeWork from home, VA, United States R1d ago
-
Senior Data Engineer (AWS, Azure, GCP) USD 90K-200KAWS | Amazon Aurora | Amazon Redshift | Azure | Azure Data401k matching | Disability insurance | Fitness and travel stipends | Flexible work environment | Generous PTOSenior-level Full TimeReston, VA, United States R1d ago
-
Senior Data Engineer (AWS, Azure, GCP) USD 90K-200KAWS | Aurora | Azure | Azure Data | Azure Data Factory401k matching | Disability insurance | Employee resource groups | Fertility coverage | Generous PTOSenior-level Full TimeRichmond, VA, United States R1d ago
-
Data Engineer (AWS, Azure, GCP) USD 90K-200KAWS | Azure | BigQuery | Cloud platform | Databricks1 to 1 mental health care | 401k matching | Certification support | Digital On Demand Learning | Disability insuranceSenior-level Full TimeReston, VA, United States R1d ago
-
Data Engineer (AWS, Azure, GCP) USD 90K-200KAWS | AWS Lambda | Amazon EMR | Amazon Redshift | Aurora1 on 1 mental health support | 401k matching | Disability insurance | Employee resource groups | Employer paid stipend programSenior-level Full TimeRichmond, VA, United States R1d ago
-
AWS | Alerting | Bash | Docker | GoEmployee benefits | EquitySenior-level Full TimeRemote R1d ago
-
ML Engineer, Manipulation USD 142K-193KBehavior Cloning | Behavior Modeling | Computer Vision | Critical Systems | Data AugmentationMid-level Full TimeAnywhere in the US R1d ago
-
Ansible | C plus plus | C# | CMake | CUDAMid-level Full TimeBaltimore, MD, United States R1d ago
-
Senior AI Engineer | Sage Home Loans USD 150K-220KAgent Orchestration | DPO | Databricks | Fine Tuning | Golden datasets401k match | Disability insurance | Employee assistance program | Flexible PTO | Flexible spending accountsSenior-level Full TimeCharlotte, NC R1d ago