LLM Engineer (LLM Evaluation)
Tasks
- Automate model evaluation workflows
- Build benchmark datasets
- Build end to end evaluation workflows
- Define evaluation metrics
- Design LLM evaluation benchmarks
- Design quality validation workflows
- Detect model regression automatically
- Establish evaluation protocols
- Improve model quality based on evaluation results
- Integrate evaluations with ML pipelines
- Maintain reproducible evaluation environments
Perks/Benefits
- N/A
Skills/Tech-stack
Argo Workflows | Asynchronous programming | Benchmarking | Data Monitoring | Datadog | Deep learning | Distributed inference | Evaluation automation | GPU Computing | Kubernetes | Language Models | Language Processing | Large Language Models | MLflow | Machine Learning | Machine Learning Pipeline | Natural Language | Natural Language Processing | Prometheus | Python | Regression Detection | Reproducibility
Education
N/A
Related jobs
-
Agent Orchestration | CI/CD | Docker | Evaluation | Fine TuningDental insurance | Disability insurance | Family planning support | Health insurance | Life insuranceSenior-level Full TimeSeoul20h ago
-
Forward Deployed Engineer - AI Engineer KRW 46542K-50000KAgent Orchestration | CI/CD | Docker | Fine Tuning | KubernetesDental insurance | Disability insurance | Family planning support | Health insurance | Life insuranceMid-level Full TimeSeoul20h ago
-
AWS | Apache Airflow | Apache Beam | Apache Spark | AzureMid-level Full TimeSeoul, Korea1d ago
-
Senior-level Full TimePangyo (Software Dream Center), South Korea5d ago
-
Audio CODEC | C++ | Deep learning | Diffusion Models | Flow matchingSenior-level Full TimePangyo (Software Dream Center), South Korea5d ago
-
CI/CD | Capacity Planning | Deep learning | GPU Compute | GPU schedulingSenior-level Full TimeSeoul5d ago
-
Bash | Cloud platform | Data Ingestion | Data Processing | DockerAsynchronous culture | Diversity and inclusion | Flexible management | Impactful mission | Remote-friendly cultureMid-level Full TimeBusan, South Korea5d ago
-
A/B | A/B Testing | B testing | Data Analysis | Deep learningSenior-level Full TimeSeoul, South Korea5d ago
-
A/B | A/B Testing | B testing | Data Analysis | Deep learningSenior-level Full TimeSeoul, South Korea5d ago
-
A/B | A/B Testing | B testing | Data Analysis | Deep learningSenior-level Full TimeSeoul, South Korea5d ago
-
3D Gaussian Splatting | C++ | CARLA | CUDA | Gaussian SplattingSenior-level Full TimePangyo (Software Dream Center), South Korea6d ago
-
Senior-level Full TimePangyo (Software Dream Center), South Korea6d ago
-
AWS | Agent systems | Apache Spark | Apache Spark SQL | AzureConference speaking opportunities | Travel opportunitiesSenior-level Full TimeSeoul, South Korea6d ago
-
Agent systems | Artificial Intelligence | Deep learning | Inference Optimization | KV cacheEntry-level Full TimeSeoul6d ago
-
Entry-level Full TimeKOR - Seoul, South Korea, Korea, …6d ago
-
ARM | Agile | Bash | Bootloader | C#Onsite work required | Relocation assistance not provided | Travel up to 10 percentEntry-level Full TimeKOR - Seoul, South Korea, Korea, …6d ago
-
Senior-level Full TimePangyo (Software Dream Center), South Korea7d ago
-
Senior-level Full TimeSeoul, Korea8d ago
-
Bias Mitigation | DPO | Data Pipelines | Deep learning | Fine TuningSenior-level Full TimeSeoul, Korea8d ago
-
Cost Optimization | DPO | Data Pipelines | GPU | Inference architectureSenior-level Full TimeSeoul, Korea8d ago
-
Cost Optimization | Data Pipelines | GPU Computing | JAX | Latency optimizationSenior-level Full TimeSeoul, Korea8d ago
-
Data Pipelines | GPU Computing | JAX | Machine Learning | Machine learning deploymentSenior-level Full TimeSeoul, Korea8d ago
-
Senior-level Full TimeSeoul, Korea8d ago
-
Mid-level Full TimeSeoul, Korea8d ago
-
Senior-level Full TimeSeoul, Korea8d ago