LLM Engineer (Reinforcement Learning)
Tasks
- Design self refine training structure
- Develop foundation models integrated with external knowledge and APIs
- Enhance generation accuracy and stability
- Improve LLM training efficiency
- Optimize direct alignment training with PPO GRPO DPO
- Prevent reward hacking
- Train models that select external tools based on instruction types
Perks/Benefits
- N/A
Skills/Tech-stack
DDP | Deep learning | Direct Preference Optimization | Distributed Training | Docker | Fine Tuning | GPU Computing | Horovod | Kubernetes | Language Processing | Natural Language | Natural Language Processing | Parameter efficient fine-tuning | Policy Optimization | Preference optimization | Proximal Policy Optimization | PyTorch | Python | Reinforcement Learning | Slurm | Supervised Fine Tuning
Education
Related jobs
-
Android | Attention Mechanisms | C# | C++ | CI/CDSenior-level Full TimePangyo (Software Dream Center), South Korea1d ago
-
API Development | C++ | Computer Graphics | Diffusion Models | DockerEquipment support | Flexible work schedule | Health checkups | Meals and snacks | Paid leaveEntry-level Full TimeSeoul3d ago
-
3D Deep Learning | API Development | C plus plus | Computer Graphics | Computer VisionEquipment support | Flexible work schedule | Health checkups | Learning and development support | Meal and snack supportEntry-level Full TimeSeoul4d ago
-
Attention Mechanisms | Cloud Platforms | Data Processing | Deep learning | Dialogue SystemsRelocation assistance not availableSenior-level Full TimeKOR - Seoul, South Korea, Korea, …6d ago
-
Mid-level Full TimeKOR - Seoul, South Korea, Korea, …6d ago
-
3D Reconstruction | CI/CD | CMM | Camera Calibration | Cloud processingRelocation assistance not availableMid-level Full TimeKOR - Seoul, South Korea, Korea, …6d ago
-
Mid-level Full TimeKOR - Seoul, South Korea, Korea, …6d ago
-
3D Reconstruction | CI/CD | CMM | Camera | Cloud processingRelocation assistance not providedMid-level Full TimeKOR - Seoul, South Korea, Korea, …6d ago
-
Attention Mechanisms | Cloud Platforms | Data Processing | Deep learning | Dialogue SystemsRelocation assistance not providedSenior-level Full TimeKOR - Seoul, South Korea, Korea, …6d ago
-
Data Pipelines | Distributed Serving | Distributed Training | GPU Computing | KubernetesCorporate card | English education support | Equipment stipend | Health check | Home Office Equipment RefreshSenior-level Full TimeSeoul, South Korea7d ago
-
Agent Orchestration | Embedding Models | Evaluation | LLM APIs | ObservabilityEquity | Flexible time off | Flexible work schedules | Health and wellness benefits | In-person offsitesSenior-level Full TimeSeoul, South Korea7d ago
-
Artificial Intelligence | Evaluation | Feedback loops | LLM APIs | Language ModelsFlexible time off | Flexible work schedules | Health and wellness benefits | In-person offsites | Technology reimbursementsSenior-level Full TimeSeoul, South Korea7d ago
-
Mid-level Full TimeKOR - Seoul, South Korea, Korea, …7d ago
-
Mid-level Full TimeKOR - Seoul, South Korea, Korea, …7d ago
-
Batch Processing | Computer Vision | Contrastive Learning | Distributed Training | Embedding ModelsEnglish education | Equipment stipend | Health checkup | Hybrid work | Snacks and coffeeSenior-level Full TimeSeoul, South Korea8d ago
-
Data Mining | Data Pipelines | Deep learning | Experimentation | Feature EngineeringSenior-level Full TimeSeoul, South Korea9d ago
-
Data Pipelines | Deep learning | Experimentation | Feature Engineering | Integration TestingSenior-level Full TimeSeoul, South Korea9d ago
-
Data Mining | Data Pipelines | Deep learning | Experimentation | Feature EngineeringSenior-level Full TimeSeoul, South Korea9d ago
-
API Testing | Boundary-value analysis | CI/CD | Concurrency Testing | CypressMid-level Full TimeKorea, Republic of9d ago
-
Senior-level Full TimeKorea, Republic of9d ago
-
AI Agent framework | AI gateway | AWS Bedrock | Adversarial Red Teaming | Agent FrameworkMid-level Full TimeKorea, Republic of9d ago
-
Data Engineer KRW 26740K-26740KAWS | Apache Airflow | Apache Flink | Apache Kafka | Apache SparkEquipment support | Flexible work schedule | Health checkups | Learning and development support | Meal and snacks supportMid-level Full TimeSeoul10d ago
-
API Integration | Agent Development | Agent Frameworks | Evaluation Frameworks | LLM DeploymentConference speaking opportunities | Flexible working hours | Generous vacation and parental leave | Hybrid work policy | Visa sponsorshipMid-level Full TimeSeoul, South Korea12d ago
-
Senior-level Full TimeKR-Seoul13d ago
-
Computer Vision | Diffusion Models | Isaac Sim | Language Models | Large Language ModelsSenior-level Full TimeSeoul13d ago