Large Model Application Algorithm Research Scientist-International Content Security Algorithm Research

Singapore, Singapore

Apply now Apply later

TikTok Content Security Algorithm Research Team:
The International Content Safety Algorithm Research Team is dedicated to maintaining a safe and trustworthy environment for users of ByteDance's international products. We develop and iterate on machine learning models and information systems to identify risks earlier, respond to incidents faster, and monitor potential threats more effectively. The team also leads the development of foundational large models for products. In the R&D process, we tackle key challenges such as data compliance, model reasoning capability, and multilingual performance optimization. Our goal is to build secure, compliant, and high-performance models that empower various business scenarios across the platform, including content moderation, search, and recommendation.

Project Background:
In recent years, Large Language Models (LLMs) have achieved remarkable progress across various domains of natural language processing (NLP) and artificial intelligence. These models have demonstrated impressive capabilities in tasks such as language generation, question answering, and text translation. However, reasoning remains a key area for further improvement. Current approaches to enhancing reasoning abilities often rely on large amounts of Supervised Fine-Tuning (SFT) data. However, acquiring such high-quality SFT data is expensive and poses a significant barrier to scalable model development and deployment.

To address this, OpenAI's o1 series of models have made progress by increasing the length of the Chain-of-Thought (CoT) reasoning process. While this technique has proven effective, how to efficiently scale this approach in practical testing remains an open question. Recent research has explored alternative methods such as Process-based Reward Model (PRM), Reinforcement Learning (RL), and Monte Carlo Tree Search (MCTS) to improve reasoning. However, these approaches still fall short of the general reasoning performance achieved by OpenAI's o1 series of models. Notably, the recent DeepSeek R1 paper suggests that pure RL methods can enable LLM to autonomously develop reasoning skills without relying on the expensive SFT data, revealing the substantial potential of RL in advancing LLM capabilities.

Project Challenges:
1. Design of Reward Models: In the RL process, designing an effective reward model is crucial. It must accurately reflect the effectiveness of the reasoning process and guide the model to iteratively improve its reasoning ability. This involves not only setting appropriate evaluation criteria across different tasks, but also ensuring the reward model to adapt dynamically during training to match the evolving model performance.
2. Stability of the Training Process: In the absence of high-quality SFT data, ensuring stable training in RL becomes a major challenge. RL often involves extensive exploration and trial-and-error, which may lead to unstable training or even performance degradation. Developing robust training strategies is essential to ensure the reliability and effectiveness of the training process for models.
3. Expanding from Mathematics and Code Tasks to Natural Language Tasks: Current RL reasoning methods are primarily applied to mathematics and code tasks, where CoT data is more abundant. However, natural language tasks are more open and complex. Expanding from successful RL strategies to natural language processing tasks requires in-depth research and innovation in both data design and RL methodology to enable cross-task general reasoning capabilities.
4. Improving Reasoning Efficiency: While maintaining high reasoning quality, improving reasoning efficiency is another critical challenge. Efficient reasoning directly impacts the model's practicality and cost-effectiveness in real-world applications. Approaches such as knowledge distillation (transferring knowledge from complex models to smaller models) can be explored to reduce computational resource consumption, or the use of Long Chain-of-Thought (Long-CoT) techniques to improve Short-CoT models to balance reasoning accuracy with computational efficiency.
Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  2  0  0

Tags: LLMs Machine Learning Mathematics ML models Monte Carlo NLP OpenAI R R&D Reinforcement Learning Research Security Testing

Region: Asia/Pacific
Country: Singapore

More jobs like this