Large Model Application Algorithm Research Scientist-International Content Security Algorithm Research
Singapore, Singapore
TikTok Content Security Algorithm Research Team:
The International Content Safety Algorithm Research Team is dedicated to maintaining a safe and trustworthy environment for users of ByteDance's international products. We develop and iterate on machine learning models and information systems to identify risks earlier, respond to incidents faster, and monitor potential threats more effectively. The team also leads the development of foundational large models for products. In the R&D process, we tackle key challenges such as data compliance, model reasoning capability, and multilingual performance optimization. Our goal is to build secure, compliant, and high-performance models that empower various business scenarios across the platform, including content moderation, search, and recommendation.
Project Background:
In recent years, Large Language Models (LLMs) have achieved remarkable progress across various domains of natural language processing (NLP) and artificial intelligence. These models have demonstrated impressive capabilities in tasks such as language generation, question answering, and text translation. However, reasoning remains a key area for further improvement. Current approaches to enhancing reasoning abilities often rely on large amounts of Supervised Fine-Tuning (SFT) data. However, acquiring such high-quality SFT data is expensive and poses a significant barrier to scalable model development and deployment.
To address this, OpenAI's o1 series of models have made progress by increasing the length of the Chain-of-Thought (CoT) reasoning process. While this technique has proven effective, how to efficiently scale this approach in practical testing remains an open question. Recent research has explored alternative methods such as Process-based Reward Model (PRM), Reinforcement Learning (RL), and Monte Carlo Tree Search (MCTS) to improve reasoning. However, these approaches still fall short of the general reasoning performance achieved by OpenAI's o1 series of models. Notably, the recent DeepSeek R1 paper suggests that pure RL methods can enable LLM to autonomously develop reasoning skills without relying on the expensive SFT data, revealing the substantial potential of RL in advancing LLM capabilities.
Project Challenges:
1. Design of Reward Models: In the RL process, designing an effective reward model is crucial. It must accurately reflect the effectiveness of the reasoning process and guide the model to iteratively improve its reasoning ability. This involves not only setting appropriate evaluation criteria across different tasks, but also ensuring the reward model to adapt dynamically during training to match the evolving model performance.
2. Stability of the Training Process: In the absence of high-quality SFT data, ensuring stable training in RL becomes a major challenge. RL often involves extensive exploration and trial-and-error, which may lead to unstable training or even performance degradation. Developing robust training strategies is essential to ensure the reliability and effectiveness of the training process for models.
3. Expanding from Mathematics and Code Tasks to Natural Language Tasks: Current RL reasoning methods are primarily applied to mathematics and code tasks, where CoT data is more abundant. However, natural language tasks are more open and complex. Expanding from successful RL strategies to natural language processing tasks requires in-depth research and innovation in both data design and RL methodology to enable cross-task general reasoning capabilities.
4. Improving Reasoning Efficiency: While maintaining high reasoning quality, improving reasoning efficiency is another critical challenge. Efficient reasoning directly impacts the model's practicality and cost-effectiveness in real-world applications. Approaches such as knowledge distillation (transferring knowledge from complex models to smaller models) can be explored to reduce computational resource consumption, or the use of Long Chain-of-Thought (Long-CoT) techniques to improve Short-CoT models to balance reasoning accuracy with computational efficiency.
The International Content Safety Algorithm Research Team is dedicated to maintaining a safe and trustworthy environment for users of ByteDance's international products. We develop and iterate on machine learning models and information systems to identify risks earlier, respond to incidents faster, and monitor potential threats more effectively. The team also leads the development of foundational large models for products. In the R&D process, we tackle key challenges such as data compliance, model reasoning capability, and multilingual performance optimization. Our goal is to build secure, compliant, and high-performance models that empower various business scenarios across the platform, including content moderation, search, and recommendation.
Project Background:
In recent years, Large Language Models (LLMs) have achieved remarkable progress across various domains of natural language processing (NLP) and artificial intelligence. These models have demonstrated impressive capabilities in tasks such as language generation, question answering, and text translation. However, reasoning remains a key area for further improvement. Current approaches to enhancing reasoning abilities often rely on large amounts of Supervised Fine-Tuning (SFT) data. However, acquiring such high-quality SFT data is expensive and poses a significant barrier to scalable model development and deployment.
To address this, OpenAI's o1 series of models have made progress by increasing the length of the Chain-of-Thought (CoT) reasoning process. While this technique has proven effective, how to efficiently scale this approach in practical testing remains an open question. Recent research has explored alternative methods such as Process-based Reward Model (PRM), Reinforcement Learning (RL), and Monte Carlo Tree Search (MCTS) to improve reasoning. However, these approaches still fall short of the general reasoning performance achieved by OpenAI's o1 series of models. Notably, the recent DeepSeek R1 paper suggests that pure RL methods can enable LLM to autonomously develop reasoning skills without relying on the expensive SFT data, revealing the substantial potential of RL in advancing LLM capabilities.
Project Challenges:
1. Design of Reward Models: In the RL process, designing an effective reward model is crucial. It must accurately reflect the effectiveness of the reasoning process and guide the model to iteratively improve its reasoning ability. This involves not only setting appropriate evaluation criteria across different tasks, but also ensuring the reward model to adapt dynamically during training to match the evolving model performance.
2. Stability of the Training Process: In the absence of high-quality SFT data, ensuring stable training in RL becomes a major challenge. RL often involves extensive exploration and trial-and-error, which may lead to unstable training or even performance degradation. Developing robust training strategies is essential to ensure the reliability and effectiveness of the training process for models.
3. Expanding from Mathematics and Code Tasks to Natural Language Tasks: Current RL reasoning methods are primarily applied to mathematics and code tasks, where CoT data is more abundant. However, natural language tasks are more open and complex. Expanding from successful RL strategies to natural language processing tasks requires in-depth research and innovation in both data design and RL methodology to enable cross-task general reasoning capabilities.
4. Improving Reasoning Efficiency: While maintaining high reasoning quality, improving reasoning efficiency is another critical challenge. Efficient reasoning directly impacts the model's practicality and cost-effectiveness in real-world applications. Approaches such as knowledge distillation (transferring knowledge from complex models to smaller models) can be explored to reduce computational resource consumption, or the use of Long Chain-of-Thought (Long-CoT) techniques to improve Short-CoT models to balance reasoning accuracy with computational efficiency.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Job stats:
2
0
0
Categories:
Data Science Jobs
Research Jobs
Tags: LLMs Machine Learning Mathematics ML models Monte Carlo NLP OpenAI R R&D Reinforcement Learning Research Security Testing
Region:
Asia/Pacific
Country:
Singapore
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.
BI Developer jobsData Engineer II jobsStaff Data Scientist jobsSr. Data Engineer jobsPrincipal Data Engineer jobsStaff Machine Learning Engineer jobsPrincipal Software Engineer jobsData Science Manager jobsData Manager jobsData Science Intern jobsSoftware Engineer II jobsDevOps Engineer jobsBusiness Intelligence Analyst jobsJunior Data Analyst jobsData Analyst Intern jobsData Specialist jobsBusiness Data Analyst jobsLead Data Analyst jobsStaff Software Engineer jobsSr. Data Scientist jobsAI/ML Engineer jobsSenior Backend Engineer jobsData Governance Analyst jobsData Engineer III jobsResearch Scientist jobs
Consulting jobsAirflow jobsMLOps jobsOpen Source jobsKPIs jobsKafka jobsJavaScript jobsLinux jobsEconomics jobsTerraform jobsNoSQL jobsData Warehousing jobsComputer Vision jobsGoogle Cloud jobsGitHub jobsRDBMS jobsPostgreSQL jobsScikit-learn jobsR&D jobsPhysics jobsStreaming jobsHadoop jobsData warehouse jobsBanking jobsScala jobs
dbt jobsPandas jobsBigQuery jobsOracle jobsClassification jobsReact jobsLooker jobsRAG jobsCX jobsScrum jobsPySpark jobsDistributed Systems jobsPrompt engineering jobsIndustrial jobsRedshift jobsELT jobsMicroservices jobsJira jobsGPT jobsTypeScript jobsRobotics jobsOpenAI jobsLangChain jobsSAS jobsJenkins jobs