AI Security Engineer-Soaring Star Talent Program
Singapore
ByteDance
ByteDance is a technology company operating a range of content platforms that inform, educate, entertain and inspire people across languages, cultures and geographies.Responsibilities
Team Introduction:
The Privacy Innovation Lab focuses on delving into the latest technologies and theories in data privacy and security. It offers technology consulting services that offer valuable perspectives on industry trends and innovative tech solutions, which are crucial for our business. In the realm of data security, the Privacy Innovation Lab has a long term vision and determination, including focuses on digital sovereignty, and protecting personal privacy data in large-scale models. With privacy compliance regulations getting stricter, and the concept of multi-polar digital sovereignty is emerging, our team draws more on practical knowledge from academia and industry. By introducing state-of-the-art technologies and theories, we offer comprehensive and efficient data privacy and security safeguards for Internet services with a large user base and vast amounts of data to drive continuous business innovation.
Project Introduction:
Generative AI models generate new content by learning a large amount of training data, which may contain a large amount of sensitive personal information. If the training data or model training process is not sufficiently protected for privacy, the generated content may leak private information in the training data. We need to learn how to utilize the powerful capabilities of generative AI models without revealing personal privacy has become a key issue that urgently needs to be solved. How to design generative AI that can ensure privacy protection while maintaining the generation effect and model performance is becoming a cutting-edge research direction in this field.
Topic Challenge:
1. Privacy leakage risk: The training of generative AI models relies on a large amount of data, especially in the fields of Natural Language Processing and image generation. During the training process, the model may memorize certain specific information of the training data, which may be reproduced by the generative model. For example, Language models similar to GPT may inadvertently generate text containing personal identity information, addresses, or other sensitive data in the training data. How to ensure that the generative model does not leak this information has become a major challenge in privacy protection.
2. Data perturbation and model quality: In order to prevent privacy leakage, commonly used privacy protection techniques (such as Differential Privacy) usually require perturbation or noise injection into the training data. However, such perturbation may cause the generative model to lose its ability to accurately model the data, thus affecting the quality of the generated content. Especially in generation tasks, the quality of the model directly determines the practicality and creativity of the output content. Therefore, how to maintain the high quality of the generated results as much as possible while protecting privacy is an urgent problem that needs to be solved.
3. ""Memory"" and ""reusing"" problems of models: Generative AI models establish generation rules by learning a large amount of data, but they may also remember the details of the data during training. In some cases, this problem may manifest as ""memory leakage"", that is, the output content of the model may inadvertently reproduce certain specific segments in the training set, especially on small sample or high - sensitivity datasets. How to prevent generative AI models from remembering and reusing specific personal information, but only learning the ""rules"" or ""features"" of the data, is an important issue that must be considered when designing privacy protection mechanisms.
4. Compliance and cross-border data flow: Different countries have different legal provisions on privacy protection, such as GDPR, CCPA, etc., which place strict requirements on how to handle and transfer personal data. For cross-border data flow, how to ensure compliance with data privacy regulations in different regions when conducting generative AI training, especially when it comes to sensitive data, has become a complex legal and technical challenge. In addition, generative models may involve user data from multiple data sources and multiple countries. How to balance privacy protection and compliance in these environments is also a concern.
5. Transparency and interpretability of generated content: Although the generative AI models' generation ability is amazing, they often lack sufficient transparency, making it difficult for users to understand the reasons behind the generated results. In the context of privacy protection, how to make the generative model better interpretable, so that users can understand how the model generates specific content and whether the content involves privacy information, is the key to enhancing user trust. This challenge is not only a technical issue, but also an ethical and social issue.
Qualifications
1. Got doctor degree, majoring in artificial intelligence, computer science, mathematics, etc. are preferred;
2. Have a solid foundation and coding ability in generating AI, and those who have published papers in top journals and conferences such as ICLR/NeurIPS/ICML are preferred;
3. Familiar with the industry trends in the direction of large models, with a quick learning curiosity and quick hands-on ability for new knowledge;
4. Good communication and collaboration skills, able to explore new technologies with the team and promote technological progress.
Job Information
About Us
Founded in 2012, ByteDance's mission is to inspire creativity and enrich life. With a suite of more than a dozen products, including TikTok, Lemon8, CapCut and Pico as well as platforms specific to the China market, including Toutiao, Douyin, and Xigua, ByteDance has made it easier and more fun for people to connect with, consume, and create content.
Why Join ByteDanceInspiring creativity is at the core of ByteDance's mission. Our innovative products are built to help people authentically express themselves, discover and connect – and our global, diverse teams make that possible. Together, we create value for our communities, inspire creativity and enrich life - a mission we work towards every day.
As ByteDancers, we strive to do great things with great people. We lead with curiosity, humility, and a desire to make impact in a rapidly growing tech company. By constantly iterating and fostering an "Always Day 1" mindset, we achieve meaningful breakthroughs for ourselves, our Company, and our users. When we create and grow together, the possibilities are limitless. Join us.
Diversity & Inclusion
ByteDance is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At ByteDance, our mission is to inspire creativity and enrich life. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach. We are passionate about this and hope you are too.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Computer Science Consulting Generative AI Generative modeling GPT ICLR ICML Mathematics Model training NeurIPS NLP Privacy Research Security
Perks/benefits: Conferences Transparency
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.