Research Scientist, LLM/Data-centric

Crimson House Singapore

Rakuten Asia Pte Ltd

Rakuten Group, Inc. is a leading global company that contributes to society by creating value through innovation and entrepreneurship. Browse corporate information, including company overview, investor relations, sustainability and careers.

View all jobs at Rakuten Asia Pte Ltd

Apply now Apply later

Job Description:

Situated in the heart of Singapore's Central Business District, Rakuten Asia Pte. Ltd. is Rakuten's Asia Regional headquarters. Established in August 2012 as part of Rakuten's global expansion strategy, Rakuten Asia comprises various businesses that provide essential value-added services to Rakuten's global ecosystem. Through advertisement product development, product strategy, and data management, among others, Rakuten Asia is strengthening Rakuten Group's core competencies to take the lead in an increasingly digitalized world.

Rakuten Group, Inc. is a global leader in internet services that empower individuals, communities, businesses, and society. Founded in Tokyo in 1997 as an online marketplace, Rakuten has expanded to offer services in e-commerce, fintech, digital content, and communications to approximately 1.7 billion members around the world. The Rakuten Group has nearly 32,000 employees and operations in 30 countries and regions. For more information visit https://global.rakuten.com/corp/

We are seeking a Research Scientist (LLM/Data-centric) to join our Large Language Model (LLM) Research team. This role focuses on driving our data excellence agenda, designing and running human-centric experiments, and advancing the rigor of data collection and evaluation protocols. The ideal candidate will combine strong technical skills with a passion for impactful, systematic research on data quality, evaluation methodologies, and the design of human-in-the-loop processes at scale.

Key Responsibilities

  • Lead and own the data strategy and experimentation agenda, driving improvements in data collection, annotation, curation, and evaluation that enhance LLM research and development.
  • Design and execute human-in-the-loop experiments, including crafting experimental protocols, task definitions, user studies, and managing feedback cycles with human annotators and evaluators.
  • Establish annotation guidelines, QA processes, and workflows, ensuring consistency, reliability, and quality in human data labeling and evaluation efforts.
  • Contribute actively to research and engineering codebases, collaborating with LLM scientists and engineers to integrate data and evaluation workflows.
  • Drive innovation in data-centric research, including exploration of new evaluation methodologies, dataset interventions, and measurement frameworks for generative AI capabilities.
  • Coordinate and manage external annotation vendors, crowdsourcing platforms, and internal annotation teams, ensuring efficient project execution and cost control.
  • Analyze, interpret, and communicate experimental results and data insights clearly to technical and non-technical stakeholders.
  • Contribute to best practices on data collection, evaluation protocols, and data-driven research methods.

Mandatory Qualifications:

  • 2+ years of experience in research, data science, or human-computer interaction roles at industry, academia, or research institutions.
  • Advanced degree (Master’s or Ph.D.) in Computer Science, Data Science, Human-Computer Interaction, Cognitive Science, Behavioral Science, or a related technical discipline.
  • Proven track record of contributing production-quality code, data pipelines, and experimental tools that support data-centric research.
  • Proven experience in designing, running, and analyzing human-in-the-loop experiments, user studies, or human data labeling projects.
  • Strong knowledge of data quality, annotation standards, and evaluation methodologies for machine learning systems.

Desired Qualifications:

  • Experience working with LLM, NLP, or generative AI models, particularly in evaluation, prompting, and alignment contexts.
  • Familiarity with crowdsourcing platforms, annotation tools, and quality assurance workflows, including vendor management.
  • Track record in publishing at leading conferences (e.g., CHI, ACL, NeurIPS, ICML, SIGIR or equivalent).
  • Proficiency in Python, data analysis tools, and statistical methods for experimental design and evaluation.
  • Experience working in cross-functional, globally distributed research teams.
  • Ability to speak, write, and communicate in Japanese is a plus.

Rakuten is an equal opportunities employer and welcomes applications regardless of sex, marital status, ethnic origin, sexual orientation, religious belief, or age.

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  0  0  0

Tags: Computer Science Data analysis Data management Data pipelines Data quality Data strategy E-commerce Engineering FinTech Generative AI ICML LLMs Machine Learning NeurIPS NLP Pipelines Prompt engineering Python Research Statistics

Perks/benefits: Career development Conferences

Region: Asia/Pacific
Country: Singapore

More jobs like this