Data Engineer, Product Master & Data Delivery Group - Catalog Management Department, AI & Data Division
Rakuten Crimson House, Japan
Rakuten
楽天グループ株式会社のコーポレートサイトです。企業情報や投資家情報、プレスリリース、サステナビリティ情報、採用情報などを掲載しています。楽天グループは、イノベーションを通じて、人々と社会をエンパワーメントすることを目指しています。Job Description:
Business Overview
Rakuten Group, Inc., a global technology leader based in Japan, provides innovative solutions that enrich the lives of millions worldwide. Our Technology division is pioneering in leveraging cutting-edge technologies to deliver exceptional experiences across our services.
Department Overview
The importance of catalog data as a master data is increasing as it serves as a hub and data platform that connects products and corporate data handled within Rakuten services. By managing abundant and accurate catalog data and deploying it as master data across Rakuten services, it plays the role of core data that accelerates new service development.
The Catalog Management Department is committed to building data creation processes, collecting and expanding catalog data, and improving quality in order to continue providing services that meet and exceed the expectations of Rakuten's customers.
Under the mission of creating the best catalog data and improving user experience, we provide the power of Rakuten's design, technology, and operations to provide high-quality catalog data that is at the core of our business strategy. This contributes to Rakuten's vision of "Empowering people and society through innovation".
Position:
Position Details
The primary objective is to design and develop a comprehensive catalog and knowledge management platform that enables robust, cross-functional management of master datasets. This platform will ingest, normalize, and consolidate data from a wide array of internal and external sources into a centralized master dataset, which can then be delivered to support diverse use cases across multiple services. In addition to platform development, the role encompasses implementing targeted enhancements to ensure data solutions are precisely aligned with client requirements.
A critical challenge lies in the continuous improvement and maintenance of data quality and coverage within core master datasets - such as product masters, artist masters, and their integration. This responsibility is addressed through ongoing development efforts that leverage innovative strategies, including AI technologies and the incorporation of public data, to enhance data integrity and comprehensiveness.
Responsibilities
- Development and operation of data pipeline for data linkage, normalization, and provision.
- Development of matching processing necessary for aggregating master data.
- Development of data provision according to the needs of the client (development of product information provision API, development using SQL necessary for aggregating large amounts of data using BigQuery, Web Application development, etc.).
- Consistent maintenance and operation tasks from data requirement sorting and system design to release necessary for product development.
- Design and operation tasks of the entire system including infrastructure and network aspects.
Tech Stack
- Programming languages: Java, TypeScript, Python, Shell Script.
- Platform: Hadoop, Google Cloud and internal private cloud in some cases.
- Databases: BigQuery, BigTable, HBase, MySQL, Hive.
- Web Applications: Node.js, Spring Boot.
- Monitoring: Cloud Monitoring.
- Framework: Spring, Spring Boot.
- Others: Cloud Dataflow (ETL), Docker (Container Technology), Github actions (CI/CD), Kubernetes & Cloud Run (Deployment Environment).
Mandatory Qualifications:
- Fluent in both Japanese and English, with excellent communication skills in a diverse, multinational environment.
- At least 3 years of experience in Java programming.
- Minimum of 2 years experience working with cloud platforms such as GCP and AWS.
- At least 1 year of experience operating within Linux environments.
- Minimum 3 years of professional experience in Data Engineering.
- Proven track record in designing and building production-grade data pipelines.
- Experience developing full-stack applications.
- Strong expertise in API design and development.
- In-depth knowledge of data pipeline architecture and ETL processes.
Desired Qualifications:
- Understanding of AI/GenAI concepts and their data requirements is a plus.
- Experience building data pipelines to support AI/ML models is a plus.
- Data assets, data governance, storage, and sharing.
- Background in performance optimization.
- Experience as a mentor, tech lead or leading an engineering team.
- Have a strong ownership and able to own project end-to-end from designing to deployment.
- Experience in building and driving adoption of new products.
- Strong adaptability and commitment to continuous learning, keeping up with the latest advancements in technology.
#engineer #DataEngineer #applicationsengineer #technologyservicediv
Languages:
English (Overall - 4 - Fluent), Japanese (Overall - 4 - Fluent)* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: APIs Architecture AWS BigQuery Bigtable CI/CD Dataflow Data governance Data pipelines Data quality Docker Engineering ETL GCP Generative AI GitHub Google Cloud Hadoop HBase Java Kubernetes Linux Machine Learning ML models MySQL Node.js Pipelines Python SQL TypeScript
Perks/benefits: Career development
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.