Machine Learning Engineer Intern (FeatureStore) - 2025 Summer (BS/MS)
San Jose, California, United States
⚠️ We'll shut down after Aug 1st - try foo🦍 for all jobs in tech ⚠️
Team Introduction:
The TikTok Data Ecosystem Team plays a critical role in supporting TikTok’s personalized recommendation system, which serves over 1 billion users. We are responsible for building scalable, reliable, and high-performance infrastructure for storing and serving machine learning features — especially user behavior sequences and contextual embeddings used in large-scale recommendation and pretraining models.
Our work sits at the intersection of systems and machine learning: ensuring training-serving consistency, low-latency access to temporal features, and scalable ingestion pipelines across online and offline environments.
We explore and integrate with various underlying storage engines, including RocksDB, HBase, and time-series databases, depending on the access pattern, feature type, and serving latency required by ML models.
Responsibilities:
- Build and optimize the core infrastructure of TikTok’s feature store, powering both training data pipelines and real-time inference systems.
- Design efficient storage strategies for user behavior sequences, long-range contextual features, and sparse embeddings — ensuring freshness, consistency, and high availability.
- Work with underlying storage engines such as RocksDB, HBase, and time-series databases to support feature retention, versioning, compaction, and fast lookup.
- Collaborate with recommendation algorithm teams to design schemas and access patterns tailored to evolving model needs.
- Integrate online and offline data pipelines to reduce training-serving skew and support continuous training and A/B testing scenarios.
- Investigate techniques such as temporal sampling, embedding quantization, caching, and hybrid tiered storage to improve cost-efficiency and latency.
The TikTok Data Ecosystem Team plays a critical role in supporting TikTok’s personalized recommendation system, which serves over 1 billion users. We are responsible for building scalable, reliable, and high-performance infrastructure for storing and serving machine learning features — especially user behavior sequences and contextual embeddings used in large-scale recommendation and pretraining models.
Our work sits at the intersection of systems and machine learning: ensuring training-serving consistency, low-latency access to temporal features, and scalable ingestion pipelines across online and offline environments.
We explore and integrate with various underlying storage engines, including RocksDB, HBase, and time-series databases, depending on the access pattern, feature type, and serving latency required by ML models.
Responsibilities:
- Build and optimize the core infrastructure of TikTok’s feature store, powering both training data pipelines and real-time inference systems.
- Design efficient storage strategies for user behavior sequences, long-range contextual features, and sparse embeddings — ensuring freshness, consistency, and high availability.
- Work with underlying storage engines such as RocksDB, HBase, and time-series databases to support feature retention, versioning, compaction, and fast lookup.
- Collaborate with recommendation algorithm teams to design schemas and access patterns tailored to evolving model needs.
- Integrate online and offline data pipelines to reduce training-serving skew and support continuous training and A/B testing scenarios.
- Investigate techniques such as temporal sampling, embedding quantization, caching, and hybrid tiered storage to improve cost-efficiency and latency.
Job stats:
16
8
0
Categories:
Engineering Jobs
Machine Learning Jobs
Tags: A/B testing Data pipelines HBase Machine Learning ML models Pipelines Testing
Region:
North America
Country:
United States
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.
Data Scientist II jobsSr. Data Engineer jobsBusiness Intelligence Developer jobsPrincipal Data Engineer jobsBI Developer jobsStaff Data Scientist jobsPrincipal Software Engineer jobsStaff Machine Learning Engineer jobsDevOps Engineer jobsData Science Intern jobsJunior Data Analyst jobsSoftware Engineer II jobsData Manager jobsStaff Software Engineer jobsAI/ML Engineer jobsData Science Manager jobsLead Data Analyst jobsData Analyst Intern jobsBusiness Data Analyst jobsSr. Data Scientist jobsData Specialist jobsBusiness Intelligence Analyst jobsData Governance Analyst jobsData Engineer III jobsSenior Backend Engineer jobs
Consulting jobsMLOps jobsAirflow jobsOpen Source jobsEconomics jobsKafka jobsLinux jobsGitHub jobsKPIs jobsTerraform jobsJavaScript jobsPrompt engineering jobsPostgreSQL jobsRAG jobsBanking jobsStreaming jobsScikit-learn jobsClassification jobsNoSQL jobsData Warehousing jobsRDBMS jobsPhysics jobsComputer Vision jobsdbt jobsPandas jobs
Google Cloud jobsHadoop jobsScala jobsLangChain jobsGPT jobsR&D jobsBigQuery jobsData warehouse jobsMicroservices jobsCX jobsELT jobsDistributed Systems jobsReact jobsScrum jobsOracle jobsLooker jobsIndustrial jobsPySpark jobsOpenAI jobsJira jobsRedshift jobsRobotics jobsSAS jobsTypeScript jobsUnstructured data jobs