Machine Learning Engineer Intern (Training Pre-processing) - 2025 Summer (PhD)

San Jose, California, United States

Apply now Apply later

Team Introduction​:
The TikTok Flink Ecosystem Team plays a critical role in delivering real-time computing capabilities to power TikTok’s massive-scale recommendation, search, and advertising systems. This team is focused on building the infrastructure for stream processing at exabyte scale — enabling ultra-low-latency, high-reliability, and cost-efficient real-time data transformations.​

We are deeply involved in developing and optimizing Apache Flink and surrounding components like connectors, state backends, and runtime execution models to meet TikTok’s rapidly evolving data needs at EB-level throughput and scale.​

We also collaborate closely with ML infrastructure teams to bridge real-time stream processing and machine learning. This includes integrating Velox to accelerate model training, building multimodal data pipelines, and utilizing frameworks like Ray to orchestrate large-scale distributed ML workflows.


Responsibilities:​
- Design and develop core Flink operators, connectors, or runtime modules to support TikTok’s exabyte-scale real-time processing needs.​
​- Build and maintain low-latency, high-throughput streaming pipelines powering online learning, recommendation, and ranking systems.​
- ​Collaborate with ML engineers to design end-to-end real-time ML pipelines, enabling efficient feature generation, training data streaming, and online inference.​
- Leverage Velox for compute-optimized ML data transformation and training acceleration on multimodal datasets (e.g., video, audio, and text).​
- Use Ray to coordinate distributed machine learning workflows and integrate real-time feature pipelines with ML model training/inference.​
- Optimize Flink job performance, diagnose bottlenecks, and deliver scalable solutions across EB-scale streaming workloads.
Apply now Apply later
Job stats:  0  0  0

Tags: Data pipelines Flink Machine Learning ML infrastructure Model training PhD Pipelines Streaming

Region: North America
Country: United States

More jobs like this