数据管线高级工程师
Tasks
- Build data cleaning labeling quality inspection tools
- Build data visualization capabilities
- Build distributed data processing system
- Build distributed storage and table management
- Collaborate with model teams to deliver data solutions
- Design data models
- Design data pipeline core workflow
- Design high throughput low latency pipelines
- Develop data lake ingest workflow
- Develop data mining toolchain
- Enable data lineage tracking
- Expose data as data services
- Implement data version control
- Implement log tagging and data collection
- Manage data synchronization and data standardization
- Manage memory and I O performance
- Manage metadata and fast search
- Optimize end to end data collection cleaning transformation performance
- Perform offline data processing
- Perform real-time data processing
- Provide unified data access across teams
- Resolve large scale data transmission bottlenecks
- Support algorithm teams with error case localization
- Troubleshoot distributed system performance issues
Perks/Benefits
- N/A
Skills/Tech-stack
Apache Iceberg | Caching | Columnar Storage | Data Lake | Data Lakehouse | Data Lineage | Data Mining | Data Modeling | Data Quality | Data Services | Data Standardization | Data Versioning | Data Visualization | Data cleaning | Data labeling | Distributed Computing | Distributed Systems | Distributed messaging | Docker | ETL | Git | Go | I/O | I/O Optimization | Java | Kafka | Kubernetes | Log Collection | Memory Management | Metadata Management | MongoDB | MySQL | NoSQL | Offline processing | Partitioning | Performance optimization | PostgreSQL | Pulsar | Python | Query engines | RabbitMQ | Real Time | Real-time Processing | Redis | Relational databases | Snapshots | Stream processing | Time processing
Education
Related jobs
-
Entry-level Full Time上海6h ago
-
Mid-level Full Time广州 R6h ago
-
Mid-level Full Time广州6h ago
-
Mid-level Full Time深圳、上海、北京、中国香港6h ago
-
机器学习工程师 – 模型推理优化 CNY 180K-300KModel Distillation | Model Pruning | Model Quantization | Model Sparsity | ONNXEntry-level Full Time北京6h ago
-
Mid-level Full Time深圳、上海、北京、中国香港6h ago
-
Ai 多模态软件工程师(数据飞轮方向) CNY 180K-300KBatch Processing | Data Processing | Feature extraction | Language Models | Large Language ModelsCareer growth | Large-scale project experience | Learning opportunities | Team collaborationMid-level Full Time广州、北京6h ago
-
Mid-level Full Time深圳、上海、北京、中国香港7h ago
-
Entry-level Full Time深圳、北京、上海7h ago
-
Entry-level Full Time深圳、北京、上海7h ago
-
大语言模型后训练算法工程师 CNY 240K-480KDistributed Training | Docker | Fine Tuning | Human Feedback | KubernetesMid-level Full Time深圳、上海7h ago
-
数据平台开发工程师 CNY 180K-360KCode Refactoring | Data Governance | Data Lake | Data Modeling | Data WarehouseMid-level Full Time广州7h ago
-
Senior Consultant Specialist (RAG Backend Developer) CNY 144K-240KA/B | A/B Testing | ABAC | Audit Logging | B testingSenior-level Full TimeGuangzhou, Guangdong, China14h ago
-
AWQ | AWS | Batching | CPU architecture | CUDASenior-level Full TimeGuangzhou, Guangdong, China17h ago
-
Sr. AI Process Engineer, Seller Compliance CNY 360K-600KAWS | CI/CD | Code review | Data Pipelines | DocumentationSenior-level Full TimeShanghai, CHN1d ago
-
Senior Manufacturing AI Engineer – Machine Learning CNY 144K-240KClustering | Docker | Hypothesis Testing | Kubernetes | LightGBMSenior-level Full TimeChina Jiangmen1d ago
-
Senior Data Engineer (Smart Manufacturing) CNY 144K-240KApache Airflow | ClickHouse | Clustering Algorithms | Data Governance | Data ModelingDiversity and equity workplace | Global team | Inclusive work environmentSenior-level Full TimeChina Jiangmen1d ago
-
Entry-level Internship上海1d ago
-
具身智能 / Vla / Wam 算法工程师 CNY 180K-360KC plus plus | Camera Calibration | Coordinate transformations | Data Quality | Data labelingEntry-level Full Time上海1d ago
-
软件工程师 - pytorch训练框架国产芯片适配 CNY 240K-480KCUDA | GPU Architecture | GPU Programming | PyTorch | PythonMid-level Full Time北京1d ago
-
Mid-level Full TimeGuangzhou, Guangdong, China1d ago
-
Senior Consultant Specialist CNY 160K-240KApache Airflow | Apache Beam | Apache Spark | Cloud Composer | Cloud DataflowSenior-level Full TimeXi'an, Shaanxi, China1d ago
-
R&D – Embedded Display Software Development Engineer CNY 180K-300KAndroid | Android Display Stack | C# | C++ | Device DriversMid-level Full TimeShenzhen, Guangdong, China2d ago
-
Artificial Intelligence | Attention Mechanisms | Benchmarking | C++ | GEMMEntry-level Full Time InternshipChina, Beijing2d ago
-
Senior Machine Learning Engineer I CNY 360K-600KAPI Integration | AWS | Agent Frameworks | Agile | AzureAnnual Medical Checkup | Flexible benefits | Life insurance | Long service award | Marriage leaveSenior-level Full TimeChina-Shanghai (Tianshan-W-Rd)2d ago