数据管线高级工程师
Tasks
- Build data cleaning labeling quality inspection tools
- Build data visualization capabilities
- Build distributed data processing system
- Build distributed storage and table management
- Collaborate with model teams to deliver data solutions
- Design data models
- Design data pipeline core workflow
- Design high throughput low latency pipelines
- Develop data lake ingest workflow
- Develop data mining toolchain
- Enable data lineage tracking
- Expose data as data services
- Implement data version control
- Implement log tagging and data collection
- Manage data synchronization and data standardization
- Manage memory and I O performance
- Manage metadata and fast search
- Optimize end to end data collection cleaning transformation performance
- Perform offline data processing
- Perform real-time data processing
- Provide unified data access across teams
- Resolve large scale data transmission bottlenecks
- Support algorithm teams with error case localization
- Troubleshoot distributed system performance issues
Perks/Benefits
- N/A
Skills/Tech-stack
Apache Iceberg | Caching | Columnar Storage | Data Lake | Data Lakehouse | Data Lineage | Data Mining | Data Modeling | Data Quality | Data Services | Data Standardization | Data Versioning | Data Visualization | Data cleaning | Data labeling | Distributed Computing | Distributed Systems | Distributed messaging | Docker | ETL | Git | Go | I/O | I/O Optimization | Java | Kafka | Kubernetes | Log Collection | Memory Management | Metadata Management | MongoDB | MySQL | NoSQL | Offline processing | Partitioning | Performance optimization | PostgreSQL | Pulsar | Python | Query engines | RabbitMQ | Real Time | Real-time Processing | Redis | Relational databases | Snapshots | Stream processing | Time processing
Education
Related jobs
-
Mid-level Full TimeHangzhou14h ago
-
Mid-level Full Time深圳14h ago
-
【27届实习】云原生Ai平台研发工程师-杭州 CNY 37K-37KArgo Workflow | Computer networks | Containers | Data Structures | GoEntry-level Internship杭州15h ago
-
【27届实习】数据挖掘工程师 CNY 25K-37KData Structures | Deep learning | Distributed machine learning | Go | JavaFull-time conversion opportunityEntry-level Internship Temporary上海15h ago
-
【27届实习】Dba(数据库工程师) CNY 37K-37KAutomation | Database Architecture | Distributed Systems | MongoDB | MySQLFull-time conversion opportunityEntry-level Internship上海15h ago
-
Mid-level Full Time东莞15h ago
-
Ai算法工程师 CNY 180K-300KConvolutional Neural Networks | Data Mining | Data Warehouse | Data labeling | Deep learningMid-level Full Time东莞16h ago
-
Sr. Associate Director, Data and Analytics CNY 240K-360KAgile | Alerting | Analytics engineering | Automated testing | CI/CDMid-level Full TimeGuangzhou, Guangdong, China1d ago
-
Senior-level Full TimeChina2d ago
-
Senior-level Full TimeChina2d ago
-
Bash | Data Ingestion | Data Processing | Docker | GCPAsynchronous work culture | Friendly laid-back atmosphereMid-level Full TimeShanghai, China2d ago
-
Computer Graphics | Computer Vision | CoreML | Deep learning | Diffusion ModelsSenior-level Full TimeBeijing, Beijing, China2d ago
-
CUDA | DeepSpeed | Distributed Training | FSDP | Gradient CheckpointingEntry-level Full TimeBeijing, Beijing, China2d ago
-
Senior-level Full TimeBeijing, China2d ago
-
Entry-level Internship深圳2d ago
-
Entry-level InternshipBeijing,Beijing,China3d ago
-
(Sr) Cloud & Data Engineer CNY 192K-240KAWS | Automation | CI/CD | Container Security | Data ModelingMid-level Full TimeBeijing, Beijing, CN3d ago
-
[Growth Engineering] Staff Back-end Engineer I CNY 144K-240KAnomaly Detection | CI/CD | Containerization | FastAPI | Graph DatabaseSenior-level Full TimeShanghai, China3d ago
-
[Growth Engineering] Staff Back-end Engineer I CNY 144K-240KCI/CD | Containerization | FastAPI | Graph Database | Inference ServerSenior-level Full TimeShanghai, China3d ago
-
C# | C++ | Computer Vision | Debugging | Deep learningSenior-level Full TimeChina, Shanghai3d ago
-
Entry-level Internship Part TimeShanghai - Daning Main Blg, China3d ago
-
Senior Data & AI Platform Engineer CNY 144K-240KAWS | Azure | BI Gateway | DAX | Data ManagementMon Fri schedule | Office Work Schedule | Team collaboration with technical and business stakeholdersSenior-level Full TimeChina - Shanghai - Xin Jin …3d ago
-
大数据开发(数据挖掘、数据测试、java) CNY 25K-37KApache Kafka | Apache Spark | Data Mining | Data Modeling | Data WarehousingMid-level Full Time保定3d ago
-
Entry-level Full Time广州3d ago
-
Senior-level Full Time上海、北京3d ago