数据管线高级工程师
Tasks
- Build data lake ingestion and training data management platform
- Build offline and real time data processing systems
- Collaborate with machine learning teams to deliver data for model iteration
- Design and build data end to end pipeline
- Design data models and metadata management
- Develop data cleaning labeling quality check and data mining toolchains
- Develop distributed data storage and query solutions
- Implement data collection synchronization cleaning and standardization
- Implement data version control and lineage tracking
- Optimize large scale data transmission memory and IO performance
- Provide production and research data support
Perks/Benefits
- N/A
Skills/Tech-stack
Apache Iceberg | Batch Processing | Caching | Columnar Storage | Data Lake | Data Mining | Data Modeling | Data Quality | Data Standardization | Data cleaning | Data labeling | Distributed Systems | Docker | ETL | GitHub | Go | Java | Kafka | Kubernetes | Lance | Log Collection | Metadata Management | MongoDB | MySQL | PostgreSQL | Pulsar | Python | Query engines | RabbitMQ | Real Time | Real-time Processing | Redis | Snapshot Isolation | Stream processing | Table Partitioning | Time processing
Education
Bachelor of Engineering | Bachelor of Science | Master of Science
Related jobs
-
数据管线高级工程师 CNY 240K-480KApache Iceberg | Data Lineage | Data Processing | Data Versioning | Distributed SystemsSenior-level Full Time广州4h ago
-
AWS | Azure | Cloud Computing | Data Preprocessing | Entity recognitionAccident insurance | Annual leave | Dental coverage | Employee discount | Life insuranceSenior-level Full TimeHong Kong, Hong Kong, China16h ago
-
AI GPU Arch Perf Optimization Intern CNY 38K-50KAttention | CUDA | GEMM | OpenCL | Operator fusionOn-site workEntry-level Full Time InternshipCHN - Minhang, China22h ago
-
AI GPU Arch Perf Optimization Intern CNY 38K-50KCUDA | Computer Systems | GPU Kernels | GPU Programming | Memory systemsOn-site workEntry-level Full Time InternshipCHN - Minhang, China22h ago
-
AI GPU Arch Perf Optimization Intern CNY 38K-50KAI Fundamentals | Attention | CUDA | Computer Systems | GEMMCollaborative team environment | Internship experience | On-site workEntry-level Full Time InternshipCHN - Minhang, China22h ago
-
ATE | ATPG | Applied Machine Learning | C++ | DFTCross-functional collaboration | MentorshipSenior-level Full TimeChina, Shanghai22h ago
-
Software Engineering & Development, SrAssc CNY 38K-50KAWS | Azure | Data Engineering | Data analytics | Deep learningEmployee networks | Flexible work/life support | Inclusive development opportunities | Mentorship | Paid volunteer daysEntry-level Full TimeHangzhou, China22h ago
-
Entry-level Full Time北京 R1d ago
-
Miclaw-端云协同调度专家 (Hybrid AI Architect) CNY 240K-480K5G | Cloud Computing | Consistency protocols | Data Compression | Distributed SystemsHybrid work modelSenior-level Full Time北京 R1d ago
-
Mid-level Full Time武汉1d ago
-
Entry-level Full Time北京 R1d ago
-
Senior-level Full Time北京1d ago
-
Entry-level Full Time北京 R1d ago
-
Mid-level Full Time北京 R1d ago
-
Mid-level Full Time北京 R1d ago
-
ANSYS | APDL | C Programming | Design of Experiments | DynamicsNone Full TimeWuhan, Hubei, China1d ago
-
Senior-level Full TimeChina, Shanghai1d ago
-
IT Dept. AI Engineer_Application (上海) CNY 240K-360KAI machine learning | Alibaba Cloud | Cloud Applications | Database Design | Language ModelsMid-level Full TimeAnting, CN, 2018051d ago
-
Sr Machine Learning Engineer III CNY 240K-480KAPI Design | AWS | Agent Frameworks | Azure DevOps | CI/CDAdoption leave | Annual Medical Checkup | Family leave | Flexible benefits | Life insuranceSenior-level Full TimeChina-Shanghai (Tianshan-W-Rd)1d ago
-
Sr Machine Learning Engineer III CNY 240K-480KAPI Design | AWS | Agile | Azure DevOps | CI/CDAnnual Medical Checkup | Flexible benefits | Long service award | Medical and life insurance | Paid time offSenior-level Full TimeChina-Shanghai (Tianshan-W-Rd)1d ago
-
Recommendation Algorithm Engineer CNY 25K-37KDeep learning | Java | Recommendation Systems | SpringTeam collaborationEntry-level InternshipGuangzhou2d ago
-
Mid-level Full Time深圳2d ago
-
Mid-level Full Time上海2d ago
-
大数据开发(数据挖掘、数据测试、java) CNY 25K-37KApache Kafka | Apache Spark | Data Mining | Data Modeling | Data PipelinesMid-level Full Time保定2d ago
-
Entry-level Full Time广州2d ago