AI GPU Arch Perf Optimization Intern
CHN - Minhang, China
CNY 38K-50K (estimate) Entry-level Full Time Internship
Tasks
- Analyze and optimize GPU compute kernels for AI and numerical workloads
- Analyze compute memory and pipeline performance
- Build performance models and performance profiles
- Perform GPU performance profiling and bottleneck analysis
- Provide workload and kernel insights for GPU architecture design
- Reproduce AI inference and training workloads for GPU validation
Perks/Benefits
Skills/Tech-stack
CUDA | Computer Systems | GPU Kernels | GPU Programming | Memory systems | OpenCL | Parallel Computing | Performance Profiling | Performance optimization | Python | SYCL | Triton
Education
Related jobs
-
AWS | Azure | Cloud Computing | Data Preprocessing | Entity recognitionAccident insurance | Annual leave | Dental coverage | Employee discount | Life insuranceSenior-level Full TimeHong Kong, Hong Kong, China7h ago
-
AI GPU Arch Perf Optimization Intern CNY 38K-50KAttention | CUDA | GEMM | OpenCL | Operator fusionOn-site workEntry-level Full Time InternshipCHN - Minhang, China13h ago
-
AI GPU Arch Perf Optimization Intern CNY 38K-50KCUDA | Computer Architecture | GPU Kernels | Memory systems | OpenCLCross-functional collaboration | Internship experience | On-site workEntry-level Full Time InternshipCHN - Minhang, China13h ago
-
AI GPU Arch Perf Optimization Intern CNY 38K-50KAI Fundamentals | Attention | CUDA | Computer Systems | GEMMCollaborative team environment | Internship experience | On-site workEntry-level Full Time InternshipCHN - Minhang, China13h ago
-
Entry-level Full Time北京 R17h ago
-
Entry-level Full Time北京 R19h ago
-
Senior-level Full Time北京19h ago
-
Entry-level Full Time北京 R19h ago
-
Mid-level Full Time北京 R19h ago
-
具身世界模型训练INFRA工程师 - XiaomiRobotics CNY 180K-360KAPI | Fault Tolerance | Infrastructure | Machine Learning | Model TrainingMid-level Full Time北京19h ago
-
Mid-level Full Time北京 R19h ago
-
ANSYS | APDL | C Programming | Design of Experiments | DynamicsNone Full TimeWuhan, Hubei, China1d ago
-
Senior-level Full TimeChina, Shanghai1d ago
-
IT Dept. AI Engineer_Application (上海) CNY 240K-360KAI machine learning | Alibaba Cloud | Cloud Applications | Database Design | Language ModelsMid-level Full TimeAnting, CN, 2018051d ago
-
Sr Machine Learning Engineer III CNY 240K-480KAPI Design | AWS | Agent Frameworks | Azure DevOps | CI/CDAdoption leave | Annual Medical Checkup | Family leave | Flexible benefits | Life insuranceSenior-level Full TimeChina-Shanghai (Tianshan-W-Rd)1d ago
-
AI Software Engineer Intern CNY 38K-50KCUDA | Distributed Systems | FP8 | FasterTransformer | Flash AttentionOn-site workEntry-level Full Time InternshipCHN - Minhang, China1d ago
-
AI Software Engineer Intern CNY 38K-50KCUDA | Compiler optimization | Continuous batching | Distributed Systems | Dynamic batchingOn-site workEntry-level Full Time InternshipCHN - Minhang, China1d ago
-
AI Software Engineer Intern CNY 28K-50KAWQ | Cache optimization | DINOv2 | DeepSpeed | Diffusion ModelsEntry-level Full Time InternshipCHN - Minhang, China1d ago
-
Ai多模态研究实习生(有留用机会) CNY 25K-37KClustering | DBSCAN | Data Visualization | Embeddings | FaissMentorship | Real world production data exposure | Return OfferEntry-level Internship广州、北京1d ago
-
Mid-level Full Time深圳1d ago
-
Mid-level Full Time上海1d ago
-
Ai多模态研究实习生(有留用机会) CNY 25K-37KAttention | Clustering | DBSCAN | Data Visualization | EmbeddingsConversion to full time offer | Mentorship | On-the-job trainingEntry-level Internship广州、北京1d ago
-
Entry-level Full Time广州1d ago
-
Ai多模态研究实习生(有留用机会) CNY 25K-37KAttention Mechanisms | Clustering | DBSCAN | Data Analysis | Data ProcessingCareer growth | Full-time conversion opportunity | Mentorship | Real world production dataEntry-level Internship广州、北京1d ago
-
Senior-level Full Time上海2d ago