AI GPU Arch Perf Optimization Intern
CHN - Minhang, China
CNY 38K-50K (estimate) Entry-level Full Time Internship
Tasks
- Analyze and optimize GPU compute kernels for AI workloads
- Build performance profiles and performance models
- Identify compute memory and pipeline bottlenecks
- Perform GPU performance profiling and analysis
- Provide workload and kernel insights for GPU architecture design
- Reproduce AI inference and training workloads for GPU validation
Perks/Benefits
Skills/Tech-stack
CUDA | Computer Architecture | GPU Kernels | Memory systems | OpenCL | Parallel Computing | Performance Profiling | Performance optimization | Python | SYCL | Triton
Education
Related jobs
-
Robotic AI Intern(实习生岗位:机器人+AI) CNY 37K-50KC plus plus | C# | Computer Vision | Deep learning | Edge AIFlexible work environment | Full-time conversion | Mentorship | Remote work option | Return OfferEntry-level Full Time InternshipBeijing, China R8h ago
-
AI GPU Arch Perf Optimization Intern CNY 38K-50KAttention | CUDA | GEMM | OpenCL | Operator fusionOn-site workEntry-level Full Time InternshipCHN - Minhang, China13h ago
-
AI GPU Arch Perf Optimization Intern CNY 38K-50KCUDA | Computer Systems | GPU Kernels | GPU Programming | Memory systemsOn-site workEntry-level Full Time InternshipCHN - Minhang, China13h ago
-
AI GPU Arch Perf Optimization Intern CNY 38K-50KAI Fundamentals | Attention | CUDA | Computer Systems | GEMMCollaborative team environment | Internship experience | On-site workEntry-level Full Time InternshipCHN - Minhang, China13h ago
-
Entry-level Full Time北京 R17h ago
-
IT Dept. AI Engineer_Application (上海) CNY 240K-360KAI machine learning | Alibaba Cloud | Cloud Applications | Database Design | Language ModelsMid-level Full TimeAnting, CN, 2018051d ago
-
AI Software Engineer Intern CNY 38K-50KCUDA | Distributed Systems | FP8 | FasterTransformer | Flash AttentionOn-site workEntry-level Full Time InternshipCHN - Minhang, China1d ago
-
AI Software Engineer Intern CNY 38K-50KCUDA | Compiler optimization | Continuous batching | Distributed Systems | Dynamic batchingOn-site workEntry-level Full Time InternshipCHN - Minhang, China1d ago
-
AI Software Engineer Intern CNY 28K-50KAWQ | Cache optimization | DINOv2 | DeepSpeed | Diffusion ModelsEntry-level Full Time InternshipCHN - Minhang, China1d ago
-
Entry-level Full Time InternshipCHN - Minhang, China1d ago
-
Ai多模态研究实习生(有留用机会) CNY 25K-37KClustering | DBSCAN | Data Visualization | Embeddings | FaissMentorship | Real world production data exposure | Return OfferEntry-level Internship广州、北京1d ago
-
Mid-level Full Time上海1d ago
-
Ai多模态研究实习生(有留用机会) CNY 25K-37KAttention | Clustering | DBSCAN | Data Visualization | EmbeddingsConversion to full time offer | Mentorship | On-the-job trainingEntry-level Internship广州、北京1d ago
-
Entry-level Full Time广州1d ago
-
Ai多模态研究实习生(有留用机会) CNY 25K-37KAttention Mechanisms | Clustering | DBSCAN | Data Analysis | Data ProcessingCareer growth | Full-time conversion opportunity | Mentorship | Real world production dataEntry-level Internship广州、北京1d ago
-
Mid-level Full TimeAIA ED (Shanghai) Hongkou, China2d ago
-
C++ | CUDA | Embodied AI | GPU Computing | LinuxCompetitive salary | Comprehensive benefits packageMid-level Full TimeChina, Shanghai2d ago
-
Ai算法暑期实习生(Llm/强化学习) CNY 36K-37KDPO | Deep learning | Language Models | Large Language Models | Machine LearningFull-time internship | On-site internshipEntry-level Internship北京2d ago
-
Mid-level Full Time深圳、上海2d ago
-
实习-AI模型使用(Safety服务方向) CNY 25K-37KAdversarial Attacks | CI/CD | CNN | Data poisoning | Deep learningEntry-level Internship上海2d ago
-
实习-Ai研究员-大语言模型/视觉语言模型算法与后训练(博士优先) CNY 25K-37KAI Feedback | Direct Preference Optimization | Efficient Fine Tuning | Fine Tuning | FlaxEntry-level Internship上海2d ago
-
AI Governance | APIs | AWS | Adversarial Testing | Automated EvaluationExecutive-level Full TimeHangzhou, China3d ago
-
Efficient AI Solutions Engineering Intern CNY 28K-50KC++ | Deep learning | Language Models | Large Language Models | LinuxOn-site workEntry-level Full Time InternshipCHN - Beijing, China3d ago
-
Algorithm Developer IV CNY 400K-540KActive Learning | Bayesian optimization | CI/CD | CUDA | Code ReviewsCareer development support | Health and wellbeing programs | Relocation assistance | Travel opportunities | Work-life supportSenior-level Full TimeHangzhou,CHN, China3d ago
-
Executive-level Full TimeHangzhou, China3d ago