Developer Technology Engineer - AI
Tasks
- Build and optimize parallel algorithms and data structures on GPUs
- Collaborate with architecture and research teams to improve developer efficiency
- Develop and contribute to GPU and large language model frameworks and open source projects
- Improve distributed training and inference communication libraries
- Optimize GPU kernels and operators
- Optimize collective communication and data transfer strategies
- Optimize training and inference for large language models
- Tune instructions and optimize compilers
Perks/Benefits
- N/A
Skills/Tech-stack
C# | C++ | CUBLAS | CUDA | CUDNN | Cutlass | Direct memory access | Distributed Systems | FlashAttention | FlashInfer | Fortran | Infiniband | Linear Algebra | Megatron | Memory access | NVIDIA NCCL | NVSHMEM | Numerical Methods | Parallel Programming | Python | Remote Direct Memory Access | RoCE | TensorRT | TensorRT-LLM
Education
Roles
Related jobs
-
数据开发工程师 CNY 240K-480KAirbyte | BigQuery | Cube.js | DBT | Data GovernanceAI tool subscriptions | API credits | Cloud credits | Flat organizationSenior-level Full Time深圳14h ago
-
数据平台开发工程师 CNY 180K-360KData Lake | Data Warehouse | Data Warehouse Modeling | Data pipeline | Delta LakeMid-level Full Time广州14h ago
-
Entry-level InternshipShenzhen14h ago
-
GenAI Software Architect CNY 240K-480KAutogen | Bayesian analysis | Chroma | Deep learning | EmbeddingsSenior-level Full TimeCHN - Minhang, China1d ago
-
Senior-level Full TimeShanghai Offices, China1d ago
-
Entry-level Full Time InternshipShenzhen Brion office, China1d ago
-
Software Engineering & Development, AVP CNY 300K-420KAI Governance | API Development | AWS | Adversarial Robustness | AlertingExecutive-level Full TimeHangzhou, China1d ago
-
Executive-level Full TimeHangzhou, China1d ago
-
Mid-level Full Time北京 R1d ago
-
Miclaw-端云协同调度专家 (Hybrid AI Architect) CNY 240K-480K5G | API Integration | Classifier Training | Claude 3 | Claude 3 5 APIHybrid workSenior-level Full Time北京 R1d ago
-
Machine Learning Engineer, AI Applications - Shenzhen CNY 240K-330KAPI Integration | Anomaly Detection | Backend integration | Data Pipelines | Data ProcessingMid-level Full TimeShenzhen1d ago
-
Application Engineer-Senior CNY 240K-480KAPI Development | Computer Vision | Dify | Django | DockerSenior-level Full TimeShanghai, China1d ago
-
Audit Logging | CI/CD | Data Governance | Data Privacy | Drift DetectionSenior-level Full TimeShanghai, Shanghai, China1d ago
-
Senior AI Engineer CNY 240K-480KAgent Orchestration | Authentication | Authorization | CI Gates | CI/CDSenior-level Full TimeChina2d ago
-
Bash | Cloud platform | Data Processing | Docker | Google CloudAsynchronous culture | Friendly work environment | Hands-off management | Remote/distributed workMid-level Full TimeShanghai, China2d ago
-
Artificial Intelligence | C# | C++ | Computer Architecture | GStreamerSenior-level Full TimeChina Shanghai2d ago
-
Forward Deployed AI Engineer CNY 37K-37KAWS | Agile | Azure | BigQuery | Cloud ComputingTravel opportunitiesEntry-level Full Time Internship北京2d ago
-
Mid-level Full Time北京2d ago
-
Mid-level Full Time北京2d ago
-
Mid-level Full Time杭州2d ago
-
Entry-level Full Time深圳、上海、北京、中国香港2d ago
-
【26届校招】Software Engineer (All Levels) – 大模型与智能机器人系统 CNY 240K-480KC++ | CUDA | DDS | GPU memory | GPU memory managementNone Full Time广州、深圳2d ago
-
【26届校招】大语言模型后训练算法工程师(Foundation Model) CNY 240K-480KData loading | Distributed Training | Docker | Fine Tuning | Inference OptimizationEntry-level Full Time上海、深圳2d ago
-
Agent数据工程师-2026届 CNY 25K-37KArtificial Intelligence | Data Governance | Data Structures | Data Structures and Algorithms | Data WarehousingEntry-level Internship北京、上海2d ago
-
Agent 服务端开发实习生(AI Agent / AI App) CNY 37K-37KContainerization | Cpluspluplus | Dify | Distributed Systems | GoEntry-level Internship北京、上海2d ago