Software Engineer, AI and DL Kernel Libraries
Tasks
- Analyze performance profile and optimize workloads
- Build deep learning library abstractions
- Build just in time compilation and code generation
- Collaborate with compilers and GPU architecture teams
- Contribute to open source inference ecosystems
- Design implement optimize GPU kernels
- Develop production AI inference software
- Integrate and improve LLM inference runtimes
Perks/Benefits
- N/A
Skills/Tech-stack
API Design | Apache TVM | C# | C++ | CUDA | CUDA C++ | CUDNN | Code generation | Code optimization | GPU Performance | GPU performance modeling | JAX | Just-in-Time | Just-in-time compilation | Linear Algebra | MLIR | ONNX | Performance Modeling | Performance Profiling | PyTorch | Python | System Architecture | TensorFlow | TensorRT | Triton
Education
Bachelor of Engineering | Bachelor of Science | Master of Science
Related jobs
-
优才-多模态交互算法工程师-X-Lab CNY 240K-480KAttention | Benchmarking | Computer Vision | Deep learning | Hard Negative MiningSenior-level Full Time上海、深圳4h ago
-
Mid-level Full Time深圳 R5h ago
-
Mid-level Full Time北京 R6h ago
-
大模型算法研究员-MiMo CNY 500K-500KActive Learning | C++ | Curriculum learning | Data Generation | Data ProcessingEntry-level Full Time北京6h ago
-
Mid-level Full Time武汉7h ago
-
Forward Deployed AI Engineer CNY 72K-96KAWS | Agile | Amazon Redshift | BigQuery | Cloud platformTravel up to 50 percentEntry-level Full Time Internship北京7h ago
-
Mid-level Full Time北京 R7h ago
-
Mid-level Full Time Temporary北京7h ago
-
Mid-level Full Time北京 R7h ago
-
Mid-level Full Time杭州7h ago
-
Regional Data & AI Engineer, Operations, Asia Pacific CNY 300K-380KArtificial neural networks | Clustering | Data Architecture | Data Governance | Data ModelingMid-level Full TimeShanghai, CN19h ago
-
[Pricing Data Engineering ] Staff Data Engineer I CNY 120K-180KAWS | Algorithms | Amazon EMR | Apache Airflow | Apache SparkSenior-level Full TimeShanghai, China21h ago
-
Magnetic Recording Algorithm Development Engineer CNY 144K-240KAlgorithm Development | Automated Test | Automated Test Equipment | C# | C++Senior-level Full TimeShenzhen, Guangdong Province, China23h ago
-
Mid-level Full TimeWuxi - Ximei Road, China (Mainland)1d ago
-
Senior AI Training Performance Engineer CNY 144K-240KC++ | CUDA | Computer Architecture | Deep learning | GPU ArchitectureSenior-level Full TimeChina, Shanghai1d ago
-
Mid-level Full TimeShenzhen, Guangdong, China1d ago
-
数据开发工程师 CNY 120K-180KBI | Data Governance | Data Quality | Data Warehousing | Data quality monitoringMid-level Full Time深圳1d ago
-
MiMo-大模型训练框架开发工程师 CNY 240K-480KC++ | CI/CD | DeepSpeed | Distributed Training | GPU Memory OptimizationEntry-level Full Time北京 R1d ago
-
Senior-level Full Time北京1d ago
-
Entry-level Full Time北京 R1d ago
-
机器人VLA算法研究员 - XiaomiRobotics CNY 500K-500KAction Generation | Computer Vision | Data pipeline | Deep learning | Diffusion ModelsEntry-level Full Time北京1d ago
-
Mid-level Full Time北京 R1d ago
-
具身智能算法工程师-模型 CNY 500K-500KDeep learning | Distributed Training | IQL | Inference Optimization | Isaac LabMid-level Full Time北京 R1d ago
-
Mid-level Full Time北京1d ago
-
Entry-level Internship上海1d ago