Deep Learning Performance Architect, CUTLASS DSL
Tasks
- Build and advance MLIR dialects
- Collaborate with research and software teams
- Design and develop GPU kernel DSL
- Develop code generation flows
- Implement MLIR lowering passes
- Improve kernel compilation speed
- Integrate optimizations into products
- Maintain performance parity with CUTLASS C plus plus
- Optimize GPU kernel compilation
Perks/Benefits
- N/A
Skills/Tech-stack
C plus plus | CUDA | Code generation | Compiler design | Domain-specific language | Intermediate representation | LLVM | MLIR | Optimization | Performance Analysis | Python
Education
Roles
Related jobs
-
Machine Learning Engineer CNY 300K-380KArtifact tracking | Data Lineage | Data Pipelines | Distributed Systems | DockerFitness Events | Free meals | Hybrid working | Paid time off | Volunteer opportunitiesMid-level Full TimeShanghai, China6h ago
-
机器学习平台研发工程师/专家 CNY 240K-360KDebugging | Distributed Training | Docker | Elastic scaling | Fault ToleranceSenior-level Full Time北京、上海13h ago
-
机器人 Vln 大模型导航-实习生 CNY 25K-37KArtificial Intelligence | C++ | CUDA | Computer Vision | Data PipelinesOnsite workEntry-level Internship北京13h ago
-
Entry-level Internship南京14h ago
-
Entry-level Internship南京14h ago
-
Entry-level Internship南京14h ago
-
nlp算法工程师-2027届 CNY 25K-37KDeep learning | DeepSpeed | Information Retrieval | Intent Recognition | Language ProcessingInternshipEntry-level Internship武汉14h ago
-
Entry-level Full Time上海14h ago
-
Entry-level Internship北京15h ago
-
大模型 Infra 研发实习生(Agentic RL 方向) CNY 25K-37KAsynchronous programming | Concurrency | Distributed Systems | Docker | GRPOEntry-level Internship深圳15h ago
-
AI Agent 开发实习生(通用智能仿真方向) CNY 25K-37KAPI | API Integration | Agent architecture | Agent systems | Asynchronous programmingEntry-level Internship广州15h ago
-
Apache Airflow | Apache Spark | Automated testing | Data Lakes | Data WarehousesCommute subsidy | Disability insurance | Employee assistance program | Employee resource groups | Employee stock ownershipSenior-level Full TimeShanghai, China1d ago
-
Embedded Base Software Testing Engineer- Intern CNY 74K-100KC# | CAN | Excel | Hardware-in-the-loop | I2CEntry-level Full Time InternshipWuhan, Hubei, China1d ago
-
Senior Software Engineer (RAG Backend Developer) CNY 120K-180KA/B | A/B Testing | ABAC | Audit Logging | B testingSenior-level Full TimeGuangzhou, Guangdong, China R1d ago
-
Embedded Base Software Testing Engineer- Intern CNY 74K-100KC# | CAN | Excel | Hardware-in-the-loop | I2CEntry-level Full Time InternshipWuhan, Hubei, China1d ago
-
Magnetic Recording Algorithm Development Engineer CNY 150K-240KAlgorithm Development | Automated Test | Automated Test Equipment | C# | C++Senior-level Full TimeShenzhen, Guangdong Province, China1d ago
-
Assistant Manager, Data Platform Delivery CNY 300K-406KARMA | Amazon SageMaker | Association rule | Association rule learning | AzureMid-level Full TimeChina - Guangzhou1d ago
-
Mid-level Full TimeShanghai, Shanghai, China1d ago
-
Senior-level Full TimeShenyang - PIC, China1d ago
-
Mid-level Full Time深圳2d ago
-
Mid-level Full Time深圳2d ago
-
Senior-level Full TimeShanghai, CN, 2012033d ago
-
Data Engineer CNY 360K-600KAPIs | Airflow | Alerting | Anonymization | CI/CDFlexible working models | Health and wellbeing benefits | Professional learning and developmentSenior-level Full TimeShanghai, CN, 2012033d ago
-
Backend Developer - AI & Agentic Applications CNY 360K-600KCI/CD | Distributed Systems | Docker | Error Handling | GoSenior-level Full TimeShanghai, CN, 2012033d ago
-
Ai研发工程师(云服务与大模型部署) CNY 180K-300KC++ | CI/CD | Cloud Computing | Distributed Systems | Edge ComputingMid-level Full Time深圳3d ago