MaaS 架构师
Tasks
- Apply prefix caching and paged attention
- Build LLM inference gateway
- Configure priority queues and multi tenant isolation
- Create unified inference load abstraction
- Deploy quantized models
- Design MaaS platform architecture
- Design token level metering
- Develop GPU elastic scaling strategy
- Enable continuous batching and speculative decoding
- Ensure model service SLAs latency throughput
- Implement KV cache aware scheduling
- Implement model routing and traffic scheduling
- Package inference services with observability
- Select and optimize inference engine frameworks
Perks/Benefits
- N/A
Skills/Tech-stack
Attention | Batching | C++ | CUDA | Continuous batching | GPU scheduling | Go | KV cache | Kubernetes | LLM serving | Multi-tenancy | NCCL | Paged Attention | Prefix caching | Python | Quantization | Ray Serve | SGLang | Speculative decoding | TensorRT-LLM | Triton | VLLM
Education
Bachelor of Engineering | Bachelor of Science | Master of Science | PhD
Roles
Regions
Countries
States
Related jobs
-
Ai产品经理 CNY 144K-240KArtificial Intelligence | Coze | Data cleaning | Data collection | Data synchronizationEntry-level Full Time深圳6h ago
-
Entry-level Internship南京7h ago
-
Entry-level Internship南京7h ago
-
Entry-level Internship南京7h ago
-
Entry-level Full Time上海7h ago
-
Entry-level Internship深圳8h ago
-
Entry-level Internship北京8h ago
-
AI Agent 开发实习生(通用智能仿真方向) CNY 25K-37KAPI | API Integration | Agent architecture | Agent systems | Asynchronous programmingEntry-level Internship广州8h ago
-
Embodied AI Research Intern CNY 25K-37KAction Model | Agentic AI | Auto-labeling | CLIP | Computer VisionEntry-level Full Time Internship深圳、上海8h ago
-
Ai研发工程师(云服务与大模型部署) CNY 180K-300KC++ | CI/CD | Cloud Computing | Distributed Systems | Edge ComputingMid-level Full Time深圳3d ago
-
大模型算法工程师(Memory/RAG/意图识别) CNY 180K-360KComputer Vision | Data Processing | Data cleaning | Data labeling | Dataset developmentEntry-level Full Time深圳3d ago
-
AI应用研发工程师(架构设计/RAG/Agent) CNY 180K-300KCI/CD | Context Management | Data pipeline | Deep learning | DockerMid-level Full Time深圳3d ago
-
Entry-level Full Time深圳3d ago
-
Data Architect China - RDT Diagnostics Commercial CNY 360K-600KAnalytics Layer | Architecture Decision | Architecture Decision Record | Batch Processing | Cloud DataSenior-level Full TimeSHANGHAI, China5d ago
-
Ai算法工程师 CNY 144K-240KBig Data | Big data processing | Data Processing | Deep learning | Feature EngineeringEntry-level Full Time深圳5d ago
-
【校招实习】Ai算法工程师 CNY 25K-37KComputer Vision | Data Analysis | Deep learning | Feature Engineering | HadoopInternship opportunityEntry-level Internship深圳5d ago
-
Entry-level Internship深圳5d ago
-
Ai数据闭环研发工程师 CNY 240K-360KData Distribution | Data Distribution Strategy | Data Flywheel | Data Mining | Data evaluationSenior-level Full Time上海、北京5d ago
-
Mid-level Full Time上海5d ago
-
Senior-level Full Time上海、武汉、北京5d ago
-
算法工程师-大模型数据方向 CNY 240K-360KAnnotation Automation | Apache Spark | Clustering | Data Annotation | Data GovernanceSenior-level Full Time上海5d ago
-
数据智能团队负责人 CNY 240K-360KAnomaly Detection | ClickHouse | Data Governance | Data Modeling | Data QualitySenior-level Full Time上海5d ago
-
Senior-level Full Time上海5d ago
-
Mid-level Full Time上海5d ago
-
Senior-level Full Time上海5d ago