Associate Director, Software Engineering (Model Hosting/Inference Optimisation)
Shenzhen, Guangdong, China
R
CNY 240K-360K (estimate) Mid-level Full Time
Tasks
- Apply hardware specific optimizations
- Build fine tuning pipelines
- Design model hosting platforms for LLMs embeddings and STT TTS
- Ensure production reliability scalability security and high availability
- Integrate inference frameworks
- Monitor inference health and performance
- Optimize inference for latency throughput and cost
- Troubleshoot deployment bottlenecks
- Validate fine tuned models and integrate into hosting stack
Perks/Benefits
- N/A
Skills/Tech-stack
AWQ | AWS | Accelerate | Azure | Batching | CUDA | Distributed Training | Docker | FP8 | Fine Tuning | GCP | GPTQ | Hugging Face | Hugging Face Transformers | Hyperparameter Tuning | INT4) | Inference Optimization | KV cache | Kubernetes | LLM | LoRA | Operator optimization | Python | QLoRA | Quantization | SGLang | TensorRT-LLM | VLLM
Education
Related jobs
-
Data Analytics and CRM Manager CNY 240K-330KArbitration | BigQuery | Campaign Management | Campaign Measurement | Contact policiesContinuous professional development | Flexible working | Inclusive workplaceMid-level Full TimeGuangzhou, Guangdong, China R14h ago
-
Behavior Cloning | C++ | Cloud processing | Control | DaggerEntry-level Internship北京、上海 R3d ago
-
Entry-level Full Time上海、深圳 R3d ago
-
Mid-level Full Time北京 R3d ago
-
Manager - Data and Analytics CNY 216K-296KBigQuery | CRM | Customer Research | Data analytics | Google CloudCareer growth opportunities | Continuous professional development | Flexible working | Inclusive working environment | Supportive cultureMid-level Full TimeGuangzhou, Guangdong, China R3d ago
-
Avp - Data And Analytics CNY 300K-420KAWS | BigQuery | Cloud Computing | Cloud platform | Customer SegmentationCareer growth opportunities | Flexible working | Inclusive work environment | Professional developmentExecutive-level Full TimeGuangzhou, Guangdong, China R3d ago
-
AWS | Access Control | Agentic Workflows | Auditability | AzureMid-level Full TimeGuangzhou, Guangdong, China R4d ago
-
Deep Learning Compiler CI/Infrastructure Engineer CNY 160K-240KAI Agents | Agent workflows | Artifact management | Automated triage | AutomationGenerous benefits packageSenior-level Full TimeChina, Shanghai R4d ago
-
API Design | AWS | Agent Loop | Agent Orchestration | Async workflowsSenior-level Full TimeShenzhen, Guangdong Province, China - Remote R4d ago
-
Applied AI Engineer CNY 300K-399KA/B | A/B Testing | API Integration | Analytics | AnthropicCareer growth | Fully remote | Global team collaboration | High ownership culture | Learning and development budgetMid-level Full TimeChina R5d ago
-
Lead AI Engineer (AI Systems & Automation) CNY 360K-600KAlerting | Anthropic | Distributed Systems | Docker | EmbeddingsFully remote | High ownership culture | Learning and development budgetSenior-level Full TimeChina R5d ago
-
Mid-level Full Time深圳 R9d ago
-
Mid-level Full Time北京 R9d ago
-
Mid-level Full Time北京 R9d ago
-
MiMo-大模型训练框架开发工程师 CNY 240K-480KC++ | CI/CD | DeepSpeed | Distributed Training | GPU Memory OptimizationEntry-level Full Time北京 R10d ago
-
Entry-level Full Time北京 R10d ago
-
Mid-level Full Time北京 R10d ago
-
具身智能算法工程师-模型 CNY 500K-500KDeep learning | Distributed Training | IQL | Inference Optimization | Isaac LabMid-level Full Time北京 R10d ago
-
AI Engineer USD 100K-200KAPIs | Agentic Frameworks | Artificial Intelligence | Backtesting | Data integrationCollaborative work environment | Equity options | Innovation-driven cultureMid-level Full TimeShanghai, Shanghai, China R10d ago
-
Senior Software Engineer (RAG Backend Developer) CNY 120K-180KA/B | A/B Testing | ABAC | Audit Logging | B testingSenior-level Full TimeGuangzhou, Guangdong, China R11d ago
-
Ai 院--多模态团队--多模态理解算法研究员-强化学习方向 CNY 240K-480KDPO | Data Preprocessing | Data cleaning | DeepSpeed | Distributed TrainingSenior-level Full Time北京 R1mo ago
-
Lead Technical Support Engineer - AI / ML CNY 144K-240KAPI Integration | Agent Frameworks | C plus plus | Cause analysis | Cloud ComputingHybrid work model | Travel for customer workshops | Work from homeSenior-level Full TimeBeijing, China R1mo ago
-
AI基础设施研发工程师(Sandbox / 容器化)-MiMo CNY 180K-420KAppArmor | Argo Workflows | CI/CD | CPU resource scheduling | CgroupMid-level Full Time北京 R1mo ago
-
Entry-level Full Time北京、上海 R1mo ago
-
AI platforms | API Development | Artificial Intelligence | Cloud AI | Cloud AI PlatformsMid-level Full TimeRemote, China R1mo ago