Associate Director, Software Engineering (Model Hosting/Inference Optimisation)
Shenzhen, Guangdong, China
R
CNY 240K-360K (estimate) Mid-level Full Time
Tasks
- Apply hardware specific optimizations
- Build fine tuning pipelines
- Design model hosting platforms for LLMs embeddings and STT TTS
- Ensure production reliability scalability security and high availability
- Integrate inference frameworks
- Monitor inference health and performance
- Optimize inference for latency throughput and cost
- Troubleshoot deployment bottlenecks
- Validate fine tuned models and integrate into hosting stack
Perks/Benefits
- N/A
Skills/Tech-stack
AWQ | AWS | Accelerate | Azure | Batching | CUDA | Distributed Training | Docker | FP8 | Fine Tuning | GCP | GPTQ | Hugging Face | Hugging Face Transformers | Hyperparameter Tuning | INT4) | Inference Optimization | KV cache | Kubernetes | LLM | LoRA | Operator optimization | Python | QLoRA | Quantization | SGLang | TensorRT-LLM | VLLM
Education
Related jobs
-
AI ML Engineer CNY 280K-360KAWS | Azure | C++ | Cloud Computing | Computer VisionPerformance bonuses | Professional development opportunities | Remote workMid-level Full TimeShenzhen, Guangdong Province, China R1d ago
-
API Development | Artificial Intelligence | Cloud Computing | Data Pipelines | Data integrationMid-level Full TimeRemote, China R1d ago
-
AI工程师-Agent Memory & RAG 方向(成都) CNY 240K-480KBERT | Chroma | Cross-Encoder | Embedding Models | FaissSenior-level Full Time成都 R1d ago
-
AI工程师-Agent Memory & RAG 方向(武汉) CNY 240K-480KAlgorithms | BERT | Chroma | Cross-Encoder | Data StructuresSenior-level Full Time武汉 R1d ago
-
AI工程师-Agent Memory & RAG 方向(北京) CNY 240K-480KBERT | Chroma | Cross-Encoder | Embedding Models | FaissSenior-level Full Time北京 R1d ago
-
AWS | Access Control | Agentic Workflows | Auditability | AzureMid-level Full TimeShenzhen, Guangdong, China R3d ago
-
AWS | Agent Orchestration | Agent systems | Azure | DockerMid-level Full TimeShenzhen, Guangdong, China R3d ago
-
AVP, AI Solution Lead CNY 360K-600KCloud Computing | DataOps | DevOps | Flutter | Generative AIContinuous professional development | Flexible workingSenior-level Full TimeGuangzhou, Guangdong, China R3d ago
-
Analytics Modelling CNY 360K-600KAWS | BigQuery | Cloud platform | Google Cloud | Google Cloud PlatformContinuous professional development | Flexible working | Inclusive and diverse environment | Opportunities for growthSenior-level Full TimeGuangzhou, Guangdong, China R3d ago
-
Mid-level Full Time北京 R4d ago
-
Entry-level Full Time北京 R4d ago
-
Entry-level Full Time北京 R4d ago
-
Mid-level Full Time北京 R4d ago
-
具身智能算法工程师-模型 CNY 500K-500KActor-critic | Deep learning | Distributed Training | Implicit Q Learning | Inference accelerationMid-level Full Time北京 R4d ago
-
AI基础设施研发工程师(Sandbox / 容器化)-MiMo CNY 180K-420KAppArmor | Argo Workflows | CPU resource scheduling | Cgroup | ContainerdMid-level Full Time北京 R4d ago
-
Generative AI - ML System Engineering CNY 360K-600KC++ | CUDA | Compilation | Data pipeline | Diffusion ModelsFully remote option | On-site work flexibilitySenior-level Full TimeShanghai R8d ago
-
AVP, Decision Science, Global Risk Analytics CNY 300K-420KBasel | Credit Risk | Credit risk modeling | MATLAB | Machine LearningContinuous professional development | Flexible working | Inclusive diverse environmentExecutive-level Full TimeGuangzhou, Guangdong, China R9d ago
-
Lead Embedded Software Engineer CNY 349K-437KARM | BLE | C# | C++ | Embedded LinuxHybrid work model | Remote-friendly | Work from homeSenior-level Full TimeSuzhou, China R9d ago
-
Manager - Data and Analytics CNY 216K-296KBigQuery | Customer Research | Data analytics | Google Cloud | Machine LearningMid-level Full TimeGuangzhou, Guangdong, China R12d ago
-
Mid-level Full Time上海、深圳 R14d ago
-
模型部署与推理优化工程师 CNY 180K-360KC++ | Edge inference | Inference Performance | Inference Performance Optimization | Model DistillationMid-level Full Time北京 R21d ago
-
Entry-level Internship上海 R21d ago
-
Mid-level Full Time北京 R1mo ago
-
AWS | Azure | JavaScript | NoSQL | Node.jsFast-paced environment | Remote workMid-level Full TimeHangzhou R1mo ago
-
AWS | Agile | Azure | Blockchain | CursorRemote workMid-level Full TimeShenzhen R1mo ago