aijobs.net

Ai基础架构工程师-大规模集群方向

上海

CNY 240K-360K (estimate) Senior-level Full Time

Apply Save
Found 1d ago
Tasks
Perks/Benefits
Skills/Tech-stack

Automation tools | CUDA | CUDNN | Ceph | Containerd | DCGM | DDP | DeepSpeed | Docker | FSDP | GPFS | Golang | Golang Automation Tools | Grafana | Infiniband | Kubeflow | Kubernetes | Kubernetes Operator | Language model training | Large Language Model | Large language model training | Linux | Lustre | Megatron | Megatron-LM | Minio | Model Training | NCCL | NVIDIA A100 | NVIDIA GPU | NVIDIA H100 | NVLink | NVSwitch | Pipeline parallelism | Prometheus | PyTorch | Python | RDMA | RoCE | TensorFlow | Volcano

Education

Bachelor of Engineering | Bachelor of Science

Roles

AI | AI Infrastructure Engineer | Engineer | Infrastructure Engineer | Platform | Platform Engineer

Regions

Asia/Pacific

Countries

China

States

Shanghai, CN

Cities

Shanghai, Shanghai, CN

Apply Save
Language: zh Views: 2 Clicks: 1 Saves: 0

Related jobs