Find jobs in AI/ML, Data Science and Big Data
12 results
for GPU Kernels
(Skill/Tech stack)
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KDiffusion Models | Distributed Inference Systems | Distributed inference | Expert parallelism | Flash AttentionEnglish support | Remote workSenior-level Full TimeRemote job R19h ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KCompute Shaders | Diffusion Models | Distributed Inference Systems | Distributed inference | Edge ComputingRemote workSenior-level Full TimeRemote job R19h ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KComputer Vision | Deep learning | Diffusion Models | Distributed inference | Edge ComputingRemote workSenior-level Full TimeRemote job R19h ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KCompute Shaders | Diffusion Models | Distributed Inference Systems | Distributed inference | Edge ComputingRemote workSenior-level Full TimeRemote job R19h ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KCompute Shaders | Custom Compute Shaders | Data Pipelines | Diffusion Models | Distributed Inference SystemsRemote workSenior-level Full TimeRemote job R19h ago
-
Senior Deep Learning Systems Engineer, Datacenters USD 184K-356KC# | C++ | CPU architecture | CUDA | CompilersEquity | Health insurance | Hybrid work | Paid time offSenior-level Full TimeUS, CA, Santa Clara, United States11d ago
-
AI GPU Arch Perf Optimization Intern CNY 38K-50KCUDA | Computer Systems | GPU Kernels | GPU Programming | Memory systemsOn-site workEntry-level Full Time InternshipCHN - Minhang, China18d ago
-
AI Software Engineer Intern CNY 38K-50KCUDA | Compiler optimization | Continuous batching | Distributed Systems | Dynamic batchingOn-site workEntry-level Full Time InternshipCHN - Minhang, China19d ago
-
Senior Solutions Architect, Generative AI USD 184K-356KAttention kernels | C++ | CUDA | CUDA CUTLASS | CUDNNComprehensive benefits package | Occasional travel for customer visits and conferences | Remote work optionSenior-level Full TimeUS, CA, Santa Clara, United States21d ago
-
Senior Software Engineer, LLM Performance USD 180K-339KC++ | CUDA | Cutlass | FlashAttention | FlashInferSenior-level Full TimeSF Bay Area (Hybrid) R1mo ago
-
Software Engineer, Inference – AMD GPU Enablement USD 295K-555KCUDA | Collective communication | Distributed Systems | GPU Kernels | HIPMid-level Full TimeSan Francisco1mo ago
-
ML Systems Engineer, ML Acceleration SGD 120K-155KCUDA | Data loading | Distributed Training | GPU Kernels | Kernel FusionSenior-level Full TimeSingapore, Central, Singapore1mo ago