Find jobs in AI/ML, Data Science and Big Data
46 results
for Tensor Parallelism
(Skill/Tech stack)
-
Staff Software Engineer, AI Runtime USD 190K-265KCUDA | Checkpointing | Data parallelism | DeepSpeed | Distributed SystemsSenior-level Full TimeMountain View, California; San Francisco, California1d ago
-
Senior Software Engineer, AI Runtime USD 160K-225KAlgorithms | Checkpointing | Collective communication | Data Structures | Data parallelismSenior-level Full TimeMountain View, California; San Francisco, California1d ago
-
AI Performance Optimization Engineer USD 100K-150KBenchmarking | C++ | Communication Primitives | Continuous batching | Distributed TrainingCareer growth potential | Remote workMid-level Full TimeUnited States - Remote R2d ago
-
AI Performance Optimization Engineer USD 100K-150KC++ | CUDA | Continuous batching | Cutlass | Deep learningCareer growth | Health benefits | Remote workMid-level Full TimeUnited States - Remote R2d ago
-
AI Performance Optimization Engineer USD 100K-150KC++ | Compiler optimization | Continuous batching | Distributed Training | FSDPMid-level Full TimeUnited States - Remote R2d ago
-
AI Performance Optimization Engineer USD 100K-150KAttention Mechanisms | Benchmarking | C++ | Continuous batching | Data pipelineCareer growth | Remote workMid-level Full TimeUnited States - Remote R5d ago
-
AI Performance Optimization Engineer USD 100K-150KBenchmarking | C++ | Compiler optimization | Continuous batching | CutlassBenefits | Full-time employment | Remote workMid-level Full TimeUnited States - Remote R5d ago
-
AI Performance Optimization Engineer USD 100K-150KBenchmarking | C plus plus | CUDA | Continuous batching | Distributed TrainingMid-level Full TimeUnited States - Remote R6d ago
-
AI Performance Optimization Engineer USD 100K-150KAccess Optimization | Attention Mechanisms | Benchmarking | C++ | Communication PrimitivesMid-level Full TimeUnited States - Remote R6d ago
-
Staff Compiler Engineer - PyTorch + Kernel DSLPLATE USD 163K-253KAutotuning | Collective Primitives | Cost Based Compilation | Custom ISA | Cutlass401k | Adoption support stipend | Charitable giving match | Fertility care stipend | Flexible work environmentSenior-level Full TimeSan Jose, California, United States7d ago
-
Mid-level Full Time北京 R10d ago
-
具身世界模型推理INFRA工程师 - XiaomiRobotics CNY 240K-480KCFG Parallelism | Diffusion Models | Expert parallelism | FP8 | Machine LearningSenior-level Full Time北京10d ago
-
Senior AI Engineer USD 209K-275KA/B | A/B Testing | Autoscaling | B testing | BashFour days in office | Hybrid work arrangement | Telecommuting one day per weekSenior-level Full TimeSan Jose (CA), United States20d ago
-
Engineering Manager, Model Inference USD 220K-270KAPIs | Attention Mechanism | Batching | Distributed Systems | Docker401k matching | Commuter benefits | Flexible PTO | Flexible spending accounts | Generous time offMid-level Full TimeSF Office20d ago
-
Compute Shaders | Diffusion Models | Distributed inference | Edge Computing | Expert parallelism100 percent remoteSenior-level Full TimeRemote job R21d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 201K-332KComputer Vision | Diffusion Models | Edge Computing | Expert parallelism | Flash AttentionRemote workSenior-level Full TimeRemote job R21d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 201K-332KCompute Shaders | Diffusion Models | Distributed inference | Edge Computing | Expert parallelismEnglish communication support | Remote workSenior-level Full TimeRemote job R21d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 201K-332KDiffusion Models | Distributed Inference Systems | Distributed inference | Edge Computing | Expert parallelismRemote workSenior-level Full TimeRemote job R21d ago
-
Diffusion Models | Distributed Inference Systems | Distributed inference | Expert parallelism | Flash Attention100 percent remote | Worldwide remoteSenior-level Full TimeRemote job R21d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KCompute Shaders | Diffusion Models | Distributed Inference Systems | Distributed inference | Edge ComputingRemote workSenior-level Full TimeRemote job R23d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KComputer Vision | Deep learning | Diffusion Models | Distributed inference | Edge ComputingRemote workSenior-level Full TimeRemote job R23d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KCompute Shaders | Diffusion Models | Distributed Inference Systems | Distributed inference | Edge ComputingRemote workSenior-level Full TimeRemote job R23d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KDiffusion Models | Distributed Inference Systems | Distributed inference | Edge Computing | Expert parallelismCareer growth | Collaborative research environment | English communication support | Remote work opportunitySenior-level Full TimeRemote job R23d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KCompute Shaders | Custom Compute Shaders | Data Pipelines | Diffusion Models | Distributed Inference SystemsRemote workSenior-level Full TimeRemote job R23d ago
-
Software Engineer, Inference - Multi Modal USD 295K-555KDistributed Systems | GPU | High Throughput | Inference | Language ModelsEntry-level Full TimeSan Francisco24d ago
-
Activation checkpointing | Attention Mechanisms | CUDA | Collective operations | Data parallelismSenior-level Full TimeMountain View, California; San Francisco, California26d ago
-
Senior Software Engineer, CUDA Deep Learning Systems USD 184K-356KC++ | CUDA | CUDA kernel | CUDA kernel optimization | Computer ArchitectureEquity | Health benefits | Paid time offSenior-level Full TimeUS, CA, Santa Clara, United States27d ago
-
Senior Deep Learning Frameworks CUDA Software Engineer USD 184K-356KAI compilers | C++ | CUDA | Distributed machine learning | HPC communicationSenior-level Full TimeUS, CA, Santa Clara, United States27d ago
-
API Integration | Agent systems | Benchmarking | Computer Vision | Data DriftBare Metal GPU Cluster Access | Company bike leasing | Company events | Employer Subsidy | Flexible work hoursSenior-level Full TimeHeidelberg28d ago
-
Software Engineering Manager, LLM Training USD 170K-277KCUDA | Containerization | Context Parallelism | Data I/O | Data parallelismEntry-level Full TimeMountain View, CA, United States28d ago
-
AWQ | Audio codecs | Audio streaming | Autoscaling | Chunked prefill401k matching | Annual offsites | Dental coverage | Employer-paid training | Healthcare benefitsMid-level Full TimeSan Francisco, CA1mo ago
-
Forward Deployed Engineer (Inference & Post-Training) USD 270K-300KDPO | GRPO | KV cache | LoRA | Pipeline parallelismEquity | Health insurance | Remote work flexibilitySenior-level Full TimeSan Francisco1mo ago
-
Research Engineer, Training & Inference USD 200K-450KC++ | CUDA | Cutlass | Distributed Training | FSDP401k matching | Employer-paid health insurance | Health Savings Account (HSA) | Unlimited PTOEntry-level Full TimePalo Alto1mo ago
-
Intern Researcher – AI Foundation Model Training CAD 58K-104KAI Agent | AI agent systems | Agent systems | Architecture Search | Computational Graph OptimizationEntry-level InternshipMarkham, Ontario, Canada1mo ago
-
C++ | CUDA | CUDA profiling | Collective communication | Communication Compute OverlapSenior-level Full TimeIsrael, Tel Aviv R1mo ago
-
Principal High-Performance LLM Training Engineer USD 272K-431KActivation checkpointing | Benchmarking | CUDA | Communication and Computation Overlap | CompilersBenefits | EquitySenior-level Full TimeUS, CA, Santa Clara, United States1mo ago
-
3D Parallelism | C++ | CUDA | Data parallelism | DeepSpeedEntry-level Full TimeHong Kong1mo ago
-
3D Parallelism | C++ | CUDA | Data parallelism | DeepSpeedEntry-level Full TimeSingapore1mo ago
-
C++ | CUDA | Data parallelism | DeepSpeed | InfinibandEntry-level Full TimeChina1mo ago
-
C++ | CUDA | Data parallelism | DeepSpeed | InfinibandEntry-level Full TimeBoston, USA1mo ago
-
3D Parallelism | C++ | CUDA | Data parallelism | DeepSpeedEntry-level Full TimeSeattle, USA1mo ago
-
3D Parallelism | C++ | CUDA | Data parallelism | DeepSpeedEntry-level Full TimeOregon, USA1mo ago
-
C++ | CUDA | Data parallelism | DeepSpeed | InfinibandEntry-level Full TimeSan Francisco Bay Area, USA1mo ago
-
Senior Software Engineer, RL Post-Training Frameworks USD 184K-356KActor Based Programming | C# | C++ | Consistency models | DPOComprehensive benefits | EquitySenior-level Full TimeUS, CA, Santa Clara, United States1mo ago
-
Senior DL Software Engineer, Model Optimization and Edge Deployment - Autonomous Vehicles USD 184K-356KC++ | CUDA | Cutlass | Efficient Attention | GPU ArchitectureSenior-level Full TimeUS, CA, Santa Clara, United States1mo ago
-
Principal Deep Learning Communication Architect USD 272K-431K3D Parallelism | CUDA | Context Parallelism | Data parallelism | DeepSpeedSenior-level Full TimeUS, CA, Santa Clara, United States1mo ago