Find jobs in AI/ML, Data Science and Big Data
45 results
for Tensor Parallelism
(Skill/Tech stack)
-
Engineering Manager, Model Inference USD 220K-270KAPIs | Attention Mechanism | Batching | Distributed Systems | Docker401k matching | Commuter benefits | Flexible PTO | Flexible spending accounts | Generous time offMid-level Full TimeSF Office13h ago
-
AI Performance Optimization Engineer USD 100K-150KC++ | CPU Profiling | Continuous batching | Cutlass | Deep Learning ProfilingBenefits | Career growth potential | Remote workMid-level Full TimeUnited States - Remote R1d ago
-
Compute Shaders | Diffusion Models | Distributed inference | Edge Computing | Expert parallelism100 percent remoteSenior-level Full TimeRemote job R1d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 201K-332KComputer Vision | Diffusion Models | Edge Computing | Expert parallelism | Flash AttentionRemote workSenior-level Full TimeRemote job R1d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 201K-332KCompute Shaders | Diffusion Models | Distributed inference | Edge Computing | Expert parallelismEnglish communication support | Remote workSenior-level Full TimeRemote job R1d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 201K-332KDiffusion Models | Distributed Inference Systems | Distributed inference | Edge Computing | Expert parallelismRemote workSenior-level Full TimeRemote job R1d ago
-
Diffusion Models | Distributed Inference Systems | Distributed inference | Expert parallelism | Flash Attention100 percent remote | Worldwide remoteSenior-level Full TimeRemote job R1d ago
-
具身世界模型推理INFRA工程师 - XiaomiRobotics CNY 240K-480KCFG Parallelization | Diffusion Models | Expert parallelism | FP8 Quantization | Inference OptimizationSenior-level Full Time北京2d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KDiffusion Models | Distributed Inference Systems | Distributed inference | Expert parallelism | Flash AttentionEnglish support | Remote workSenior-level Full TimeRemote job R3d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KCompute Shaders | Diffusion Models | Distributed Inference Systems | Distributed inference | Edge ComputingRemote workSenior-level Full TimeRemote job R3d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KComputer Vision | Deep learning | Diffusion Models | Distributed inference | Edge ComputingRemote workSenior-level Full TimeRemote job R3d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KCompute Shaders | Diffusion Models | Distributed Inference Systems | Distributed inference | Edge ComputingRemote workSenior-level Full TimeRemote job R3d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KDiffusion Models | Distributed Inference Systems | Distributed inference | Edge Computing | Expert parallelismCareer growth | Collaborative research environment | English communication support | Remote work opportunitySenior-level Full TimeRemote job R3d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KCompute Shaders | Custom Compute Shaders | Data Pipelines | Diffusion Models | Distributed Inference SystemsRemote workSenior-level Full TimeRemote job R3d ago
-
Software Engineer, Inference - Multi Modal USD 295K-555KDistributed Systems | GPU | High Throughput | Inference | Language ModelsEntry-level Full TimeSan Francisco4d ago
-
AI Performance Optimization Engineer USD 136K-258KC++ | Cache optimization | Continuous batching | Cutlass | Deep learningMid-level Full TimeUnited States - Remote R4d ago
-
AI Performance Optimization Engineer USD 136K-258KAccess patterns | Benchmarking | C++ | Cache optimization | Compiler optimizationFull-time W2 employment | Health benefits | Remote workMid-level Full TimeUnited States - Remote R4d ago
-
AI Performance Optimization Engineer USD 159K-264KC++ | Continuous batching | Cutlass | Deep learning | DeepSpeedRemote workMid-level Full TimeUnited States - Remote R4d ago
-
AI Performance Optimization Engineer USD 136K-258KBenchmarking | C++ | Compiler optimization | Continuous batching | DebuggingMid-level Full TimeUnited States - Remote R4d ago
-
AI Performance Optimization Engineer USD 136K-258KAccess Optimization | Attention Optimization | Benchmarking | C++ | Compiler optimizationMid-level Full TimeUnited States - Remote R5d ago
-
Activation checkpointing | Attention Mechanisms | CUDA | Collective operations | Data parallelismSenior-level Full TimeMountain View, California; San Francisco, California6d ago
-
Senior Software Engineer, CUDA Deep Learning Systems USD 184K-356KC++ | CUDA | CUDA kernel | CUDA kernel optimization | Computer ArchitectureEquity | Health benefits | Paid time offSenior-level Full TimeUS, CA, Santa Clara, United States7d ago
-
Senior Deep Learning Frameworks CUDA Software Engineer USD 184K-356KAI compilers | C++ | CUDA | Distributed machine learning | HPC communicationSenior-level Full TimeUS, CA, Santa Clara, United States7d ago
-
API Integration | Agent systems | Benchmarking | Computer Vision | Data DriftBare Metal GPU Cluster Access | Company bike leasing | Company events | Employer Subsidy | Flexible work hoursSenior-level Full TimeHeidelberg8d ago
-
Software Engineering Manager, LLM Training USD 170K-277KCUDA | Containerization | Context Parallelism | Data I/O | Data parallelismEntry-level Full TimeMountain View, CA, United States8d ago
-
AWQ | Audio codecs | Audio streaming | Autoscaling | Chunked prefill401k matching | Annual offsites | Dental coverage | Employer-paid training | Healthcare benefitsMid-level Full TimeSan Francisco, CA13d ago
-
Forward Deployed Engineer (Inference & Post-Training) USD 270K-300KDPO | GRPO | KV cache | LoRA | Pipeline parallelismEquity | Health insurance | Remote work flexibilitySenior-level Full TimeSan Francisco13d ago
-
Research Engineer, Training & Inference USD 200K-450KC++ | CUDA | Cutlass | Distributed Training | FSDP401k matching | Employer-paid health insurance | Health Savings Account (HSA) | Unlimited PTOEntry-level Full TimePalo Alto13d ago
-
Intern Researcher – AI Foundation Model Training CAD 58K-104KAI Agent | AI agent systems | Agent systems | Architecture Search | Computational Graph OptimizationEntry-level InternshipMarkham, Ontario, Canada22d ago
-
C++ | CUDA | CUDA profiling | Collective communication | Communication Compute OverlapSenior-level Full TimeIsrael, Tel Aviv R23d ago
-
Principal High-Performance LLM Training Engineer USD 272K-431KActivation checkpointing | Benchmarking | CUDA | Communication and Computation Overlap | CompilersBenefits | EquitySenior-level Full TimeUS, CA, Santa Clara, United States23d ago
-
3D Parallelism | C++ | CUDA | Data parallelism | DeepSpeedEntry-level Full TimeHong Kong26d ago
-
3D Parallelism | C++ | CUDA | Data parallelism | DeepSpeedEntry-level Full TimeSingapore26d ago
-
C++ | CUDA | Data parallelism | DeepSpeed | InfinibandEntry-level Full TimeChina26d ago
-
C++ | CUDA | Data parallelism | DeepSpeed | InfinibandEntry-level Full TimeBoston, USA26d ago
-
3D Parallelism | C++ | CUDA | Data parallelism | DeepSpeedEntry-level Full TimeSeattle, USA26d ago
-
3D Parallelism | C++ | CUDA | Data parallelism | DeepSpeedEntry-level Full TimeOregon, USA26d ago
-
C++ | CUDA | Data parallelism | DeepSpeed | InfinibandEntry-level Full TimeSan Francisco Bay Area, USA26d ago
-
Senior Software Engineer, RL Post-Training Frameworks USD 184K-356KActor Based Programming | C# | C++ | Consistency models | DPOComprehensive benefits | EquitySenior-level Full TimeUS, CA, Santa Clara, United States28d ago
-
Senior DL Software Engineer, Model Optimization and Edge Deployment - Autonomous Vehicles USD 184K-356KC++ | CUDA | Cutlass | Efficient Attention | GPU ArchitectureSenior-level Full TimeUS, CA, Santa Clara, United States30d ago
-
Senior Software Engineer, Machine Learning, Core ML USD 174K-252KC++ | Compiler optimization | Data Processing | Data parallelism | DebuggingSenior-level Full TimeMountain View, CA, USA1mo ago
-
Principal Deep Learning Communication Architect USD 272K-431K3D Parallelism | CUDA | Context Parallelism | Data parallelism | DeepSpeedSenior-level Full TimeUS, CA, Santa Clara, United States1mo ago
-
Research Engineer, Infrastructure USD 255K-400KC++ | Checkpointing | Compute efficiency | Data Pipelines | Data parallelismSenior-level Full TimeSan Francisco Bay Area1mo ago
-
Software Engineer, Inference – AMD GPU Enablement USD 295K-555KCUDA | Collective communication | Distributed Systems | GPU Kernels | HIPMid-level Full TimeSan Francisco1mo ago
-
Senior Engineering Manager, AI Runtime USD 228K-297KCheckpointing | Cluster Lifecycle Management | Cluster lifecycle | DeepSpeed | Distributed TrainingSenior-level Full TimeMountain View, California; San Francisco, California1mo ago