Find jobs in AI/ML, Data Science and Big Data
55 results
for Pipeline parallelism
(Skill/Tech stack)
-
LLM Fine-Tuning Engineer USD 100K-150KAdapter-Tuning | Attention Optimization | Benchmarking | Cluster operations | DPOMid-level Full TimeUnited States - Remote R1d ago
-
Compute Shaders | Diffusion Models | Distributed inference | Edge Computing | Expert parallelism100 percent remoteSenior-level Full TimeRemote job R1d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 201K-332KComputer Vision | Diffusion Models | Edge Computing | Expert parallelism | Flash AttentionRemote workSenior-level Full TimeRemote job R1d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 201K-332KCompute Shaders | Diffusion Models | Distributed inference | Edge Computing | Expert parallelismEnglish communication support | Remote workSenior-level Full TimeRemote job R1d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 201K-332KDiffusion Models | Distributed Inference Systems | Distributed inference | Edge Computing | Expert parallelismRemote workSenior-level Full TimeRemote job R1d ago
-
Diffusion Models | Distributed Inference Systems | Distributed inference | Expert parallelism | Flash Attention100 percent remote | Worldwide remoteSenior-level Full TimeRemote job R1d ago
-
LLM Fine-Tuning Engineer USD 150K-270KAdapter-Tuning | DPO | Dataset curation | Distributed Training | Evaluation methodologyCareer growth | Mentorship | Remote workMid-level Full TimeUnited States - Remote R3d ago
-
AI Performance Optimization Engineer USD 136K-258KC++ | Continuous batching | Deep learning | Distributed Systems | FSDPMid-level Full TimeUnited States - Remote R3d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KDiffusion Models | Distributed Inference Systems | Distributed inference | Expert parallelism | Flash AttentionEnglish support | Remote workSenior-level Full TimeRemote job R3d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KCompute Shaders | Diffusion Models | Distributed Inference Systems | Distributed inference | Edge ComputingRemote workSenior-level Full TimeRemote job R3d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KComputer Vision | Deep learning | Diffusion Models | Distributed inference | Edge ComputingRemote workSenior-level Full TimeRemote job R3d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KCompute Shaders | Diffusion Models | Distributed Inference Systems | Distributed inference | Edge ComputingRemote workSenior-level Full TimeRemote job R3d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KDiffusion Models | Distributed Inference Systems | Distributed inference | Edge Computing | Expert parallelismCareer growth | Collaborative research environment | English communication support | Remote work opportunitySenior-level Full TimeRemote job R3d ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 200K-332KCompute Shaders | Custom Compute Shaders | Data Pipelines | Diffusion Models | Distributed Inference SystemsRemote workSenior-level Full TimeRemote job R3d ago
-
LLM Fine-Tuning Engineer USD 150K-270KBenchmarking | Direct Preference Optimization | Distributed Training | Efficient Attention | FSDPMid-level Full TimeUnited States - Remote R4d ago
-
LLM Fine-Tuning Engineer USD 150K-270KDPO | Dataset curation | Distributed Training | Efficient Attention | Evaluation methodologyCareer growth | Mentorship | Remote workMid-level Full TimeUnited States - Remote R4d ago
-
LLM Fine-Tuning Engineer USD 150K-270KCluster operations | DPO | Efficient Attention | Evaluation methodology | FSDPCareer growth | Employee benefits | Remote workMid-level Full TimeUnited States - Remote R4d ago
-
LLM Fine-Tuning Engineer USD 150K-270KAdapter based Fine Tuning | Benchmarking | Cluster operations | DPO | Distributed TrainingCareer growth | Health benefits | Mentorship | Remote workMid-level Full TimeUnited States - Remote R4d ago
-
AI Performance Optimization Engineer USD 136K-258KC++ | Cache optimization | Continuous batching | Cutlass | Deep learningMid-level Full TimeUnited States - Remote R4d ago
-
AI Performance Optimization Engineer USD 136K-258KAccess patterns | Benchmarking | C++ | Cache optimization | Compiler optimizationFull-time W2 employment | Health benefits | Remote workMid-level Full TimeUnited States - Remote R4d ago
-
AI Performance Optimization Engineer USD 159K-264KC++ | Continuous batching | Cutlass | Deep learning | DeepSpeedRemote workMid-level Full TimeUnited States - Remote R4d ago
-
AI Performance Optimization Engineer USD 136K-258KBenchmarking | C++ | Compiler optimization | Continuous batching | DebuggingMid-level Full TimeUnited States - Remote R4d ago
-
LLM Fine-Tuning Engineer USD 150K-270KAdapter-Tuning | DPO | Dataset curation | Efficient Attention | EvaluationHealth insurance | Paid time off | Remote workMid-level Full TimeUnited States - Remote R5d ago
-
AI Performance Optimization Engineer USD 136K-258KAccess Optimization | Attention Optimization | Benchmarking | C++ | Compiler optimizationMid-level Full TimeUnited States - Remote R5d ago
-
Activation checkpointing | Attention Mechanisms | CUDA | Collective operations | Data parallelismSenior-level Full TimeMountain View, California; San Francisco, California6d ago
-
Senior Software Engineer, CUDA Deep Learning Systems USD 184K-356KC++ | CUDA | CUDA kernel | CUDA kernel optimization | Computer ArchitectureEquity | Health benefits | Paid time offSenior-level Full TimeUS, CA, Santa Clara, United States7d ago
-
Senior Deep Learning Frameworks CUDA Software Engineer USD 184K-356KAI compilers | C++ | CUDA | Distributed machine learning | HPC communicationSenior-level Full TimeUS, CA, Santa Clara, United States7d ago
-
Large Model Training Acceleration Engineer USD 187K-387KBenchmarking | Data parallelism | Deep learning | Distributed Training | Distributed inferenceMid-level Full TimeSan Jose, California, United States7d ago
-
Technical Specialist-Data Engg INR 1500K-2200KAb Initio | Ab Initio GDE | Agile | Co Op | Component ParallelismMid-level Full TimeINDIA - HYDERABAD - BIRLASOFT OFFICE, …8d ago
-
Software Engineering Manager, LLM Training USD 170K-277KCUDA | Containerization | Context Parallelism | Data I/O | Data parallelismEntry-level Full TimeMountain View, CA, United States8d ago
-
Forward Deployed Engineer (Inference & Post-Training) USD 270K-300KDPO | GRPO | KV cache | LoRA | Pipeline parallelismEquity | Health insurance | Remote work flexibilitySenior-level Full TimeSan Francisco13d ago
-
C++ | CUDA | CUDA profiling | Collective communication | Communication Compute OverlapSenior-level Full TimeIsrael, Tel Aviv R23d ago
-
Principal High-Performance LLM Training Engineer USD 272K-431KActivation checkpointing | Benchmarking | CUDA | Communication and Computation Overlap | CompilersBenefits | EquitySenior-level Full TimeUS, CA, Santa Clara, United States23d ago
-
3D Parallelism | C++ | CUDA | Data parallelism | DeepSpeedEntry-level Full TimeHong Kong26d ago
-
3D Parallelism | C++ | CUDA | Data parallelism | DeepSpeedEntry-level Full TimeSingapore26d ago
-
C++ | CUDA | Data parallelism | DeepSpeed | InfinibandEntry-level Full TimeChina26d ago
-
C++ | CUDA | Data parallelism | DeepSpeed | InfinibandEntry-level Full TimeBoston, USA26d ago
-
3D Parallelism | C++ | CUDA | Data parallelism | DeepSpeedEntry-level Full TimeSeattle, USA26d ago
-
3D Parallelism | C++ | CUDA | Data parallelism | DeepSpeedEntry-level Full TimeOregon, USA26d ago
-
C++ | CUDA | Data parallelism | DeepSpeed | InfinibandEntry-level Full TimeSan Francisco Bay Area, USA26d ago
-
Senior Software Engineer, RL Post-Training Frameworks USD 184K-356KActor Based Programming | C# | C++ | Consistency models | DPOComprehensive benefits | EquitySenior-level Full TimeUS, CA, Santa Clara, United States28d ago
-
Senior Machine Learning Engineer USD 170K-240KAWS | Azure | Debugging | Distributed Computing | FSDPSenior-level Full TimeGM Automation - Sunnyvale - GM …1mo ago
-
Data parallelism | Deep learning | Distributed Training | Model Acceleration | Model BenchmarkingSenior-level Full TimeSan Jose, California, United States1mo ago
-
Computational optimization | Data parallelism | Deep learning | Distributed Training | Generative AIMid-level Full TimeSan Jose, California, United States1mo ago
-
Communication optimization | Data parallelism | Deep learning | Distributed Training | Generative AISenior-level Full TimeSeattle, Washington, United States1mo ago
-
Benchmarking | CUDA | Data parallelism | Distributed Training | Model ParallelismSenior-level Full TimeSan Jose, California, United States1mo ago
-
Benchmarking | CUDA | Communication optimization | Data parallelism | Deep learningMid-level Full TimeSeattle, Washington, United States1mo ago
-
Data parallelism | Deep learning | Distributed Training | GPU Acceleration | Model BenchmarkingMid-level Full TimeSan Jose, California, United States1mo ago
-
Senior Software Engineer, Machine Learning, Core ML USD 174K-252KC++ | Compiler optimization | Data Processing | Data parallelism | DebuggingSenior-level Full TimeMountain View, CA, USA1mo ago
-
Principal Deep Learning Communication Architect USD 272K-431K3D Parallelism | CUDA | Context Parallelism | Data parallelism | DeepSpeedSenior-level Full TimeUS, CA, Santa Clara, United States1mo ago