Principal Deep Learning Communication Architect
US, CA, Santa Clara, United States
USD 272K-431K Senior-level Full Time
Tasks
- Co design communication primitives with application developers
- Collaborate on hardware and software co design for networking
- Define technical roadmap for communication libraries
- Design communication primitives and collective algorithms
- Develop analytical models and simulators for system behavior
- Ensure evolution of communication libraries for large language models
- Lead development and scaling for distributed deep learning
- Optimize communication for heterogeneous interconnects
Perks/Benefits
- N/A
Skills/Tech-stack
3D Parallelism | CUDA | Context Parallelism | Data parallelism | DeepSpeed | Expert parallelism | Infiniband | JAX | MPI | Megatron Core | NCCL | NVSHMEM | Pipeline parallelism | PyTorch | PyTorch distributed | RDMA | RoCE | SGLang | Tensor Parallelism | TensorRT-LLM | UCC | UCX | VLLM | XLA | Zero
Education
Regions
Countries
States
Cities
Related jobs
-
Applied AI Engineering Intern USD 60K-120KAnomaly Detection | Clustering | Computer Vision | Data Analysis | Data VisualizationEqual opportunity employer | Hybrid work environmentEntry-level InternshipSanta Clara14h ago
-
AWS | Agent systems | Apache Spark | Apache Spark SQL | AzurePeriodic onsite work | Remote work option | Travel as neededSenior-level Full TimeMaryland; Virginia; Washington, D.C.14h ago
-
Principal Applied AI Scientist - Agentic AI USD 190K-210KAgentic Systems | Autogen | Cloud Computing | Data Quality | Data Systems401k match | Flexible schedule | Health insurance | Paid parental leave | Paid time offSenior-level Full TimeWork From Home, United States R15h ago
-
Principal Applied AI Scientist - Predictive AI USD 190K-210KAnomaly Detection | Big Data | Cloud Computing | Data Quality | Data quality assurance401k match | Disability insurance | Employee assistance program | Flexible schedules | Health insuranceSenior-level Full TimeWork From Home, United States R15h ago
-
Principal Applied AI Scientist - Agentic AI USD 190K-210KAccuracy testing | Agentic Systems | Autogen | Cloud Computing | CrewAI401k match | Dental insurance | Employee assistance program | Employee stock purchase plan | Flexible schedulingSenior-level Full TimeWork From Home, United States R15h ago
-
APIs | Android Development | Artificial Intelligence | IOS Development | JavaMid-level ContractChicago, United States17h ago
-
AI/ML Engineer USD 170K-220KBenchmarking | Cloud Computing | Containerization | Debugging | Distributed Systems401k match | Education & training benefits | Healthcare dental vision coverage | Paid Holidays | Paid time offSenior-level Full TimeRome, NY R18h ago
-
Senior-level Full TimeAustin, Texas, United States19h ago
-
Software Engineer, Systems ML - SW/HW Co-design USD 117K-173KAI infrastructure | Bias Mitigation | C# | C++ | Co-designSenior-level Full TimeSunnyvale, CA | Redmond, WA20h ago
-
Senior Staff AI Engineer USD 180K-240KA3C | Actor-critic | Adaptive computation | Benchmarks | C plus plusSenior-level Full TimeLos Altos, California,22h ago
-
Senior-level Full TimeUS, CA, Remote, United States R1d ago
-
Principal AI/ML Engineer USD 165K-226KC# | C++ | CI/CD | CUDA | Computer Vision401k match | Dental insurance | Health insurance | Life insurance | Paid time offSenior-level Full TimeRemote PA - PA PAR, United … R1d ago
-
Associate Director - AI Engineering USD 150K-190KAI Governance | AWS | Agentic Frameworks | Agentic framework | Amazon SageMakerCompany-matched student loan contribution | Continuous learning | Family-friendly benefits | Financial wellness programs | Flexible time offMid-level Full TimeUS - NY NYC - 55 …1d ago
-
AI Agents | AWS | Agentic AI | CUDA | Deep learningCompetitive vacation and holidays | Comprehensive wellness programs | Employee networks | Great Place to Work certified | Paid adoption leaveSenior-level Full TimeAustin, United States R1d ago
-
AI Research Engineer (Applied AI) USD 100K-150KAblation Studies | Accelerator hardware | Computer Vision | Data Quality | Data labelingCareer growth | Full-time employment | Remote workMid-level Full TimeUnited States - Remote R1d ago
-
AI Performance Optimization Engineer USD 100K-150KBenchmarking | C++ | CUDA | Compiler optimization | Continuous batchingCareer growth | Remote workMid-level Full TimeUnited States - Remote R1d ago
-
Software Engineer Senior- AI Engineer USD 55K-158K4 bit Quantization | 4-bit | 8 bit Quantization | API Orchestration | Agent systems401k match | Dental insurance | Educational assistance | Health insurance | Paid time offSenior-level Full TimeTwo PNC Plaza (PA374), United States1d ago
-
AI Engineer USD 99K-192KAgile | Apache Spark | Data Pipelines | Data Preprocessing | Feature EngineeringEmployee resource groups | Flexible family care days | Medical dental vision prescription drug coverage | Paid Holiday Week Between Christmas And New Year | Paid HolidaysMid-level Full TimeDearborn, MI, United States1d ago
-
Mid-level Full TimeHerndon, VA1d ago
-
Entry-level Full TimeNew York, NY, United States1d ago
-
AI Engineer - FDE (Forward Deployed Engineer) USD 152K-210KAWS | Agent systems | Apache Spark | Azure | Cloud platformRemote work opportunities | Travel as neededSenior-level Full TimeUnited States1d ago
-
Entry-level Full TimeNew York, NY, United States1d ago
-
Senior-level Full TimeNew York, NY1d ago
-
Senior Solutions Engineer, AI Infrastructure USD 184K-287KApache Slurm | Apache Spark | BeeGFS | Ceph | CheckpointingSenior-level Full TimeRemote, NY, US R1d ago
-
Research Associate I, Step 1 USD 37KAWS | Azure | Computer Vision | Data Engineering | Data PreprocessingMid-level Full TimeAlabama1d ago