Principal Deep Learning Communication Architect
US, CA, Santa Clara, United States
USD 272K-431K Senior-level Full Time
Tasks
- Co design communication primitives with application developers
- Collaborate on hardware and software co design for networking
- Define technical roadmap for communication libraries
- Design communication primitives and collective algorithms
- Develop analytical models and simulators for system behavior
- Ensure evolution of communication libraries for large language models
- Lead development and scaling for distributed deep learning
- Optimize communication for heterogeneous interconnects
Perks/Benefits
- N/A
Skills/Tech-stack
3D Parallelism | CUDA | Context Parallelism | Data parallelism | DeepSpeed | Expert parallelism | Infiniband | JAX | MPI | Megatron Core | NCCL | NVSHMEM | Pipeline parallelism | PyTorch | PyTorch distributed | RDMA | RoCE | SGLang | Tensor Parallelism | TensorRT-LLM | UCC | UCX | VLLM | XLA | Zero
Education
Regions
Countries
States
Cities
Related jobs
-
Principal Applied AI Scientist - Agentic AI USD 190K-210KAgentic Systems | Autogen | Cloud Computing | Data Quality | Data Systems401k match | Flexible schedule | Health insurance | Paid parental leave | Paid time offSenior-level Full TimeWork From Home, United States R12h ago
-
Principal Applied AI Scientist - Predictive AI USD 190K-210KAnomaly Detection | Big Data | Cloud Computing | Data Quality | Data quality assurance401k match | Disability insurance | Employee assistance program | Flexible schedules | Health insuranceSenior-level Full TimeWork From Home, United States R12h ago
-
Principal Applied AI Scientist - Agentic AI USD 190K-210KAccuracy testing | Agentic Systems | Autogen | Cloud Computing | CrewAI401k match | Dental insurance | Employee assistance program | Employee stock purchase plan | Flexible schedulingSenior-level Full TimeWork From Home, United States R12h ago
-
APIs | Android Development | Artificial Intelligence | IOS Development | JavaMid-level ContractChicago, United States14h ago
-
Software Engineer, Systems ML - SW/HW Co-design USD 117K-173KAI infrastructure | Bias Mitigation | C# | C++ | Co-designSenior-level Full TimeSunnyvale, CA | Redmond, WA17h ago
-
Senior-level Full TimeUS, CA, Remote, United States R1d ago
-
Principal AI/ML Engineer USD 165K-226KC# | C++ | CI/CD | CUDA | Computer Vision401k match | Dental insurance | Health insurance | Life insurance | Paid time offSenior-level Full TimeRemote PA - PA PAR, United … R1d ago
-
Associate Director - AI Engineering USD 150K-190KAI Governance | AWS | Agentic Frameworks | Agentic framework | Amazon SageMakerCompany-matched student loan contribution | Continuous learning | Family-friendly benefits | Financial wellness programs | Flexible time offMid-level Full TimeUS - NY NYC - 55 …1d ago
-
AI Agents | AWS | Agentic AI | CUDA | Deep learningCompetitive vacation and holidays | Comprehensive wellness programs | Employee networks | Great Place to Work certified | Paid adoption leaveSenior-level Full TimeAustin, United States R1d ago
-
AI Research Engineer (Applied AI) USD 100K-150KAblation Studies | Accelerator hardware | Computer Vision | Data Quality | Data labelingCareer growth | Full-time employment | Remote workMid-level Full TimeUnited States - Remote R1d ago
-
AI Performance Optimization Engineer USD 100K-150KBenchmarking | C++ | CUDA | Compiler optimization | Continuous batchingCareer growth | Remote workMid-level Full TimeUnited States - Remote R1d ago
-
Software Engineer Senior- AI Engineer USD 55K-158K4 bit Quantization | 4-bit | 8 bit Quantization | API Orchestration | Agent systems401k match | Dental insurance | Educational assistance | Health insurance | Paid time offSenior-level Full TimeTwo PNC Plaza (PA374), United States1d ago
-
AI Engineer USD 99K-192KAgile | Apache Spark | Data Pipelines | Data Preprocessing | Feature EngineeringEmployee resource groups | Flexible family care days | Medical dental vision prescription drug coverage | Paid Holiday Week Between Christmas And New Year | Paid HolidaysMid-level Full TimeDearborn, MI, United States1d ago
-
Mid-level Full TimeHerndon, VA1d ago
-
Entry-level Full TimeNew York, NY, United States1d ago
-
AI Engineer - FDE (Forward Deployed Engineer) USD 152K-210KAWS | Agent systems | Apache Spark | Azure | Cloud platformRemote work opportunities | Travel as neededSenior-level Full TimeUnited States1d ago
-
Entry-level Full TimeNew York, NY, United States1d ago
-
Senior-level Full TimeNew York, NY1d ago
-
Senior Solutions Engineer, AI Infrastructure USD 184K-287KApache Slurm | Apache Spark | BeeGFS | Ceph | CheckpointingSenior-level Full TimeRemote, NY, US R1d ago
-
Research Associate I, Step 1 USD 37KAWS | Azure | Computer Vision | Data Engineering | Data PreprocessingMid-level Full TimeAlabama1d ago
-
Principal Engineer -In Bayesian, Large Foundational Systems, and Distributional Reinforcement Learning USD 296K-370KApache Kafka | Bayesian Neural Networks | Bayesian learning | C++ | Distributed ComputingSenior-level Full TimeUnited States1d ago
-
Principal AI/ML Researcher / Engineer Reasoning, Planning, and Decision-making systems USD 296K-370KAgent systems | Belief State Tracking | C++ | Decision Making | Distributed Reinforcement LearningSenior-level Full TimeUnited States R1d ago
-
AI Engineer USD 120K-180KAWS Bedrock | AWS SageMaker | Algorithms | Amazon ECS | ClassificationDental insurance | Health insurance | Paid time off | Retirement contributions | Vision insuranceMid-level Full TimeBoston, MA2d ago
-
AI Engineer - ICAM USD 153K-207KAI Services | API Integration | AWS Bedrock | Agile Development | Anomaly DetectionPaid time off | Remote workSenior-level Full TimeUSA VA Falls Church - 3170 … R2d ago
-
API | AWS | Amazon SageMaker | Azure | Azure Machine LearningContract position | Remote workMid-level ContractUnited States - Remote R2d ago