Software Engineer, SystemML - AI Networking
Tasks
- Analyze NCCL performance on RoCE and Infiniband
- Build distributed ML performance tuners and benchmarks
- Develop AI framework and trainer for large scale distributed deep learning models
- Enable multi GPU multi node distributed training
- Improve distributed ML reliability and performance for GenAI and LLM
- Tech lead NCCL collective communication library development
Perks/Benefits
- N/A
Skills/Tech-stack
C# | C++ | CUDA | Data-parallel | Distributed Data Parallel | Distributed data | Fully Sharded Data Parallel | High Performance | High-Performance Computing | Infiniband | NCCL | Parallel Computing | Performance Computing | Pipeline Parallel | PyTorch | Python | RoCE | Tensor Parallel
Education
Roles
Regions
Countries
States
Cities
Related jobs
-
Data Engineer USD 128K-214KAWS Glue | AWS GovCloud | Apache Airflow | Apache Iceberg | Apache KafkaSenior-level Full TimeUSA-FL-Doral1h ago
-
Software Engineer, SystemML - Scaling / Performance USD 152K-287KCUDA | Data-parallel | Distributed Data Parallel | Distributed data | Fully Sharded Data ParallelEntry-level Full TimeMenlo Park, CA3h ago
-
Research Software Engineer, Multimodal AI USD 174K-253KC plus plus | Computer Vision | Deep learning | Distributed Computing | Few-Shot LearningMid-level Full TimeSan Jose, CA, USA3h ago
-
Forward Deployed Engineer V, Generative AI, Google Cloud USD 262K-365KAPI Integration | Agent systems | Cloud platform | CrewAI | Data PipelinesSenior-level Full TimeSan Francisco, CA, USA; Atlanta, GA, …3h ago
-
Automation | C# | C++ | CSS | Database DesignMid-level Full TimeAnn Arbor, MI, USA3h ago
-
Staff Software Engineer, AI and Infrastructure USD 207K-301KC++ | Cluster management | Compute Technologies | Distributed Systems | Hardware ArchitectureSenior-level Full TimeSunnyvale, CA, USA3h ago
-
Principal Consultant, AI/ML, Mandiant, Google Cloud USD 168K-244KAI Evaluation | Agent systems | Cloud APIs | Cybersecurity | Data leakageSenior-level Full TimeReston, VA, USA; United States3h ago
-
Senior Product Engineer, Machine Learning Accelerators USD 144K-209KBash | Board assembly | Cause analysis | Circuit board assembly | Continuous ProcessSenior-level Full TimeSunnyvale, CA, USA3h ago
-
Senior Software Engineer, Eye Tracking USD 174K-253KAlgorithms | Android | Android Application Development | Android application | Application developmentSenior-level Full TimeSan Jose, CA, USA3h ago
-
Senior Software Engineer, AI/ML, Geo and Gemini App USD 174K-253KA/B | A/B Testing | B testing | C++ | Data AnalysisSenior-level Full TimeNew York, NY, USA3h ago
-
Software Engineer III, Cloud Bigtable SQL and Analytics USD 147K-211KApache Beam | Apache Spark | C++ | Cassandra | ConcurrencyEquity | Health insurance | On-call rotation | Paid time offSenior-level Full TimeNew York, NY, USA3h ago
-
Embedded Software Test Engineer USD 105K-158K5S | Agile | C# | CI/CD | Case Development401k | Disability insurance | Employee stock purchase plan | Fully remote | Health insuranceMid-level Full TimeHarrisburg, PA, US, 17111 R8h ago
-
Senior-level Full TimeSan Jose, California, United States10h ago
-
Machine Learning Infrastructure Engineer USD 140K-165KAWS | Alerting | Batching | Caching | Cost OptimizationSenior-level Full TimeSan Jose, California, United States10h ago
-
Senior Machine Learning Engineer, Ad Serving USD 195K-408KData Science | Experimentation | Feature Engineering | Java | Machine LearningDisability benefits | Health insurance | Life insurance | Paid time off | Parental leaveSenior-level Full TimeNew York, New York10h ago
-
Senior Machine Learning Engineer, Ad Serving USD 195K-352KContinuous Improvement | Distributed Systems | Experimentation | Feature Engineering | JavaDisability benefits | Equity awards | Health insurance | Life insurance | Paid time offSenior-level Full TimeBoston, Massachusetts10h ago
-
Principal Software Engineer, Perception Pretraining USD 349K-431KC++ | Compute Optimization | Computer Vision | End to End | End-to-end modelingCompany benefits | Discretionary annual bonus | Equity incentive planSenior-level Full TimeMountain View, CA, USA; San Francisco, …11h ago
-
Senior Software Engineer, Quantitative Evaluations USD 204K-259KA/B | A/B Testing | B testing | C++ | Data PipelinesAnnual bonus program | Company benefits | Equity incentive plan | Hybrid work scheduleSenior-level Full TimeMountain View, CA, USA; San Francisco, …11h ago
-
Technical Lead Manager (TLM), ML Simulation USD 238K-302KAnomaly Detection | C++ | Data Processing | Deep learning | Hugging FaceBonus program | Company benefits program | Equity incentive plan | Hybrid work scheduleSenior-level Full TimeNew York, NY, USA; Mountain View, …11h ago
-
Staff Data Engineer USD 160K-207KAnomaly Detection | Business Intelligence | DBT | Data Architecture | Data Engineering401k match | Dental insurance | Family planning resources | Flexible vacation days | Learning and development programSenior-level Full TimeRemote - USA R11h ago
-
API Integration | API Security | Authentication | CI/CD | Cloud NetworkingAnnual performance bonus | Career growth | Fully remote workSenior-level Full TimeCanada R13h ago
-
Cloud Engineer - Data USD 100K-140KAnalysis Services | Apache Spark | Azure Blob | Azure Blob Storage | Azure Cost Management401k savings plan | Education reimbursement | Gym access | Medical, dental, life insurance | PTOMid-level Full TimeNational Harbor, Maryland14h ago
-
Associate Consultant, Generative AI USD 80K-110KAWS | Azure | CI/CD | Chroma | Cloud platformHealth savings account | Paid parental leave | Paid time offMid-level Full TimeNew York, NY, United States16h ago
-
Marketing Analytics Engineer USD 142K-196KA/B | A/B Testing | Amazon Redshift | Attribution | B testingEntry-level Full TimeLehi, UT R16h ago
-
AMD UltraScale Plus | AMD UltraScale plus MPSoC | ARM | Authentication | BSPBackground check clearance | Drug Test Clearance | Hybrid schedule | Onsite days three per weekSenior-level Full TimeLos Angeles R16h ago