AI/HPC System Performance Engineer
Menlo Park, CA
USD 163K-225K (estimate) Senior-level Full Time Found 1d ago
Tasks
- Assess trade-offs and make pragmatic decisions
- Benchmark, monitor, and troubleshoot communication performance
- Define technical strategy and multi-year roadmap
- Develop solutions for large scale training systems
- Ensure performance and availability of network systems
- Guide AI network architecture and topologies
- Lead multidisciplinary teams
Perks/Benefits
- N/A
Skills/Tech-stack
AI Training | AI Training Workloads | C++ | Communication libraries | Congestion Control | Distributed applications | Host Networking | Host Networking Protocols | Message Passing Interface | Message passing | NCCL | Network Infrastructure | Network Performance | Network Performance Tuning | Networking protocols | Performance Benchmarking | Performance Tuning | PyTorch | RDMA | TensorFlow | Training workloads | Troubleshooting | UCX
Education
Regions
Countries
States
Cities
Language: en |
Views: 0 |
Clicks: 0
Related jobs
-
AI/ML Engineer (TS/SCI Poly) USD 107K-179KDashboard Development | Data Pipelines | Data Visualization | ELT workflows | ETL/ELTBroad benefits | Inclusive culture | Professional developmentMid-level Full TimeArlington/Rosslyn, Virginia, United States1d ago
-
Software Engineer, Machine Learning USD 219K-240KAlgorithms | Availability | C++ | Code editors | ConsistencyMid-level Full TimeNew York, NY1d ago
-
Software Engineer, AI Native USD 173K-247KAI Automation | AI Safety | AI orchestration | AI/ML | AI/ML techniquesSenior-level Full TimeMenlo Park, CA1d ago
-
Senior-level Full TimeMenlo Park, CA | Seattle, WA …1d ago
-
Code review | Data Filtering | Data Generation | Data Pipelines | Distributed SystemsSenior-level Full TimeMenlo Park, CA1d ago
-
Application development | C++ | Data Analysis | Distributed Computing | Large Software SystemsBenefits | Bonus | EquityMid-level Full TimeSunnyvale, CA, USA1d ago
-
AI Agents | Algorithms | Automation | C++ | Data StructuresBenefits | Bonus | EquitySenior-level Full TimeNew York, NY, USA1d ago
-
Senior Software Engineer, AI/ML GenAI, Google Workspace USD 166K-244KC++ | Computer Vision | Data Processing | Debugging | Distributed ComputingBenefits | Bonus | EquitySenior-level Full TimeNew York, NY, USA1d ago
-
A/B | A/B Testing | Airflow | Algorithms | B testingBenefits | Bonus | Employee travel credits | Equity | Inclusive cultureSenior-level Full TimeRemote-USA R1d ago
-
Senior Principal AI Engineer USD 140K-210KAWS | Azure | Collaboration | Communication | Data PreprocessingSenior-level Full TimeChantilly/Herndon, VA1d ago
-
Data Engineer USD 110K-149KAPIs | AWS | Agile methodologies | Azure | CI/CDComprehensive benefits | Supportive cultureSenior-level Full TimeFort Meade, MD1d ago
-
Senior Engineer, Datacenter Server Lifecycle USD 320K-405KAWS | Asset tracking | Failure analysis | Firmware upgrades | Fleet ManagementFlexible hours | Generous vacation | Office collaboration space | Parental leaveSenior-level Full TimeSan Francisco, CA | Seattle, WA1d ago
-
Technical Consultant- Enterprise Data Engineer USD 82K-138KArcGIS | Backup and Recovery | Data Management | Data loading | Database Design401k | Dental | Health benefits | Life insurance | Paid HolidaysMid-level Full TimeVienna, Virginia, United States2d ago
-
System Engineer- Enterprise Data Engineer USD 117K-197KAutomation | Cloud Database | Cloud database solutions | Coordinate systems | Data ArchitectureDental insurance | Health benefits | Life insurance | Paid Holidays | Paid leaveSenior-level Full TimeVienna, Virginia, United States2d ago
-
System Engineer- Enterprise Data Engineer USD 117K-197KAWS RDS | ArcGIS Enterprise | Automation | Azure SQL | Backup and Recovery401k | Dental | Health and welfare benefits | Life insurance | MedicalSenior-level Full TimeSt. Louis, MO - Globe2d ago
-
ABAC | Anomaly Detection | Audit Logging | Cloud Orchestration | Data ModelingBroad benefits | Impactful work | Inclusive culture | Professional developmentSenior-level Full TimeCincinnati, Ohio, United States R2d ago
-
Data Collaboration | Data Documentation | Data Infrastructure | Data Infrastructure Management | Data ModelingMid-level Full TimeSan Jose, California, United States2d ago
-
Agile Development | Apache Beam | Data Analysis | Data Science | JAXBenefits | Bonus | EquityMid-level Full TimeMountain View, CA, USA; San Francisco, …2d ago
-
Senior Software Engineer, Core, Marketing Engineering USD 166K-244KAI Agents | C++ | Data Processing | Distributed Systems | Full StackBenefits | Bonus | EquitySenior-level Full TimeAustin, TX, USA2d ago
-
Accessible Technologies | Algorithms | C# | C++ | Code HealthBenefits | Bonus | EquitySenior-level Full TimeSunnyvale, CA, USA; Kirkland, WA, USA2d ago
-
C++ | Continuous Deployment | Debugging | Distributed Computing | Integration TestingBenefits | Bonus | EquityMid-level Full TimeSunnyvale, CA, USA2d ago
-
Software Engineer, ML Fabric Deployment Acceleration USD 141K-202KAutomation | C++ | Distributed Computing | Integration & Test | Integration test frameworksBenefits | Bonus | EquityMid-level Full TimeNew York, NY, USA2d ago
-
C++ | Concurrency Control | Data Storage | Data Structures | Database InternalsBenefits | Bonus | EquityEntry-level Full TimeSunnyvale, CA, USA2d ago
-
Artificial Intelligence Engineer USD 98K-165KClinical Workflows | Cloud Platforms | Docker | Electronic Health Records | Health RecordsEntry-level Full TimeHouston, TX, US2d ago
-
Artificial Intelligence Engineer USD 98K-165KCloud Platforms | Data Analysis | Data Pipelines | Data Processing | DockerEntry-level Full TimeHouston, TX, US2d ago