AI/HPC System Performance Engineer
Tasks
- Assess trade-offs and make pragmatic decisions
- Benchmark, monitor, and troubleshoot communication performance
- Define technical strategy and multi-year roadmap
- Develop solutions for large scale training systems
- Ensure performance and availability of network systems
- Guide AI network architecture and topologies
- Lead multidisciplinary teams
Perks/Benefits
- N/A
Skills/Tech-stack
AI Training | AI Training Workloads | C++ | Communication libraries | Congestion Control | Distributed applications | Host Networking | Host Networking Protocols | Message Passing Interface | Message passing | NCCL | Network Infrastructure | Network Performance | Network Performance Tuning | Networking protocols | Performance Benchmarking | Performance Tuning | PyTorch | RDMA | TensorFlow | Training workloads | Troubleshooting | UCX
Education
Regions
Countries
States
Cities
Related jobs
-
Senior-level Full TimeBelmont, CA, US, 9400210h ago
-
AI Agents | AI Search | AWS | AWS Bedrock | Agentic Workflows401k | Dental insurance | Medical insurance | Paid sick hours | Vision insuranceSenior-level Contract Full TimeRidgefield Park, NJ, United States10h ago
-
Staff Machine Learning Engineer USD 141K-210KCI/CD | Computer Vision | Deep learning | Distributed Training | DockerSenior-level Full TimeSunnyvale, CA, United States11h ago
-
Senior AI Systems Engineer USD 122K-188KAlerting | Bash | CI/CD | CMMC | Cause analysisFully remote option | Hybrid option | Onsite optionSenior-level Full TimeRaleigh, North Carolina, United States; Albuquerque, … R12h ago
-
Senior Data Engineer USD 150K-220KClustering | Common table expressions | DBT | Dagster | Data LineageIn-office work 4 days per week | On-call rotationSenior-level Full TimeNew York, NY13h ago
-
Senior-level Full TimeNew York, NY13h ago
-
Senior AI Engineer USD 144K-210KAPI Development | Azure | Azure Machine Learning | Azure OpenAI | Cloud Computing401k retirement plan | Dental insurance | Education Course Access | Employee eyewear discount | Health insuranceSenior-level Full TimeDuluth, GA, United States13h ago
-
Staff Engineer, Storage Control Plane USD 165K-242KC++ | Ceph | ClickHouse | DAOS | Dashboards401k match | Employee stock purchase program | Flexible PTO | Flexible spending account | Health savings accountSenior-level Full TimeLivingston, NJ / New York, NY …13h ago
-
Sr. Data Engineer I (Splunk) (6382) USD 170K-239KAgile | Data Normalization | Data Parsing | Forwarder | High Availability401k match | Dependent care | Employee Assistance and Wellness Programs | Flexible work arrangements | Health, dental, vision insuranceSenior-level Full TimeWashington, DC14h ago
-
Digital Health And Behavior Data Scientist USD 115K-130KAWS | Azure | Computational simulation | Data Quality | Data Reconciliation401k match | Dependent Care Assistant Program | Educational benefits | Flexible spending account | Medical, dental & vision coverageMid-level Full TimeBaltimore, MD14h ago
-
Generative AI Integration Engineer USD 191K-253KAgent Tool Calling | Autonomous Vehicles | C# | C++ | CameraCommuter benefits | Dental coverage | Disability insurance | Healthcare benefits | Life insuranceMid-level Full TimeCosta Mesa, California, United States14h ago
-
Senior Robotic Systems Integration Engineer (SLA & SLS) USD 102K-130KC++ | Control Systems | Data Analysis | Embedded firmware | Hardware debugging401k matching | Dental coverage | Flexible out of office plan | Healthcare coverage | Onsite Meals Snacks BeveragesSenior-level Full TimeBoston, MA14h ago
-
Senior Robotic Systems Integration Engineer (SLA & SLS) USD 102K-130KC++ | Control Algorithms | Embedded firmware | Hardware debugging | LabVIEW401k matching | Dental insurance | Equity program | Flexible out of office plan | Healthcare coverageSenior-level Full TimeSomerville, MA14h ago
-
Robotic Systems Integration Engineer (SLA & SLS) USD 102K-130KC++ | Control Algorithms | Electro-mechanical | Electro-mechanical Design | Embedded Systems401k matching | Dental insurance | Flexible time off | Health insurance | On-site meals and snacksMid-level Full TimeBoston, MA14h ago
-
Robotic Systems Integration Engineer (SLA & SLS) USD 102K-130KC++ | Control Algorithms | Data Analysis | Data reporting | Electro-mechanical401k matching | Dental coverage | Flexible time off | Healthcare coverage | Paid parental leaveMid-level Full TimeSomerville, MA14h ago
-
AI/ML Physical Design Flow Engineer USD 100K-500KAssembly | CAD Tools | Custom CAD Tools | Data Analysis | Data-Driven ModelingEqual opportunity employer | Hybrid workSenior-level Full TimeAustin, Texas, United States; Fort Collins, …15h ago
-
Staff Data Engineer USD 146K-233KAWS | Airflow | Apache Flink | Apache Spark | CI/CD401k | Paid time off | Wellness programSenior-level Full TimeFoster City, CA, United States16h ago
-
Senior-level Full TimeFoster City, CA, United States17h ago
-
C++ | Computer Vision | Data Augmentation | Data pipeline | Distributed Training401k | Dental insurance | FSA | Health insurance | Paid time offSenior-level Full TimeMountain View, California, United States17h ago
-
Senior ML Engineer USD 180K-200KBlack-box | Black-box optimization | C plus plus | CUDA C | CUDA C plus plusDental insurance | Health insurance | Offsites | Paid parental leave | Regular team eventsSenior-level Full TimeRemote - US R18h ago
-
Robot Perception Engineer - Smart Robotics USD 112K-202K3D Geometry | Active Learning | C++ | CUDA | Camera systemsMid-level Full TimeSan Francisco, California19h ago
-
Active Learning | Bayesian Inference | Bayesian optimization | Cheminformatics | Machine Learning401k | Dental insurance | Medical insurance | Paid sick leave | Vision insuranceMid-level ContractSouth San Francisco, United States21h ago
-
Senior Embedded C++/Rust Software Engineer (Robotics) USD 105K-165K802.11 | Agile | BSP | C++ | CI/CDSenior-level Full TimeWestborough, MA21h ago
-
Staff Engineer – AI/ML & Digital Twin USD 112K-168KAWS | Autoencoders | Azure | Cloud Computing | Convolutional Neural NetworksSenior-level Full TimePennsylvania, Canonsburg22h ago
-
Research Engineer, MRS AI USD 117K-173KAgent Orchestration | Bias Mitigation | Continual Learning | Data Augmentation | DebiasingEntry-level Full TimeBellevue, WA | Menlo Park, CA …23h ago