AI Hardware Systems Engineer, Annapurna Labs, Trainium Machine Learning Fleet Operations
Tasks
- Build data infrastructure and analyze trends
- Collaborate with hardware and software teams to deploy fixes
- Debug GPU and server hardware issues
- Develop automation software for fleet operations
- Develop tools for incident response and data visualization
- Implement system level testing across lifecycle
- Manage software deployments and debug deployment issues
- Monitor fleet dashboards and triage emergent issues
- Root cause hardware failures using data
- Run large scale experiments on hardware fleets
Perks/Benefits
- N/A
Skills/Tech-stack
Automation | Bash | Data Analysis | Data Infrastructure | GPU debugging | Machine Learning | Python | Reliability Engineering | Scalability | Server Hardware | Software Deployment | System Testing
Education
N/A
Related jobs
-
Senior-level Full TimeMorristown, NJ, United States6h ago
-
Manager, Data Engineering USD 130K-166KAWS | Access Controls | Apache Airflow | Audit Logging | AzureCollaborative team culture | Remote work | Work-life balanceSenior-level Full TimeRemote, United States R7h ago
-
Communication optimization | Data parallelism | Deep learning | Distributed Training | GPU AccelerationMid-level Full TimeSeattle, Washington, United States11h ago
-
Robotics/UAS Engineer II/III USD 89K-176KArduPilot | Autopilot | C++ | CAD | Embedded SystemsCareer Development Training Opportunities | Education assistance | Fitness reimbursement | Flexible work schedules | Paid Time Off N/ASenior-level Full TimeUS-Massachusetts-Wilmington11h ago
-
Product Engineering, Full Sack Engineer USD 85K-142K.NET | .Net Core | AWS | Agile | AngularMentorship | Professional development | Travel opportunitiesNone Full TimeNew York, New York, United States11h ago
-
Content Safety | Data Modeling | Feature Engineering | Graph Databases | Information RetrievalSenior-level Full TimeSan Jose, California, United States11h ago
-
Partner Engineering GenAI - US USD 133K-203KAPIs | Artificial Intelligence | C plus plus | Claude | Cloud ComputingSenior-level Full TimeMenlo Park, CA | Seattle, WA …12h ago
-
Machine Learning Performance Modeling Architect USD 173K-249KC# | C++ | Data Visualization | Heterogeneous computing | Image qualitySenior-level Full TimeSunnyvale, CA12h ago
-
Mid-level Full TimeSunnyvale, CA | Burlingame, CA12h ago
-
Robotics Engineer - Logistics and Material Flow USD 170K-240KAGV | Automation | Branching | C++ | Computer ScienceSenior-level Full TimeFremont, CA12h ago
-
Software Developer, Scaled Ops AI Acceleration Team USD 147K-203KAI infrastructure | Data Mining | Fine Tuning | Hack | JavaScriptSenior-level Full TimeSunnyvale, CA | Austin, TX | …12h ago
-
Automated testing | C++ | CSS | Debugging | GraphQLSenior-level Full TimeMenlo Park, CA12h ago
-
Robotics Control Engineer - Manipulation USD 170K-240KABB Rapid | AI Motion Planning | Adaptive Control | C++ | Cause analysisSenior-level Full TimeMenlo Park, CA | Fremont, CA12h ago
-
Robotics Manipulation Engineer USD 170K-240KAdaptive Control | Automation | C++ | Deep learning | GazeboSenior-level Full TimeFremont, CA12h ago
-
Software Engineer - Language (Technical Leadership) USD 213K-293KASR | Benchmarking | C# | C++ | Conversational AISenior-level Full TimeMenlo Park, CA | Seattle, WA …12h ago
-
Code review | Contamination Checking | Data Generation | Data Pipelines | Data ProcessingEntry-level Full TimeMenlo Park, CA12h ago
-
Business Support Engineer USD 136K-197KCall Support | Cloud Computing | Data Analysis | Data Mining | Docker24x7 on-call rotationEntry-level Full TimeMenlo Park, CA12h ago
-
Business Support Engineer USD 159K-223KCloud Computing | Data Analysis | Data Mining | Distributed Systems | Docker24x7 on-call rotation | Cross-functional team collaboration | Global partner supportSenior-level Full TimeMenlo Park, CA12h ago
-
Senior-level Full TimeMenlo Park, CA | New York, …12h ago
-
Research Engineer, Media Data Research - MSL FAIR USD 170K-251KComputer Vision | Data Curation | Data Generation | Data Scaling Laws | Data mixingSenior-level Full TimeMenlo Park, CA12h ago
-
Staff Software Engineer, Torch TPU USD 207K-300KCUDA | Computer Vision | Data Processing | Debugging | Distributed SystemsSenior-level Full TimeSunnyvale, CA, USA12h ago
-
C++ | Compilers | Custom Kernels | Data Processing | Data StructuresSenior-level Full TimeMountain View, CA, USA12h ago
-
Technical Solutions Engineer, Cloud AI, Google Cloud USD 150K-218KAI Model Training | AI model | Apache Beam | Apache Hadoop | Apache SparkSenior-level Full TimeSunnyvale, CA, USA; Austin, TX, USA12h ago
-
Artificial Intelligence | Machine Learning | Marketing | Product Development | Product ManagementCoaching | Community access | Relocation support | Startup hiring support | Weekly founder sparringMid-level ContractAustin, United States R15h ago
-
Principal AI Engineer - Core Platform USD 250K-290KAWS | Agents SDK | Anomaly Detection | Automated testing | Classification401k match | Company-provided phone | Health insurance | Hybrid work | PTOSenior-level Full TimeNew York, New York, United States19h ago