AI Hardware Systems Engineer, Annapurna Labs, Trainium Machine Learning Fleet Operations
Tasks
- Build data infrastructure and analyze trends
- Collaborate with hardware and software teams to deploy fixes
- Debug GPU and server hardware issues
- Develop automation software for fleet operations
- Develop tools for incident response and data visualization
- Implement system level testing across lifecycle
- Manage software deployments and debug deployment issues
- Monitor fleet dashboards and triage emergent issues
- Root cause hardware failures using data
- Run large scale experiments on hardware fleets
Perks/Benefits
- N/A
Skills/Tech-stack
Automation | Bash | Data Analysis | Data Infrastructure | GPU debugging | Machine Learning | Python | Reliability Engineering | Scalability | Server Hardware | Software Deployment | System Testing
Education
N/A
Related jobs
-
Senior Data Engineer - Knowledge Platform USD 160K-260KApache Airflow | Apache NiFi | Batch Processing | BigQuery | Cloud platformEquity compensation | Fully stocked kitchen | Open office space | Team building eventsSenior-level Full TimeUS - San Francisco8h ago
-
Robotics Platform Security Engineer USD 90K-300KAppArmor | Auditd | C# | C++ | CIS BenchmarksHybrid work option | On-site collaboration | Remote work optionSenior-level Full TimeIrvine, CA8h ago
-
Senior Computational Fluid Dynamics (CFD) Engineer USD 168K-205KAerodynamic Database | Aerodynamics | Aeromechanics | Automation | CAMRAD IIBonus | On-site work 5 days a week | Relocation assistance | Travel reimbursementSenior-level Full TimeIrvine, CA9h ago
-
Machine Learning Engineer, Growth USD 130K-500KElasticsearch | Embeddings | Fine Tuning | Go | KafkaEquity grant | Free gym membership | Health insurance | Housing bonus | Meals stipendMid-level Full TimeSan Francisco10h ago
-
Senior Analytics Engineer USD 87K-161KData Lakehouse | Data mesh | Databricks | Delta Lake | ETL401k | Health insurance | Hybrid work | Paid time off | Remote workSenior-level Full TimeRemote-MO, United States R11h ago
-
Abuse Test Engineer, Energy Storage USD 140K-224KActuation | Data Analysis | Data acquisition | Data logging | Electrochemical systemsSenior-level Full TimeMcCarran, NV13h ago
-
Data Warehouse Engineer USD 46K-60KCorepoint | Data Modeling | Data Quality | Data Validation | Data Warehousing401k match | Dental insurance | Discount programs | Employee counseling | FSAEntry-level Full TimeRemote, United States R13h ago
-
Lead Machine Learning Engineer USD 157K-237KA/B | A/B Testing | Airflow | B testing | Data PipelinesSenior-level Full TimeUS TX Austin13h ago
-
Senior Machine Learning Engineer USD 170K-237KA/B | A/B Testing | Apache Airflow | B testing | Deep learningSenior-level Full TimeUS TX Austin13h ago
-
AWS Batch | AWS EC2 | AWS IAM | AWS Lambda | AWS S3Annual bonus | Company paid benefits | Equity | Paid time offSenior-level Full TimeLos Angeles, California13h ago
-
Staff Applied AI Engineer, Enterprise GenAI USD 216K-270KAWS | Cloud platform | Data Analysis | Generative AI | Google CloudCommuter stipend | Equity compensation | Health, dental, vision insurance | Learning and development stipend | Paid time offSenior-level Full TimeSan Francisco, CA; Seattle, WA; New …13h ago
-
Entry-level InternshipHouston, TX14h ago
-
AI/ML Engineer USD 130K-223KAgentic AI | Deep learning | Distributed Training | Docker | EmbeddingsMid-level Full TimeScottsdale, AZ14h ago
-
Principal Engineer, Data & ML Platform USD 119K-180KAPIs | Automated testing | Cloud Native | Cloud platform | Continuous DeploymentSenior-level Full TimeScottsdale, AZ14h ago
-
Principal Machine Learning Engineer USD 245K-393KCloud infrastructure | Data Science | Distributed Systems | Infrastructure as Code | ML pipelinesSenior-level Full TimeChicago, Illinois, USA R14h ago
-
AI Ops Specialist USD 150K-210KAPI Integration | Automation | Evaluation | GitHub API | LLM Agents401k match | Dental insurance | Hardware setup | Health insurance | Unlimited PTOMid-level Full TimeNew York14h ago
-
Sr Sales Engineer, West USD 160K-196KAnalytics | Apache Spark | Artificial Intelligence | Dataiku | Kubernetes401k match | Dental insurance | Employer paid disability coverage | Flexible spending accounts | Medical insuranceSenior-level Full TimeUnited States, Remote R14h ago
-
Data Engineer USD 129K-178KApache Kafka | Apache Spark | Cloud Platforms | Data Compliance | Data GovernanceMid-level Full TimeUS - Remote R14h ago
-
Senior AI Solutions Engineer USD 180K-230KArtificial Intelligence | Confluence | Documentation | Feedback management | Machine Learning401k | Financial wellbeing support | Health care options | Mental health resources | Paid sick leaveSenior-level Full TimeUnited States14h ago
-
Sr. IT Systems/Automation Engineer USD 143K-197KAccess Lifecycle | Access Management | Access lifecycle management | Automation | Automation platformSenior-level Full TimeMountain View, California14h ago
-
Machine Learning Engineer, Foundation Model USD 129K-247KAuto-regressive models | C plus plus | Deep learning | Diffusion Models | Distributed TrainingSenior-level Full TimeSan Jose15h ago
-
Data Analytics Engineer USD 150K-160KAmazon Redshift | BI | DBT | Data Marts | Data ModelingEquity package | Flexible working environment | Learning and development stipend | Paid Maternity Leave | Paid paternity leaveMid-level Full TimeNew York, NY15h ago
-
Data Engineer USD 140K-160KAI workflows | DBT | JavaScript | N8n | Python401k access | Dental insurance | Health insurance | Life insurance | Paid parental leaveMid-level Full TimeNew York, NY15h ago
-
Mid-level Full TimeLondon, New York City15h ago
-
AI Engineer USD 53K-119KAPI Design | Cost Optimization | Embeddings | Evaluation | JSONDental insurance | Gym stipend | Health insurance | Medical membership | Offsite retreatsSenior-level Full TimeRemote, US R15h ago