Software Engineer, ML Systems & Training Architecture
Tasks
- Debug ML training failures
- Diagnose GPU cluster networking issues
- Improve maintainability and usability
- Improve training framework reliability
- Perform code reviews and raise code quality bar
- Review and improve training framework code
- Unblock broken training jobs
Perks/Benefits
Skills/Tech-stack
Code review | Debugging | Distributed Systems | Docker | GPU Computing | Kubernetes | Linux | Machine Learning | Networking | Python
Education
N/A
Regions
Countries
States
Related jobs
-
APIs | Agent systems | Cloud platform | CrewAI | Data PipelinesSenior-level Full TimeSan Francisco, CA, USA; Atlanta, GA, …2h ago
-
C++ | Code review | Compute Technologies | Data Analysis | Data StructuresSenior-level Full TimeSunnyvale, CA, USA2h ago
-
Senior Software Engineer, AI/ML, AI and Infrastructure USD 174K-252KC++ | Data Processing | Data Storage | Data Structures | Data structures algorithmsSenior-level Full TimeMountain View, CA, USA; Kirkland, WA, …2h ago
-
Software Engineer III, AI/ML, Google Workspace USD 147K-211KC++ | Data Processing | Debugging | Distributed Computing | Information RetrievalSenior-level Full TimeKirkland, WA, USA2h ago
-
AI Solution Engineer USD 109K-155KAPIs | AWS | Azure | Embeddings | GCP401k match | Basic life insurance | Dental insurance | Disability coverage | Medical insuranceMid-level Full TimePiscataway, NJ, US5h ago
-
Software Engineer, RL Training Infra USD 295K-445KAgent systems | Async systems | Debugging | Distributed Systems | Hardware ReliabilityMid-level Full TimeSan Francisco11h ago
-
Algorithms | Angular | Bash | CSS | Continuous DeliveryCareer development | Hybrid work | Mentoring | Paid internshipEntry-level InternshipPalo Alto, CA, US, 9430411h ago
-
Entry-level Full TimePalo Alto, CA, US, 9430411h ago
-
AWS | Agile | CI/CD | Code review | Distributed Systems401k match | Commuter benefits | Disability insurance | Electric Car Charging Station | Employee assistance programSenior-level Full TimeSeattle, USA12h ago
-
AWS | Agile | CI/CD | Code review | Data Processing401k match | Commuter benefits | Electric Car Charging Station | Employee assistance program | Flexible spending accountsSenior-level Full TimeSeattle, USA12h ago
-
Junior Software Engineer USD 72K-110KDebugging | Problem Solving | Production Code | Python | Software ArchitectureEntry-level Full TimeUnited States or Canada12h ago
-
Robotics Software Engineer, Verification & Validation USD 191K-253KBare Metal | C++ | CANbus | CI/CD | Embedded LinuxEquity grants | Health benefits | Travel opportunityMid-level Full TimeCosta Mesa, California, United States13h ago
-
API Design | Bare Metal | C++ | Embedded Linux | GRPCHealth benefits | Recovery Benefits | Security clearance sponsorship | Travel opportunitiesSenior-level Full TimeCosta Mesa, California, United States13h ago
-
Agent systems | Automated benchmarking | Chain-of-Thought | DPO | Dataset curationMid-level Full TimePalo Alto, California, USA13h ago
-
Staff Advanced Concepts Optimization Engineer USD 130K-240KAdjoint methods | Automatic Differentiation | CasADi | Cloud Computing | Computational Fluid DynamicsSenior-level Full TimeSan Jose, California, United States14h ago
-
Principal Software Engineer USD 142K-304K.NET | Active Directory | Amazon S3 | Authentication and Authorization | AzureSenior-level Full TimeRedmond, WA, US14h ago
-
Senior Machine Learning Engineer USD 198K-287KData Engineering | Fine Tuning | Foundation Models | GenAI | Incident ResponseOn-call rotationSenior-level Full TimeRemote - US R14h ago
-
Agentic AI Engineer USD 176K-265KAccess Control | Anthropic | Arize | Audit Logging | CI/CDHybrid work arrangement | On-site collaborationSenior-level Full TimeSan Francisco, CA14h ago
-
Forward-Deployed Engineer, Data Centers USD 140K-165KAPI Integration | Agent workflows | Azure | CI/CD | Deployment401k match | Dental insurance | Device purchase support | Disability insurance | Flexible spending accountsMid-level Full TimeCalifornia15h ago
-
Robotics Software Engineer, Behaviors USD 146K-194KArduPilot | Autonomy | Behavior Trees | C++ | Computer VisionMid-level Full TimeCosta Mesa, California, United States15h ago
-
Robotics Software Engineer USD 134K-201K3D Rendering | C++ | Device Software | Forward Kinematics | GUI DevelopmentSenior-level Full TimeSunnyvale, CA, United States15h ago
-
Software Engineer, Robot Interfaces USD 140K-200KAI Planning | Audio signal processing | Cloud Computing | Computer Vision | Deployment AutomationMid-level Full TimeRedwood City, CA15h ago
-
Research Scientist, Open Ecosystem USD 167K-260KDeep learning | Efficient algorithms | Experimental Methodology | Generative AI | Language ModelsFamily leave | Paid vacation | Sick leave | Work-life balanceMid-level Full TimeSeattle, WA15h ago
-
Sr. Solutions Engineer - Oil, Gas, Energy USD 152K-209KAWS | Account Management | Artificial Intelligence | Azure | Big DataSenior-level Full TimeHouston, Texas15h ago
-
Senior-level Full TimeRemote, US R15h ago