Lead Infrastructure and Reliability Engineer (Systems & Scale)
Palo Alto, CA
USD 200K-250K (estimate) Senior-level Full Time Found 4d ago
Tasks
- Architect GPU environments
- Build scalable reliability mechanisms
- Design scheduling and resource management
- Hire and develop engineering team
- Improve utilization and performance
- Resolve hardware and software failures
- Scale training and inference infrastructure
- Shape architecture through research and product collaboration
- Translate reliability constraints into platform strategy
Perks/Benefits
- N/A
Skills/Tech-stack
Automation | Code Development | Debugging hardware | Debugging hardware to orchestration | Distributed Systems | GPU clusters | Kubernetes | Linux | Open Source | Open source infrastructure | Systems behavior | Systems behavior under contention and scale
Education
N/A
Regions
Countries
States
Cities
Language: en |
Views: 0 |
Clicks: 0
Related jobs
-
Senior Software Engineer USD 119K-267KAWS | Agile Development | Algorithms | Analytics | Data ProcessingSenior-level ContractAnnapolis, MD, US13h ago
-
Sr. Data Engineer USD 108K-158KAWS | Apache Spark | Azure | CI/CD | Data Modeling401k | Dental | Disability | Employee discounts | Health insuranceSenior-level Full TimeNew York-TONAWANDA16h ago
-
Software Engineer, Machine Learning USD 219K-240KAlgorithms | Availability | C++ | Code editors | ConsistencyMid-level Full TimeNew York, NY18h ago
-
Senior-level Full TimeMenlo Park, CA | Seattle, WA …18h ago
-
Code review | Data Filtering | Data Generation | Data Pipelines | Distributed SystemsSenior-level Full TimeMenlo Park, CA18h ago
-
AI Agents | Algorithms | Automation | C++ | Data StructuresBenefits | Bonus | EquitySenior-level Full TimeNew York, NY, USA18h ago
-
Software Engineer, Infrastructure Security USD 255K-325KApplication Security | Automation | Cloud infrastructure | Communication skills | IAMMid-level Full TimeSan Francisco23h ago
-
Lead Machine Learning Engineer USD 190K-260KAI Coding Assistants | AI coding | AI tools | Apache Spark | Cloud NativeEvents and activities | Flexible PTO | Healthcare coverage | Inclusive environment | Ownership via equitySenior-level Full TimeSeattle, WA1d ago
-
Member of Technical Staff, Inference & RL Systems USD 225K-550KDistributed Systems | GPU | Inference Serving | Memory Management | Model execution401k with matching | Equity | Health insurance | Relocation stipend | Unlimited paid time offSenior-level Full TimeSan Francisco1d ago
-
Software Engineer USD 200K-550KAPI Design | Backend Development | Data Pipelines | Distributed Systems | Frontend workflows401k matching | Equity | Health, dental, vision insurance | Relocation stipend | Unlimited paid time offMid-level Full TimeSan Francisco1d ago
-
API Integration | Customer Engagement | Debugging | Distributed Systems | Event DrivenBonuses | Catered lunch | Equity | Impact from day one | Ownership and autonomySenior-level Full TimeSan Francisco or New York City R1d ago
-
Data Solution Engineer USD 88K-138KAWS | Apache Kafka | Apache Spark | Automation | AzureNot specifiedSenior-level Full TimeJersey City, NJ, United States1d ago
-
Senior-level Full TimeOakland, CA, United States1d ago
-
A/B | A/B Testing | Airflow | Algorithms | B testingBenefits | Bonus | Employee travel credits | Equity | Inclusive cultureSenior-level Full TimeRemote-USA R1d ago
-
AI Operations Engineer USD 85K-128KAI infrastructure | Automation | Azure | CI/CD | Cloud PlatformsAdoption Assistance | Dental plan | Gym discount | Insurance | Medical plansMid-level Full TimeMinneapolis, MN, United States1d ago
-
Senior Engineer, Datacenter Server Lifecycle USD 320K-405KAWS | Asset tracking | Failure analysis | Firmware upgrades | Fleet ManagementFlexible hours | Generous vacation | Office collaboration space | Parental leaveSenior-level Full TimeSan Francisco, CA | Seattle, WA1d ago
-
Software Engineer, Compute (8+ YOE) USD 196K-339KAWS | ArgoCD | CI/CD | CRDs | CloudFormationBenefits | Incentive compensation | Stock optionsSenior-level Full TimeSan Francisco, CA; New York, NY; … R1d ago
-
Data Analyst/BI Engineer USD 85K-110KAPI Integration | Apps Script | Automation | Azure | BigQuery401k | Company holidays | Dental | Healthcare | MatchingSenior-level Full TimeUnited States R1d ago
-
System Engineer- Enterprise Data Engineer USD 117K-197KAutomation | Cloud Database | Cloud database solutions | Coordinate systems | Data ArchitectureDental insurance | Health benefits | Life insurance | Paid Holidays | Paid leaveSenior-level Full TimeVienna, Virginia, United States1d ago
-
System Engineer- Enterprise Data Engineer USD 117K-197KAWS RDS | ArcGIS Enterprise | Automation | Azure SQL | Backup and Recovery401k | Dental | Health and welfare benefits | Life insurance | MedicalSenior-level Full TimeSt. Louis, MO - Globe1d ago
-
Analyst, Lead Data Engineering USD 140K-165KAWS | Airbyte | Apache Airflow | Apache Iceberg | Cloud infrastructureSenior-level Full TimeUSA-Texas-Houston1d ago
-
Senior Software Engineer, Core, Marketing Engineering USD 166K-244KAI Agents | C++ | Data Processing | Distributed Systems | Full StackBenefits | Bonus | EquitySenior-level Full TimeAustin, TX, USA1d ago
-
Accessible Technologies | Algorithms | C# | C++ | Code HealthBenefits | Bonus | EquitySenior-level Full TimeSunnyvale, CA, USA; Kirkland, WA, USA1d ago
-
Software Engineer, ML Fabric Deployment Acceleration USD 141K-202KAutomation | C++ | Distributed Computing | Integration & Test | Integration test frameworksBenefits | Bonus | EquityMid-level Full TimeNew York, NY, USA1d ago
-
C++ | Concurrency Control | Data Storage | Data Structures | Database InternalsBenefits | Bonus | EquityEntry-level Full TimeSunnyvale, CA, USA1d ago