Member of Technical Staff, Inference & RL Systems
Tasks
- Automate fault detection and recovery
- Build and maintain distributed RL and post-training infrastructure
- Collaborate with research teams on execution systems
- Design and scale inference serving systems
- Improve reliability of rollout, evaluation, and reward pipelines
- Improve throughput and latency for long-context workloads
- Optimize KV-cache management and batching
- Profile and eliminate performance bottlenecks
Perks/Benefits
- 401k with matching
- Equity
- Health insurance
- Relocation stipend
- Unlimited paid time off
- Visa sponsorship
Skills/Tech-stack
Distributed Systems | GPU | Inference Serving | Memory Management | Model execution | Performance Profiling | Performance debugging | RL infrastructure | Scaling Systems | System Optimization
Education
N/A
Regions
Countries
States
Related jobs
-
Senior Software Engineer, Applied AI, Commerce AI USD 174K-252KAPI Design | Application Programming | Application Programming Interfaces | Data Processing | DebuggingSenior-level Full TimeSunnyvale, CA, USA; Kirkland, WA, USA3h ago
-
AI | Adversarial Machine Learning | Data Integrity | Data Quality | Data loggingMid-level Full TimeMountain View, CA, USA3h ago
-
Staff Software Engineer, Cluster Management USD 207K-300KC++ | Compute Technologies | Data Structures | Data Structures and Algorithms | Distributed SystemsSenior-level Full TimeSunnyvale, CA, USA3h ago
-
Software Engineer, Infrastructure - Analytics Platform USD 230K-385KAsynchronous programming | Backpressure | C++ | Concurrency | ConsistencyHybrid work model | On Call Pay N/A | Relocation assistanceSenior-level Full TimeSan Francisco13h ago
-
Mid-level Full TimeRedmond, WA, US16h ago
-
Principal AI Engineer USD 115K-160KAPI Design | Agentic Systems | Artificial Intelligence | Backend Development | Data PipelinesBusiness travel insurance | Dental insurance | Disability insurance | Employee assistance program | Employee stock purchase planSenior-level Full TimeDallas, TX, United States18h ago
-
Technical Director for AI Functions USD 304K-437KAI | Distributed Systems | LLM | Language Models | Language ProcessingExecutive-level Full TimeUS-CA-Menlo Park19h ago
-
Senior Computer Vision Engineer USD 123K-165KAgile | CI/CD | ClearML | Computer Vision | Deep learning401k contributions | Dental insurance | Life insurance | Medical insurance | Paid HolidaysSenior-level Full TimeCarlsbad, California, United States19h ago
-
Software Engineer - Dragonfly Portfolio USD 160K-215KCryptography | Distributed Systems | Event Ingestion | Onchain Event Ingestion | Performance optimizationOnsite work locationMid-level Full TimeSan Francisco22h ago
-
Senior Machine Learning Engineer, Match Team USD 160K-210KAWS | Anomaly Detection | Distributed Systems | Elasticsearch | EmbeddingsEquity options | Health insurance | Paid time off | Professional development | Retirement benefitsSenior-level Full TimeNew York, NY, San Francisco, CA …1d ago
-
Software Engineer, Machine Learning USD 185K-200KClassification | Computer Vision | Data Mining | Data Regression | Deep learningMid-level Full TimeMenlo Park, CA1d ago
-
Robotics Manipulation Engineer USD 157K-240KAdaptive Control | C plus plus | Control Systems | Deep learning | GPUSenior-level Full TimeFremont, CA1d ago
-
Software Engineer III, Infrastructure, GDC AI Storage USD 147K-211KCSI | Data Structures | Data Structures and Algorithms | Distributed Systems | GoSenior-level Full TimeKirkland, WA, USA1d ago
-
Mid-level Full TimeHerndon, VA, United States1d ago
-
Software Engineer II - Abnormal Data Platform USD 149K-214KAerospike | Amazon DynamoDB | Apache Spark | Data Storage | DatabricksDistributed team collaboration | Remote work | Technical mentorshipMid-level Full TimeRemote - USA R1d ago
-
Principal Software Engineer - Robotics & Drones USD 170K-200KAPIs | Accelerators | CPU | Camera Signal Processing | Cloud DataSenior-level Full TimeBoston, MA - USA, United States1d ago
-
Staff Machine Learning Engineer USD 206K-230KAmazon SageMaker | Amazon Web Services | Apache Spark | Databricks | Distributed Systems401k match | Dental insurance | Employee stock purchase program | Flexible time off | Lifestyle spending accountSenior-level Full TimeHybrid - Denver, United States R1d ago
-
AWS | Airflow | Azure | Distributed Systems | GCP401k | Dental insurance | Disability insurance | Employee stock purchasing program | Life insuranceSenior-level Full TimeWashington - Seattle, United States1d ago
-
C++/CUDA Systems Engineer – Surgical Robotics Platform USD 140K-160KC++ | C++17 | C++20 | CPU GPU Scheduling | CUDAEquity | Health insurance | Paid time off | Performance bonusMid-level Full TimeLos Angeles, California1d ago
-
Entry-level InternshipHouston, TX1d ago
-
Principal Engineer, Data & ML Platform USD 119K-180KAPIs | Automated testing | Cloud Native | Cloud platform | Continuous DeploymentSenior-level Full TimeScottsdale, AZ1d ago
-
Principal Machine Learning Engineer USD 245K-393KCloud infrastructure | Data Science | Distributed Systems | Infrastructure as Code | ML pipelinesSenior-level Full TimeChicago, Illinois, USA R1d ago
-
Senior Technical Solutions Engineer, Platform USD 87K-154KAWS | Azure | Big data computing | Data computing | DatabricksHybrid work schedule | Learning and internal training programs | On-call rotationSenior-level Full TimeDallas, Texas1d ago
-
Distinguished Software Engineer, Data Infrastructure USD 248K-406KAI | Batch Processing | Data Infrastructure | Data Privacy | Data ProcessingExecutive-level Full TimeMountain View, CA, United States1d ago
-
Android Development | C Sharp | C plus plus | C# | Command LineMid-level Full TimeMountain View, CA, US; Redmond, WA, … R1d ago