Distinguished Engineer - Inference Serving Network and Storage
USD 206K-326K (estimate) Senior-level Full Time
Tasks
- Automate network and storage management frameworks
- Define networking architecture for inference serving
- Define storage architecture for model artifacts and checkpoints
- Design KV cache storage tiering and restore mechanisms
- Design management network architecture
- Develop infrastructure for disaggregated prefill decode and continuous batching
- Establish monitoring and observability for network and storage
- Establish performance models benchmarks and tuning methodologies
- Guide parallelism scaling strategies for tensor pipeline and expert parallelism
- Identify technical risks and resolve architecture trade offs
- Implement session state telemetry and log storage
- Implement traffic isolation and segmentation
- Influence roadmap standards and implementation choices
- Lead small multi functional technical team
- Optimize inter partition latency path
- Optimize networking and storage for AI and HPC workloads
- Plan backup and disaster recovery
- Set QoS and transport tuning strategy
Perks/Benefits
Skills/Tech-stack
Artifact management | Automation | Backup | Backup and Restore | Benchmarking | Caching | Checkpointing | Congestion Control | Disaster Recovery | Distributed Storage | Distributed Systems | Flow Control | High Performance | High Performance Transport | High Throughput | High-Performance Computing | High-Throughput Systems | KV cache | Logging | Low Latency | Low-Latency Systems | Mixture of Experts | Model Artifact Management | Monitoring | Networking | Observability | Performance Computing | Pipeline parallelism | Provisioning | Quality of Service | Replication | Resource prioritization | Segmentation | Session State | Session State Management | State management | Storage Architecture | Tail Latency | Telemetry | Tensor Parallelism | Throughput Stability | Traffic isolation | Tuning
Education
Roles
Related jobs
-
Software Engineer, Play Analytics, Platforms and Devices USD 147K-211KArtificial Intelligence | C++ | Code Quality | Computer networks | Data StorageMid-level Full TimeMountain View, CA, USA1h ago
-
Senior Software Engineer, Core, Server Optimization USD 174K-252KC++ | CPU architecture | Data Analysis | Debugging | MicroarchitectureSenior-level Full TimeSunnyvale, CA, USA; New York, NY, …1h ago
-
Software Engineer III, Speech Production, Infrastructure USD 147K-211KAutomatic Speech Recognition | C++ | Data Structures | Data Structures and Algorithms | Distributed SystemsSenior-level Full TimeMountain View, CA, USA1h ago
-
Customer Engineer, Data Management, Retail, Google Cloud USD 153K-222KCloud Architecture | Code conversion | DB2 | Data Backup | Database AdministrationSenior-level Full TimeAustin, TX, USA; Atlanta, GA, USA1h ago
-
Senior Software Engineer, AI/ML, Maps Navigation USD 174K-252KC plus plus | Code review | Computational Geometry | Computer Vision | Data StructuresSenior-level Full TimeMountain View, CA, USA1h ago
-
Cloud platform | Computer Vision | Data Processing | Data Structures | Data Structures and AlgorithmsSenior-level Full TimeSunnyvale, CA, USA1h ago
-
Senior Software Engineer, AI/ML, AI and Infrastructure USD 174K-252KC++ | Data Processing | Data Storage | Debugging | Distributed ComputingSenior-level Full TimeSunnyvale, CA, USA1h ago
-
Algorithms | C# | C++ | Code review | Data AnalysisSenior-level Full TimeSan Jose, CA, USA1h ago
-
Quantum Execution Stack Engineer, Quantum AI USD 174K-252KC++ | Data Storage | Digital Signal Processing | Distributed Computing | Graphics ProcessingMid-level Full TimeSeattle, WA, USA1h ago
-
Engineer, Supercomputing & Distributed Systems USD 180K-287KDocker | DuckDB | ETL | Infiniband | KafkaSenior-level Full TimeSan Francisco12h ago
-
AI Agents | AI Search | AWS | Agentic Workflows | Amazon SageMaker401k | Dental insurance | Medical insurance | Paid sick hours | Vision insuranceSenior-level Contract Full TimeRidgefield Park, NJ, United States13h ago
-
Generative AI Consultant USD 105K-105KAWS | Azure | CI/CD | Chroma | Cloud platform401k matching | College loan repayment plan | Company holidays | Dental insurance | Flexible spending accountMid-level Full TimeSan Francisco, CA, United States14h ago
-
Senior Machine Learning Engineer II USD 201K-253KAutoregressive models | Bias Mitigation | CTR Prediction | Causal Inference | Conversion RateAnnual refresh grants | Equity grant | Flex First work policy | Remote workSenior-level Full TimeUnited States - Remote R15h ago
-
Senior Machine Learning Engineer, Gen AI USD 165K-210KASR | AWS | Audio Processing | Cloud Computing | ContainersOpportunity to work in office if located near headquarters | Remote work optionSenior-level Full TimeUS Remote R15h ago
-
Complex event processing | Database Internals | Distributed Computing | Distributed Systems | Event ProcessingSenior-level Full TimeMountain View, California; San Francisco, California15h ago
-
Staff Data Engineer USD 140K-224KApache Spark | CDC | DBT | Data Governance | Data ModelingGenerous parental leave | Healthcare coverage | Hybrid work schedule | Lifetime Headspace membership | Monthly wellness stipendSenior-level Full TimeRemote - United States R15h ago
-
Applied ML Engineer, Data USD 200K-260KAWS S3 | Amazon DynamoDB | Annotation Workflows | Data Filtering | Data Parsing401k retirement plan | Company equity | Company holidays | Dental insurance | Fertility supportMid-level Full TimeRemote (U.S. or Europe) R15h ago
-
AI Lead USD 82K-175KAPI Development | AWS | Inference Pipelines | LLM Operations | Language ModelsBackground check required | Remote workSenior-level Full TimeSchenectady, New York, United States, Remote R16h ago
-
Customer Success Engineer - Database (2nd Shift) USD 75K-94KAnsible | Backups | ClickHouse | Cloud infrastructure | Database performanceConference reimbursement | Employee assistance program | Flexible time off | Remote work | Training reimbursementEntry-level Full TimeSeattle R17h ago
-
Customer Success Engineer - Database (2nd Shift) USD 75K-94KAnsible | Automation | Cause analysis | ClickHouse | Cloud infrastructureConference reimbursement | Employee assistance program | Employee equity options | Flexible time off | LinkedIn Learning accessEntry-level Full TimeDenver R17h ago
-
Customer Success Engineer - Database (2nd Shift) USD 75K-94KAnsible | Backups | ClickHouse | Helm | Incident ResponseConference reimbursement | Employee assistance program | Employee meetups | Flexible time off | LinkedIn Learning accessEntry-level Full TimeBoston R17h ago
-
Customer Success Engineer - Database (2nd Shift) USD 75K-94KAnsible | ClickHouse | Database Administration | Database backups | Database performanceConference reimbursement | Employee assistance program | Employee stock purchase program | Flexible time off | LinkedIn Learning accessEntry-level Full TimeAustin R17h ago
-
Customer Success Engineer - Database (2nd Shift) USD 75K-94KAI | AWS | Ansible | Automation | AzureRemote workEntry-level Full TimeSan Francisco R17h ago
-
Full Stack AI Engineer (Staff level) USD 160K-226KAWS | Agent Orchestration | Agentic Workflows | Context engineering | Distributed SystemsSenior-level Full TimeUS Remote R18h ago
-
AI Solution Engineer-Platform USD 111K-176KAPI | AWS GovCloud | Amazon Neptune | CI/CD | D3.js401k matching | Continuing education assistance | Dental insurance | Eleven Federal Holidays | Employee assistance programMid-level Full TimeHuntsville, AL Or Washington, DC20h ago