Staff Engineer
Tasks
- Architect scheduling architectures for Kubernetes clusters
- Configure topology aware pod placement
- Coordinate multicomponent inference deployments
- Enable gang scheduling for distributed training
- Implement checkpoint restore for fault recovery
- Implement in place pod resizing with VPA
- Implement multilevel autoscaling and startup ordering
- Optimize GPU utilization for multi tenant workloads
- Optimize admission webhooks
- Orchestrate disaggregated AI inference pipelines
- Orchestrate model weight distribution using OCI image volumes
- Secure isolation environments for untrusted AI code
- Tune etcd and reduce API server load
Perks/Benefits
- Conference reimbursement
- Education reimbursement
- Employee assistance program
- Employee stock purchase program
- Equity compensation
- Flexible time off
- Hybrid work
- LinkedIn Learning access
- Local Employee Meetups
- Training reimbursement
Skills/Tech-stack
AMD GPU | Apache Yunikorn | Autoscaling | Bin packing | CRIU | Containerd | Custom Resource Definitions | DRF | Dominant Resource Fairness | Dynamic Resource Allocation | Etcd | Firecracker | GVisor | Gang Scheduling | Inference Server | KAI Scheduler | KV cache | Kata Containers | Kubernetes | Kueue | Load aware scheduling | MPI | NUMA | NVIDIA CUDA | NVIDIA GPU | NVIDIA Grove | NVIDIA Triton | NVIDIA Triton Inference | NVIDIA Triton Inference Server | NVIDIA cuda checkpoint | NVLink | Namespaces | OCI Image Volumes | Observability | PCIe | PyTorch | Resource allocation | Rootless Containers | Run.ai | Runc | SGLang | Security contexts | Spread scheduling | Time Per Output Token | Time To First Token | Triton | Triton Inference Server | VLLM | Volcano
Education
N/A
Related jobs
-
Senior Data Engineer (Remote) USD 155KAgile | Apache Spark | BigQuery | Cassandra | Data Governance401k match | Dental insurance | Employee assistance program | Employee stock purchase plan | Flexible scheduleSenior-level Full TimeWork From Home, United States R5h ago
-
Senior AI Operations Engineer USD 170K-180KAI infrastructure | Azure | CI/CD | Cloud infrastructure | Container Engine for Kubernetes401k match | Employee assistance program | Employee stock purchase plan | Flexible schedule | Flexible spending accountSenior-level Full TimeWork From Home, United States R5h ago
-
API Development | Airflow | Automated retraining | CI/CD | Cloud PlatformsEquityMid-level Full TimeNaples, United States7h ago
-
API Design | AWS | AWS Cloud | AWS Cloud Development Kit | AWS cloud developmentSenior-level ContractGlendale, United States8h ago
-
Delivery Senior Consultant, Data Engineering and Gen AI USD 119K-208K.NET | AWS | Agentic AI | Agile | AngularSenior-level Full TimeGilbert, Arizona, United States; Lake Mary, …9h ago
-
Research Engineer / Scientist - Storage for LLM USD 156K-387KAttention Mechanisms | CUDA | Caching | Distributed Systems | Eviction policiesCompetitive compensation | Conference attendance | Generous research resources | Innovation-driven culture | Open source contributionsEntry-level Full TimeSan Jose, California, United States9h ago
-
Agentic data | Apache Hive | Apache Spark | Coding Data | Data CurationSenior-level Full TimeMenlo Park, CA10h ago
-
Software Engineer USD 149K-211KAlgorithms | C# | C++ | Code review | Data AnalysisBonus | Equity | Hybrid work scheduleMid-level Full TimeMountain View, CA, USA R10h ago
-
Software Engineer USD 149K-211KC# | C++ | Cause analysis | Data Processing | Data StructuresHybrid scheduleMid-level Full TimeSunnyvale, CA, USA R10h ago
-
Senior Research Engineer USD 174K-252KC plus plus | Code Reviews | Data Curation | Deep learning | JAXHybrid scheduleSenior-level Full TimeNew York, NY, USA R10h ago
-
AI accelerators | C++ | CPU | Diffusion Models | Edge ComputingSenior-level Full TimeMountain View, CA, USA10h ago
-
Software Engineer, Embedded Agentic AI USD 195K-345KAgent Orchestration | Agent systems | C# | C++ | ContainerizationFinancial wellness support | Hybrid work schedule | Mental health support | Paid time off | Remote work optionsEntry-level Full TimeAustin, Texas17h ago
-
Staff Software Engineer, AI Data Platform USD 250K-280KCloud platform | Google Cloud | Google Cloud Platform | GraphQL | KafkaSenior-level Full TimeSan Francisco Bay Area R19h ago
-
BEV | Bayesian Methods | CUDA | Machine Learning | Metrics OptimizationSenior-level Full TimeFoster City, CA20h ago
-
Continual Learning | Data Processing | Deep learning | JAX | Language ModelsBonus program | Company benefits program | Equity incentive planEntry-level Full TimeMountain View, CA USA; San Francisco, …21h ago
-
Staff Machine Learning Engineer, Multi-Modal Perception USD 251K-310KC plus plus | Computer Vision | Data Analysis | Deep learning | JAXSenior-level Full TimeMountain View, CA USA; San Francisco, …21h ago
-
Sr. Software Development Engineer, MLOPs USD 168K-227KAlerting | Amazon EKS | CI/CD | Checkpointing | Data IngestionSenior-level Full TimeBellevue, Washington, USA21h ago
-
Data Platform Engineer USD 182K-240KAWS | Amazon Kinesis | Apache Airflow | Apache Flink | Apache Kafka401k | Dental insurance | Family leave | Flexible paid time off | Free food and snacksSenior-level Full TimeOrlando, Florida, United States - Remote R21h ago
-
Senior-level Full TimeInnovation Point, United States21h ago
-
Hugging Face | LLM orchestration | Langchain | Language Models | Large Language ModelsCareer growth potential | Early stage technical hire | Equity compensation | High ownership role | Hybrid workMid-level Full TimeSan Francisco, CA; Hybrid R21h ago
-
Mid-level Full TimePENNANT PARK, ATLANTA - 9141, United …21h ago
-
Senior-level Full TimeFoster City, United States21h ago
-
Senior Data Engineer - Athlete (REMOTE) USD 83K-138KData Governance | Data Lineage | Data Modeling | Data Privacy | Data TestingSenior-level Full TimeRemote - US, United States R21h ago
-
Senior Data Engineer IS - Remote USD 122K-208KAPI | API Integration | Access Control | Alerting | BashRemote work | Rotational on-call supportSenior-level Full TimeRenton, WA, United States R22h ago
-
Software Engineer, Compute Infrastructure USD 140K-220KAWS | Autoscaling | Azure | Cost Optimization | Distributed Systems401k contribution | Dental insurance | Education stipend | Healthy lunches | Home office improvement stipendSenior-level Full TimeMountain View, CA22h ago