Staff Engineer
Tasks
- Architect scheduling architectures for Kubernetes clusters
- Configure topology aware pod placement
- Coordinate multicomponent inference deployments
- Enable gang scheduling for distributed training
- Implement checkpoint restore for fault recovery
- Implement in place pod resizing with VPA
- Implement multilevel autoscaling and startup ordering
- Optimize GPU utilization for multi tenant workloads
- Optimize admission webhooks
- Orchestrate disaggregated AI inference pipelines
- Orchestrate model weight distribution using OCI image volumes
- Secure isolation environments for untrusted AI code
- Tune etcd and reduce API server load
Perks/Benefits
- Conference reimbursement
- Education reimbursement
- Employee assistance program
- Employee stock purchase program
- Equity compensation
- Flexible time off
- Hybrid work
- LinkedIn Learning access
- Local Employee Meetups
- Training reimbursement
Skills/Tech-stack
AMD GPU | Apache Yunikorn | Autoscaling | Bin packing | CRIU | Containerd | Custom Resource Definitions | DRF | Dominant Resource Fairness | Dynamic Resource Allocation | Etcd | Firecracker | GVisor | Gang Scheduling | Inference Server | KAI Scheduler | KV cache | Kata Containers | Kubernetes | Kueue | Load aware scheduling | MPI | NUMA | NVIDIA CUDA | NVIDIA GPU | NVIDIA Grove | NVIDIA Triton | NVIDIA Triton Inference | NVIDIA Triton Inference Server | NVIDIA cuda checkpoint | NVLink | Namespaces | OCI Image Volumes | Observability | PCIe | PyTorch | Resource allocation | Rootless Containers | Run.ai | Runc | SGLang | Security contexts | Spread scheduling | Time Per Output Token | Time To First Token | Triton | Triton Inference Server | VLLM | Volcano
Education
N/A
Related jobs
-
Senior Machine Learning Engineer, Agentic USD 163K-245KArtificial Intelligence | Direct Preference Optimization | Evaluation | Fine Tuning | Human-in-the-loop401k matching | Catered meals | Employee events | Employer-paid disability insurance | Employer-paid life insuranceSenior-level Full TimeBellevue, WA; Menlo Park, CA10h ago
-
Forward Deployment Engineer - Gen AI USD 162K-224KAWS Bedrock | AWS SageMaker | Autogen | Azure OpenAI | ChromaCareer development opportunities | Individual responsibility | Travel to client sitesMid-level Full TimeNew York, New York, United States11h ago
-
Senior Software Engineer, Knowledge Graph USD 196K-230KApache Flink | Apache Spark | Data Ingestion | Data Processing | DatabasesEmployee travel credits | Remote eligibleSenior-level Full TimeUnited States11h ago
-
Specialist Solutions Architect - AI/ML USD 180K-247KAI guardrails | Amazon Web Services | Apache Spark | Artificial Intelligence | Cloud ComputingMentorship | Remote work | Technical training | Travel up to 30 percentSenior-level Full TimeUnited States11h ago
-
Principal AI Platform Engineer USD 167K-220KAgent Orchestration | Backend Development | Braintrust | Cost Optimization | Data PipelinesEquity | Flexible Token Limits | Health, dental, vision coverage | Unlimited paid time offSenior-level Full TimeSan Francisco, California R12h ago
-
AWS MSK | Amazon Kinesis | Apache Flink | Apache Kafka | AutomationSenior-level Full TimeUSA - Atlanta, GA; USA - …13h ago
-
Staff Software Engineer - Container Platform USD 236K-339KAWS | Azure | Cloud automation | Cluster Lifecycle Management | Cluster lifecycleSenior-level Full TimeUS-CA-Menlo Park14h ago
-
Ansible | ArgoCD | CI/CD | Chef | Configuration ManagementSenior-level Full TimeNew York, NY, United States14h ago
-
A/B | A/B Testing | APIs | Airflow | B testingSenior-level Full TimeUnited States16h ago
-
Full Stack Software Engineer - Robotics USD 125K-200KAWS | Datadog | Distributed Systems | Edge Computing | Grafana401k | Cell phone reimbursement | DC FSA | Employee assistance program | EquityMid-level Full TimeSan Francisco || Oakland, CA R17h ago
-
Senior-level Full TimeCosta Mesa, California, United States18h ago
-
Senior-level Full TimeCosta Mesa, California, United States18h ago
-
AI Platform Engineer, Training and Inference USD 150K-225KANN indexing | BF16 | DDP | Embeddings | FP8Career growth | Learning opportunitiesSenior-level Full TimeSan Francisco18h ago
-
API Gateway | AWS Glue | Access Control | Apache Airflow | Apache AtlasSenior-level Full TimeFAIRFAX, VA, United States18h ago
-
AI Developer - Model Creation & Full Stack USD 150K-175KAWS | Angular | Azure | CI/CD | D3.jsRemote work | USPS Public Trust Clearance eligibleMid-level Full TimeWork from home, VA, United States R18h ago
-
Staff Embedded Software Engineer - ADAS/ AD USD 179K-263KBSP | Board Bring-up | Bring-up | C# | C++401k | Dental insurance | Disability insurance | Life insurance | Medical insuranceSenior-level Full TimeNewark, CA19h ago
-
Lead AI Engineer USD 180K-280KCI/CD | ColBERT | Docker | Faiss | Fine Tuning401k match | Bonus | Childcare benefits | Dental insurance | Disability insuranceSenior-level Full TimeQuincy, MA, United States19h ago
-
Senior-level Full TimeUnited States20h ago
-
Senior-level Full TimeUnited States20h ago
-
Software engineer, generative AI USD 119K-292KAWS | Agentic Workflows | Asyncio | Azure | Docker401k | Cancer testing support | Company holidays | Company off-sites | Company stock optionsMid-level Full TimeSan Francisco, CA20h ago
-
AI/ML Software Engineer - Recent Graduate USD 83K-83KAPIs | AWS | Algorithms | Angular | Azure401k matching | Dental insurance | Flex PTO | Health insurance | Life insuranceEntry-level Full TimeCharlotte, NC20h ago
-
AI engineer USD 152K-315KAWS | Azure | Cloud Computing | Deep learning | GCP401k | Company offsites | Dental insurance | Fertility support | Flexible spending accountMid-level Full TimeSan Francisco, CA20h ago
-
Computational Biologist - Protein Engineering USD 150K-250KAWS | Amazon Web Services | CUDA | Conda | Deep learningRelocation supportEntry-level Full TimeSan Francisco, CA, US21h ago
-
Entry-level Full TimeSan Francisco, CA, US21h ago
-
Applied Math Libraries Engineer (SPU) USD 120K-170KArithmetic intensity | Atomics | BLAS | Bandwidth) | C#Entry-level Full TimeRedwood City21h ago