Principal Machine Learning Engineer, Distributed vLLM Inference
Tasks
- Build Go or Rust system components
- Contribute inference optimization algorithms
- Design KV cache aware routing
- Develop distributed inference infrastructure
- Enhance inference stack stability
- Implement scoring algorithms
- Improve fault tolerance
- Integrate with vLLM project
- Maintain Kubernetes inference components
- Manage distributed inference workloads
- Mentor engineers
- Optimize memory utilization
- Participate in technical design discussions
- Provide code reviews
Perks/Benefits
Skills/Tech-stack
API Gateway | C++ | Cilium | Distributed tracing | Envoy | GPU Performance | GPU Performance Benchmarking | GRPC | Go | HTTP2 | Infiniband | Istio | KV cache | Kubernetes | Layer 7 | Layer 7 Networking | NVIDIA Nsight | Nsight Systems | Nvidia NSight Systems | OpenTelemetry | Performance Benchmarking | Python | RDMA | Reverse Proxy | RoCE | Rust | UCX
Education
Regions
Countries
States
Cities
Related jobs
-
Senior Data Engineer USD 135K-165KApache Airflow | Apache Spark | Argo | Cloud Dataproc | Context engineering401k match | Beverages | Dental insurance | Health insurance | Hybrid work availabilitySenior-level Full TimeDenver, CO, US R5h ago
-
AI Agent | AI agent workflows | AI systems | API Integration | Agent workflowsBonus | Equity | Hybrid work | Leadership growthSenior-level Full TimeNew York, New York; Hybrid R20h ago
-
1483466 Principal AI Platform Engineer/ AI Infrastructure Engineer(LLMOps / MLOps)- Remote A USD 150K-170KAI Search | AWS | Airflow | Arize Phoenix | AzureFirst shift hours | Fully remote | Monday to Friday scheduleSenior-level Full TimeUnited States R21h ago
-
Senior Forward Deployed Engineer (AI/ML) USD 140K-174KBenchmarking | CUDA | Continuous batching | CrewAI | DatabasesConference reimbursement | Employee assistance program | Flexible time off | LinkedIn Learning access | Local Employee MeetupsSenior-level Full TimeSeattle R23h ago
-
Senior Forward Deployed Engineer (AI/ML) USD 140K-174KArtificial Intelligence | CUDA | Continuous batching | CrewAI | DatabasesConference reimbursement | Employee assistance program | Flexible time off | LinkedIn Learning access | Remote workSenior-level Full TimeSan Francisco R23h ago
-
Data Engineer USD 127K-175KAWS | Airbyte | Airflow | Alerting | CI/CD401k contribution | Dental insurance | Health insurance | Life insurance | Long-term disabilitySenior-level Full TimeUnited States - Remote R23h ago
-
Staff Engineer - AI Engineer USD 155K-175KERP | Hugging Face | Langchain | MES | ModbusContract extension possible | Remote workSenior-level Full TimeRemote, REMOTE, United States R1d ago
-
Staff Embedded Controls Engineer, Body Controls USD 129K-244KActuator control | Automation frameworks | Automotive Ethernet | Batteries | BazelEmployee resource groups | Flexible family care days | Medical, dental & vision coverage | Paid Holidays | Paid community serviceSenior-level Full TimePalo Alto, CA, United States R1d ago
-
Intern, AI/ML USD 70K-120KAlgorithm deployment | C++ | Computer Vision | Deep learning | GPU ComputingAccess to LinkedIn Learning | Commuter benefits | Internet reimbursement | Paid time offEntry-level InternshipUnited States, Remote R1d ago
-
Senior Staff Engineer, Computer Vision/AI USD 270K-342KAWS | Active Learning | Airflow | Alerting | AzureAnnual refresh grants | Equity grant | Remote work flexibilitySenior-level Full TimeUnited States - Remote R1d ago
-
Data Processing | Distributed Systems | Flink | Go | JavaAccess to cutting-edge technology | Continuous learning | Diversity and inclusion workplace culture | Flexible working hours | Professional developmentSenior-level Full TimeIdaho R1d ago
-
Apache Flink | Apache Kafka | Data Processing | Development Lifecycle | Distributed SystemsAccess to cutting-edge technology | Collaborative work environment | Continuous learning and professional development | Diversity and inclusion focused culture | Flexible working hoursSenior-level Full TimeMassachusetts R1d ago
-
Computer Science | Computer science fundamentals | Data Processing | Distributed Systems | FlinkAccess to cutting-edge technology | Collaborative work environment | Continuous learning | Diversity and inclusion focused workplace culture | Flexible working hoursSenior-level Full TimeIllinois R1d ago
-
AI machine learning | Apache Flink | Apache Kafka | Computer Science | Computer science fundamentalsAccess to cutting-edge technology | Collaborative work environment | Continuous learning | Diversity and inclusion workplace culture | Flexible working hoursSenior-level Full TimeColumbia R1d ago
-
Apache Flink | Apache Kafka | Distributed Systems | Go | JavaAccess to cutting-edge technology | Collaborative work environment | Continuous learning and professional development | Diversity and inclusion focused culture | Flexible working hoursSenior-level Full TimeFlorida R1d ago
-
Apache Flink | Apache Kafka | Distributed Systems | Go | JavaAccess to cutting-edge technology | Collaborative work environment | Continuous learning | Diversity and inclusion focused culture | Flexible working hoursSenior-level Full TimeConnecticut R1d ago
-
Staff Machine Learning Engineer, GenAI Platform USD 253K-354KCUDA | DeepSpeed | Distributed Systems | Docker | FSDP401k employer match | Family planning support | Flexible vacation | Gender-affirming care | Healthcare benefitsSenior-level Full TimeRemote - United States R1d ago
-
Senior Staff Machine Learning Engineer, GenAI Platform USD 292K-409KAWS | Agentic AI | CI/CD | Cloud Storage | Generative AI401k employer match | Caregiving support | Family planning support | Flexible vacation | Gender-affirming careSenior-level Full TimeRemote - United States R1d ago
-
Data Engineer II USD 82K-130KAirflow | Amazon Web Services | DBT | Data Ingestion | Data Pipeline MonitoringFixed term 12 month position | Possibility of renewal | Remote within United States | Wellbeing focused benefitsMid-level Full TimeUnited States R1d ago
-
Sr. Data Engineer USD 180K-220KBusiness Intelligence | DBT | Dashboards | Data Modeling | Data Transformation401k plan | Health coverage | Life and disability insurance | Mental health days | Paid parental leaveSenior-level Full TimeRemote - United States Only R1d ago
-
Senior Data Engineer USD 148K-214KAWS | Amazon DynamoDB | Amazon EMR | Amazon S3 | Apache Airflow401k matching | Bonus program | Equity plans | Medical, dental & vision coverage | Unlimited PTOSenior-level Full TimePasadena, United States R1d ago
-
Staff Data Engineer | Luma USD 142K-163KAWS | Agile | Apache Kafka | Asynchronous processing | CI/CD401k retirement plan | Dental insurance | Disability insurance | Fitness perks | Flexible time offSenior-level Full TimeRemote - USA R1d ago
-
Machine Learning Engineer USD 140K-200KAWS Lambda | Amazon ECS | Amazon EKS | Amazon Web Services | Apache SparkMid-level Full TimeUnited States - Remote R1d ago
-
Cloudflare | Docker | Event Processing | Go | JavaScriptHigh ownership culture | Remote work flexibility | Startup environmentSenior-level Full TimeRemote, US R1d ago
-
Senior Data/ML Engineer USD 151K-205KDBT | Data Architecture | Data Governance | Data Observability | Data Quality401k match | Dental insurance | Family planning resources | Flexible vacation days | Learning and development programSenior-level Full TimeRemote - USA R1d ago