Find jobs in AI/ML, Data Science and Big Data
76 results
for KV cache
(Skill/Tech stack)
-
Senior-level Full Time上海、北京7h ago
-
AI Performance Optimization Engineer USD 100K-150KBenchmarking | C++ | Cache optimization | Compiler optimization | Continuous batchingMid-level Full TimeUnited States - Remote R4d ago
-
Application Software Engineer, Inference USD 135K-185KAgent Orchestration | Agent SDK | Auto Scaling | Batch scheduling | C++401k plan | Employee stock purchase plan | Long-term incentives | Medical, dental & vision coverage | Onsite Palo AltoEntry-level Full TimePalo Alto, CA4d ago
-
Sr GenAI Infra Specialist SA, AWS WWSO Startup USD 153K-228KAWS Inferentia | AWS Trainium | Amazon Web Services | Batching | CUDASenior-level Full TimeNew York, New York, USA5d ago
-
AI Performance Optimization Engineer USD 100K-150KC++ | CUDA | Continuous batching | Deep learning | Distributed TrainingMid-level Full TimeUnited States - Remote R5d ago
-
AI Performance Optimization Engineer USD 100K-150KBenchmarking | C++ | CUDA | Communication Primitives | Continuous batchingMid-level Full TimeUnited States - Remote R5d ago
-
AI Performance Optimization Engineer USD 100K-150KBenchmarking | C++ | Continuous batching | Cutlass | DeepSpeedRemote workMid-level Full TimeUnited States - Remote R5d ago
-
AI Performance Optimization Engineer USD 100K-150KC++ | CUDA | Continuous batching | DeepSpeed | Distributed TrainingBenefits | Career growth | Mentorship | Remote workMid-level Full TimeUnited States - Remote R5d ago
-
AI Engineer EUR 60K-80KAWQ | AWS | Agent SDK | CI/CD | CUDACareer growth opportunities | Permanent employment | Remote work optionMid-level Full TimeRemote - Paris, France R6d ago
-
AI Performance Optimization Engineer USD 100K-150KAttention Mechanisms | Benchmarking | C++ | Compiler optimization | Continuous batchingBenefits | Career growth | Mentorship | Remote workMid-level Full TimeUnited States - Remote R7d ago
-
AI Performance Optimization Engineer USD 100K-150KBenchmarking | C++ | CUDA | Compiler optimization | Continuous batchingMid-level Full TimeUnited States - Remote R7d ago
-
Senior Machine Learning Engineer USD 188K-282KAdversarial Training | Calibration monitoring | Continuous batching | DPO | Deep learningSenior-level Full TimePalo Alto, CA7d ago
-
Sr. Software Engineer, Inference PLN 321K-470KAutoscaling | BF16 | C++ | CI/CD | CUDACritical illness cover | Employee assistance programme | Family dental insurance | Family medical insurance | Life assuranceSenior-level Full TimeWarsaw, Poland7d ago
-
LLM Inference Frameworks and Optimization Engineer USD 160K-230KC++ | CUDA | CUDA graph | Cluster scheduling | CompilerEquity | Health insuranceMid-level Full TimeSan Francisco, Singapore, Amsterdam7d ago
-
C++ | CUDA | CUDA kernels | Concurrency | Distributed SystemsSenior-level Full TimePittsburgh, PA or Remote R7d ago
-
AWQ | AWS | Accelerate | Benchmarking | CUDASenior-level Full TimeGuangzhou, Guangdong, China7d ago
-
Attention Mechanism | CI/CD | Custom Kernels | Deep learning | Distributed TrainingCareer development | Collaborative culture | Continuous learning | Flexible work environment | High autonomySenior-level Full TimeGermany7d ago
-
Inference Optimization Engineer (local / edge runtime) USD 170K-315KBatching | C++ | CUDA | KV cache | LinuxHealth benefits | Hybrid work model | Retirement benefits | VacationMid-level Full TimeUSA - CA - Santa Clara, …8d ago
-
Senior Inference Engineer, AIConfigurator for Dynamo USD 184K-356KBatching | Distributed Systems | Expert parallelism | GPU Computing | High PerformanceEquity | Health benefits | Hybrid workSenior-level Full TimeUS, CA, Santa Clara, United States11d ago
-
Staff Software Engineer, Inference PLN 369K-542KAutoscaling | BF16 | Benchmarking | C++ | CUDACritical illness cover | Employee assistance programme | Family dental insurance | Family medical insurance | Generous pension contributionSenior-level Full TimeWarsaw, Poland11d ago
-
Senior-level Full TimeMarkham, Ontario, Canada11d ago
-
[SX/EIT-MM] Senior AI/Agent Engineer USD 150K-197KA/B | A/B Testing | Async Programming | Attention Mechanism | Audio ProcessingSenior-level Full TimeThành phố Hồ Chí Minh, Hồ …11d ago
-
AI Software Engineer USD 151K-332KC++ | CUDA | Computer Vision | Continuous batching | FP8Hybrid work | In-person work | Remote work | Work-life balanceMid-level Full TimeSeattle (WA), United States12d ago
-
AI Engineer USD 100K-135KAWQ | AWS | AWS EC2 | Agent Frameworks | CI/CD401k match | Health insurance | Learning and development stipend | Paid parental leave | Paid time offMid-level Full TimeRemote USA - In Tandem R12d ago
-
Senior Site Reliability Engineer (Noida, BLR, India) INR 3000K-5000KCI/CD | Cast AI | FinOps | GCP | GPU ProfilingSenior-level Full TimeNoida14d ago
-
Senior Deep Learning Solution Architect CNY 240K-480KAccelerated computing | Computer Systems | Data Structures | Deep learning | Distributed TrainingSenior-level Full TimeChina, Beijing15d ago
-
Attention Mechanisms | C++ | Decoder Only | Decoder-only Transformer | GPU parallelismComprehensive benefitsSenior-level Full TimeNew York, New York, United States …15d ago
-
AWQ | AWS | Batching | CPU architecture | CUDASenior-level Full TimeGuangzhou, Guangdong, China17d ago
-
Senior-level Full TimeGurugram, Haryana, India18d ago
-
Agent systems | Artificial Intelligence | Deep learning | Inference Optimization | KV cacheEntry-level Full TimeSeoul18d ago
-
Artificial Intelligence | Attention Mechanisms | Benchmarking | C++ | GEMMEntry-level Full Time InternshipChina, Beijing19d ago
-
AI Architect Lead (Hybrid-within BankUnited's footprint) USD 140K-200KAI Foundry | AWS | AWS Bedrock | Artificial Intelligence | AutogenHybrid workSenior-level Full TimeMiami Lakes, FL, United States R19d ago
-
Senior Machine Learning Engineer (Inference Platform) USD 175K-225KAWS | Alerting | CI/CD | Continuous batching | Data ProcessingSenior-level Full TimeRemote - USA R19d ago
-
Attention | Batching | C++ | CUDA | CUDA kernelsBenefits | EquitySenior-level Full TimeUS, CA, Santa Clara, United States20d ago
-
Engineering Manager, Inference Benchmarking — AI Perf USD 224K-356KDCGM | Distributed Systems | GPU Telemetry | GPU observability | HelmSenior-level Full TimeUS, CA, Santa Clara, United States26d ago
-
Product Manager - AI Inference & Model Serving USD 165K-275KAI Inference | Artificial Intelligence | Autoscaling | Cache Management | Continuous batchingConference attendance | Professional development | Stock options | Training | Workstation providedMid-level Full TimeAustin, TX, United States26d ago
-
Data Curation | Deep learning | DeepSpeed | Direct Preference Optimization | EvaluationSenior-level Full TimeSingapore, Singapore28d ago
-
Senior MLOps Engineer - LLMs EUR 56K-76KA/B | A/B Testing | Argo | Async API | AuthenticationAutonomy | Hybrid work model | Professional growth and learningSenior-level Full TimeNetherlands - Amsterdam1mo ago
-
Staff Machine Learning Engineer, ML Infrastructure USD 183K-269KAWS EKS | Amazon IAM | Amazon S3 | Autoscaling | BatchingEmployee resource groups | Free home security system | Free professional monitoring | Hybrid work modelSenior-level Full TimeBoston, MA1mo ago
-
Senior-level Full TimeMilpitas, CA, United States1mo ago
-
AI/ML ASIC Architect USD 163K-249KARM | ASIC architecture | AXI interconnect | Area Optimization | Attention MechanismsSenior-level Full TimeMilpitas, CA, United States1mo ago
-
Machine Learning Engineer, Distributed vLLM USD 136K-287KAPI Gateway | Cilium | Distributed Systems | Envoy | GPU ProfilingPaid parental leave | Paid time off | Retirement 401k match | Tuition reimbursementMid-level Full TimeBoston, United States R1mo ago
-
Product Manager - AI Inference & Model Serving USD 160K-275KAI Inference | Autoscaling | Cache Management | Cold Start | Cold Start OptimizationConference attendance | Professional development and training | Stock options | Workstation providedMid-level Full TimeAustin, TX, United States1mo ago
-
AWQ | AWS | Accelerate | Azure | BatchingMid-level Full TimeShenzhen, Guangdong, China R1mo ago
-
Compute Shaders | Diffusion Models | Distributed inference | Edge Computing | Expert parallelism100 percent remoteSenior-level Full TimeRemote job R1mo ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 201K-332KComputer Vision | Diffusion Models | Edge Computing | Expert parallelism | Flash AttentionRemote workSenior-level Full TimeRemote job R1mo ago
-
AI Research Engineer (Kernel & Inference Optimization) USD 201K-332KCompute Shaders | Diffusion Models | Distributed inference | Edge Computing | Expert parallelismEnglish communication support | Remote workSenior-level Full TimeRemote job R1mo ago
-
Diffusion Models | Distributed Inference Systems | Distributed inference | Expert parallelism | Flash Attention100 percent remote | Worldwide remoteSenior-level Full TimeRemote job R1mo ago
-
Senior Applied ML Engineer (Speech & Audio) USD 140K-200KActivity Detection | Audio codecs | Audio preprocessing | Automatic Speech Recognition | ConformerAccommodation allowance | Career Development Programs | Career growth opportunities | Coffee | Daily DrinksSenior-level Full TimeEgypt - Remote R1mo ago
-
Senior Applied ML Engineer (Speech & Audio) USD 140K-200KActivity Detection | Audio Processing | Audio codecs | Automatic Speech Recognition | CUDAAccommodation allowance | Career growth opportunities | Coffee and hot drinks | Company events and parties | Daily breakfastSenior-level Full TimeCairo, Cairo Governorate, Egypt1mo ago