Research Engineer - LLM Infra training - Seed Infra
Seattle, Washington, United States
USD 232K-427K Mid-level Full Time
Tasks
- Analyze performance bottlenecks and propose optimization methods
- Conduct research and development on large scale LLM training infrastructure
- Design and optimize distributed training strategies for LLMs
- Investigate system reliability and resilience techniques
- Manage GPU memory during training
- Optimize network and scheduling for training workloads
- Translate research ideas into scalable production AI infrastructure
Perks/Benefits
- N/A
Skills/Tech-stack
Checkpointing | Data-Driven Optimization | Data-driven | Deep learning | Distributed Training | Fault Tolerance | GPU memory | GPU memory management | Language Models | Large Language Models | Memory Management | Network Optimization | Parallel Computing | Performance optimization | Reinforcement Learning | Scheduling | System Reliability | Throughput Optimization
Education
N/A
Related jobs
-
C++ | Cloud Native | Container Orchestration | Deep learning | Distributed SystemsCareer growth | Open Source contribution | World Class CollaborationEntry-level Full TimeSan Jose, California, United States10h ago
-
Research Engineer, Robotics USD 184K-356KC++ | CUDA | Computer Graphics | GPU Architectures | GPU KernelsSenior-level Full TimeRedmond, WA11h ago
-
Partner Engineer, Generative AI USD 159K-223KAWS | Agent Orchestration | Azure | Bias Mitigation | C++Senior-level Full TimeMenlo Park, CA11h ago
-
Staff Research Engineer, MRS AI USD 146K-208KA/B | A/B Testing | Alignment techniques | B testing | BenchmarkingSenior-level Full TimeBellevue, WA11h ago
-
Customer Engineer III, Applied AI, Google Cloud USD 174K-253KAgent tooling | C++ | Cloud Architecture | Conversational AI | Document AISenior-level Full TimeSunnyvale, CA, USA; Mountain View, CA, …12h ago
-
Research Engineer, Pretraining, DeepMind USD 174K-253KFine Tuning | Inference Optimization | JAX | Language Models | Large Language ModelsMid-level Full TimeNew York, NY, USA12h ago
-
Senior Software Engineer, Map Ads, Machine Learning USD 174K-253KC++ | Data Processing | Debugging | Differential Modeling | Language ModelsSenior-level Full TimeMountain View, CA, USA12h ago
-
Staff Datacloud Blackbelt Engineer, Data and AI USD 183K-266KAI/ML | AI/ML workflows | BigQuery | Cloud Architecture | Computer VisionSenior-level Full TimeSunnyvale, CA, USA; Kirkland, WA, USA12h ago
-
Senior Staff Software Engineer, AI/ML, Google Cloud USD 262K-365KAlgorithms | Data Processing | Data Structures | Debugging | Distributed SystemsSenior-level Full TimeSeattle, WA, USA12h ago
-
Senior Software Engineer, AI/ML, Google Cloud Platforms USD 174K-253KC++ | Code Reviews | Data Processing | Data Structures | Data structures algorithmsSenior-level Full TimeKirkland, WA, USA12h ago
-
Senior Software Engineer, AI/ML, Google Cloud USD 174K-253KC++ | Data Processing | Debugging | Distributed Computing | Information RetrievalSenior-level Full TimeSunnyvale, CA, USA12h ago
-
Senior Software Engineer, AI/ML GenAI, Google Cloud USD 174K-253KAlgorithms | C++ | Computer Vision | Data Processing | Data StructuresSenior-level Full TimeSunnyvale, CA, USA12h ago
-
Database querying | Deep learning | Language Processing | Machine Learning | Natural LanguageMid-level Full TimeNew York, NY, USA; Mountain View, …12h ago
-
Technical Lead, Storage Distributed and Sovereign Cloud USD 207K-301KAI/ML | AI/ML Workloads | Access Control | Automated remediation | Block StorageSenior-level Full TimeRaleigh, NC, USA; Durham, NC, USA12h ago
-
Agent Construction | Agent Orchestration | Air Gapped Computing | Air-gapped | Data IngestionBonus | Equity | Security clearance travel availabilitySenior-level Full TimeWashington D.C., DC, USA; Maryland, USA12h ago
-
Staff Research Engineer, Applied AI, DeepMind USD 207K-301KAgent workflows | Algorithms | Data Structures | Dataset curation | Deep learningSenior-level Full TimeMountain View, CA, USA12h ago
-
Principal Scientist, Machine Learning - Biomolecules USD 208K-286KAWS Batch | AWS ECS | AWS EKS | AWS S3 | AWS SageMakerAnnual incentive program | Healthcare coverage | Retirement benefitsSenior-level Full TimeCambridge, MA USA21h ago
-
AI Engineer USD 125K-201KAWS | Agent Frameworks | Agent SDK | Agent coordination | Claude Agent SDKCollaboration with little supervision | Startup environment | Work on cutting-edge AIEntry-level Full TimePittsburgh, Pennsylvania, United States21h ago
-
Mid-level Full TimeSan Francisco22h ago
-
AI Developer Evangelist USD 137K-194KC++ | CSS | Computer Vision | Deep learning | Edge AIHealth insurance | Retirement plan | VacationSenior-level Full TimeUSA - OR - Hillsboro, United …23h ago
-
AI Engineer USD 66K-145KAWS | Azure | CI/CD | Deep learning | DockerHealth benefits | Home-based work | Paid time off | Retirement contributionsMid-level Full TimeUS - VA - Remote, United … R23h ago
-
AI Research Engineer (Applied AI) USD 100K-150KAblation Studies | Accelerator hardware | Agentic Systems | Computer Vision | Deep learningCareer growth | Remote workMid-level Full TimeUnited States - Remote R23h ago
-
AI Research Engineer (Applied AI) USD 100K-150KAblation Studies | Accelerator hardware | Agentic Systems | Computer Vision | Data QualityCareer growth | Remote workMid-level Full TimeUnited States - Remote R23h ago
-
LLM Fine-Tuning Engineer USD 100K-150KAdapter methods | Benchmarking | DPO | Distributed Training | Efficient AttentionBenefits | RemoteMid-level Full TimeUnited States - Remote R23h ago
-
LLM Fine-Tuning Engineer USD 100K-150KAdapter-Tuning | Benchmarking | DPO | Deep Policy Optimization | Distributed TrainingCareer growth | Remote workMid-level Full TimeUnited States - Remote R23h ago