Research Engineer - LLM Training Infrastructure - Seed Infra
San Jose, California, United States
USD 244K-450K Mid-level Full Time
Tasks
- Analyze performance bottlenecks
- Design distributed training strategies
- Design fault tolerance and failure diagnosis
- Enhance training reliability and resilience
- Implement fast checkpointing
- Improve computation and communication efficiency
- Manage GPU memory
- Optimize network and scheduling
- Optimize parallelism schemes
- Propose data driven optimization methods
- Research LLM training infrastructure
- Scale throughput on GPU clusters
- Translate research into production AI infrastructure
Perks/Benefits
- N/A
Skills/Tech-stack
Checkpointing | Data-Driven Optimization | Data-driven | Distributed Training | Fault Tolerance | GPU memory | GPU memory management | Language Models | Large Language Models | Memory Management | Network Optimization | Performance optimization | Reinforcement Learning | Scheduling
Education
N/A
Related jobs
-
AI Engineer USD 152K-200KAPI Development | Bias Mitigation | Cloud Computing | Data Pipelines | Data PreparationSenior-level Full TimeLincoln, NE, US2h ago
-
Architecture | Cloud Computing | Development Lifecycle | Generative AI | InferenceMid-level Full TimePalo Alto, CA, United States5h ago
-
Forward Deployed AI Engineer, Aviation Regulatory USD 112K-300KAirspace Tracking | Analytics | C++ | Data Pipelines | Data ProcessingSenior-level Full TimeSouth San Francisco, California, USA11h ago
-
Forward Deployed AI Engineer, Expansion USD 112K-300KAnalytics | C++ | Data Pipelines | Decision Support Systems | Decision supportDental insurance | Equity compensation | Medical insurance | Overtime pay | Paid time offSenior-level Full TimeSouth San Francisco, California, USA11h ago
-
Analytics | C++ | Data Pipelines | Document Intelligence | EvaluationDental insurance | Health insurance | Paid time off | Travel up to 25% | Vision insuranceSenior-level Full TimeSouth San Francisco, California, USA11h ago
-
Analytics | C++ | Data Pipelines | Evaluation | JavaDental insurance | Health insurance | Paid time off | Travel opportunities | Vision insuranceSenior-level Full TimeSouth San Francisco, California, USA11h ago
-
Forward Deployed AI Engineer, Legal Real Estate USD 112K-300KAnalytics | Approval Routing | C++ | Contract automation | Data PipelinesDental insurance | Equity compensation | Medical insurance | Paid time off | Performance bonusesSenior-level Full TimeSouth San Francisco, California, USA11h ago
-
Forward Deployed AI Engineer, Operations USD 112K-300KAnalytics | C++ | Data Processing | Data Processing Pipelines | GenAIDental insurance | Equity compensation | Medical insurance | Overtime pay | Paid time offSenior-level Full TimeSouth San Francisco, California, USA11h ago
-
Forward Deployed AI Engineer, People USD 112K-300KAnalytics | C++ | Data Pipelines | Data Processing | GenAIDental insurance | Medical insurance | Paid time off | Travel opportunities | Vision insuranceSenior-level Full TimeSouth San Francisco, California, USA11h ago
-
Analytics | C++ | Data Pipelines | Data Processing | GeospatialDental insurance | Medical insurance | Paid time off | Travel opportunity | Vision insuranceSenior-level Full TimeSouth San Francisco, California, USA11h ago
-
Forward Deployed AI Engineer, Site Acquisition USD 112K-300KAnalytics | C++ | CRM | Data Processing | Data Processing PipelinesDental insurance | Medical insurance | Paid time off | Travel up to 25 percent | Vision insuranceSenior-level Full TimeSouth San Francisco, California, USA11h ago
-
Forward Deployed AI Engineer, Talent USD 112K-300KAnalytics | C++ | Data Processing | Data Processing Pipelines | Deep learningSenior-level Full TimeSouth San Francisco, California, USA11h ago
-
Firmware Lead, Robotics USD 185K-268KARM Cortex | ARM Cortex-M | ARM Cortex-R | Asynchronous programming | Bring-upIn office 5 days per week | Relocation assistanceSenior-level Full TimeSan Francisco12h ago
-
Agentic AI Developer USD 99K-225KAI Safety | Agent planning | Design Patterns | Embeddings | Language ModelsDependent care | Paid leave | Professional development | Tuition assistance | Work-life programsMid-level Full TimeUSA, IL, O'Fallon (475 Regency Park), …17h ago
-
Freelance Machine Learning Engineer USD 180KLangchain | Language Models | Large Language Models | MLOps | NumPyFlexible weekly hours | Part-time availability | Project based workMid-level FreelanceNew York, United States - Remote R17h ago
-
AI Research Engineer (Applied AI) USD 100K-150KAblation Studies | Accelerator hardware | Data Quality | Data Validation | Data labelingMid-level Full TimeUnited States - Remote R17h ago
-
LLM Fine-Tuning Engineer USD 100K-150KAttention Optimization | DPO | Direct Preference Optimization | Distributed Training | EvaluationMid-level Full TimeUnited States - Remote R17h ago
-
LLM Fine-Tuning Engineer USD 100K-150KAdapter methods | DPO | Dataset curation | Distributed Training | Efficient AttentionMid-level Full TimeUnited States - Remote R17h ago
-
Senior Machine Learning Engineer USD 107K-199KAI Foundry | AI Search | AKS | Azure AI | Azure AI Foundry401k matching | Dental insurance | Health insurance | Hybrid work environment | Life insuranceSenior-level Full TimeUSA, Massachusetts, Boston, 200 Berkeley Street, …17h ago
-
AI Performance Optimization Engineer USD 100K-150KAttention Mechanisms | Benchmarking | C++ | Continuous batching | Data pipelineCareer growth | Remote workMid-level Full TimeUnited States - Remote R17h ago
-
AI/ML Implementation Engineer (m/f/d) USD 93K-155K8D | APQP | AWS | AWS Lambda | Amazon Bedrock12 paid holidays | Disability benefits | Employee assistance program | Life insurance | Medical, dental & vision coverageSenior-level Full TimeRemote, United States R17h ago
-
Staff Data Platform Engineer USD 164K-282KAWS | AWS CDK | AWS Serverless | Batch Processing | CI/CD401k match | Company paid life insurance | Company-paid disability insurance | Flexible schedule | HackathonsSenior-level Full TimeAustin, TX17h ago
-
ARM Cortex | ARM Cortex-M | C# | C++ | CANEmployee resource groups | Flexible hours | Flexible time off | Medical, dental & vision coverage | Monthly social eventsSenior-level Full TimeSan Francisco18h ago
-
Agentic AI/ML Engineer USD 70K-300KAWS | Agent systems | Artificial Intelligence | Automated testing | AzureEntry-level Full TimeIrvine, CA19h ago
-
Machine Learning Research Engineer USD 150K-200KData Assimilation | Data Fusion | Deep learning | Distributed Training | Gradient descentMid-level Full TimeRWC HQ19h ago