Research Engineer - LLM Training Infrastructure - Seed Infra
San Jose, California, United States
USD 244K-450K Mid-level Full Time
Tasks
- Analyze performance bottlenecks
- Design distributed training strategies
- Design fault tolerance and failure diagnosis
- Enhance training reliability and resilience
- Implement fast checkpointing
- Improve computation and communication efficiency
- Manage GPU memory
- Optimize network and scheduling
- Optimize parallelism schemes
- Propose data driven optimization methods
- Research LLM training infrastructure
- Scale throughput on GPU clusters
- Translate research into production AI infrastructure
Perks/Benefits
- N/A
Skills/Tech-stack
Checkpointing | Data-Driven Optimization | Data-driven | Distributed Training | Fault Tolerance | GPU memory | GPU memory management | Language Models | Large Language Models | Memory Management | Network Optimization | Performance optimization | Reinforcement Learning | Scheduling
Education
N/A
Related jobs
-
Machine Learning Engineer USD 128K-260KArtificial Intelligence | Cache optimization | Inference | KV cache | KV cache optimizationSenior-level Full TimeSanta Clara, California, United States8h ago
-
Data Scientist / ML Engineer USD 170K-210KAWS | Azure | Bias Evaluation | Cloud Computing | Cloud platformFlexible working hours | Remote workSenior-level Full TimeNew York, NY, US, Remote R8h ago
-
AI Adoption and Implementation Lead USD 71K-85KAPIs | Analytics | Artificial Intelligence | CRM | Data PrivacyEntry-level Full TimeMenomonee Falls, WI, United States15h ago
-
Staff Software Engineer, AI Foundation Model USD 157K-240KBatch Processing | C++ | Data Engineering | Data Pipelines | Data Versioning401k match | Catered lunches | Coffee and tea | Dental insurance | Health insuranceSenior-level Full TimeBoston16h ago
-
GTM AI Engineer USD 192K-238KAPIs | Artificial Intelligence | Authentication | Claude | Cost controlsAdoption leave | Commuter benefits | Dental insurance | Disability insurance | ESPPMid-level Full TimeSan Mateo, CA, United States16h ago
-
Analytics Data Engineer USD 160K-200KAmazon Redshift | Analytics engineering | Assisted coding | DBT | Dagster401k | Commuter reimbursement | Dental insurance | Equipment provided | Flexible paid time offSenior-level Full TimeNew York City17h ago
-
Senior AI Engineer USD 180K-240KAPI Integration | Agentic Workflows | Embeddings | Evaluation | ExperimentationDental insurance | Medical insurance | Office spaceSenior-level Full TimeAtlanta, GA18h ago
-
API Integration | Agent Orchestration | Database Design | Fine Tuning | JavaScriptMid-level Full TimeNew York, New York, United States18h ago
-
ML Software Engineer, GenAI for Youth USD 207K-301KAdversarial Machine Learning | Automated testing | Benchmarking | Data Pipelines | Distributed ComputingSenior-level Full TimeMountain View, CA, USA19h ago
-
Software Engineer, AI/ML, Ads Data USD 147K-211KC++ | Data Structures | Data Structures and Algorithms | Debugging | Distributed ComputingMid-level Full TimeLos Angeles, CA, USA19h ago
-
Lead GenAI Forward Deployed Engineer, YouTube USD 186K-270KAI Safety | Agent systems | Agentic Frameworks | Applied Artificial Intelligence | Artificial IntelligenceSenior-level Full TimeSan Bruno, CA, USA; Mountain View, …19h ago
-
Senior Software Engineer, Generative AI, gUP USD 174K-253KA/B | A/B Testing | B testing | Computer Vision | Data Analysis24x7 support rotation | Bonus target | Equity | Health and wellness benefitsSenior-level Full TimeSunnyvale, CA, USA19h ago
-
Senior Software Engineer, Gen AI GCP Data Analytics USD 174K-253KAgents | BigQuery | Cloud platform | Data Warehousing | DebuggingSenior-level Full TimeKirkland, WA, USA19h ago
-
Algorithm Design | C++ | Deep learning | Foundation Models | Imitation LearningSenior-level Full TimeFoster City, CA1d ago
-
Associate Director, Data Analytics Delivery & Operations USD 215K-261KAWS | Agile | Amazon Web Services | Analytics | Business Intelligence401 K | Dental insurance | Holidays | Medical insurance | Paid time offMid-level Full TimeMettawa, IL, United States1d ago
-
AI Solutions Engineer, Talent Acquisition USD 129K-171KAPIs | Access Control | Agentic Workflows | Audit trails | AuthenticationMid-level Full TimeSeattle, Washington, United States1d ago
-
US Tech - AI Engineering Senior Associate USD 55K-187KAutomated testing | CI/CD | Data Pipelines | Data Quality | Generative AISenior-level Full TimeTPA ESC-4040 W Boy Scout Blvd, …1d ago
-
AI Research Engineer USD 100K-150KAccelerator hardware | Agentic Systems | Computer Vision | Data Quality | Data quality monitoringMid-level Full TimeUnited States - Remote R1d ago
-
AI Research Engineer USD 100K-150KAblation Studies | Accelerator hardware | Computer Vision | Data Quality | Data quality monitoringCareer growth | Remote workMid-level Full TimeUnited States - Remote R1d ago
-
AI Research Engineer USD 100K-150KAccelerator hardware | Computer Vision | Data Quality | Deep learning | Distributed TrainingBenefits package | Remote workMid-level Full TimeUnited States - Remote R1d ago
-
AI Software Engineer USD 151K-332KC++ | CUDA | CUDA kernels | CUDA profiling | Cache ManagementCommunity involvement | Health benefits | Hybrid work options | In-person work options | Remote work optionsMid-level Full TimeSeattle (WA), United States1d ago
-
AI and Workflow Automation Developer USD 71K-150KAgent Development | Agent systems | Agentic Workflows | Artificial Intelligence | C5ISR IntegrationFlexible time off | Learning resourcesMid-level Full Time345 FAYETTEVILLE NC, United States1d ago
-
Sr Software Development Engineer - AI Engineering USD 92K-185KCloud platform | Generative AI | Google Cloud | Google Cloud Platform | Language ModelsDental insurance | Medical insurance | Paid time off | Retirement savings options | Vision insuranceSenior-level Full TimeWork At Home-Texas, United States1d ago
-
Senior Software Engineer - Generative AI USD 97K-161KAI Services | APIs | Attention Mechanisms | Automated testing | Azure AISenior-level Full TimeBuffalo, NY, United States1d ago
-
Mid-level Full TimeUnited States - Remote R1d ago