Research Engineer - LLM Infra training - Seed Infra
Tasks
- Analyze performance bottlenecks and propose optimization methods
- Conduct research and development on large scale LLM training infrastructure
- Design and optimize distributed training strategies for LLMs
- Investigate system reliability and resilience techniques
- Manage GPU memory during training
- Optimize network and scheduling for training workloads
- Translate research ideas into scalable production AI infrastructure
Perks/Benefits
- N/A
Skills/Tech-stack
Checkpointing | Data-Driven Optimization | Data-driven | Deep learning | Distributed Training | Fault Tolerance | GPU memory | GPU memory management | Language Models | Large Language Models | Memory Management | Network Optimization | Parallel Computing | Performance optimization | Reinforcement Learning | Scheduling | System Reliability | Throughput Optimization
Education
N/A
Related jobs
-
Research Engineer - LLM Infra training - Seed Infra USD 244K-450KCheckpointing | Data Analysis | Distributed Training | Fault Tolerance | GPU memoryMid-level Full TimeSan Jose, California, United States5h ago
-
Causal Inference | Cross-modal fusion | DPO | Data Modeling | Deep learningMid-level Full TimeSeattle, Washington, United States6h ago
-
Machine Learning Engineer Graduate (E-Commerce Supply Chain & Logistics)- 2026 Start (BS/MS) USD 122K-256KData Mining | Deep learning | Knowledge graphs | Language Models | Language ProcessingEntry-level Full TimeSan Jose, California, United States6h ago
-
AI Models | Kubernetes | LLM Inference | Linux | Machine LearningSenior-level Full TimeSan Jose, California, United States6h ago
-
Agentic Systems | Architecture Design | Fine Tuning | Generative AI | Human FeedbackEntry-level Full TimeSan Jose, California, United States6h ago
-
Partner Engineering GenAI - US USD 140K-203KAPI Integration | Agent Orchestration | Artificial Intelligence | Bias Mitigation | C++Senior-level Full TimeMenlo Park, CA | Seattle, WA …7h ago
-
Computer Science Research - US - IC5 USD 166K-244KData Pipelines | Deep learning | Experimentation | Generative Models | Image-to-videoKnowledge sharing | Mentoring | Open source contributionsMid-level Full TimeBellevue, WA | Menlo Park, CA7h ago
-
API Design | Agentic Workflows | C plus plus | C# | Computer VisionSenior-level Full TimeRedmond, WA7h ago
-
Software Engineer III, AI/ML GenAI, Google Ads USD 147K-211KC++ | Data Processing | Data Storage | Debugging | Distributed ComputingSenior-level Full TimeMountain View, CA, USA7h ago
-
Software Engineer, AI/ML, Platforms and Devices USD 147K-211KAndroid | C plus plus | Data Processing | Debugging | Distributed SystemsMid-level Full TimeMountain View, CA, USA7h ago
-
Staff Software Engineer, YouTube Ads, AI/ML USD 207K-300KAlgorithms | Data Processing | Data Structures | Debugging | Distributed ComputingEmployee discounts | Health insurance | Paid time off | Professional development | Retirement plansSenior-level Full TimeMountain View, CA, USA7h ago
-
Embedded Software Engineer (Data Platform), Autonomy USD 175K-210KA Star | Agent planning | Airspace management | C# | C++Dental insurance | Equity compensation | Medical insurance | Paid time off | Performance bonusMid-level Full TimeSouth San Francisco, California, USA14h ago
-
3D Perception Engineer - Autonomy (Droid) USD 180K-265K3D Geometry | Aerial survey | Autonomy | CNN | Camera CalibrationBonus pay | Dental insurance | Equity compensation | Medical insurance | Paid time offMid-level Full TimeSouth San Francisco, California, USA14h ago
-
Autonomy Perception Engineer - CV / 3D Reconstruction USD 180K-265K3D Reconstruction | Camera Calibration | Computer Vision | Convolutional Neural Networks | Data AnnotationDental insurance | Equity compensation | Medical insurance | Paid time off | Vision insuranceMid-level Full TimeSouth San Francisco, California, USA14h ago
-
Senior Applied AI Engineer USD 182K-207KAPIs | Causal Inference | Data Pipelines | Data Storage | Distributed SystemsOnsite workSenior-level Full TimeSan Francisco HQ17h ago
-
API Design | C++ | Data Mining | Deep learning | Feature EngineeringSenior-level Full TimeMountain View, CA, USA; San Francisco, …19h ago
-
Senior Machine Learning Engineer, AI Personalization USD 194K-343KAWS | Agentic Engineering | Automated testing | Code generation | Data ExperimentationFlexible time off | Medical insurance | Modern family planning | Remote work | Retirement savings plansSenior-level Full TimeBay Area, CA, United States of …19h ago
-
Data Analytics Analyst USD 172K-202KAWS | Computer Vision | Data Analysis | Data Pipelines | Deep learningBackup childcare | Financial coaching | Health insurance | Mental health support | On-site health and wellness centersMid-level Full TimeNew York, NY, United States19h ago
-
Agentic AI | Information Retrieval | LLM Evaluation | Language Models | Language ProcessingFlexible work environment | Health benefits | Remote work optionsSenior-level Full TimeMountain View, CALIFORNIA, United States20h ago
-
Applied AI ML Engineer Associate USD 175K-215KAPI Integration | Autogen | Big Data | CI/CD | CloudFormationBackup childcare | Financial coaching | Health care coverage | Mental health support | Retirement savings planSenior-level Full TimeColumbus, OH, United States21h ago
-
Tech Lead, ML Engineer - AV Product engineering USD 175K-264KAction models | C++ | CUDA | Closed Loop | Closed Loop EvaluationHybrid work policy | Mentorship opportunities | On-site collaboration | Work from home flexibilitySenior-level Full TimeSunnyvale21h ago
-
Data Engineer USD 115K-120KAzure | Data Performance Optimization | Data Quality | Data Security | Data StorageEntry-level Full TimeRedmond, WA21h ago
-
Entry-level Full TimeRedmond, WA21h ago
-
Software Engineer, Machine Learning Infrastructure USD 190K-300KAWS Kinesis | AWS Lambda | AWS SageMaker | Amazon DynamoDB | Amazon EC2Cell phone and internet allowance | Childcare allowance | Dental insurance | Flexible time off | Health insuranceMid-level Full TimeSan Francisco, CA21h ago
-
Mid-level Full TimeAtlanta, GA | Kansas City, MO …22h ago