Research Engineer - LLM Infra training - Seed Infra
Seattle, Washington, United States
USD 232K-427K Mid-level Full Time
Tasks
- Analyze performance bottlenecks and propose optimization methods
- Conduct research and development on large scale LLM training infrastructure
- Design and optimize distributed training strategies for LLMs
- Investigate system reliability and resilience techniques
- Manage GPU memory during training
- Optimize network and scheduling for training workloads
- Translate research ideas into scalable production AI infrastructure
Perks/Benefits
- N/A
Skills/Tech-stack
Checkpointing | Data-Driven Optimization | Data-driven | Deep learning | Distributed Training | Fault Tolerance | GPU memory | GPU memory management | Language Models | Large Language Models | Memory Management | Network Optimization | Parallel Computing | Performance optimization | Reinforcement Learning | Scheduling | System Reliability | Throughput Optimization
Education
N/A
Related jobs
-
Computational Biologist - Protein Engineering USD 150K-250KAWS | Amazon Web Services | CUDA | Conda | Deep learningRelocation supportEntry-level Full TimeSan Francisco, CA, US1h ago
-
Checkpointing | Data-Driven Optimization | Data-driven | Distributed Training | Fault ToleranceMid-level Full TimeSan Jose, California, United States3h ago
-
Software Engineer, Generative AI, Workspace USD 147K-211KC++ | Distributed Systems | Generative AI | Information Retrieval | Integration TestingBenefits | Bonus | EquityMid-level Full TimeBoulder, CO, USA4h ago
-
Staff Software Engineer, Machine Learning, Google Chat USD 207K-300KAgentic Workflows | Caching | Cloud Spanner | Continuous Delivery | Continuous integrationSenior-level Full TimeSunnyvale, CA, USA4h ago
-
Software Engineer III, Database Internals AlloyDB USD 147K-211KACID | C# | C++ | CAP Theorem | Compiler TheoryEntry-level Full TimeSunnyvale, CA, USA4h ago
-
AI/ML Engineer 2 USD 101K-165KAI Agents | API Development | AWS | Azure | CI/CDDisability insurance | Family leave | Flexible spending accounts | Life and AD D Insurance | Medical/Dental/Vision insuranceSenior-level Full TimePhiladelphia, PA, US, 1910310h ago
-
Staff AI/ML Engineer USD 240K-270KAWS | Agentic Workflows | Cloud platform | Data Curation | Deep learning401k | Commuter benefits | Dog-friendly office | Equity | FSA benefitsSenior-level Full TimeSan Francisco, CA10h ago
-
Applied AI ML Engineer-Vice President USD 150K-210KAWS Bedrock | AWS SageMaker | Amazon EKS | AutoPrompt | DDPBackup childcare | Financial coaching | Health care coverage | Mental health support | On-site health and wellness centersExecutive-level Full TimeNew York, NY, United States14h ago
-
Applied AI ML Engineer-Senior Associate USD 175K-210KAWS Bedrock | AWS SageMaker | Amazon EKS | Containerization | Data PreprocessingSenior-level Full TimeNew York, NY, United States14h ago
-
AI Engineer USD 157K-283KAPI Integration | Agentic Workflows | Autogen | Cloud infrastructure | CrewAISenior-level Full TimeUSA MD Columbia (Field), United States15h ago
-
API Integration | Agent Orchestration | Amazon Bedrock | Angular | Autogen401k plan | Commuter benefits | Disability benefits | Life insurance | Paid time offExecutive-level Full Time110832-NY-30 Hudson Yards, New York, United …15h ago
-
AI/ML Scientist/Developer USD 115K-130KCloud Computing | Containerization | Data integration | Deep learning | Differential Equations401k match | Dependent Care Assistant Program | Educational benefits | Employee referral bonus | Flexible spending accountsMid-level Full TimeFrederick, MD1d ago
-
Data Scientist (Generative AI) USD 125K-160KAWS | AWS Bedrock | AWS SageMaker | Adversarial Networks | Attention MechanismsEntry-level Full TimeMcLean, VA, United States1d ago
-
Amazon Web Services | Apache Airflow | Apache Flink | Apache Kafka | Apache SparkSenior-level Full TimeChicago, Illinois, United States1d ago
-
Amazon Web Services | Apache Airflow | Apache Flink | Apache Kafka | Apache SparkBonus | Distributed work | Equity | Full-timeSenior-level Full TimeNew Jersey, New Jersey, United States1d ago
-
AWS | Apache Airflow | Apache Flink | Apache Kafka | Apache SparkSenior-level Full TimeBoston, Massachusetts, United States1d ago
-
Amazon Web Services | Apache Airflow | Apache Flink | Apache Kafka | Apache SparkSenior-level Full TimeStamford, Connecticut, United States1d ago
-
Amazon Web Services | Apache Airflow | Apache Flink | Apache Kafka | Apache SparkSenior-level Full TimeConnecticut, Connecticut, United States1d ago
-
Airflow | Amazon Web Services | Apache Flink | Apache Kafka | Apache SparkBonus | EquitySenior-level Full TimeNew York, New York, United States1d ago
-
Amazon Web Services | Apache Airflow | Apache Flink | Apache Kafka | Apache SparkSenior-level Full TimeCharlotte, North Carolina, United States1d ago
-
AWS | Apache Airflow | Apache Flink | Apache Kafka | Apache SparkBonus | EquitySenior-level Full TimeFlorida, Florida, United States1d ago
-
Amazon Web Services | Apache Airflow | Apache Flink | Apache Kafka | Apache SparkSenior-level Full TimeWashington D.C., District of Columbia, United …1d ago
-
Amazon Web Services | Apache Airflow | Apache Flink | Apache Kafka | Apache SparkSenior-level Full TimeSan Jose, California, United States1d ago
-
Amazon Web Services | Apache Airflow | Apache Flink | Apache Kafka | Apache SparkSenior-level Full TimeSan Francisco, California, United States1d ago
-
AWS | Apache Airflow | Apache Flink | Apache Kafka | Apache SparkSenior-level Full TimeManhattan Beach, California, United States1d ago