Research Engineer - LLM Infra training - Seed Infra
Seattle, Washington, United States
USD 232K-427K Mid-level Full Time
Tasks
- Analyze performance bottlenecks and propose optimization methods
- Conduct research and development on large scale LLM training infrastructure
- Design and optimize distributed training strategies for LLMs
- Investigate system reliability and resilience techniques
- Manage GPU memory during training
- Optimize network and scheduling for training workloads
- Translate research ideas into scalable production AI infrastructure
Perks/Benefits
- N/A
Skills/Tech-stack
Checkpointing | Data-Driven Optimization | Data-driven | Deep learning | Distributed Training | Fault Tolerance | GPU memory | GPU memory management | Language Models | Large Language Models | Memory Management | Network Optimization | Parallel Computing | Performance optimization | Reinforcement Learning | Scheduling | System Reliability | Throughput Optimization
Education
N/A
Related jobs
-
Software Engineer, Generative AI, Workspace USD 147K-211KC++ | Distributed Systems | Generative AI | Information Retrieval | Integration TestingBenefits | Bonus | EquityMid-level Full TimeBoulder, CO, USA3h ago
-
Staff Software Engineer, Machine Learning, Google Chat USD 207K-300KAgentic Workflows | Caching | Cloud Spanner | Continuous Delivery | Continuous integrationSenior-level Full TimeSunnyvale, CA, USA3h ago
-
Software Engineer III, Database Internals AlloyDB USD 147K-211KACID | C# | C++ | CAP Theorem | Compiler TheoryEntry-level Full TimeSunnyvale, CA, USA3h ago
-
AI/ML Engineer 2 USD 101K-165KAI Agents | API Development | AWS | Azure | CI/CDDisability insurance | Family leave | Flexible spending accounts | Life and AD D Insurance | Medical/Dental/Vision insuranceSenior-level Full TimePhiladelphia, PA, US, 191039h ago
-
Applied AI ML Engineer-Vice President USD 150K-210KAWS Bedrock | AWS SageMaker | Amazon EKS | AutoPrompt | DDPBackup childcare | Financial coaching | Health care coverage | Mental health support | On-site health and wellness centersExecutive-level Full TimeNew York, NY, United States13h ago
-
Applied AI ML Engineer-Senior Associate USD 175K-210KAWS Bedrock | AWS SageMaker | Amazon EKS | Containerization | Data PreprocessingSenior-level Full TimeNew York, NY, United States13h ago
-
AI/ML Scientist/Developer USD 115K-130KCloud Computing | Containerization | Data integration | Deep learning | Differential Equations401k match | Dependent Care Assistant Program | Educational benefits | Employee referral bonus | Flexible spending accountsMid-level Full TimeFrederick, MD1d ago
-
Data Scientist (Generative AI) USD 125K-160KAWS | AWS Bedrock | AWS SageMaker | Adversarial Networks | Attention MechanismsEntry-level Full TimeMcLean, VA, United States1d ago
-
Amazon Web Services | Apache Airflow | Apache Flink | Apache Kafka | Apache SparkSenior-level Full TimeChicago, Illinois, United States1d ago
-
Amazon Web Services | Apache Airflow | Apache Flink | Apache Kafka | Apache SparkBonus | Distributed work | Equity | Full-timeSenior-level Full TimeNew Jersey, New Jersey, United States1d ago
-
AWS | Apache Airflow | Apache Flink | Apache Kafka | Apache SparkSenior-level Full TimeBoston, Massachusetts, United States1d ago
-
Amazon Web Services | Apache Airflow | Apache Flink | Apache Kafka | Apache SparkSenior-level Full TimeStamford, Connecticut, United States1d ago
-
Amazon Web Services | Apache Airflow | Apache Flink | Apache Kafka | Apache SparkSenior-level Full TimeConnecticut, Connecticut, United States1d ago
-
Airflow | Amazon Web Services | Apache Flink | Apache Kafka | Apache SparkBonus | EquitySenior-level Full TimeNew York, New York, United States1d ago
-
Amazon Web Services | Apache Airflow | Apache Flink | Apache Kafka | Apache SparkSenior-level Full TimeCharlotte, North Carolina, United States1d ago
-
AWS | Apache Airflow | Apache Flink | Apache Kafka | Apache SparkBonus | EquitySenior-level Full TimeFlorida, Florida, United States1d ago
-
Amazon Web Services | Apache Airflow | Apache Flink | Apache Kafka | Apache SparkSenior-level Full TimeWashington D.C., District of Columbia, United …1d ago
-
Amazon Web Services | Apache Airflow | Apache Flink | Apache Kafka | Apache SparkSenior-level Full TimeSan Jose, California, United States1d ago
-
Amazon Web Services | Apache Airflow | Apache Flink | Apache Kafka | Apache SparkSenior-level Full TimeSan Francisco, California, United States1d ago
-
AWS | Apache Airflow | Apache Flink | Apache Kafka | Apache SparkSenior-level Full TimeManhattan Beach, California, United States1d ago
-
Amazon Web Services | Apache Airflow | Apache Flink | Apache Kafka | Apache SparkBonus | Equity | Full-time employmentSenior-level Full TimeTexas, Texas, United States1d ago
-
Amazon Web Services | Apache Airflow | Apache Flink | Apache Kafka | Apache SparkBonus | EquitySenior-level Full TimeMenlo Park, California, United States1d ago
-
Amazon Web Services | Apache Airflow | Apache Flink | Apache Kafka | Apache SparkSenior-level Full TimeChicago, Illinois, United States1d ago
-
Amazon Web Services | Apache Airflow | Apache Flink | Apache Kafka | Apache SparkSenior-level Full TimeMassachusetts, Massachusetts, United States1d ago
-
Amazon Web Services | Apache Airflow | Apache Flink | Apache Kafka | Apache SparkSenior-level Full TimeAustin, Texas, United States1d ago