Research Scientist / Engineer – Training Infrastructure
Palo Alto, CA, Remote - International, London, UK
R
USD 200K-300K (estimate) Senior-level Full Time
Tasks
- Build monitoring and debugging tools
- Design train distributed systems
- Implement parallelization techniques
- Optimize training stability and resource utilization
Perks/Benefits
- N/A
Skills/Tech-stack
CUDA | Containerization | Distributed Systems | GPU clusters | Linux | MPI | NCCL | Networking | Orchestration | PyTorch | Scripting
Education
Bachelor of Engineering | Bachelor of Science | Master of Science | PhD
Roles
Engineer | Research Engineer | Research Scientist | Scientist
Regions
Countries
States
Related jobs
-
Automation Testing | CI/CD | CSS | Cypress | Feature DevelopmentMedical, dental & vision coverage | Paid time off | Parental leave | Reimbursement programs | Retirement planMid-levelRaleigh, United States R6d ago
-
Senior Data Engineer (Core Data Platform) USD 130K-185KAWS | Alerting | Amazon Redshift | Apache Airflow | Apache IcebergDental insurance | Equity | Flexible PTO | Home office stipend | Lifestyle Savings AccountSenior-level Full TimeRemote - US R2d ago
-
Distributed Storage Software Developer Engineer USD 101K-219KBlock Storage | C# | C++ | CAP Theorem | CI/CDDental insurance | Disability coverage | Employee assistance program | Flexible time off | Health insuranceSenior-level Full TimeUSA Remote Worksite, United States R2d ago
-
ML Platform / MLOps Engineer USD 180K-250KCI/CD | Cloud Computing | Data Pipelines | Docker | GCPGrowth opportunities | Health insurance | Paid time offMid-level Full TimeEmeryville, California, United States; Hybrid (2-3 … R2d ago
-
Senior Machine Learning Engineer, AI Platform USD 160K-235KA/B | A/B Testing | B testing | Batch inference | CI/CDHealth insurance | Learning and development budget | Retirement plan | Virtual team activities | Wellness supportSenior-level Full TimeSan Francisco, CA; USA (Remote) R2d ago
-
Founding Staff ML Engineer (Tech Lead) USD 190K-260KData Processing | Deep learning | Fine Tuning | LLM | ML deploymentSenior-level Full TimePalo Alto, USA Remote, New York R3d ago
-
Staff Software Engineer (Fulfillment Automation) USD 205K-230KAI | Build systems | Cloud Native | Containerization | Developer toolsFlexible time off | Health insurance | Parental leave | Remote work | Stock optionsSenior-level Full TimeUnited States - Remote R3d ago
-
AI | Autonomous Systems | Chatbots | Deep learning | Hugging FaceCareer growth opportunities | Remote workSenior-level Full TimeUnited Kingdom, REMOTE, United Kingdom R3d ago
-
Principal Data Engineer USD 142K-200KAPI Development | AWS | Airflow | Cassandra | Data ArchitectureDental coverage | Gym reimbursement | Health insurance | Leadership programs | Mental health supportSenior-level Full TimeRemote, US R3d ago
-
Research Software Engineer, AI/ML USD 110K-130KCI/CD | Containerization | GPU Computing | Git | Inference frameworksProfessional development opportunities | University shared governanceMid-level Full TimeBlacksburg, Virginia, Hybrid R3d ago
-
Software Engineer GBP 62K-72KCloud Computing | Datadog | Distributed Systems | Golang | KubernetesCollaborative environment | Flexible workingSenior-level Full TimeGB Remote United Kingdom R3d ago
-
Embedded Software Engineer [Remote Eligible] USD 112K-125KAI tools | Buildroot | C# | C++ | ChatGPTFlexible work schedule | Health benefits | Paid parental leave | Professional development opportunities | Retirement planMid-level Full TimeScottsdale, AZ, United States R3d ago
-
AI infrastructure | APIs | Distributed Systems | Embedding pipelines | Language ModelsCollaborative hybrid work environment | Health, dental, vision coverage | Impactful work on real-world AI systems | Relocation supportMid-level Full TimeSan Francisco, CA; Hybrid R3d ago
-
Machine Learning Engineer, Agentic AI USD 138K-232KLangchain | Langgraph | ML Infrastructure | Multi-step reasoning | PyTorchEquity options | Remote workSenior-level Full TimeRemote-USA, United States R3d ago
-
Data Engineer (US Remote) USD 75K-90KAgile | CI/CD | Data Pipelines | Data Transformations | DatabricksChallenging projects | Flexible schedule | Health insurance | Life insurance | Modern officeMid-level Full TimeUnited States - Remote R3d ago
-
Sr. Distinguished, Software Engineer - Enterprise Data Storage and Consumption Platforms - Remote-Eligible USD 286K-326KAPI Development | AWS | BFF Architecture | Cloud Computing | Distributed SystemsFinancial benefits | Health benefits | Inclusive work environment | Remote workSenior-level Full TimeRichmond, VA, United States R3d ago
-
Machine Learning Engineer USD 160K-200KAgent Frameworks | BigQuery | Cloud Functions | Cloud GCP | Cloud RunFlexible hours | Health insurance | Remote work | Skill development opportunitiesMid-level Full TimeLos Angeles, CA; Remote (United States) R3d ago
-
Applied AI Engineer - Federal (TS Required) USD 160K-250KAirflow | Chroma | CrewAI | Data Generation | Deep learningCareer development | Health insurance | Paid time off | Work in federal security environmentsMid-level Full TimeUnited States (Remote); Washington, D.C. (Remote) R3d ago
-
Applied AI Engineer - AI Solutions USD 172K-300KAPI Development | Chroma | CrewAI | Data Processing | Deep learningCareer growth opportunities | Equity | Flexible work options | Learning and development opportunitiesMid-level Full TimeNew York City, NY (Hybrid); Redwood … R3d ago
-
Director, Engineering – Applied AI USD 322K-387KAI AI pipelines | AI Pipelines | AI/AI | API Development | Agent OrchestrationExecutive-level Full TimeRemote, United States R4d ago
-
Senior MLOps Engineer USD 180K-200KData Structures | Design Principles | Distributed Systems | Docker | GoEquity participation | Flexible spending account | Health savings account | Healthcare dental vision coverage | Paid parental leaveSenior-level Full TimeRemote, USA R4d ago
-
Senior Staff Software Engineer, Data Platform USD 253K-298KBatch Processing | Change Data Capture | Cloud infrastructure | Data Capture | Data EngineeringCollaborative work environment | Remote participation supportSenior-level Full TimeRemote - USA R4d ago
-
Senior Machine Learning Engineer, Trust USD 191K-223KAPI Development | Airflow | Algorithm Development | Anomaly Detection | C++Career development | Health insurance | Paid time off | Remote workSenior-level Full TimeRemote-USA R4d ago
-
AI Engineer GBP 50K-65KA/B | A/B Testing | AI code assistants | Async/Await | AzureBonus opportunities | Career growth | Flexible benefits | Group income protection | Holiday CarryoverMid-level Full TimeUnited Kingdom R4d ago
-
Mid-level Full TimeUnited States R4d ago