Research Infrastructure Engineer, Training Systems
Tasks
- Build large scale model training infrastructure
- Debug Python PyTorch distributed GPU networking and storage issues
- Design training workflow APIs and interfaces
- Improve reliability across training and data pipelines
- Write tests benchmarks and diagnostics for training workloads
Perks/Benefits
- N/A
Skills/Tech-stack
API Design | Benchmarking | Debugging | Distributed Systems | GPU Computing | Networking | Performance optimization | PyTorch | Python | Storage | Testing
Education
N/A
Regions
Countries
States
Related jobs
-
GenAI Engineer USD 93K-163KAWS Bedrock | Agentic Workflows | C++ | CI/CD | CohereHealth and wellness benefits | Mentorship | Professional developmentEntry-level Full TimeArlington/Rosslyn, Virginia, United States1h ago
-
Senior GenAI Engineer USD 102K-171KAPI Development | AWS Bedrock | Agentic Workflows | CI/CD | CohereSenior-level Full TimeArlington/Rosslyn, Virginia, United States1h ago
-
Computer Vision | Data Pipelines | Language Models | Language Processing | Large Language ModelsSenior-level Full TimeBellevue, WA | Menlo Park, CA2h ago
-
Senior Software Engineer, Cloud Databases USD 174K-252KAnalytical processing | Benchmarking | C++ | Cloud Databases | Cloud platformSenior-level Full TimeKirkland, WA, USA2h ago
-
C++ | Clustering | Data Pipelines | Data Processing | DebuggingSenior-level Full TimeMountain View, CA, USA2h ago
-
Technical Lead, AI/ML Storage USD 207K-300KAI/ML | AI/ML frameworks | Artificial Intelligence | Benchmarking | Cloud MLHealth insurance | Paid time off | Professional development | Retirement benefitsSenior-level Full TimeSeattle, WA, USA2h ago
-
Artificial Intelligence | Computer Vision | Computer vision models | Data Processing | Data StorageSenior-level Full TimeSunnyvale, CA, USA2h ago
-
Data Engineer, Global Business and Operations USD 130K-187KBigQuery | Data Governance | Data Marts | Data Modeling | Data PipelinesMid-level Full TimeNew York, NY, USA2h ago
-
Sr. Machine Learning Engineer USD 91K-177KAlgorithms | Anomaly Detection | Apache Airflow | Data Analysis | Deep learning401k plan | Employee recognition | Employee stock purchase plan | Health insurance | Paid time offSenior-level Full TimeIrvine, CA, US4h ago
-
Staff Software Engineer - Core Ingest USD 191K-224KAgile Development | Apache Kafka | Distributed Systems | Docker | Fault ToleranceHealth insurance | Paid time off | Remote work optionsSenior-level Full TimeUnited States, Remote R9h ago
-
Staff Software Engineer - Data Query USD 191K-224KAgile | Automated testing | Big Data | C++ | Data StructuresSenior-level Full TimeUnited States, Remote R9h ago
-
ArcGIS Pro | Arcpy | Bokeh | Dash | GDAL401k | Dental insurance | Health insurance | Vision insuranceSenior-level Full TimeFayetteville, North Carolina, United States12h ago
-
ArcGIS Pro | Arcpy | Bokeh | Dash | GDAL401k | Dental insurance | Health insurance | TS/SCI clearance | Vision insuranceSenior-level Full TimeSneads Ferry, North Carolina, United States12h ago
-
Data Engineer - Mid-Level USD 130K-160KAirflow | Automated Deployment | Automated testing | CI/CD | Control workflows401k matching | Dental insurance | Health insurance | Lunch and snacks provided | Maternity & paternity leaveMid-level Full TimeEl Segundo, California, United States12h ago
-
Staff Engineer, Machine Learning USD 196K-269KCamera | Computer Vision | Convolutional Neural Networks | DETR | Deep Neural Networks401k employer match | Dental insurance | Life insurance | Long-term disability | Medical insuranceSenior-level Full TimeMountain View, CA12h ago
-
Software Engineer – Surgical Robot Manufacturing USD 127K-192KAutomated testing | Control Systems | HTML | JSON | JavaScriptMid-level Full TimeSunnyvale, CA, United States12h ago
-
Senior-level Full TimeRemote - United States R13h ago
-
Senior Manufacturing Analytics Engineer USD 115K-140KChemometrics | Data Preparation | Descriptive Analytics | Feature Engineering | Machine LearningComprehensive benefits | Medical benefits | Sick leave | Travel up to 15 percentSenior-level Full TimeWayzata, Minnesota, US United States, 5539113h ago
-
Senior Software Engineer - Experiment Platform USD 159K-235KA/B | A/B Testing | B testing | Data Pipelines | Data Quality401k plan | Basic life insurance | Dental insurance | Flexible time off | Long-Term Disability coverageSenior-level Full TimeSeattle, Washington, United States13h ago
-
Sr. Applied AI Engineer USD 160K-200KAPIs | Cloud Computing | Generative AI | Machine Learning | PythonHome-office equipment | Hybrid work model | Self-development budget | Top of market equity and cash compensation packageSenior-level Full TimeNew York Office13h ago
-
Applied AI Engineer USD 110K-160KGenerative AI | Machine Learning | Python | REST APIs | SQLEquipment provided | Hybrid work | Mentorship | Self-development budgetMid-level Full TimeNew York Office14h ago
-
Machine Learning Engineer, Foundation Model USD 129K-247KAuto-regressive models | C++ | Data Pipelines | Deep learning | Diffusion ModelsSenior-level Full TimeSan Jose14h ago
-
Senior Embedded Engineer - Connectivity USD 180K-300KC# | C++ | Communication Protocols | Data-driven | Data-driven methodsCommuter benefits | Fertility stipend | Flexible PTO | Health and wellness benefits | Healthy lunches provided dailySenior-level Full TimeSan Mateo, CA United States14h ago
-
Senior Data Engineer (Health Research Team) USD 104K-199KAWS EC2 | AWS S3 | Apache Spark | Azure | Data Governance401k plan | Employee Assistance Program (EAP) | Family building benefits | Flexible Spending Accounts FSA | Flexible work arrangementsSenior-level Full TimeChicago, Illinois, United States R15h ago
-
AWS | Azure | CI/CD | Cloud platform | Data PipelinesLong-term contractSenior-level Contract Full TimeDallas, TX, United States15h ago