Principal AI and ML Infra Software Engineer, GPU Clusters
USD 272K-431K Senior-level Full Time
Tasks
- Collaborate with AI and ML research teams to define infrastructure requirements
- Define and measure AI researcher efficiency metrics
- Develop AI and ML infrastructure ecosystem with cross functional teams
- Identify researcher efficiency bottlenecks and improve workflows
- Integrate new AI and ML frameworks and strategies
- Optimize infrastructure performance for high availability scalability and resource utilization
Perks/Benefits
- N/A
Skills/Tech-stack
AWS | Amazon EFA | Azure | Bash | BeeGFS | Containerization | DDP | Data Processing | Distributed Training | Docker | Enroot | FSDP | GCP | GPFS | GPU | Go | HPC | High Performance | High-Performance Computing | Inference | Infiniband | JAX | Kubernetes | Lustre | Model Training | NEMO | Performance Computing | PyTorch | Python | RoCE | Slurm | Storage
Education
Roles
AI | AI and ML Infrastructure Software Engineer | AI and Machine Learning Infrastructure Engineer | Engineer | Infrastructure Engineer | Infrastructure Software Engineer | ML Infrastructure Software Engineer | Machine Learning Infrastructure Engineer | Principal | Principal AI and ML Infrastructure Software Engineer | Software Engineer
Regions
Countries
States
Cities
Related jobs
-
Senior-level Full TimeSan Jose, CA, United States8h ago
-
Senior Machine Learning Engineer - Cybersecurity USD 80K-200KAnomaly Detection | Behavioral analytics | Cyber Threat | Cyber Threat Detection | CybersecuritySenior-level Full TimeSan Jose, CA, United States8h ago
-
Lead AI Engineer - AI & Credit Analytics USD 156K-234KAWS | CI/CD | Data Governance | Generative AI | LLMOpsFlexible time off | Flexible work environment | Hybrid work option | Matching 401k | Medical/Dental/Vision insuranceSenior-level Full TimeCosta Mesa, CA, United States R8h ago
-
Sr. Delivery Acceleration AI Engineer USD 137K-241KA/B | A/B Testing | AI Agents | API Design | Artificial IntelligenceSenior-level Full TimeAtlanta, Georgia , United States10h ago
-
Generative AI Consultant USD 105K-105KAWS | Azure | CI/CD | Chroma | Cloud platform401k match | Flexible spending account | Healthcare coverage | Paid time off | Parental leaveMid-level Full TimeSan Francisco, CA, United States12h ago
-
Mid-level Full TimeKing George, VA, United States14h ago
-
Data Engineer USD 130K-145KApache Spark | CI/CD | Cloud platform | Containerization | Data GovernancePublic trust clearance support | Remote workSenior-level Full TimeWork from home, VA, United States R14h ago
-
DNS | FC | Fibre Channel | Isilon | LinuxRemote work | Unlimited growthSenior-level Full TimeUnited States, United States R15h ago
-
Artificial Intelligence | Competency Mapping | Content Review | Curriculum Development | Data ScienceCross-cultural work experience | Free in-house training | Networking opportunities | Opportunities to develop as a public expert | Remote workSenior-level Part TimeBoston, US18h ago
-
Artificial Intelligence | Data Science | Language Models | Large Language Models | Machine LearningFree training | Networking | Professional development opportunities | Remote workSenior-level Part Timegeorgia, georgia, GE18h ago
-
Research Scientist – TikTok Recommendation(NextGen LLM) - Global Frontier Tech Recruitment Program - 2027 Start (PhD) USD 136K-237KAgentic AI | Computer Vision | Deep learning | Language Models | Language ProcessingEntry-level Full TimeSan Jose, California, United States18h ago
-
Senior Engineer, Big Data USD 90K-140KAmbari | Apache HBase | Apache Hive | Apache Impala | Apache KafkaSenior-level Full TimeUnited States18h ago
-
Senior Software Engineer, Generative AI, Ads Safety USD 174K-252KAI Agents | Algorithms | C++ | Computer Vision | Context learningSenior-level Full TimeMountain View, CA, USA19h ago
-
Senior Staff Software Engineer, AI/ML, Security USD 262K-365KAdversarial Machine Learning | Artificial Intelligence | Cloud Architecture | Cloud Computing | Data PrivacySenior-level Full TimeKirkland, WA, USA; Seattle, WA, USA19h ago
-
C++ | Compute Optimization | Deep learning | GPU | JAXSenior-level Full TimeSunnyvale, CA, USA; Kirkland, WA, USA19h ago
-
Software Engineer, ML Fleet Intelligence USD 207K-300KAnomaly Detection | Artificial Intelligence | Data Processing | Data Structures | Data Structures and AlgorithmsSenior-level Full TimeSunnyvale, CA, USA19h ago
-
AI Governance | Agent systems | Architecture | Context engineering | Data SovereigntySenior-level Full TimeChicago, IL, USA; Atlanta, GA, USA19h ago
-
Forward Deployed Engineer II, Applied AI, Cloud USD 127K-183KAPIs | Agent systems | Agentic Workflows | Conversational AI | DebuggingTravel 50% timeSenior-level Full TimeNew York, NY, USA; Atlanta, GA, …19h ago
-
Data Engineer, Geo 3P and Developer Data Operations USD 106K-151KArtificial Intelligence | BigQuery | C++ | Cloud Dataflow | Data MiningEntry-level Full TimeMountain View, CA, USA19h ago
-
Automation | Business Strategy | Content Moderation | Cybersecurity | DashboardsSenior-level Full TimeMountain View, CA, USA19h ago
-
Bioinformatics | Biomarker Analysis | CRISPR | ChIP-seq | Clinical trialDental insurance | Health care | PTO | Retirement | Sick leaveSenior-level Full TimeCambridge, Massachusetts, US22h ago
-
Senior Data Engineer USD 150K-220KAmazon Redshift | Apache Hudi | Apache Iceberg | Apache Pinot | Autovacuum401k match | Dental insurance | Hardware provided | Health insurance | Unlimited PTOSenior-level Full TimeNew York1d ago
-
Staff Engineer, Datacenter Server Lifecycle USD 320K-405KAWS | Asset tracking | Coreboot | Decommissioning | Firmware verificationFlexible working hours | Generous vacation | Hybrid work policy | Optional equity donation matching | Parental leaveSenior-level Full TimeSan Francisco, CA | New York …1d ago
-
Principal Scientist – AI/ML Specialization - WFH1651 USD 145K-251KBias correction | Computational Efficiency | Data Bias Correction | Data Imputation | Data Noise ReductionAir Gapped Linux Environment | Remote work | Working on Resource Constrained Edge HardwareSenior-level Full TimeReston, VA - Remote R1d ago
-
AI/ML Engineer, Senior - WFH1650 USD 128K-201KCPU Inference | Class imbalance | Data Analysis | Data Preprocessing | Data QualityWork from homeSenior-level Full TimeReston, VA - Remote R1d ago