Director of AI Infrastructure
Tasks
- Decide when to burst to cloud vs invest in on prem capacity
- Develop storage roadmap for throughput and durability
- Direct strategy for Beaker orchestration platform
- Manage GPU compute budget and resource economics
- Optimize job scheduling for hybrid cloud workloads
- Oversee on prem GPU cluster availability and performance
- Partner with hardware vendors to meet infrastructure demands
- Provide technical bridge to research teams
Perks/Benefits
- 401k plan
- Annual bonuses
- Commuting support
- Employee assistance program
- Fitness and Wellbeing Support
- Health savings account
- Long-term incentive plan
- Medical/Dental/Vision
- Paid Holidays
- Paid sick leave
- Paid vacation
- Personal days
Skills/Tech-stack
AWS | Beaker | Ceph | Containerd | Distributed Systems | Docker | GCP | Go | HPC | High Performance | High-Performance Computing | Hybrid Cloud | Infiniband | Kube scheduler | Kubernetes | Linux | Lustre | NCCL | NVIDIA GPU | Performance Computing | Python | Resource Management | Slurm | Weka
Education
Bachelor of Engineering | Bachelor of Science | Master of Science
Related jobs
-
Head of AI USD 150K-300KAPI Integration | AWS | Artificial Intelligence | Automations | ChatGPT401k with company matching | Dental insurance | Fully remote work environment | Health insurance | Vision insuranceExecutive-level Full TimeAustin, Texas, Austin HQ, Dallas, Texas, … R12h ago
-
Manager, Content Engineering — AI Content Understanding USD 134K-196KA/B | A/B Testing | Annotation | B testing | Content labelingMid-level Full TimeMenlo Park, CA | New York, …15h ago
-
AI Architect, Partner Co-Innovation, Google Cloud USD 183K-265KAPI performance | Agent systems | AlloyDB | BigQuery | C++Senior-level Full TimeSunnyvale, CA, USA; Atlanta, GA, USA15h ago
-
Algorithms | Backend architecture | C++ | Cloud Spanner | Data StructuresMid-level Full TimeMountain View, CA, USA15h ago
-
Mid-level Full TimeSeattle22h ago
-
AWS | Agile | CI/CD | Cloud Computing | Cloud platformCareer development support | Flexible working models | Learning resources | Mentoring | Wellbeing prioritizationEntry-level InternshipSan Ramon, CA, US, 945 831d ago
-
Principal Deep Learning Communication Architect USD 272K-431K3D Parallelism | CUDA | Context Parallelism | Data parallelism | DeepSpeedSenior-level Full TimeUS, CA, Santa Clara, United States1d ago
-
AI Solutions Consultant USD 119K-206KADK | C++ | Chunking | Dialogflow CX | EmbeddingsHybrid work schedule | Up to 10 percent travelMid-level Full Time141753-NC-Three Wells Fargo Center, Charlotte, United …1d ago
-
Sr Director - Medical Affairs Analytics Product USD 169K-248KAgentic AI | Agile | Autonomous Systems | Computer Science | Data EngineeringSenior-level Full TimeUS: Indianapolis IN Tech Center South, …1d ago
-
Senior Java/AI Engineer- Assistant Vice President USD 107K-160KAutomated Build | Automated build and release | Autosys | Bitbucket | Blackduck401k | Accident insurance | Dental insurance | Disability insurance | Health insuranceSenior-level Full Time6400 LAS COLINAS BLVD IRVING, United …1d ago
-
AI Scientist USD 98K-123KCode Quality | Data pipeline | Deep learning | Forecasting | InferenceCareer development | Global opportunities | Pay transparencyMid-level Full TimeAtlanta, GA, United States, United States1d ago
-
APIs | AWS SageMaker | Azure AI | CI/CD | Data PipelinesHybrid workMid-level Full TimeCharlotte, NC1d ago
-
Manager AI Architect USD 139K-180KAI ethics | Artificial Intelligence | Azure | Azure DevOps | Data GovernanceSenior-level Full TimeCentral Tech Unit Plymouth MI, United …1d ago
-
Senior‑Level AI Engineer (Python) USD 155K-175KAgile | Automated testing | Confluence | Database Design | Database StorageSenior-level Full TimeAnnapolis Junction, MD1d ago
-
Mid‑Level AI Engineer (Python) USD 105K-125KAgile | Automated testing | Confluence | Data Transformation | Database DesignMid-level Full TimeAnnapolis Junction, MD1d ago
-
AI Scientist USD 98K-123KAzure | C plus plus | Cloud Platforms | Cloud platform | Google CloudCareer development | Global opportunities | Pay transparencyMid-level Full TimeAtlanta, GA, United States, United States1d ago
-
AI Algorithm Engineer USD 133K-240KC++ | Cloud Computing | Computer Vision | Data Engineering | Data ImbalanceMentorship | Networking | Professional growthMid-level Full TimeUSA - OR - Hillsboro, United …1d ago
-
AI/ML Scientist – Protein Foundation Models USD 120K-200KAWS | Alphafold | Attention Mechanisms | Diffusion Models | Distributed ComputingOn-site work | Relocation assistanceMid-level Full TimeBoston, MA or San Francisco, CA1d ago
-
AI/ML Scientist USD 120K-200KAutoregressive models | Cloud Computing | Computational Biology | Data Engineering | Deep learningOn-site work | Relocation supportMid-level Full TimeBoston, MA or San Francisco, CA1d ago
-
AI Automation Infrastructure Engineer USD 100K-176KAPI Development | APIs | Agent Frameworks | Anomaly Detection | AnsibleMid-level Full TimeUSA - Sandy, UT1d ago
-
Forward Deployment AI Engineer USD 120K-150KAgent Orchestration | Anthropic API | Autogen | Chroma | Embedding pipelinesHybrid work | Travel to client sitesMid-level Full TimeNaperville, IL1d ago
-
Staff AI Researcher USD 160K-210KApache Spark | Benchmarking | Data Processing | Data Transformation | DatabricksSenior-level Full TimeRemote, United States R1d ago
-
AWS | Azure | Cloud Architecture | Distributed Systems | GCPDirect exposure to enterprise AI deployments | No Travel | Office access | Open source contributions welcomedMid-level Full TimeNew York, NY, United States1d ago
-
Insights / Analytics Associate Director USD 139K-232KAdvanced Analytics | Business Intelligence | CRM | Dashboards | Data Visualization401k company match | Dental insurance | Health insurance | Life and disability benefits | Paid HolidaysMid-level Full TimeBedford, Massachusetts, United States1d ago
-
AI Engineer USD 140K-210KAWS | Airflow | Django | Docker | ECSCutting edge AI in financial operations | Equity | In-person team | Meaningful workMid-level Full TimeBoston, MA, US1d ago