AI Operations & Infrastructure Engineer
Fort Meade, MD, United States, 20755
USD 184K-333K (estimate) Senior-level Full Time
Tasks
- Configure and administer logical and physical resources
- Configure and manage network topologies and out of band management
- Configure and optimize AI networking infrastructure
- Deploy and manage data processing units
- Diagnose and resolve networking issues
- Ensure efficient power and cooling for AI infrastructure
- Ensure secure efficient and scalable operation of AI infrastructure
- Implement and manage containerization technologies
- Implement workload management and scheduling
- Install and configure GPU drivers and software
- Lead deployment and validation of AI servers and systems
- Manage and maintain AI computing platforms
- Manage storage solutions for AI data
- Monitor and manage AI cluster health and resource utilization
- Monitor document and report cluster health and job performance
- Oversee AI software stack and tools
- Perform firmware upgrades and hardware validation
- Provide technical support for AI infrastructure teams
- Replace faulty components and optimize systems
- Troubleshoot hardware software storage and performance faults
Perks/Benefits
- N/A
Skills/Tech-stack
Base Command Manager | Command Manager | Data Processing | Data Processing Unit DPU | Docker | Ethernet | Infiniband | Kubernetes | NVIDIA Base Command Manager | NVIDIA GPU | Network Protocols | Nvidia Base Command | Run | Slurm | Storage Administration
Education
N/A
Related jobs
-
MLOps Engineer ID72409 USD 120K-171KCI/CD | Cloud Computing | Docker | Drift Detection | Experiment trackingEducation budget | Exciting projects | Fitness budget | Flexible schedule remote and office options | FlextimeMid-level Full TimeDallas, United States9h ago
-
Senior AI Engineer / Data Scientist USD 110K-170KApache Spark | Automated Deployment | CI/CD | Databricks | DockerRemote workSenior-level ContractChandler, United States9h ago
-
Cloud Computing | Data Pipelines | Data Processing | Deep learning | Distributed dataHealth insurance | Other Perquisites | Paid time off | Retirement contributionsMid-level Full TimeBillerica, Massachusetts, US, 0182116h ago
-
CAN | DNP3 | Data Visualization | Distributed Logging | DockerSenior-level Full TimeSan Francisco, California, United States19h ago
-
AWS | AWS CDK | AWS CodeBuild | AWS CodePipeline | AWS Glue401k | Healthcare benefits | Paid time off | Phone stipend | Wellness benefitsSenior-level Full TimeSan Carlos - Hybrid R21h ago
-
Senior Machine Learning Engineer, Computer Vision/VLM USD 204K-259KAI Feedback | Computer Vision | Data Processing | Data Processing Pipelines | Deep learningSenior-level Full TimeMountain View, CA, USA; San Francisco, …22h ago
-
AWS | Azure | Azure DevOps | CI/CD | Cloud platform401k plan | Commuter benefits | Dental insurance | Employee assistance programs | Employee stock purchase planSenior-level Full TimeUS - Grand Island, NY - …22h ago
-
Senior Principal Engineer Software (Embedded) USD 122K-184KAgile | Atlassian | Authentication | Bash | C#Relocation assistance | TravelSenior-level Full TimeFLME228, United States22h ago
-
Senior-level Full Time100 New Millennium Way, Bldg 1, …22h ago
-
Mid-level Full Time1 New York Plaza, United States22h ago
-
AI/ML Engineer (Senior Associate) - Remote USD 130K-188KAWS | Autogen | Azure | CI/CD | CrewAIDental insurance | Health insurance | Incentive compensation program | Remote work | Travel as neededSenior-level Full TimeChicago - 550 Van Buren, United … R22h ago
-
Senior AI/ML Engineer USD 107K-216KAWS | Analytics | Async I/O | Backpressure | CI/CDEducational assistance | Emotional well-being support | Employee match program | Health care coverage | Learning resourcesSenior-level Full Time245 Summer St, Boston MA, United … R22h ago
-
DataOps Engineer USD 62K-141KAWS | AWS Glue | AWS Lambda | AWS Managed Workflows for Apache Airflow | AWS S3Dependent care | Paid leave | Professional development | Tuition assistance | Work-life programsMid-level Full TimeUSA, VA, McLean (8283 Greensboro Dr, …22h ago
-
AI & ML Engineer USD 99K-225KArtificial Intelligence | Containerization | Docker | Language Models | Language ProcessingDependent care | Paid leave | Professional development | Tuition assistance | Work-life programsMid-level Full TimeUSA, VA, Chantilly (14101 Newbrook Dr), …22h ago
-
Senior Software Engineer - Embedded Software USD 86K-165KAgile | AppArmor | Artifactory | C# | C++Relocation assistanceSenior-level Full TimeUS-TX-MCKINNEY-513WD ~ 2501 W University Dr …22h ago
-
AI Application Engineer USD 133K-204KAPIs | Airflow | Artificial Intelligence | CI/CD | DockerCareer development | Global opportunities | Pay transparencyMid-level Full TimeAtlanta, GA, United States, United States22h ago
-
AI Agent Builder | Agent Builder | BigQuery | Cloud platform | Distributed SystemsFlexible work arrangementsSenior-level Full TimeSanta Clara, CA22h ago
-
AI/ML Infrastructure Engineer USD 176K-234KAWS EKS | Amazon Web Services | Benchling | CI/CD | Data EngineeringOn-site collaboration | Relocation support (if eligible)Mid-level Full TimeBoston, Massachusetts, United States, San Francisco, …23h ago
-
Senior Business Intelligence and Analytics Engineer USD 100K-155KAirflow | BigQuery | Command Line | DBT | Data Catalog401k matching | Company offsite | Dental insurance | Employee wellness | Free therapySenior-level Full TimeUS - Remote R23h ago
-
Sr Data Platform Engineer - MongoDB USD 114K-152KAWS | Aggregation Pipeline | Ansible | Backup and Recovery | Bash401k matching | Accident and life insurance | Dental insurance | Education reimbursement | Health insuranceSenior-level Full TimeOffice Location or Remote - USA R23h ago
-
Data Engineer, Machine Learning USD 170K-240KApache Airflow | Apache Spark | Cloud infrastructure | Dagster | Data Drift401k match | Dental insurance | Employee Assistance Program (EAP) | Flexible spending account | Health insuranceMid-level Full TimeSan Francisco1d ago
-
AI Engineer USD 120K-158KAI Foundry | AIOps | Automated testing | Azure AI | Azure AI Foundry401k match | Dental insurance | Medical insurance | Paid Holidays | Paid time offMid-level Full TimeLos Angeles, CA, USA; Remote, CA, … R1d ago
-
Senior AI/ML Engineer USD 125K-157KAWS | Access Controls | Agent Orchestration | Audit trails | CI/CD401k | Commuter benefits | Employee referral program | Fertility care benefits | Free testingSenior-level Full TimeUS Remote R1d ago
-
Assistant Vice President USD 200K-262KAI risk management | AWS Bedrock | Agent Frameworks | AlloyDB | Amazon Web ServicesExecutive-level Full TimeCanada1d ago
-
AI/ML Engineer - Vision AI USD 130K-210KCUDA | CUDNN | Computer Vision | Container Orchestration | Deep learning401k match | Employee ownership (ESOP) | Health insurance | Tuition reimbursementSenior-level Full TimeRoyersford, Pennsylvania, United States1d ago