HPC - AI/ML Platform Engineer
Tasks
- Collaborate with application and research teams
- Design and implement GPU Kubernetes clusters
- Develop automation for cluster provisioning
- Implement monitoring and observability
- Maintain platform reliability and scalability
- Perform configuration management for platform operations
- Produce technical documentation and runbooks
- Support GPU Kubernetes clusters and infrastructure
- Troubleshoot compute networking and container infrastructure
- Tune performance across GPU and compute platforms
Perks/Benefits
- Dental insurance
- Employee resource groups
- Flexible family care
- Health insurance
- Paid Holidays
- Paid time off
- Parental leave
- Prescription Drug Coverage
- Subsidized back-up child care
- Tuition assistance
- Vehicle Discount
Skills/Tech-stack
Ansible | Bash | CI/CD | GPU scheduling | Grafana | Infiniband | Kubernetes | Linux | Monitoring | Observability | OpenShift | Prometheus | Python | RDMA | Terraform
Education
Roles
Engineer | GPU Platform Engineer | Platform | Platform Engineer
Related jobs
-
Automation Testing | CI/CD | CSS | Cypress | Feature DevelopmentMedical, dental & vision coverage | Paid time off | Parental leave | Reimbursement programs | Retirement planMid-levelRaleigh, United States R10d ago
-
Senior AI Engineer USD 170K-200KAgent systems | Agentic AI | Anthropic API | Automated evals | Backend architecture401k | Company-provided equipment | Comprehensive medical, dental and vision coverage | Disability insurance | Flexible vacation policySenior-level Full TimeRemote (United States) R15h ago
-
Senior Software Engineer, Data Platform USD 164K-227KAccess Control | Airflow | Amazon Kinesis | Amazon Redshift | Apache Flink401k match | Community volunteer time | Commuter benefit | Company-paid days off | Dental insuranceSenior-level Full TimeSan Francisco, CA, USA R17h ago
-
Data Engineer USD 100KAPIs | Apache Kafka | Apache Spark | Azure | Azure Data401k match | Dental insurance | Life insurance | Medical insurance | Paid sick leaveEntry-level Full TimeRemote, US R18h ago
-
Senior Data Engineer USD 165K-175KAWS Glue | AWS Step Functions | Amazon Athena | Amazon EMR | Amazon KinesisEmployee discounts | Employee equity | Medical, dental & vision coverage | Unlimited PTOSenior-level Full TimeRemote - United States R19h ago
-
Data Automation Engineer USD 110K-125KAPI Integration | AWS | Airflow | Azure | C SharpDental insurance | Employee discounts | Employee equity | Health insurance | Pet insuranceMid-level Full TimeRemote - United States R19h ago
-
Principal AI/ML Engineer - AdTech USD 300K-400KAWS | Ad Exchanges | Apache Kafka | Apache Spark | CassandraEmployee discounts | Employee equity | Medical, dental & vision coverage | Pet insurance | Unlimited PTOSenior-level Full TimeRemote - United States R19h ago
-
Lead AI Engineer USD 200K-215KA/B | A/B Testing | AWS Bedrock | Agentic LLM | Agentic LLM systemsDental insurance | Employee discounts | Employee equity | Health insurance | Pet insuranceSenior-level Full TimeRemote - United States R19h ago
-
Senior Machine Learning Engineer USD 198K-287KArtificial Intelligence | Data Engineering | Fine Tuning | Foundation Models | GenAISenior-level Full TimeRemote - US R19h ago
-
Data Engineer USD 135K-200KAPI Integration | AWS Firehose | AWS Kinesis | AWS Lambda | Amazon ECS401k | Dental insurance | Disability insurance | EAP | Employee assistance programSenior-level Full TimeNew York, NY (remote) R19h ago
-
GenAI Principal Engineer - Remote USD 145K-215KAI orchestration | API first | API-first design | AWS | Android401k matching | DEI focus | Development opportunities | Flexible schedule | Flexible time offSenior-level Full TimeUnited States, UNITED STATES, United States R19h ago
-
Data Engineer - Governance and QA USD 120K-150KCI/CD | DBT | Data Architecture | Data Contracts | Data Modeling401k with company match | Dental insurance | Life insurance | Long-term disability | Medical insuranceMid-level Full TimeDallas, TX - Hybrid (3x in … R23h ago
-
Senior Sales Engineer - Key Accounts Northcentral USD 149K-198K.NET | CRM | Go | Java | Node.jsCommunity guilds | Employee stock purchase plan | Hybrid work | Inclusion talks | Mentor/Buddy programSenior-level Full TimeChicago, Illinois, USA; Michigan, USA, Remote; … R1d ago
-
Senior Data Engineer USD 143K-229KAnalysis Services | Azure Data | Azure Data Factory | Azure DevOps | Azure Gen2Mentorship | Remote work opportunity | Travel as requiredSenior-level Full TimeDenver, CO, United States R1d ago
-
Senior Data Engineer USD 143K-229KAnalysis Services Tabular | Azure Analysis | Azure Analysis Services | Azure Analysis Services Tabular | Azure DataMentorship | Remote work optionSenior-level Full TimeKansas City, MO, United States R1d ago
-
Databricks Pipeline Architect USD 150K-180KAWS Glue | AWS Lambda | AWS S3 | Agile | Amazon Web ServicesPublic trust clearance support | Remote workSenior-level Full TimeWork from home, VA, United States R1d ago
-
Tier 3 Network Systems Engineer (Remote) USD 80K-101KActive Directory | Ansible | Ansible Playbook | Apache HTTP | Apache HTTP ServerAfter hours availability | Customer support support | On-call rotation | Remote workMid-level Full TimeDallas, TX, US R1d ago
-
ML Engineer, II - Road & Lane USD 139K-183KBEV | CUDA | CUDA kernels | Camera Calibration | Computer VisionMid-level Full TimeRemote - US, Ann Arbor, MI, … R1d ago
-
ML Engineer, II - Learned Behaviors USD 153K-222KBehavior Cloning | Data Pipelines | Distributed Training | Graph Neural Networks | Imitation LearningMid-level Full TimeRemote - US, Ann Arbor, MI, … R1d ago
-
ML Engineer, II - End to End (E2E) USD 153K-183KBEV | Computer Vision | Data workflows | Diffusion Models | Distributed TrainingMid-level Full TimeRemote - US, Ann Arbor, MI, … R1d ago
-
ML Engineer, II - Camera Models USD 153K-202K3D Perception | Camera Calibration | Computer Vision | Convolutional Neural Networks | Deep learningMid-level Full TimeRemote - US, Ann Arbor, MI, … R1d ago
-
ML Engineer, II - Birds Eye View (BEV) USD 150K-190KBEV Representations | Birds Eye View | Cameras | Computer Vision | Data PipelinesMid-level Full TimeRemote - US, Ann Arbor, MI, … R1d ago
-
BEV | Cameras | Computer Vision | Data Analysis | Data DistributionCommute subsidy | Flexible schedule | Life insurance | Medical, dental & vision coverage | Office Closures on HolidaysEntry-level ApprenticeshipRemote - US, Ann Arbor, MI, … R1d ago
-
Account-based marketing | Airflow | Amazon Redshift | Amazon SageMaker | BigQueryCompetitive paid time off | Education reimbursement | Employee assistance program | Health care coverage | Monthly Wellness or Home Office ReimbursementSenior-level Full TimeChicago, IL, United States R1d ago
-
Account-based marketing | Amazon Redshift | Amazon SageMaker | Apache Airflow | DBTAccess to employee assistance program | Charitable contribution match | Competitive paid time off | Education reimbursement | Health care coverageSenior-level Full TimeWashington, DC, United States R1d ago