Member of Technical Staff - ML Infrastructure Engineer
USD 180K-300K Senior-level Full Time
Tasks
- Design and deploy ML inference clusters
- Design and deploy ML training clusters
- Ensure cloud security best practices
- Implement custom autoscaling for ML workloads
- Implement infrastructure as code for resource provisioning
- Maintain Kubernetes clusters
- Maintain Slurm clusters
- Manage network cloud file systems
- Manage object storage for ML workloads
- Monitor and optimize observability for ML infrastructure
- Optimize CI/CD pipelines for ML workflows
- Provide developer friendly ML operations tooling
Perks/Benefits
Skills/Tech-stack
AWS | Amazon S3 | Ansible | Argo CD | Autoscaling | Azure | Best practices | CI/CD | CircleCI | Cloud platform | Container Security | Data Versioning | Distributed Training | Experiment tracking | File System | GPU infrastructure | GPU infrastructure management | GitHub Actions | Google Cloud | Google Cloud Platform | Grafana | High Performance | High-Performance Computing | Infrastructure Management | Infrastructure as Code | Kubernetes | Loki | MLOps | Monitoring | Network File System | Object storage | Observability | Performance Computing | Prometheus | Security best practices | Slurm | Storage Optimization | Terraform | Vulnerability scanning | “as-code”
Education
N/A
Regions
Countries
States
Related jobs
-
Backend Engineer - Data Pipeline EUR 48K-66KAWS | ClickHouse | GCP | Kafka | KubernetesHybrid work | Remote within EU time zoneMid-level Full TimeBerlin, DE R4h ago
-
AI Engineer I - Hybrid USD 125K-135KAPI Development | Agentic Workflows | Azure | Azure Fabric | CI/CDSenior-level Full TimeWindsor, Colorado, United States R18h ago
-
Data Engineer USD 101K-130KAgile | Apache Airflow | Apache Spark | Data Warehousing | Docker401k matching | Employee assistance program | Life insurance | Long-term disability insurance | Medical/Dental/Vision insuranceMid-level Full TimeSt. Paul, MN; Remote (United States) R1d ago
-
Backend Data Engineer USD 160K-200KAgile | Apache Kafka | CI/CD | CRM | Database IndexingComprehensive benefits package | Hybrid work environmentSenior-level Full TimeRemote Worker - USA R1d ago
-
Mid-Level Data Engineer USD 90K-98KAPI Development | Azure Data | Azure Data Factory | Azure Data Lake | Azure Data Lake StorageRemote workMid-level Full TimeWork from home, VA, United States R1d ago
-
Senior Data Engineer USD 165K-180KAPIs | Anomaly Detection | Azure | Azure Data | Azure Data FactorySenior-level Full TimeWork from home, VA, United States R1d ago
-
Principal AI Engineer USD 240K-260KAI systems | Agent evaluation | Agent systems | Agentic AI | Anthropic API401k | Company-provided equipment | Disability insurance | Flexible vacation policy | Life insuranceSenior-level Full TimeRemote (United States) R1d ago
-
Bioinformatics Engineer II (Remote/East Coast) USD 100K-135KAWS | Bash | Data Visualization | Docker | Git401k | Flexible spending account | Generous time off | Life insurance | Long-term disabilityMid-level Full TimeEastern Time Zone (USA) R1d ago
-
Staff AI Engineer USD 210K-235KAgent systems | Agentic AI | Agents SDK | Anthropic API | Automated Evaluation401k | Disability and life insurance | Equipment provided | Flexible vacation policy | Medical, dental, and vision insuranceSenior-level Full TimeRemote (United States) R1d ago
-
Senior AI Engineer USD 170K-200KAgent systems | Agentic AI | Automated evals | Backend architecture | Continuous Improvement401k | Company equipment provided | Equity compensation | Flexible vacation policy | Medical, dental & vision coverageSenior-level Full TimeRemote (United States) R1d ago
-
Software Engineer II, Computational Platform USD 124K-154KAWS | Agentic AI | Data Modeling | Docker | ETL401k plan | Annual performance bonus | Commuter support | Company-provided laptop | Flexible paid time offMid-level Full TimeRemote; Watertown, Massachusetts, United States R1d ago
-
Senior AI Engineer | Sage Home Loans USD 150K-220KAgent Orchestration | Automated Regression | Automated regression testing | Cost Optimization | DPO401k match | Disability insurance | Employee assistance program | Flexible paid time off | Flexible spending accountsSenior-level Full TimeCharlotte, NC R1d ago
-
Data Engineer USD 139K-198KAWS Glue | AWS GovCloud | AWS Lambda | Access Control | AgileDoD Top Secret clearance supported | Fully remote | US citizenship eligibility supportedMid-level Full TimeArlington, VA R1d ago
-
Senior Data Engineer USD 129K-165KAWS | Airflow | CI/CD | Data Modeling | Django401k | Half-day Fridays | Medical/Dental/Vision insurance | Paid Holidays | Remote workSenior-level Full TimeChicago, IL, US R1d ago
-
Working student AI / LLM Engineering (d/m/f/x) EUR 36K-36KAI Pipelines | Agile Development | Docker | Document processing | KubernetesFlexible working hours | Onboarding support | Part time work available | Pro rata vacation days | Remote work opportunityEntry-level Part TimeUlm, Germany R1d ago
-
Alerting | Containerization | Docker | Fault Tolerance | High AvailabilityFlexible schedule | Fully remoteMid-level FreelanceGermany - Remote R1d ago
-
Mid-level Full TimeReston, VA (VA30), United States R1d ago
-
API | AWS | Agile | Automation | AzureRemote work | Security Clearance Initiated Upon HireSenior-level Full TimeNationwide Remote Office (US99), United States R1d ago
-
Agile | Cloud infrastructure | DBT | Data Lakes | Data Modeling401k match | Dental insurance | Flexible work schedules | Holidays | Life insuranceSenior-level Full TimeUS-IN-REMOTE, United States R1d ago
-
ML Engineer USD 180K-250KAWS | Agile | Deep learning | Docker | Feature EngineeringEnglish classes | Learning opportunities | Medical benefits | Professional development | Regular meetupsSenior-level Full TimeBelarus - Remote R1d ago
-
API ingestion | Apache Airflow | BigQuery | Cloud Composer | Cloud RunOccasional onsite requirement in Atlanta or Austin | Remote work | W2 employmentSenior-level Contract Full TimeAtlanta, Georgia, United States - Remote R1d ago
-
ABAC | Audit Logging | Cloud Data | Cloud Data Pipelines | Data ContractsSenior-level Full TimeChicago - 550 Van Buren, United … R1d ago
-
Associate, Snowflake Data Engineer – BTO USD 132K-162KAWS | Azure | Azure Data | Azure Data Factory | Cloud platformFlexible time off | Healthcare | Leave benefits | Retirement benefits | Tuition reimbursementMid-level Full TimeNY7 - 50 Hudson Yards, New … R1d ago
-
Senior Machine Learning Engineer, AI Agent Platform USD 105K-230KAI Agents | Agent Orchestration | Airflow | Apache Spark | AutogenCertification assistance | Code review culture | Mentorship | Training assistance | Workplace flexibilitySenior-level Full TimeNY Manhattan (Office) - JPS, United … R1d ago
-
Data Engineer USD 101K-130KAgile | Angular | Apache Airflow | Apache Spark | Containerization401k matching | Base and Voluntary Life Insurance | Employee assistance program | Long-term disability insurance | Medical/Dental/Vision insuranceMid-level Full TimeSt. Paul, MN; Remote (United States) R1d ago