DevOps Engineer, GPUaaS
Tasks
- Collaborate to streamline workflows and improve collaboration
- Conduct GPU cluster benchmark and track GPU technology advancements
- Design deploy and support GPU clusters for AI and ML workloads
- Design implement and manage CI CD pipelines for AI models and GPU accelerated applications
- Identify bottlenecks and improve development and operational processes for AI and HPC GPU cloud
- Implement security best practices for multi tenant GPUaaS
- Improve infrastructure provisioning management and monitoring through automation
- Manage and automate provisioning of GPU resources on prem and cloud
- Monitor cluster usage health performance and availability
- Optimize system parameters for AI workload performance
- Participate in rotational or scheduled shift work
- Provide technical support and guidance to users
- Set up monitoring and logging for GPU resources
- Solve problems in high performance distributed computation
- Troubleshoot compute resource system level issues
Perks/Benefits
- Flexible work arrangements
- Health and wellness benefits
- Internal mobility opportunities
- Training and development programs
Skills/Tech-stack
Ansible | Automation | Bash | CI/CD | CUDA | CentOS | Containers | Docker | GPU Acceleration | GPU Architecture | GPU drivers | IaaS | Infiniband | Jenkins | Kubernetes | Linux | Logging | MPI | Monitoring | NCCL | NVIDIA DCGM | NVIDIA GPUs | Networking | PaaS | Prometheus | PyTorch | Python | RDMA | Rocky Linux | Security | Slurm | TensorFlow | Terraform | Ubuntu | Zabbix
Roles
Related jobs
-
ACR | AKS | ARM Templates | Agile | Application InsightsSenior-level Full TimeSingapore, Singapore8h ago
-
Capacity Planning | Disaster Recovery | Distributed Systems | Docker | GPUMid-level Full TimeSingapore, Singapore9h ago
-
Distributed Systems | Linux | Machine Learning | Networking | Performance AnalysisEntry-level Full TimeSingapore, Singapore9h ago
-
Software Engineer, Payments Data Platform SGD 138K-139KC# | C++ | Data Processing | Data pipeline | Distributed SystemsMid-level Full TimeSingapore10h ago
-
Software Engineer, Data Platform SGD 106K-108KC++ | Compliance | Data Pipelines | Data Reliability | Data StorageSenior-level Full TimeSingapore10h ago
-
AI Engineer (Catalyst) SGD 120K-180KAI coding | AI coding tools | Agentic Workflows | Anthropic | Cloud infrastructureAnnual training budget | Enhanced parental leave | Equity compensation | Well-being budgetSenior-level Full TimeSingapore, Singapore14h ago
-
AI Engineer - Gemini Powered Solutions SGD 140K-186KAgentic AI | BigQuery | CI/CD | Cloud Run | Cloud StorageSenior-level Full TimeSingapore, Raffles City Tower21h ago
-
Apache Spark | Batch Processing | Data Governance | Data Imputation | Data LineageSenior-level Full TimeHealth Promotion Board, Singapore21h ago
-
Software Engineer, Data & AI Focus (Contract) SGD 147K-180KAWS | Angular | Automated testing | Azure | CI/CDSenior-level Contract Full TimeMAS: MAS Building, Singapore21h ago
-
Intern - ML/AI Engineer (STPG PE) SGD 81K-81KBigQuery | Classification | Cloud Functions | Clustering | Data VisualizationEntry-level Full Time InternshipMSB, Singapore21h ago
-
AI model | AI model development | Artificial Intelligence | Data Analysis | Data MiningSenior-level Full TimeFab 10A, Singapore21h ago
-
Mid-level Full TimeSingapore, Singapore, Singapore1d ago
-
Full Stack Engineer / Data Engineer SGD 112K-150KAPI Gateway | AWS | ArgoCD | Authentication | AuthorizationSenior-level Full TimeSingapore, Singapore1d ago
-
Android Development | C plus plus | Data Processing | Debugging | Language ProcessingMid-level Full TimeSingapore1d ago
-
Senior-level Full TimeSingapore, Singapore1d ago
-
AI Tooling | C++ | Embedded C | Embedded firmware | Firmware Quality AssuranceSenior-level Full TimeMSB, Singapore1d ago
-
Freelance Machine Learning Engineer SGD 110KComputational Algorithms | Generative AI | NumPy | Numerical Methods | PandasFlexible schedule | Fully remote | Part-time freelanceMid-level FreelanceSingapore - Remote R1d ago
-
Computational Algorithms | Generative AI | Machine Learning | NumPy | Numerical MethodsPart-time freelance | Remote workMid-level FreelanceSingapore - Remote R1d ago
-
AI Engineer (PhD Required) SGD 96K-138KAttention Mechanisms | Autogen | Chunking | Constitutional AI | Distributed TrainingAnnual team events | Casual team environment | Flexible hours | Internet reimbursement | Opportunity for advancementMid-level Full TimeSingapore, Singapore1d ago
-
Applied AI Scientist (PhD Required) SGD 140K-191KAgent systems | Agentic Workflows | Anomaly Detection | Data Anomaly Detection | Data QualityAnnual team events | Flexible hours | Internet reimbursement | Opportunity for advancement | Remote work environmentSenior-level Full TimeSingapore, Singapore1d ago
-
Backpressure | CAP Theorem | Circuit Breaker | Concurrency | Consensus AlgorithmsContinued Career Development | Employee resource groups | Flexible WFH | Generous PTO | Mental wellness programsSenior-level Full TimeSingapore-Singapore1d ago
-
LLM | Langchain | MLOps | Matplotlib | NumPyFlexible schedule | Part-time freelance | Remote workMid-level FreelanceSingapore - Remote R1d ago
-
Bash | Claude Code | Coverage.py | Cursor | DaggerFlexible hours | Freelance project-based collaboration | Fully remote | Supportive global communitySenior-level Full TimeSingapore - Remote R1d ago
-
Bash | Coverage.py | Dagger | Docker | GcovFlexible hours | Fully remote | Project-based collaboration | Supportive global communitySenior-level Full TimeSingapore - Remote R1d ago
-
AWS Glue | Agile | Amazon Athena | Amazon RDS | Amazon S3Coaching and mentoring | Employee wellness program | Growth opportunities | Structured development frameworkMid-level Full TimeSingapore, Singapore, Singapore1d ago