Senior Storage Production Engineer - DGX Cloud
Tasks
- Automate storage operations and maintenance
- Build and maintain storage monitoring logging and alerting
- Conduct incident response and blameless root cause analysis
- Design and implement large scale storage clusters
- Ensure data security and compliance using encryption access controls and auditing
- Improve storage service lifecycle from design to operation and continuous optimization
- Optimize storage architectures for low latency AI ML workloads
- Optimize storage efficiency using compression deduplication tiering and intelligent placement
- Participate in on-call rotation
- Scale storage systems using policy based tiering and dynamic data migration
- Supervise availability latency and system health
- Support storage clusters for scalability availability and data integrity
Perks/Benefits
Skills/Tech-stack
AI/ML | Access Control | Algorithms | Ansible | Auditing | Backup | Bash | Block Storage | C# | C++ | CI/CD | Capacity Planning | Chef | Clustered file systems | Containers | Continuous Delivery | Data Structures | Disaster Recovery | Disaster Recovery Testing | Distributed Storage | Distributed object storage | Elasticsearch | Encryption | Erasure Coding | Fibre Channel | File Storage | File systems | Git | Go | Grafana | High Performance | High-Performance Computing | ISCSI | InfluxDB | Infrastructure as Code | Java | Kubernetes | Linux | Logging | NFS | NVMe over Fabrics | NodeJs | Object storage | Parallel File Systems | Performance Computing | Performance Tuning | Predictive Analytics | Prometheus | Puppet | Python | RDMA | Recovery testing | Replication | S3 | SMB | Software Design | Terraform | Virtualization | “as-code”
Education
Regions
Countries
States
Related jobs
-
Senior Software Engineer, PyTorch - Deep Learning USD 152K-287KC++ | CUDA | Distributed Computing | Parallel Programming | PyTorchSenior-level Full TimeUS, CA, Santa Clara R4d ago
-
Bash | Bootstrap | CSI | CSS3 | Container StorageSenior-level Full TimeUS, CA, Santa Clara R4d ago
-
Senior Software Engineer, AI Storage USD 184K-287KAlgorithms | Bash | C++ | CUDA | CloudBenefits | EquitySenior-level Full TimeUS, CA, Santa Clara R4d ago
-
Senior Deep Learning Framework Communications Engineer USD 152K-287KC++ | CUDA | CUDA kernels | CuTe | Distributed TrainingBenefits | EquitySenior-level Full TimeUS, CA, Santa Clara R6d ago
-
C++ | CUDA | Docker | Infiniband | JAXSenior-level Full TimeUS, CA, Santa Clara R7d ago
-
Senior Deep Learning Frameworks CUDA Software Engineer USD 184K-356KAutograd | C++ | CUDA | Compiler technology | Computer ArchitectureSenior-level Full TimeUS, CA, Santa Clara R10d ago
-
Senior Scientific Machine Learning Engineer – Earth-2 USD 152K-287KCUDA | Containers | Data parallelism | Diffusion Models | GPU KernelBenefits | EquitySenior-level Full TimeUS, CA, Santa Clara R12d ago