Sr. System Engineer

Chungho, Taiwan, TW

Supermicro

The premier provider of advanced Server Building Block Solutions® for 5G/Edge, Data Center, Cloud, Enterprise, Big Data, HPC and Embedded markets worldwide.

View all jobs at Supermicro

Apply now Apply later

Job Req ID: 25817

About Supermicro:

Supermicro® is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop/ Big Data, Hyperscale, HPC and IoT/Embedded customers worldwide. We are the #5 fastest growing company among the Silicon Valley Top 50 technology firms. Our unprecedented global expansion has provided us with the opportunity to offer a large number of new positions to the technology community. We seek talented, passionate, and committed engineers, technologists, and business leaders to join us.
 

Job Summary:

As a Sr. System Engineer, you will work you are the go-to person to roll out and maintain business critical applications and services for Supermicro. You are also responsible for resolving escalated service issues, coaching other engineers to resolutions, engineering and implementing complex projects. You will be a person who is independent with leadership to drive the technical development and with excellent communication skills.

Essential Duties and Responsibilities:

  • Conduct performance testing and benchmarking for servers, GPUs, and HPC environments.
  • Analyze results to identify bottlenecks and optimize system performance for AI/ML workloads.
  • Design and configure high-speed network topologies (InfiniBand, Ethernet) for AI clusters.
  • Configure network components to ensure optimal performance
  • Write Python scripts to automate testing, monitoring, and system optimization.
  • Understanding of AI/ML frameworks (e.g., PyTorch, TensorFlow) and deployment requirements for LLMs.
  • Monitor network health and server performance, proactively identifying and resolving issue

Qualifications:

  • Minimum 5 years of relevant experience in performance testing, system optimization, and HPC environments.
  • Proficiency in Linux system administration, including cluster setup and management.
  • Hands-on experience with Kubernetes (K8S) for container orchestration in AI/ML workloads.
  • Familiarity with CUDA and GPU configurations for AI/ML performance optimization.
  • In-depth knowledge of high-speed networking (e.g., InfiniBand, Ethernet) and related technologies.
  • Understanding of AI/ML frameworks such as PyTorch, TensorFlow, and deployment requirements for large language models (LLMs).
  • Ability to conduct performance testing and benchmarking for servers, GPUs, and HPC systems.
  • Capability to design, configure, and troubleshoot network topologies and components.
  • Problem-Solving and Monitoring:

EEO Statement

Supermicro is an Equal Opportunity Employer and embraces diversity in our employee population. It is the policy of Supermicro to provide equal opportunity to all qualified applicants and employees without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, protected veteran status or special disabled veteran, marital status, pregnancy, genetic information, or any other legally protected status.

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  0  0  0
Category: Engineering Jobs

Tags: Big Data CUDA Engineering GPU Hadoop HPC InfiniBand Kubernetes Linux LLMs Machine Learning Python PyTorch TensorFlow Testing

Region: Asia/Pacific
Country: Taiwan

More jobs like this