Senior Systems Engineer – HPC

Sofia City Province, Sofia, Bulgaria

Apply now Apply later

Prime Holding is Part of Wiser now! 

Wiser Technology is a leading software development company. Our team of 600 engineers across Europe excels in web and mobile software development, video streaming, defense, machine learning, automotive, e-commerce, and AI. We leverage top-tier technologies and expertise driven by a passion for innovation to drive progress.

We are seeking a proactive, skilled, and passionate Senior Systems Engineer - HPC.


WHAT YOU WILL DO:

  • Oversee the design, deployment, and optimization of the HPC infrastructure, including hardware, platform, software, networking, and storage components.
  • Partake in preparation and review of HLD, LLD documents, the scope of work, RFIs, RFPs, and RFQs.
  • Lead efforts to maximize the efficiency and performance of HPC systems, ensuring optimal resource utilization and minimal downtime.
  • Collaborate closely with product and architecture teams to understand and implement customer computational needs and requirements. Provide tailored technical solutions that align with the company’s strategic goals.
  • Develop and implement automation solutions and tools for deployment and management.
  • Set up monitoring, logging, and alerting systems.
  • Act as L3 support for complex technical issues, perform root cause analysis and implement solutions to ensure the reliability and availability of HPC systems.
  • Maintain comprehensive documentation of HPC configurations, procedures, and best practices to facilitate knowledge sharing and future reference.
  • Ensure the security and compliance of the HPC infrastructure by implementing necessary safeguards and adhering to company standards and regulations.
  • Collaborate with HPC vendors and suppliers for hardware and software procurement, support, and delivery.
  • Assist in budget planning and management for HPC-related expenditures, ensuring cost-effective solutions.
  • Stay at the forefront of HPC technology trends, evaluating and recommending new technologies and practices to enhance HPC capabilities.

WHAT YOU WILL NEED: 

  • Bachelor’s degree in Information Technology, Computer Science, or relevant field.
  • Minimum 7 years of hands-on experience in High-Performance Computing (HPC) systems administration and infrastructure management
  • Advanced knowledge and expertise in configuring, optimizing, and maintaining complex HPC environments, including hardware, software, and storage systems.
  • Proficiency in parallel computing principles, distributed computing, and cluster management.
  • Comprehensive knowledge and hands-on experience in the system administration of Linux environments.
  • Experience with job schedulers, resource managers, and workflow orchestration tools commonly used in HPC environments (Slurm, LSF or PBS, K8S)
  • Advanced knowledge of Data Center network design and related technologies [OSI model, TCP/IP stack, routing, VLAN/VxLAN, etc].
  • Competence in network design and configuration of switches/routers, including InfiniBand and RoCE.
  • Experience with large-scale data storage solutions, particularly Ceph, NFS, and Lustre.
  • Proficiency in one or more of the parallel libraries/languages such as MPI, OpenMP, OneAPI and CUDA.
  • Competence in configuration management tools such as Ansible, Puppet, Terraform, and integration with Git.
  • Strong scripting and automation skills (e.g., Python, Bash) for system administration tasks.
  • Excellent problem-solving skills and the ability to troubleshoot complex HPC issues effectively.
  • In-depth knowledge of performance tuning and optimization techniques for HPC systems.
  • Familiarity with containerization and orchestration (Docker, Kubernetes)
  • Experience with monitoring and observability (e.g. Prometheus, Grafana, Nagios, Zabbix, Ganglia, ELK)
  • Effective communication and collaboration skills to work with cross-functional teams.


WHAT’S IN IT FOR YOU? 

Culture & Development:

  • Friendly Environment: We take pride in our culture and love spending time together.
  • Team Spirit: Be part of a supportive team that uplifts each other.
  • Mentorship and coaching: Our colleagues are experts in their field, and you can expect to have a solid team to rely on.
  • Personalized Development Program: We realize that one size doesn’t fit all, so you'll receive an individual development plan tailored to your career aspirations.

Social Benefits:

  • Work Flexibility: Embrace flexible working hours and choose from remote, hybrid, or onsite work models. Multiple Office Locations: In Sofia, Plovdiv, Stara Zagora, and Nis, you can choose where you would like to work.
  • A Suite of Perks: Enjoy food vouchers, additional health insurance, sports cards, and more.
  • Community and Connections: Engage in exciting social events and team initiatives.

Empowerment: At Wiser, every role is instrumental. You will have the power to make a difference!
Ready to advance your career with a tech leader passionately driven by innovation?
Join Wiser - Become Wiser!
Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  0  0  0
Category: Engineering Jobs

Tags: Ansible Architecture Computer Science CUDA Docker E-commerce ELK Git Grafana HPC InfiniBand Kubernetes Linux Machine Learning OpenMP Puppet Python Security Streaming Terraform

Perks/benefits: Career development Flex hours Health care Team events

Region: Europe
Country: Bulgaria

More jobs like this