Senior High-Performance Computing Systems Engineer (Remote)
Plovdiv Province, Plovdiv, Bulgaria
Prime Holding
Driving Success ThroughTechnology Technology Innovation Domain Expertise Engineering Excellence 25 years providing exceptional software services to Redefining Excellence in Software Delivery We design, build, and operate advanced...Wiser Technology is a leading software development company. Our team of 600 engineers across Europe excels in web and mobile software development, video streaming, defense, machine learning, automotive, e-commerce, and AI. We leverage top-tier technologies and expertise driven by a passion for innovation to drive progress.
We are seeking a proactive, skilled, and passionate Senior Systems Engineer - HPC.
WHAT YOU WILL DO:
- Oversee the design, deployment, and optimization of the HPC infrastructure, including hardware, platform, software, networking, and storage components.
- Partake in preparation and review of HLD, LLD documents, the scope of work, RFIs, RFPs, and RFQs.
- Lead efforts to maximize the efficiency and performance of HPC systems, ensuring optimal resource utilization and minimal downtime.
- Collaborate closely with product and architecture teams to understand and implement customer computational needs and requirements. Provide tailored technical solutions that align with the company’s strategic goals.
- Develop and implement automation solutions and tools for deployment and management.
- Set up monitoring, logging, and alerting systems.
- Act as L3 support for complex technical issues, perform root cause analysis and implement solutions to ensure the reliability and availability of HPC systems.
- Maintain comprehensive documentation of HPC configurations, procedures, and best practices to facilitate knowledge sharing and future reference.
- Ensure the security and compliance of the HPC infrastructure by implementing necessary safeguards and adhering to company standards and regulations.
- Collaborate with HPC vendors and suppliers for hardware and software procurement, support, and delivery.
- Assist in budget planning and management for HPC-related expenditures, ensuring cost-effective solutions.
- Stay at the forefront of HPC technology trends, evaluating and recommending new technologies and practices to enhance HPC capabilities.
WHAT YOU WILL NEED:
- Bachelor’s degree in Information Technology, Computer Science, or relevant field.
- Minimum 7 years of hands-on experience in High-Performance Computing (HPC) systems administration and infrastructure management
- Advanced knowledge and expertise in configuring, optimizing, and maintaining complex HPC environments, including hardware, software, and storage systems.
- Proficiency in parallel computing principles, distributed computing, and cluster management.
- Comprehensive knowledge and hands-on experience in the system administration of Linux environments.
- Experience with job schedulers, resource managers, and workflow orchestration tools commonly used in HPC environments (Slurm, LSF or PBS, K8S)
- Advanced knowledge of Data Center network design and related technologies [OSI model, TCP/IP stack, routing, VLAN/VxLAN, etc].
- Competence in network design and configuration of switches/routers, including InfiniBand and RoCE.
- Experience with large-scale data storage solutions, particularly Ceph, NFS, and Lustre.
- Proficiency in one or more of the parallel libraries/languages such as MPI, OpenMP, OneAPI and CUDA.
- Competence in configuration management tools such as Ansible, Puppet, Terraform, and integration with Git.
- Strong scripting and automation skills (e.g., Python, Bash) for system administration tasks.
- Excellent problem-solving skills and the ability to troubleshoot complex HPC issues effectively.
- In-depth knowledge of performance tuning and optimization techniques for HPC systems.
- Familiarity with containerization and orchestration (Docker, Kubernetes)
- Experience with monitoring and observability (e.g. Prometheus, Grafana, Nagios, Zabbix, Ganglia, ELK)
- Effective communication and collaboration skills to work with cross-functional teams.
WHAT’S IN IT FOR YOU?
Culture & Development:
- Friendly Environment: We take pride in our culture and love spending time together.
- Team Spirit: Be part of a supportive team that uplifts each other.
- Mentorship and coaching: Our colleagues are experts in their field, and you can expect to have a solid team to rely on.
- Personalized Development Program: We realize that one size doesn’t fit all, so you'll receive an individual development plan tailored to your career aspirations.
Social Benefits:
- Work Flexibility: Embrace flexible working hours and choose from remote, hybrid, or onsite work models. Multiple Office Locations: In Sofia, Plovdiv, Stara Zagora, and Nis, you can choose where you would like to work.
- A Suite of Perks: Enjoy food vouchers, additional health insurance, sports cards, and more.
- Community and Connections: Engage in exciting social events and team initiatives.
Empowerment: At Wiser, every role is instrumental. You will have the power to make a difference!
Ready to advance your career with a tech leader passionately driven by innovation?
Join Wiser - Become Wiser!
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Ansible Architecture Computer Science CUDA Docker E-commerce ELK Git Grafana HPC InfiniBand Kubernetes Linux Machine Learning OpenMP Puppet Python Security Streaming Terraform
Perks/benefits: Career development Flex hours Health care Team events
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.