AI Solutions Engineer, AI/HPC

Raanana, Israel, IL

Apply now Apply later

Description

Location: Israel

Ra'anana - Hybrid work

#LI-Hybrid

DriveNets is a leader in high-scale disaggregated networking solutions. Founded in 2015, DriveNets modernizes the way service providers, cloud providers and hyperscalers build networks. Supporting the largest network in the world, more than half of AT&T’s backbone traffic is running on DriveNets’ Network Cloud open disaggregated architecture. Raising $587 million in three funding rounds, DriveNets is disrupting the networking market from high-scale architecture to AI platforms, and is bringing onboard the most talented people. We are seeking people that want to make an impact on the world’s leading communication networks and are experienced in networking architecture or AI infrastructure solutions.

The Role

As a Solution Engineer, you will play a pivotal role in designing, deploying, and optimizing Drivenets’ Network Cloud AI Infrastructure solutions. This individual contributor role requires a blend of technical expertise, leadership, and hands-on experience to implement cutting-edge solutions for our customers. You will collaborate with sales engineering teams, customers, and cross-functional teams - including Product Management, Solution Architects, Engineering, and Marketing - to define technical requirements, articulate solution value, and ensure successful deployment on-site.

Key responsibilities include guiding customers through the design and deployment process, aligning technical solutions with business needs, and providing critical feedback to improve Drivenets’ product offerings. This position demands strong technical acumen, exceptional communication skills, and the ability to lead complex, high-impact projects in dynamic environments.

Responsibilities:

  • Building robust AI/HPC infrastructure for new and existing customers.
  • Technical hands-on role in building and supporting NVIDIA/AMD based platforms.
  • Support operational and reliability aspects of large-scale AI clusters, focusing on performance at scale, training stability, real-time monitoring, logging, and alerting.

  • Administer Linux systems, ranging from powerful GPU enabled servers to general-purpose compute systems.
  • Design and plan rack layouts and network topologies to support customer requirements.
  • Design and evaluate automation scripts for network operations, configuring server and switch fabrics.
  • Perform Data Center upgrades and ensure smooth deployment of Drivenets solutions.
  • Install and configure Drivenets products, ensuring optimal performance and customer satisfaction.
  • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
  • Engage in and improve the whole lifecycle of services from inception and design through deployment, operation, and refinement.

  • Provide feedback to internal teams such as opening bugs, documenting workarounds, and suggesting improvements.
  • Engage with sales teams and customers to ensure success with major opportunities and deployments
  • Introduce new products to the Drivenets’ sales and support teams and to Drivenets’ customers 
  • Deliver technical trainings and TOIs for support/sales engineers, partners, and customers
  • Collaborate on product definition through customer requirement gathering and roadmap planning

Requirements

  • 5+ years of previous experience deploying and administrating AI/HPC clusters or general-purpose compute systems.
  • 5+ years of hands-on Linux experience (e.g., RHEL, CentOS, Ubuntu) and production infrastructure support (e.g., networking, storage, monitoring, compute, installation, configuration, maintenance, upgrade, retirement)
  • Proficiency in Cloud, Virtualization, and Container technologies.
  • Deep understanding of operating systems, computer networks, and high-performance applications
  • Hands-on experience with Bash, Python, and configuration management tools (e.g., Ansible).
  • Established record of leading technical initiatives and delivering results.
  • Ability to write extensive technical content (white papers, technical briefs, test reports, etc.) for external audiences with a balance of technical accuracy, strategy, and clear messaging
  • Ability to travel Domestic and international up to 20% of the time

Ways to stand out from the crowd:           

  • Familiarity with AI-relevant data center infrastructure and networking technologies such as: Infiniband, RoCEv2, lossless Ethernet technologies (PFC, ECN, etc), accelerated computing, GPU, DPU, etc.
  • Familiarity with GPU resource scheduling managers (Slurm, Kubernetes, etc.)
  • Familiarity with monitoring tools (e.g., Prometheus, Grafana, ELK Stack) and Telemetry (gRPC, gNMI, OTLP, etc).
  • Understanding of data center operations fundamentals in networking, cooling, and power
  • Proven experience with one or more Tier-1 Clouds (AWS, Azure, GCP or OCI) or emerging Neoclouds, and cloud-native architectures and software.

  • Expertise with parallel filesystems (e.g. Lustre, GPFS, BeeGFS, WekaIO) and high-speed interconnects (InfiniBand, Omni Path, Ethernet).
  • Understanding the AI workload requirements and how it interacts with other parts of the system like networking, storage, deep learning frameworks, etc.
  • Knowledge of AI/ML frameworks (e.g., TensorFlow, PyTorch) and associated tooling is an advantage.
  • Enjoy a competitive salary, benefits, and opportunities for career growth.

If your experience is close but doesn’t fulfill all requirements, please apply. DriveNets is on a mission to build a special company comprised of individuals with different backgrounds, perspectives, and experiences.

DriveNets is an equal opportunity employer. We do not discriminate based on upon race, religion, color, national origin, sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with disability, or other applicable legally protected characteristics.

More About DriveNets

Based in Israel with locations in Romania, US and Japan as well as extended teams, DriveNets operations cover more than 10 countries. With recognition by industry analysts and through numerous industry awards, DriveNets is pushing market momentum, allowing for faster service innovation from the network core to the edge. Visit our website:

https://drivenets.com/company/

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  0  0  0

Tags: Ansible Architecture AWS Azure Deep Learning ELK Engineering GCP GPU Grafana HPC InfiniBand Kubernetes Linux Machine Learning ML infrastructure Python PyTorch TensorFlow Travel

Perks/benefits: Career development Competitive pay Health care Startup environment

Region: Middle East
Country: Israel

More jobs like this