Senior ML Platform Engineer

Tel Aviv-Jaffa, Tel Aviv District, IL

Apply now Apply later

Description

Dream is a pioneering AI cybersecurity company delivering revolutionary defense through artificial intelligence. Our proprietary AI platform creates a unified security system safeguarding assets against existing and emerging generative cyber threats. Dream's advanced AI automates discovery, calculates risks, performs real-time threat detection, and plans an automated response. With a core focus on the ""unknowns,"" our AI transforms data into clear threat narratives and actionable defense strategies.

Dream's AI cybersecurity platform represents a paradigm shift in cyber defense, employing a novel, multi-layered approach across all organizational networks in real-time. At the core of our solution is Dream's proprietary Cyber Language Model, a groundbreaking innovation that provides real-time, contextualized intelligence for comprehensive, actionable insights into any cyber-related query or threat scenario.

We are seeking an experienced Machine Learning Engineer to join our Platform and DevOps Engineering group. In this critical role, you will be instrumental in building and maintaining a high-availability, scalable model inference infrastructure that supports advanced machine learning models, including Large Language Models (LLMs) and anomaly detection systems.

We are looking for an experienced Machine Learning Engineer who is passionate about developing scalable and robust machine learning infrastructures and has a keen interest in leveraging advanced AI models to drive significant business impact. The job involves creating and maintaining high-performance model inference systems that support both batch and real-time AI processing in cloud and on-premises environments.

Responsibilities

  • Design and implement scalable, high-availability machine learning inference architectures.
  • Develop robust systems that efficiently manage the deployment and operation of complex models like LLMs and anomaly detectors.
  • Utilize AWS technologies such as SageMaker, Lambda, SQS, and Redshift to optimize the performance and scalability of our machine learning infrastructure.
  • Collaborate with data scientists and AI researchers to ensure seamless integration and optimal performance of machine learning models.
  • Build and maintain monitoring systems to ensure the stability and efficiency of the machine learning infrastructure.
  • Troubleshoot and resolve issues related to model performance, infrastructure bottlenecks, and system failures.
  • Maintain strong communication skills and collaborate effectively within a dynamic team environment.

Requirements

None

Skills

  • Proven ability to work effectively in a team setting.
  • At least 4-5 years of experience in machine learning engineering or a related field.
  • Strong experience in building and maintaining scalable machine learning infrastructures.
  • Proficient with AWS services, particularly SageMaker, Lambda, SQS, and Redshift.
  • Deep understanding of machine learning operations (MLOps) and best practices for model deployment.
  • Expertise in Python, with familiarity in other scripting languages such as Bash.
  • Solid experience with big data technologies and data management tools.
  • Knowledgeable in continuous integration and continuous deployment (CI/CD) practices.


Advantages:

  • Experience with real-time machine learning model deployment.
  • Familiarity with cybersecurity applications of machine learning.
  • Advanced skills in performance optimization for high-throughput systems.


Tech Stack:

AWS (SageMaker, Lambda, SQS, Redshift), Deep Speed, TensorFlow, PyTorch, Scikit-learn, Airflow, Python, Docker, Kubernetes, Jenkins, Terraform, Ansible, GitHub, and more.

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  0  0  0

Tags: Airflow Ansible Architecture AWS Big Data CI/CD Data management DevOps Docker Engineering GitHub Jenkins Kubernetes Lambda LLMs Machine Learning ML infrastructure ML models MLOps Model deployment Model inference Python PyTorch Redshift SageMaker Scikit-learn Security TensorFlow Terraform

Region: Middle East
Country: Israel

More jobs like this