Senior Solutions Engineer – Big Data & Data Infrastructure
Tel Aviv-Yafo, Tel Aviv District, IL
VAST Data
The world’s first data computing platform designed to serve as the foundation for AI-automated discovery.Description
This is a great opportunity to be part of one of the fastest-growing infrastructure companies in history, an organization that is in the center of the hurricane being created by the revolution in artificial intelligence.
"VAST's data management vision is the future of the market."- Forbes
VAST Data is the data platform company for the AI era. We are building the enterprise software infrastructure to capture, catalog, refine, enrich, and protect massive datasets and make them available for real-time data analysis and AI training and inference. Designed from the ground up to make AI simple to deploy and manage, VAST takes the cost and complexity out of deploying enterprise and AI infrastructure across data center, edge, and cloud.
Our success has been built through intense innovation, a customer-first mentality and a team of fearless VASTronauts who leverage their skills & experiences to make real market impact. This is an opportunity to be a key contributor at a pivotal time in our company’s growth and at a pivotal point in computing history.
We are seeking an experienced Solutions Data Engineer who possess both technical depth and strong interpersonal skills to partner with internal and external teams to develop scalable, flexible, and cutting-edge solutions. Solutions Engineers collaborate with operations and business development to help craft solutions to meet customer business problems.
A Solutions Engineer works to balance various aspects of the project, from safety to design. Additionally, a Solutions Engineer researches advanced technology regarding best practices in the field and seek to find cost-effective solutions.
Job Description:
We’re looking for a Solutions Engineer with deep experience in Big Data technologies, real-time data pipelines, and scalable infrastructure—someone who’s been delivering critical systems under pressure, and knows what it takes to bring complex data architectures to life. This isn’t just about checking boxes on tech stacks—it’s about solving real-world data problems, collaborating with smart people, and building robust, future-proof solutions.
In this role, you’ll partner closely with engineering, product, and customers to design and deliver high-impact systems that move, transform, and serve data at scale. You’ll help customers architect pipelines that are not only performant and cost-efficient but also easy to operate and evolve.
We want someone who’s comfortable switching hats between low-level debugging, high-level architecture, and communicating clearly with stakeholders of all technical levels.
Key Responsibilities:
- Build distributed data pipelines using technologies like Kafka, Spark (batch & streaming), Python, Trino, Airflow, and S3-compatible data lakes—designed for scale, modularity, and seamless integration across real-time and batch workloads.
- Design, deploy, and troubleshoot hybrid cloud/on-prem environments using Terraform, Docker, Kubernetes, and CI/CD automation tools.
- Implement event-driven and serverless workflows with precise control over latency, throughput, and fault tolerance trade-offs.
- Create technical guides, architecture docs, and demo pipelines to support onboarding, evangelize best practices, and accelerate adoption across engineering, product, and customer-facing teams.
- Integrate data validation, observability tools, and governance directly into the pipeline lifecycle.
- Own end-to-end platform lifecycle: ingestion → transformation → storage (Parquet/ORC on S3) → compute layer (Trino/Spark).
- Benchmark and tune storage backends (S3/NFS/SMB) and compute layers for throughput, latency, and scalability using production datasets.
- Work cross-functionally with R&D to push performance limits across interactive, streaming, and ML-ready analytics workloads.
- Operate and debug object store–backed data lake infrastructure, enabling schema-on-read access, high-throughput ingestion, advanced searching strategies, and performance tuning for large-scale workloads.
Required Skills & Experience:
- 2–4 years in software / solution or infrastructure engineering, with 2–4 years focused on building / maintaining large-scale data pipelines / storage & database solutions.
- Proficiency in Trino, Spark (Structured Streaming & batch) and solid working knowledge of Apache Kafka.
- Coding background in Python (must-have); familiarity with Bash and scripting tools is a plus.
- Deep understanding of data storage architectures including SQL, NoSQL, and HDFS.
- Solid grasp of DevOps practices, including containerization (Docker), orchestration (Kubernetes), and infrastructure provisioning (Terraform).
- Experience with distributed systems, stream processing, and event-driven architecture.
- Hands-on familiarity with benchmarking and performance profiling for storage systems, databases, and analytics engines.
- Excellent communication skills—you’ll be expected to explain your thinking clearly, guide customer conversations, and collaborate across engineering and product teams.
Requirements
None* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Airflow Architecture Big Data CI/CD Data analysis Data management Data pipelines DevOps Distributed Systems Docker Engineering HDFS Kafka Kubernetes Machine Learning ML infrastructure NoSQL Parquet Pipelines Python R R&D Spark SQL Streaming Terraform
Perks/benefits: Flex hours Startup environment
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.