Software Engineering - Data Engineer

Menlo Park, CA

Voltai

Foundation model for electronics

View all jobs at Voltai

Apply now Apply later

About Voltai: Voltai is an AI company building frontier foundation models and agentic systems for semiconductor and electronics design.We are one of the fastest-growing startups (in terms of valuation and revenue bookings) and leader in AI for electronics design. Backed by the world's top VC firm, we are working with some of the world’s largest semiconductor and electronics companies.The founding team consists of Olympiad medalists, ICPC world champions, and university professors. In addition, our business team has a proven track record, having scaled revenue from $0 to $400M in just four years at their previous company.

Key Responsibilities:
  • Collect, parse, and structure diverse data types—including text, images, tables, circuit diagrams, simulations, and signal data—into standardized formats suitable for machine learning applications
  • Design and maintain scalable data pipelines that efficiently handle data ingestion, transformation, and integration into ML workflows, ensuring high throughput and reliability
  • Optimize data storage solutions to balance performance, scalability, and cost-effectiveness, facilitating rapid access and processing of large datasets
  • Collaborate with cross-functional teams, including ML and infra engineers, to curate high-quality training and evaluation datasets aligned with Voltai's product offerings
  • Implement robust data validation and quality assurance processes to ensure the integrity and usability of datasets across various applications.

Required Skillsets:
  • Programming Languages: Proficiency in Python, with experience in compiled languages such as Go or Rust
  • Data Parsing and Extraction: Expertise in parsing and extracting data from various formats and modalities, including PDFs, HTML, images, and binary files, utilizing tools like BeautifulSoup, pdfminer.six, and custom parsers
  • Data Pipeline Frameworks: Experience with modern data pipeline frameworks such as Apache Airflow, Prefect, Dagster, or Apache Beam, enabling efficient orchestration of complex data workflows
  • Data Processing Tools: Familiarity with tools like Apache Spark, Apache Flink, or similar platforms for large-scale data processing and transformation
  • Database Systems: Strong knowledge of relational and non-relational databases, including PostgreSQL, Supabase, and other scalable storage solutions
  • Cloud Platforms: In-depth experience with cloud services, particularly AWS, including S3, EC2, Lambda, and related services for deploying and managing data infrastructure
  • Web Crawling and Agentic Crawling: Proficiency in building and managing web crawlers using frameworks like Scrapy, Firecrawl, or Crawl4AI, with an understanding of agentic crawling techniques to automate data extraction tasks
  • Data Quality and Governance: Commitment to maintaining high data quality standards, with experience in implementing data validation, cleansing, and governance practices

Bonus Points:
  • A strong background in hardware/electronics, gained through professional, academic, or personal projects
  • Experience in constructing datasets for large scale ML models, specifically LLMs
  • Contributions to open-source initiatives
  • Experience thriving in a fast-paced, hyper-growth startup environment

Our Benefits
  • Unlimited PTO: We trust you to manage your time and know when you need a break. Recharge when you need it, no questions asked.
  • Comprehensive Health Coverage: Your health matters. We offer top-tier medical and dental insurance to keep you and your loved ones covered.
  • Commitment to Your Growth: At Voltai, we’re dedicated to your continuous learning and development. Whether it’s through challenging projects or opportunities for professional advancement, we invest in your journey to becoming a leader in your field.
  • Visa Sponsorship: We support international talent and offer visa sponsorship to help you join our team, no matter where you are in the world.

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  1  2  0
Category: Engineering Jobs

Tags: Airflow AWS Dagster Data pipelines Data quality EC2 Engineering Flink Lambda LLMs Machine Learning ML models Open Source Pipelines PostgreSQL Python RDBMS Rust Spark

Perks/benefits: Career development Health care Startup environment Unlimited paid time off

Region: North America
Country: United States

More jobs like this