Data Platform Engineer (US)
Sunnyvale, CA
Full Time Mid-level / Intermediate USD 150K - 220K
Onehouse
The data lakehouse for open storage, continuous pipelines, and automatic optimizations across table formats, engines, clouds. Automated data platform across Hudi, Delta, and Iceberg.We are a team of self-driven, inspired, and seasoned builders that have created large-scale data systems and globally distributed platforms that sit at the heart of some of the largest enterprises out there including Uber, Snowflake, AWS, Linkedin, Confluent and many more. Riding off a fresh $35M Series B backed by Craft, Greylock and Addition Ventures, we're now at $68M total funding and looking for rising talent to grow with us and become future leaders of the team. Come help us build the world's best fully managed and self-optimizing data lake platform!
The Community You Will JoinWhen you join Onehouse, you're joining a team of passionate professionals tackling the deeply technical challenges of building a 2-sided engineering product. Our engineering team serves as the bridge between the worlds of open source and enterprise: contributing directly to and growing Apache Hudi (already used at scale by global enterprises like Uber, Amazon, ByteDance etc) and concurrently defining a new industry category - the transactional data lake. The Data Infrastructure team is the grounding heartbeat to all of this. We live and breathe databases, building cornerstone infrastructure by working under Hudi's hood to solving incredibly complex optimization and systems problems.
A Typical Day:
- Be the thought leader around all things data engineering within the company - schemas, frameworks, data models.
- Implement new sources and connectors to seamlessly ingest data streams.
- Building scalable job management on Kubernetes to ingest, store, manage and optimize petabytes of data on cloud storage.
- Optimize Spark or Flink applications to flexibly run in batch or streaming modes based on user needs, optimize latency vs throughput.
- Tune clusters for resource efficiency and reliability, to keep costs low, while still meeting SLAs
What You Bring to the Table:
- 3+ years of experience in building and operating data pipelines in Apache Spark or Apache Flink.
- 2+ years of experience with workflow orchestration tools like Apache Airflow, Dagster.
- Proficient in Java, Maven, Gradle and other build and packaging tools.
- Adept at writing efficient SQL queries and trouble shooting query plans.
- Experience managing large-scale data on cloud storage.
- Great problem-solving skills, eye for details. Can debug failed jobs and queries in minutes.
- Operational excellence in monitoring, deploying, and testing job workflows.
- Open-minded, collaborative, self-starter, fast-mover.
- Nice to haves (but not required):
- Hands-on experience with k8s and related toolchain in cloud environment.
- Experience operating and optimizing terabyte scale data pipelines
- Deep understanding of Spark, Flink, Presto, Hive, Parquet internals.
- Hands-on experience with open source projects like Hadoop, Hive, Delta Lake, Hudi, Nifi, Drill, Pulsar, Druid, Pinot, etc.
- Operational experience with stream processing pipelines using Apache Flink, Kafka Streams.
House ValuesOne TeamOptimize for the company, your team, self - in that order. We may fight long and hard in the trenches, take care of your co-workers with empathy. We give more than we take to build the one house, that everyone dreams of being part of.
Tough & Persevering We are building our company in a very large, fast-growing but highly competitive space. Life will get tough sometimes. We take hardships in the stride, be positive, focus all energy on the path forward and develop a champion's mindset to overcome odds. Always day one!
Keep Making It Better AlwaysRome was not built in a day; If we can get 1% better each day for one year, we'll end up thirty-seven times better. This means being organized, communicating promptly, taking even small tasks seriously, tracking all small ideas, and paying it forward.
Think Big, Act FastWe have tremendous scope for innovation, but we will still be judged by impact over time. Big, bold ideas still need to be strategized against priorities, broken down, set in rapid motion, measure, refine, repeat. Great execution is what separates promising companies from proven unicorns.
Be Customer ObsessedEveryone has the responsibility to drive towards the best experience for the customer, be an OSS user or a paid customer. If something is broken, own it, say something, do something; never ignore. Be the change that you want to see in the company.
Pay Range TransparencyOnehouse is committed to fair and equitable compensation practices. Our job titles may span more than one career level. The pay range(s) for this role is listed above and represents the base salary range for non-commissionable roles or on-target earnings for commissionable roles. Actual compensation packages are dependent upon several factors that are unique to each candidate, including but not limited to: job-related skills, depth of transferable experience, relevant certifications and training, business needs, market demands and specific work location. Based on the factors above, Onehouse utilizes the full width of the range; the base pay range is subject to change and may be modified in the future. The total compensation package for this position will also include eligibility for equity options and the benefits listed above.
Tags: Airflow AWS Dagster Data pipelines Engineering Flink Hadoop Java Kafka Kubernetes Machine Learning Maven NiFi Open Source Parquet Pipelines Pulsar Snowflake Spark SQL Streaming Testing
Perks/benefits: Career development Competitive pay Equity / stock options Flex vacation Health care Home office stipend Unlimited paid time off
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.