Data Infrastructure Engineer (India)
Bangalore
Onehouse
The data lakehouse for open storage, continuous pipelines, and automatic optimizations across table formats, engines, clouds. Automated data platform across Hudi, Delta, and Iceberg.
About OnehouseOnehouse is a mission-driven company dedicated to freeing data from data platform lock-in. We deliver the industry’s most interoperable data lakehouse through a cloud-native managed service built on Apache Hudi. Onehouse enables organizations to ingest data at scale with minute-level freshness, centrally store it, and make available to any downstream query engine and use case (from traditional analytics to real-time AI / ML).
We are a team of self-driven, inspired, and seasoned builders that have created large-scale data systems and globally distributed platforms that sit at the heart of some of the largest enterprises out there including Uber, Snowflake, AWS, Linkedin, Confluent and many more. Riding off a fresh $35M Series B backed by Craft, Greylock and Addition Ventures, we're now at $68M total funding and looking for rising talent to grow with us and become future leaders of the team. Come help us build the world's best fully managed and self-optimizing data lake platform!
The Community You Will JoinWhen you join Onehouse, you're joining a team of passionate professionals tackling the deeply technical challenges of building a 2-sided engineering product. Our engineering team serves as the bridge between the worlds of open source and enterprise: contributing directly to and growing Apache Hudi (already used at scale by global enterprises like Uber, Amazon, ByteDance etc) and concurrently defining a new industry category - the transactional data lake. The Data Infrastructure team is the grounding heartbeat to all of this. We live and breathe databases, building cornerstone infrastructure by working under Hudi's hood to solving incredibly complex optimization and systems problems.
House ValuesOne TeamOptimize for the company, your team, self - in that order. We may fight long and hard in the trenches, take care of your co-workers with empathy. We give more than we take to build the one house, that everyone dreams of being part of.
Tough & Persevering We are building our company in a very large, fast-growing but highly competitive space. Life will get tough sometimes. We take hardships in the stride, be positive, focus all energy on the path forward and develop a champion's mindset to overcome odds. Always day one!
Keep Making It Better AlwaysRome was not built in a day; If we can get 1% better each day for one year, we'll end up thirty-seven times better. This means being organized, communicating promptly, taking even small tasks seriously, tracking all small ideas, and paying it forward.
Think Big, Act FastWe have tremendous scope for innovation, but we will still be judged by impact over time. Big, bold ideas still need to be strategized against priorities, broken down, set in rapid motion, measure, refine, repeat. Great execution is what separates promising companies from proven unicorns.
Be Customer ObsessedEveryone has the responsibility to drive towards the best experience for the customer, be an OSS user or a paid customer. If something is broken, own it, say something, do something; never ignore. Be the change that you want to see in the company.
We are a team of self-driven, inspired, and seasoned builders that have created large-scale data systems and globally distributed platforms that sit at the heart of some of the largest enterprises out there including Uber, Snowflake, AWS, Linkedin, Confluent and many more. Riding off a fresh $35M Series B backed by Craft, Greylock and Addition Ventures, we're now at $68M total funding and looking for rising talent to grow with us and become future leaders of the team. Come help us build the world's best fully managed and self-optimizing data lake platform!
The Community You Will JoinWhen you join Onehouse, you're joining a team of passionate professionals tackling the deeply technical challenges of building a 2-sided engineering product. Our engineering team serves as the bridge between the worlds of open source and enterprise: contributing directly to and growing Apache Hudi (already used at scale by global enterprises like Uber, Amazon, ByteDance etc) and concurrently defining a new industry category - the transactional data lake. The Data Infrastructure team is the grounding heartbeat to all of this. We live and breathe databases, building cornerstone infrastructure by working under Hudi's hood to solving incredibly complex optimization and systems problems.
The Impact You Will Drive:
- As a foundational member of the Data Infrastructure team, you will productionize the next generation of our data tech stack by building the software and data features that actually process all of the data we ingest.
- Accelerate our open source <> enterprise flywheel by working on the guts of Apache Hudi's transactional engine and optimizing it for diverse Onehouse customer workloads.
- Act as a SME to deepen our teams' expertise on database internals, query engines, storage and/or stream processing.
A Typical Day:
- Design new concurrency control and transactional capabilities that maximize throughput for competing writers.
- Design and implement new indexing schemes, specifically optimized for incremental data processing and analytical query performance.
- Design systems that help scale and streamline metadata and data access from different query/compute engines.
- Solve hard optimization problems to improve the efficiency (increase performance and lower cost) of distributed data processing algorithms over a Kubernetes cluster.
- Leverage data from existing systems to find inefficiencies, and quickly build and validate prototypes.
- Collaborate with other engineers to implement and deploy, safely rollout the optimized solutions in production.
What You Bring to the Table:
- Strong, object-oriented design and coding skills (Java and/or C/C++ preferably on a UNIX or Linux platform).
- Experience with inner workings of distributed (multi-tiered) systems, algorithms, and relational databases.
- You embrace ambiguous/undefined problems with an ability to think abstractly and articulate technical challenges and solutions.
- An ability to prioritize across feature development and tech debt with urgency and speed.
- An ability to solve complex programming/optimization problems.
- An ability to quickly prototype optimization solutions and analyze large/complex data.
- Robust and clear communication skills.
- Nice to haves (but not required):
- Experience working with database systems, Query Engines or Spark codebases.
- Experience in optimization mathematics (linear programming, nonlinear optimization).
- Existing publications of optimizing large-scale data systems in top-tier distributed system conferences.
- PhD degree with 2+ years industry experience in solving and delivering high-impact optimization projects.
House ValuesOne TeamOptimize for the company, your team, self - in that order. We may fight long and hard in the trenches, take care of your co-workers with empathy. We give more than we take to build the one house, that everyone dreams of being part of.
Tough & Persevering We are building our company in a very large, fast-growing but highly competitive space. Life will get tough sometimes. We take hardships in the stride, be positive, focus all energy on the path forward and develop a champion's mindset to overcome odds. Always day one!
Keep Making It Better AlwaysRome was not built in a day; If we can get 1% better each day for one year, we'll end up thirty-seven times better. This means being organized, communicating promptly, taking even small tasks seriously, tracking all small ideas, and paying it forward.
Think Big, Act FastWe have tremendous scope for innovation, but we will still be judged by impact over time. Big, bold ideas still need to be strategized against priorities, broken down, set in rapid motion, measure, refine, repeat. Great execution is what separates promising companies from proven unicorns.
Be Customer ObsessedEveryone has the responsibility to drive towards the best experience for the customer, be an OSS user or a paid customer. If something is broken, own it, say something, do something; never ignore. Be the change that you want to see in the company.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Job stats:
0
0
0
Category:
Engineering Jobs
Tags: AWS Engineering Java Kubernetes Linux Machine Learning Mathematics Open Source PhD RDBMS Snowflake Spark
Perks/benefits: Career development Competitive pay Conferences Equity / stock options Flex vacation Gear Home office stipend Unlimited paid time off
Region:
Asia/Pacific
Country:
India
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.
Data Engineer II jobsStaff Data Scientist jobsPrincipal Data Engineer jobsBI Developer jobsData Scientist II jobsData Manager jobsData Science Manager jobsJunior Data Analyst jobsResearch Scientist jobsBusiness Data Analyst jobsLead Data Analyst jobsData Science Intern jobsSr. Data Scientist jobsSenior AI Engineer jobsData Engineer III jobsSenior Data Scientist, Performance Marketing jobsBI Analyst jobsSoftware Engineer, Machine Learning jobsSr Data Engineer jobsData Specialist jobsJunior Data Scientist jobsJunior Data Engineer jobsSenior Artificial Intelligence/Machine Learning Engineer - Remote, Latin America jobsData Analyst Intern jobsData Engineering Manager jobs
Linux jobsSnowflake jobsEconomics jobsOpen Source jobsHadoop jobsPhysics jobsJavaScript jobsAirflow jobsComputer Vision jobsMLOps jobsRDBMS jobsKafka jobsNoSQL jobsScala jobsData Warehousing jobsBanking jobsGoogle Cloud jobsData warehouse jobsKPIs jobsGitHub jobsOracle jobsPostgreSQL jobsR&D jobsClassification jobsScikit-learn jobs
SAS jobsTerraform jobsCX jobsLooker jobsScrum jobsStreaming jobsDistributed Systems jobsPandas jobsData Mining jobsJenkins jobsRobotics jobsBigQuery jobsIndustrial jobsPySpark jobsJira jobsReact jobsMicroservices jobsdbt jobsRedshift jobsMatlab jobsUnstructured data jobsE-commerce jobsMySQL jobsGPU jobsData strategy jobs