Senior Python Engineer, DataHub Ingestion Framework
Palo Alto, California, United States
Full Time Senior-level / Expert USD 225K - 300K
DataHub, built by Acryl Data, is an AI & Data Context Platform adopted by over 3,000 enterprises, including Apple, CVS Health, Netflix, and Visa. Innovated jointly with a thriving open-source community of 13,000+ members, DataHub's metadata graph provides in-depth context of AI and data assets with best-in-class scalability and extensibility.
The company's enterprise SaaS offering, DataHub Cloud, delivers a fully managed solution with AI-powered discovery, observability, and governance capabilities. Organizations rely on DataHub solutions to accelerate time-to-value from their data investments, ensure AI system reliability, and implement unified governance, enabling AI & data to work together and bring order to data chaos.
The Challenge
As AI and data products become business-critical, enterprises face a metadata crisis:
- No unified way to track the complex data supply chain feeding AI systems
- Engineering teams struggling with data discovery, lineage, and governance
- Organizations needing machine-scale metadata management, not just human-browsable catalogs
Why This Matters
This is where infrastructure meets impact. The metadata layer you'll build will directly power the next generation of AI systems at massive scale. Your code will determine how safely and effectively thousands of organizations deploy AI, affecting millions of users worldwide.
The Role
We're looking for an exceptional Python engineer to lead development of DataHub's ingestion framework – the core that connects diverse data systems and powers our metadata collection capabilities.
You'll Build
- Scalable, fault-tolerant ingestion systems for enterprise-scale metadata
- Clean, intuitive APIs for our connector ecosystem
- Event-driven architectures for real-time metadata processing
- Schema mapping between diverse systems and DataHub's unified model
- Versioning systems for AI assets (training data, model weights, embeddings)
You Have
- 4+ years building production-grade distributed systems
- Advanced Python expertise with a focus on API design
- Experience with high-scale data processing or integration frameworks
- Strong systems knowledge and distributed architecture experience
- A track record of solving complex technical challenges
Bonus Points
- Experience with DataHub or similar metadata/ETL frameworks (Airflow, Airbyte, dbt)
- Open-source contributions
- Early-stage startup experience
Location and Compensation
Bay Area (hybrid, 3 days in Palo Alto office)
Salary Range: $225,000 to $300,000
How we work
Remote first. We're a fully distributed company, and our interaction culture is deliberately mixed between meeting culture and written. We're writing heavy because it forces clarity of thought; we have plenty of synchronous time to give space for collaborative ideation.
Benefits
- Competitive salary
- Equity
- Medical, dental, and vision insurance (99% coverage for employees, 65% coverage for dependents; USA-based employees)
- Carrot Fertility Program (USA-based employees)
- Remote friendly
- Work from home and monthly co-working space budget
Tags: Airflow APIs Architecture dbt Distributed Systems Engineering ETL Open Source Python
Perks/benefits: Career development Competitive pay Equity / stock options Fertility benefits Health care Salary bonus Startup environment
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.