Software Engineer, Discovery - Data
New York
ā ļø We'll shut down after Aug 1st - try fooš¦ for all jobs in tech ā ļø
Full Time Senior-level / Expert USD 120K - 180K
Harmonic
Harmonic's data engine keeps 20M+ companies & 150M+ professional profiles fresh, so you can always be in the loop when a company just raised a round, just hired a CTO, or just crossed the 1M follower mark on Twitter.About us
Harmonic is the startup discovery engine.
It pains us to see great startup opportunities consistently go undiscovered. So, we dedicated ourselves to mapping out the startup landscape and building the tools that ensure the most promising founders get found and funded.
The world's largest and most prolific venture capital firms (as well as the up-and-comers you havenāt heard of yet) rely on us to find and invest in the next Google, AirBnB, Uber, Stripe, and Anduril. We play a crucial part in ensuring hundreds of billions of dollars get routed efficiently and that the innovations the world would most benefit from materialize.
We are on pace to double over the next twelve months and already power thousands of investors' workflow. Backed by $30M from investors like Craft, Floodgate, and Sozo Ventures, we want to power the entire investment workflow from discovery to term sheet. If you resonate with our values and want to fundamentally evolve how venture capital markets work, come join us.
About Discovery
The Discovery team sits at the core of Harmonicās product and data engine. We ingest data from hundreds of fragmented and unstructured sources - ranging from APIs and raw HTML to legal filings and documents - and turn them into structured insights on startups, people, and investors.
Weāre responsible not just for the quality, coverage, and freshness of this data, but also for how itās served and surfaced to users across Harmonic.
That means building the pipelines that reconcile data at massive scale and powering the full-stack search experience that helps investors discover breakout companies before anyone else. From advanced LLM-based extraction to scalable search indexing and responsive APIs, our work directly fuels Harmonicās most critical product surfaces - from grid-based search to AI research copilots.
To learn more about the team:
- ExploreĀ Working with SangĀ andĀ Working with Miguel
- Check out your teammatesĀ Jimmy,Ā Akshaya,Ā Apoorva,Ā Gavin,Ā TJ, andĀ Joe.
- Explore theĀ Team Page. Weāre a mix of ex-founders and seasoned engineers from top engineering institutions like Google, LinkedIn, Microsoft and Meta.
The role
In this role, you will:
- Build and maintain Python-based data pipelines that ingest, clean, and structure data from a wide range of sources.
- Collaborate with data and infrastructure teams to implement scalable systems that power Harmonicās startup, person, and investment data models.
- Contribute to data quality initiatives by writing validation logic, tracking metrics, and helping improve coverage, freshness, and accuracy.
- Help bring structured data to life in the product through integrations with databases, search indexes, and GraphQL APIs.
- Learn how to think about system design, data modeling, and search architecture.
Background weāre looking for:
- 2ā4 years of experience as a software or data engineer.
- Experience working with data pipelines, backend systems, or APIs.
- Curiosity and product intuition around what makes data useful, high-quality, and impactful for end users.
- Willingness to dive deep into messy, unstructured data and help shape it into meaningful outputs.
- Interest in growing into someone who can think holisticallyāacross data modeling, system design, and user experience.
Experience weād be particularly excited about:
- Experience working on ETL pipelines, data ingestion systems, or backend services where data quality mattered.
- Familiarity with modern data tooling (DBT, Elasticsearch, GraphQL) or interest in learning them quickly.
- Exposure to LLMs, agentic tools, or AI-based data extractionāeven in side projects or academic settings.
- Excitement about startups, venture capital, or company intelligence data.
Pay:
$120-180k Salary + Equity depending on the level
Our stack
The Process
Hereās our interview process:
- Recruiter Screening: 20-30 mins
- Take Home Exam
- Technical Screening: 1 hr
- Two behavioral interviews with team: 45 mins each
- References
Benefits
- 𩺠Top of the line health, dental and vision insurance, with 100% premium covered
- š 401k matching
- š Free lunch in office
- š£ Monthly team dinner (we have a lot of foodies) for each office
- š Commuter benefits
Tags: APIs Architecture Data pipelines Data quality dbt Elasticsearch Engineering ETL GraphQL LLMs Pipelines Python Research Unstructured data
Perks/benefits: Career development Equity / stock options Health care Startup environment
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.