Software Engineer, ML - Document Processing
San Francisco
Full Time Senior-level / Expert USD 180K - 280K
Harvey
Professional Class AI – Harvey is the platform built to meet the standards of the world’s leading professional service firms.Harvey is a secure AI platform for legal and professional services that augments productivity and automates complex workflows. Harvey uses algorithms with reasoning-adept LLMs that have been customized and developed by our expert team of lawyers, engineers and research scientists. We’ve found product market fit and are scaling our team very quickly. Some reasons to join Harvey are:
Exceptional product market fit: We have partnered with the largest law firms and professional service providers in the world, including Paul Weiss, A&O Shearman, Ashurst, O'Melveny & Myers, PwC, KKR, and many others.
Strategic investors: Raised over $500 million from strategic investors including Sequoia, Google Ventures, Kleiner Perkins, and OpenAI.
World-class team: Harvey is hiring the best talent from DeepMind, Google Brain, Stripe, FAIR, Tesla Autopilot, Glean, Superhuman, Figma, and more.
Partnerships: Our engineers and researchers work directly with OpenAI to build the future of generative AI and redefine professional services.
Performance: 4x ARR in 2024.
Competitive compensation.
Harvey has found a massive product-market fit within the legal space, and we are significantly expanding the scale and capabilities of our offering as we grow. Many use cases in legal involve asking questions or extracting information from a collection of documents (either in-house documents, client documents, or publicly available data), so ingesting & processing documents for use in our AI systems is a critical component of our product. As we work with the biggest firms on their most complex projects, we envision building systems that seamlessly store, index, and process hundreds of millions of documents, and retrieve the right information in a fraction of a second.
In this role, you will build at the boundary of what is possible in document understanding and incorporate new advancements in OCR, semantic chunking, and vector storage into Harvey’s core system.
The ideal candidate for this role has strong backend fundamentals (distributed systems, data processing) and experience in building production systems that require experimentation. We’re looking for someone who is hands-on and execution-focused in their approach to experimentation - you get things done and can navigate trade-offs between precision, cost, and speed.
What You’ll DoDesign and build a robust evaluation system for document understanding. Build and extend our large set of complex documents, like handwritten text from decades-old governing law or large Excel files containing the complex calculations of a corporate merger. Establish reliable baseline labels by working with legal domain experts or leveraging synthetic labeling.
Iterate on representation schemes for different data types: what’s the best way to represent a spreadsheet cell in a retrieval database? How should models treat strike-throughs?
Benchmark and implement modern advancements across various modalities, like vision and audio models, into the Harvey stack.
Improve the scalability, observability, and fault tolerance of our document processing service.
3+ YoE (post-BS/MS) in an engineering or research role.
Demonstrated experience working cross-functionally with other engineering teams: you’ll need to prioritize investment in processing quality based on our product needs.
Experience with using a data-driven approach to guide engineering decisions, like recommendation engines or LLM providers.
Experience with search infrastructure or vector databases is a plus.
Track record of shipping reliable products and a strong attention to detail.
Grit - experience working at early-stage startups is a plus.
Harvey is an equal opportunity employer and does not discriminate on the basis of race, gender, sexual orientation, gender identity/expression, national origin, disability, age, genetic information, veteran status, marital status, pregnancy or related condition, or any other basis protected by law.
We are in the early innings of a generational company. Joining early at a hypergrowth startup has proven to lead to exponential growth in responsibility, access, and ability. Apply here today!
Tags: Distributed Systems Engineering Excel Generative AI LLMs Machine Learning OCR OpenAI Research
Perks/benefits: Competitive pay Startup environment
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.