Senior Software Engineer – Inference

Bellevue, WA

Apply now Apply later

Generative AI is transforming the way we interact with information, but it also poses an existential challenge to content creators. By scraping proprietary material without compensation, generative AI undermines key revenue streams -such as ad revenue, subscriptions, and licensing agreements - that sustain high-quality and original content creation.

ProRata was founded in 2024 by Bill Gross at Idealab Studio with a mission to ensure that generative AI platforms fairly credit and compensate content owners for their contributions. Our groundbreaking technology enables generative AI platforms to attribute sources and share revenue on a per-use basis, protecting creators, fostering sustainable journalism, and promoting the integrity of AI-generated content.


Role:

We’re looking for a Senior Software Engineer to join our Inference Team, where you’ll lead the design and development of our Retrieval-Augmented Generation (RAG) infrastructure. In this role, you will work closely with ML engineers, research scientists, and product teams to power both web search and API-based experiences for millions of users with fast, accurate, and context-aware responses. 

You will architect scalable systems that combine LLMs and vector retrieval, optimizing for relevance, recall, latency, and cost. This is a high-impact role focused on AI/ML inference, retrieval performance, and significant ownership in both technical decision-making and long-term architecture. 

  

Responsibilities:

  • Design, build and scale a production-grade inference stack for RAG-based applications. 

  • Develop efficient retrieval pipelines using OpenSearch or similar vector databases, with a focus on high recall and response relevance. 

  • Optimize performance and latency for both real-time and batch queries. 

  • Identify and address bottlenecks in the inference stack to improve response times and system efficiency. 

  • Ensure high reliability, observability, and monitoring of deployed systems. 

  • Collaborate with cross-functional teams to integrate LLMs and retrieval components into user-facing applications. 

  • Evaluate and integrate modern RAG frameworks and tools to accelerate development. 

  • Guide architectural decisions, mentor team members, and uphold engineering excellence. 

  

Qualifications:

  • Bachelor’s degree in Computer Science or related field, or equivalent practical experience. 

  • 8+ years of experience in software engineering, with a focus on AI/ML systems or distributed systems. 

  • Hands-on experience building and deploying retrieval-augmented generation (RAG) systems. 

  • Deep knowledge of OpenSearch, Elasticsearch, or similar search engines. 

  • Strong coding skills in Python and/or other backend languages (e.g., Rust, Java). 

  • Experience with vector search, embedding pipelines, and dense retrieval techniques. 

  • Proven ability to optimize inference stacks for latency, reliability, and scalability. 

  • Excellent problem-solving, analytical, and debugging skills. 

  • Strong sense of ownership, ability to work independently, and a self-starter mindset in fast-paced environments. 

  • Passion for building impactful technology aligned with our mission. 

  

Preferred Qualifications:  

  • Experience with frameworks like LlamaIndex or LangChain. 

  • Familiarity with vector databases such as Pinecone, Qdrant, or FAISS. 

  • Exposure to LLM fine-tuning, semantic search, embeddings, and prompt engineering. 

  • Previous work on systems handling millions of users or queries per day. 

  • Familiarity with cloud infrastructure (AWS, GCP, or Azure) and containerization tools (Docker, Kubernetes). 



Work Environment: 

Location: This position is Onsite. This role is based at our Bellevue WA (or Pasadena, CA) office location, and employees are expected to work on-site during regular business hours. 

  

Compensation: 

The compensation for this position will be competitive and commensurate with experience. The estimated salary range for this role is 160,000 - 200,000 USD. 

What We Offer:

  • Opportunity to work at the forefront of AI technology

  • Collaborative and innovative work environment

  • Competitive salary and benefits package

  • Professional development and growth opportunities

  • Chance to make a significant impact on the company's success


Equal Employment Opportunity: 

  • ProrataAI is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. All employment decisions are made based on qualifications, merit, and business needs. 

  

California Specific Notices: 

  • At-Will Employment: Employment at ProrataAI is at-will. This means that either the employee or the employer may terminate employment at any time, with or without cause or prior notice. 

  • Salary Disclosure: In compliance with California law, salary information is provided to ensure transparency and fairness. 

  • California Consumer Privacy Act (CCPA): ProrataAI complies with the CCPA. Personal information collected during the recruitment process will be used for employment purposes only. 

Apply now Apply later
Job stats:  0  0  0
Category: Engineering Jobs

Tags: APIs Architecture AWS Azure Computer Science Content creation Distributed Systems Docker Elasticsearch Engineering FAISS GCP Generative AI Java Kubernetes LangChain LLMs Machine Learning OpenSearch Pinecone Pipelines Privacy Prompt engineering Python RAG Research Rust

Perks/benefits: Career development Competitive pay Startup environment

Region: North America
Country: United States

More jobs like this