GenAI Eval Software Engineer

AMER

Applications have closed

Databricks

Databricks offers a unified platform for data, analytics and AI. Build better AI with a data-centric approach. Simplify ETL, data warehousing, governance and AI on the Data Intelligence Platform.

View all jobs at Databricks

Find more jobs like this

Posted 1 month ago

RDQ426R88

At Databricks, we are passionate about enabling data teams to solve the world's toughest problems — from making the next mode of transportation a reality to accelerating the development of medical breakthroughs. We do this by building and running the world's best data and AI infrastructure platform so our customers can use deep data insights to improve their business. Founded by engineers — and customer obsessed — we leap at every opportunity to tackle technical challenges, from designing next-gen UI/UX for interfacing with data to scaling our services and infrastructure across millions of virtual machines. And we're only getting started.

The Applied AI team at Databricks sits at the forefront of advancing GenAI-powered products. Over the past years, we’ve launched Databricks Assistant and AI/BI Genie working with product teams, and made significant strides in using LLM for these products. These products are used by 100s of thousands of Databricks users every day. We are tackling challenging problems like code suggestion, error detection and correction, text-to-sql generation, automatic pipeline generation, knowledge QA and many others.

The impact you will have:

As we continue to expand our GenAI capabilities, we're looking for a passionate Software Engineer (L4) to join our team building the GenAI Evaluation System — a critical foundation that ensures the quality, reliability, and performance of AI-driven products at scale.

In this role, you'll design and develop systems to evaluate Large Language Models (LLMs) and GenAI applications, enabling rapid experimentation, robust benchmarking, and continuous improvement across our AI products. You'll collaborate with ML engineers, product teams, and researchers to define what "good" looks like for GenAI Applications, driving impact across Databricks’s products.

Design, build, and maintain scalable infrastructure for evaluating LLMs and GenAI-powered features.
Develop automated testing, benchmarking, and monitoring frameworks to measure model quality and reliability.
Collaborate closely with ML engineers, product managers, and researchers to define evaluation metrics and methodologies.
Enable rapid iteration by building tools that support A/B testing, human-in-the-loop evaluations, and dataset management.
Contribute to the evolution of best practices for GenAI evaluation across diverse use cases (e.g., Text-to-SQL, code assistance, conversational AI).

What we look for:

Bachelor's degree in Computer Science, Engineering, or related field (or equivalent practical experience).
2+ years of industry experience in software engineering, preferably in infrastructure, platforms, or ML tooling.
Strong coding skills in languages such as Python, Scala, or Java.
Experience building scalable backend systems, distributed systems, or developer platforms.
Familiarity with SQL, machine learning workflows, LLMs, or AI evaluation concepts is a plus.
Strong problem-solving skills and a collaborative mindset

Why Join Us?

Work at the forefront of GenAI, shaping how AI quality is defined and delivered at scale.
Collaborate with world-class engineers and ML experts in a fast-paced, innovative environment.
Contribute to impactful AI products used by enterprises worldwide.

Be part of a company built on open-source, transparency, and a strong engineering culture.

Pay Range Transparency

Databricks is committed to fair and equitable compensation practices. The pay range(s) for this role is listed below and represents base salary range for non-commissionable roles or on-target earnings for commissionable roles. Actual compensation packages are based on several factors that are unique to each candidate, including but not limited to job-related skills, depth of experience, relevant certifications and training, and specific work location. Based on the factors above, Databricks utilizes the full width of the range. The total compensation package for this position may also include eligibility for annual performance bonus, equity, and the benefits listed above. For more information regarding which range your location is in visit our page here.

Zone 1 Pay Range$142,200—$204,600 USD

About Databricks

Databricks is the data and AI company. More than 10,000 organizations worldwide — including Comcast, Condé Nast, Grammarly, and over 50% of the Fortune 500 — rely on the Databricks Data Intelligence Platform to unify and democratize data, analytics and AI. Databricks is headquartered in San Francisco, with offices around the globe and was founded by the original creators of Lakehouse, Apache Spark™, Delta Lake and MLflow. To learn more, follow Databricks on Twitter, LinkedIn and Facebook.

Benefits

At Databricks, we strive to provide comprehensive benefits and perks that meet the needs of all of our employees. For specific details on the benefits offered in your region, please visit https://www.mybenefitsnow.com/databricks.

Our Commitment to Diversity and Inclusion

At Databricks, we are committed to fostering a diverse and inclusive culture where everyone can excel. We take great care to ensure that our hiring practices are inclusive and meet equal employment opportunity standards. Individuals looking for employment at Databricks are considered without regard to age, color, disability, ethnicity, family or marital status, gender identity or expression, language, national origin, physical and mental ability, political affiliation, race, religion, sexual orientation, socio-economic status, veteran status, and other protected characteristics.

Compliance

If access to export-controlled technology or source code is required for performance of job duties, it is within Employer's discretion whether to apply for a U.S. government license for such positions, and Employer may decline to proceed with an applicant on this basis alone.

Find more jobs like this

Job stats: 2 0 0

Categories: Deep Learning Jobs Engineering Jobs Generative AI Jobs

Tags: A/B testing Computer Science Conversational AI Databricks Distributed Systems Engineering Excel Generative AI Java LLMs Machine Learning MLFlow ML infrastructure Open Source Python Scala Spark SQL Testing UX