AI Evaluations Research Scientist

Washington, DC (DC Metro Area), United States

Full Time Mid-level / Intermediate Clearance required USD 115K - 246K

RAND Corporation

RAND is a nonprofit institution that helps improve policy and decisionmaking through research and analysis. RAND focuses on the issues that matter most such as health, education, national security, international affairs, the environment, and...

View all jobs at RAND Corporation

Apply now Apply later

Posted 1 month ago

Job Type:

Regular

Overview

RAND’s Technology and Security Policy Center (TASP) is seeking mission-driven AI Evaluations Research Scientists to develop and execute research projects and engineering efforts within our AI Capability Evaluations (ACE) team.

RAND's reputation for excellence is built on our commitment to high-quality, rigorous analysis and objectivity. TASP is at the forefront of research and implementation regarding the impact of high-consequence, dual-use technologies—such as artificial intelligence and biotechnology—on global competition and security. Our research has been used by the White House, government departments, the EU and UK governments, and industry leaders, among others. Our alumni have gone on to important roles at the NSC, Commerce, DOD, Congress, Google DeepMind, OpenAI, EU AI Office, UK AISI, other key think tanks, and founding mission-driven tech initiatives.

ACE develops and conducts evaluations of national security relevant capabilities of frontier AI systems, with a current focus on the intersection of large language models (LLMs) and AI agents with biological risk. We’re hiring for people with research science and/or research engineering skills to play a key role in work that assists public policymakers at all levels in strengthening national security and mitigating catastrophic risks enabled by AI systems. They will work on complex problems at the intersection of AI and national security where technical details matter and will contribute to multidisciplinary project teams that include biosecurity experts, machine learning engineers, and policy researchers.

This position is initially structured as a focused 1-year appointment to create the urgency needed to drive ambitious change in this rapidly evolving field. Every day of your tenure will count toward that goal. The appointment may be renewed for up to a total of 3 years, with options for longer-term employment at RAND thereafter. Full-time and part-time (at least 20 hours per week) schedules will be considered, but with a strong preference for full-time.

Responsibilities

Given the breadth of valuable work our team could do, there is some ability to align responsibilities with an individual’s skills, interests, and career goals, including in terms of the balance of research scientist- versus research engineer-style responsibilities. Responsibilities may include but are not limited to:

Contribute to developing concrete threat models for high-consequence risks AI risks, working with internal and external partners

Design and execute rigorous, objective evaluations of AI capabilities relevant to key bottlenecks within those threat models

Develop and maintain the technical infrastructure required to support this research, working with relevant internal and external IT stakeholders

Develop and maintain code for fundamental evaluation components that can be used across research efforts (e.g. prompting, automated grading, statistical analysis)

Keep up to date with the latest advances in AI evaluation engineering and the science of evaluations to continually improve the rigor and efficiency of our evaluations

Contribute to setting strategic and research priorities, with an emphasis on the policy impact of evaluations

Communicate research results to policymakers and other key stakeholders at all levels through written products and oral presentations

A successful candidate could grow into leading a team and/or mentoring more junior staff.

Qualifications
All research positions at RAND require excellent analytic skills; the ability to communicate clearly and effectively in English, both orally and in writing; the ability to work effectively as a member of a multi-disciplinary team; and a strong commitment to RAND's core values of quality and objectivity.

Other required qualifications:

Strong interest in understanding and addressing potential national security risks related to autonomy or high-consequence misuse of LLMs and AI agents, and in AI capability evaluations as a route to impact

Proficiency in Python

Familiarity with technical aspects of AI systems and related technologies, such as machine learning, computational infrastructure, or information security

Preferred but not required:

Experience with evaluations and evaluation frameworks for LLMs and AI agents (e.g. Inspect)

Experience with LLM elicitation techniques (e.g. fine-tuning, retrieval augmented generation, tool-use integration, agent scaffolding)

Experience working on ML model development/deployment or working at/with leading AI companies

Experience with cloud computing, in particular Azure and AWS, including government cloud environments

Familiarity with common LLM frameworks (e.g. LangChain, LlamaIndex)

Aptitude for project management and/or mentorship

Strong communication skills, both written and verbal, tailored to technical and non-technical audiences, or ability to rapidly develop that

Experience in government, intelligence community, other relevant decision-making offices, or policy analysis roles

Education Requirements
RAND is hiring for this role at associate, specialist, and expert levels of experience. Minimum education requirements at the associate level include:

A PhD in a relevant field. This can include Artificial Intelligence, Machine Learning, Computer Science, Cybersecurity, Electrical Engineering, Physics, Mathematics, Engineering and Public Policy, Security Studies, or similar.

A Master’s degree in the fields listed above with at least 3 years of relevant professional experience.

A Bachelor’s degree in the fields listed above with at least 5 years of relevant professional experience.

Security Clearance
Ability to obtain and maintain a U.S. security clearance, including having US citizenship, is preferred but not required.

Location

We are actively hiring for this position in Washington, DC; San Francisco, CA; Boston, MA; Santa Monica, CA; and Pittsburgh, PA. San Francisco or especially DC are preferred. We offer a hybrid work arrangement, combining work from home and on-site options. Fully remote work will also be considered.

Term

This position is a 1-year term appointment with a possibility of renewal for up to 3 years total, alongside options for longer term employment.

Application

Applications must include:

A detailed resume highlighting relevant academic and professional experience.

A writing sample demonstrating analytical and communication skills. This sample may be a recent, previously written paper or report (e.g., journal article, master’s thesis or paper written for coursework, prior employment, or internship). Applicants whose study and work experience (e.g., model development) has not involved producing written products that are shareable may submit a short, written summary (i.e., less than one page) of one or more recent products they have developed.

A code sample.

A cover letter which contains only responses to each of the following prompts:

1) Summarize in <200 words your career goals and why you are interested in this role.

2) Describe in <300 words one research direction or engineering infrastructure project you may want to pursue in this role. For a research direction: Describe what questions you would try to answer, what methods you would use, how many months of work would be required from you and/or colleagues, and what outcomes this research might help achieve (e.g., what important policy decisions it might inform). For an infrastructure project: You may make guesses about our goals and existing infrastructure, and propose a way you might help improve that, noting how you would implement that, how many months of work may be required from you and/or colleagues, and why this might be useful. This is just an assessment step and does not mean you would definitely work on this if hired.

Salary Range: $115,400 - $246,600

Visiting Technical Associate = $115,400 - $167,300

Visiting Technical Specialist = $137,000 - $209,000

Visiting Technical Expert = $157,800 - $246,600

RAND considers a variety of factors when formulating an offer, including the specific role responsibilities; a candidate’s work experience, education/training, skills, expertise; and internal equity. In addition, RAND provides strong benefits including health insurance coverage, life and disability insurance, a savings plan, paid time-off, and more.

Equal Opportunity Employer

Apply now Apply later

Job stats: 11 1 0

Categories: Data Science Jobs Deep Learning Jobs Research Jobs

Tags: AWS Azure Computer Science Engineering LangChain LLMs Machine Learning Mathematics ML models OpenAI PhD Physics Prompt engineering Python Research Security Statistics