AI Evaluations Research Scientist
Washington, DC (DC Metro Area), United States
Full Time Mid-level / Intermediate Clearance required USD 115K - 246K
RAND Corporation
RAND is a nonprofit institution that helps improve policy and decisionmaking through research and analysis. RAND focuses on the issues that matter most such as health, education, national security, international affairs, the environment, and...Job Type:
RegularOverview
RAND’s Technology and Security Policy Center (TASP) is seeking mission-driven AI Evaluations Research Scientists to develop and execute research projects and engineering efforts within our AI Capability Evaluations (ACE) team.
RAND's reputation for excellence is built on our commitment to high-quality, rigorous analysis and objectivity. TASP is at the forefront of research and implementation regarding the impact of high-consequence, dual-use technologies—such as artificial intelligence and biotechnology—on global competition and security. Our research has been used by the White House, government departments, the EU and UK governments, and industry leaders, among others. Our alumni have gone on to important roles at the NSC, Commerce, DOD, Congress, Google DeepMind, OpenAI, EU AI Office, UK AISI, other key think tanks, and founding mission-driven tech initiatives.
ACE develops and conducts evaluations of national security relevant capabilities of frontier AI systems, with a current focus on the intersection of large language models (LLMs) and AI agents with biological risk. We’re hiring for people with research science and/or research engineering skills to play a key role in work that assists public policymakers at all levels in strengthening national security and mitigating catastrophic risks enabled by AI systems. They will work on complex problems at the intersection of AI and national security where technical details matter and will contribute to multidisciplinary project teams that include biosecurity experts, machine learning engineers, and policy researchers.
This position is initially structured as a focused 1-year appointment to create the urgency needed to drive ambitious change in this rapidly evolving field. Every day of your tenure will count toward that goal. The appointment may be renewed for up to a total of 3 years, with options for longer-term employment at RAND thereafter. Full-time and part-time (at least 20 hours per week) schedules will be considered, but with a strong preference for full-time.
Responsibilities
Given the breadth of valuable work our team could do, there is some ability to align responsibilities with an individual’s skills, interests, and career goals, including in terms of the balance of research scientist- versus research engineer-style responsibilities. Responsibilities may include but are not limited to:
Contribute to developing concrete threat models for high-consequence risks AI risks, working with internal and external partners
Design and execute rigorous, objective evaluations of AI capabilities relevant to key bottlenecks within those threat models
Develop and maintain the technical infrastructure required to support this research, working with relevant internal and external IT stakeholders
Develop and maintain code for fundamental evaluation components that can be used across research efforts (e.g. prompting, automated grading, statistical analysis)
Keep up to date with the latest advances in AI evaluation engineering and the science of evaluations to continually improve the rigor and efficiency of our evaluations
Contribute to setting strategic and research priorities, with an emphasis on the policy impact of evaluations
Communicate research results to policymakers and other key stakeholders at all levels through written products and oral presentations
A successful candidate could grow into leading a team and/or mentoring more junior staff.
Qualifications
All research positions at RAND require excellent analytic skills; the ability to communicate clearly and effectively in English, both orally and in writing; the ability to work effectively as a member of a multi-disciplinary team; and a strong commitment to RAND's core values of quality and objectivity.
Other required qualifications:
Strong interest in understanding and addressing potential national security risks related to autonomy or high-consequence misuse of LLMs and AI agents, and in AI capability evaluations as a route to impact
Proficiency in Python
Familiarity with technical aspects of AI systems and related technologies, such as machine learning, computational infrastructure, or information security
Preferred but not required:
Experience with evaluations and evaluation frameworks for LLMs and AI agents (e.g. Inspect)
Experience with LLM elicitation techniques (e.g. fine-tuning, retrieval augmented generation, tool-use integration, agent scaffolding)
Experience working on ML model development/deployment or working at/with leading AI companies
Experience with cloud computing, in particular Azure and AWS, including government cloud environments
Familiarity with common LLM frameworks (e.g. LangChain, LlamaIndex)
Aptitude for project management and/or mentorship
Strong communication skills, both written and verbal, tailored to technical and non-technical audiences, or ability to rapidly develop that
Experience in government, intelligence community, other relevant decision-making offices, or policy analysis roles
Education Requirements
RAND is hiring for this role at associate, specialist, and expert levels of experience. Minimum education requirements at the associate level include:
A PhD in a relevant field. This can include Artificial Intelligence, Machine Learning, Computer Science, Cybersecurity, Electrical Engineering, Physics, Mathematics, Engineering and Public Policy, Security Studies, or similar.
OR
A Master’s degree in the fields listed above with at least 3 years of relevant professional experience.
OR
A Bachelor’s degree in the fields listed above with at least 5 years of relevant professional experience.
Security Clearance
Ability to obtain and maintain a U.S. security clearance, including having US citizenship, is preferred but not required.
Location
We are actively hiring for this position in Washington, DC; San Francisco, CA; Boston, MA; Santa Monica, CA; and Pittsburgh, PA. San Francisco or especially DC are preferred. We offer a hybrid work arrangement, combining work from home and on-site options. Fully remote work will also be considered.
Term
This position is a 1-year term appointment with a possibility of renewal for up to 3 years total, alongside options for longer term employment.
Application
Applications must include:
A detailed resume highlighting relevant academic and professional experience.
A writing sample demonstrating analytical and communication skills. This sample may be a recent, previously written paper or report (e.g., journal article, master’s thesis or paper written for coursework, prior employment, or internship). Applicants whose study and work experience (e.g., model development) has not involved producing written products that are shareable may submit a short, written summary (i.e., less than one page) of one or more recent products they have developed.
A code sample.
A cover letter which contains only responses to each of the following prompts:
1) Summarize in <200 words your career goals and why you are interested in this role.
2) Describe in <300 words one research direction or engineering infrastructure project you may want to pursue in this role. For a research direction: Describe what questions you would try to answer, what methods you would use, how many months of work would be required from you and/or colleagues, and what outcomes this research might help achieve (e.g., what important policy decisions it might inform). For an infrastructure project: You may make guesses about our goals and existing infrastructure, and propose a way you might help improve that, noting how you would implement that, how many months of work may be required from you and/or colleagues, and why this might be useful. This is just an assessment step and does not mean you would definitely work on this if hired.
Salary Range: $115,400 - $246,600
Visiting Technical Associate = $115,400 - $167,300
Visiting Technical Specialist = $137,000 - $209,000
Visiting Technical Expert = $157,800 - $246,600
RAND considers a variety of factors when formulating an offer, including the specific role responsibilities; a candidate’s work experience, education/training, skills, expertise; and internal equity. In addition, RAND provides strong benefits including health insurance coverage, life and disability insurance, a savings plan, paid time-off, and more.
Equal Opportunity Employer
Tags: AWS Azure Computer Science Engineering LangChain LLMs Machine Learning Mathematics ML models OpenAI PhD Physics Prompt engineering Python Research Security Statistics
Perks/benefits: Career development Equity / stock options Health care Insurance
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.