Senior Research Software Engineer

Nashville, TN, United States

⚠️ We'll shut down after Aug 1st - try foo🦍 for all jobs in tech ⚠️

Vanderbilt University

Vanderbilt is a private research university in Nashville, Tennessee. It offers 70 undergraduate majors and a full range of graduate and professional degrees across 10 schools and colleges.

View all jobs at Vanderbilt University

Apply now Apply later

The Senior Research Software Engineer at the Advanced Computing Center for Research and Education (ACCRE) serves as a key individual contributor on the research computing team, directly supporting Vanderbilt faculty, staff, and students in high-impact scientific research. This role combines technical leadership, software engineering, AI/ML application support, and consultation to create optimized, scalable, and secure computing solutions that advance discovery.

This position is responsible for onboarding labs to ACCRE’s HPC environment, troubleshooting complex software issues, and helping develop end-to-end pipelines for GPU-accelerated and data-intensive research. The engineer also plays a central role in collaborative software development, training, and contributing to grant proposals and institutional research computing strategy.

ACCRE is Vanderbilt's high-performance research computing center, serving hundreds of researchers across disciplines. It provides access to a 15,000+ core Linux cluster, GPU platforms, high-speed networking, and over 24 PB of distributed, fault-tolerant storage. ACCRE supports computing as the “third pillar” of research alongside theory and experimentation.

Duties and Responsibilities
Research Collaboration and Onboarding
  • Serve as a liaison between ACCRE and researchers across departments, facilitating access to advanced computing tools.\
  • Lead onboarding workshops and develop training materials to orient labs to ACCRE, including troubleshooting workflows and resolving software issues.
  • Consult with users to assess and translate research needs into computational workflows and support plans.

AI/ML and Deep Learning Support

  • Provide technical expertise in deep learning, machine learning, and GPU-accelerated workloads using frameworks such as PyTorch and TensorFlow and CUDA.
  • Design, optimize, and deploy AI pipelines for performance and scalability on ACCRE and cloud platforms.
  • Provide insights on model training, data preprocessing, hardware use, and optimization strategies.

Software Engineering and Pipeline Development

  • Develop modular and scalable research software applications, libraries, and reproducible workflows.
  • Implement and maintain CI/CD pipelines, testing frameworks, and automated deployment systems.
  • Contribute to or lead the development of open-source software in support of scientific research.

Training, Documentation, and Outreach

  • Author comprehensive user documentation, web-based tutorials, and workshop content.
  • Teach workshops or seminars on research software engineering, cluster usage, version control, and data workflows.
  • Create community knowledge bases and share success stories internally and externally.

Strategic Research Computing Support

  • Support faculty with technical sections of grant proposals (e.g., boilerplate text, system descriptions, data management plans).
  • Collect, analyze, and report metrics on ACCRE usage, research impact, and project outcomes.
  • Lead or participate in institutional initiatives to expand or improve research computing offering
  • Perform other duties as assigned.

Qualifications

  • Master’s degree in computer science, engineering, computational science, or a related field is required.
  • Minimum 8 years of experience in research computing, software engineering, or a comparable role is required.
  • Strong programming proficiency (e.g., Python, C/C++, or R) and experience with software development best practices.
  • Strong proficiency in applied machine learning and familiarity with frameworks such as pytorch, tensorflow and CUDA for model training and inference.
  • Advanced knowledge of high-performance computing systems, distributed computing, and job schedulers (e.g., Slurm).
  • Experience with version control (Git), unit testing, and CI/CD pipelines.
  • Excellent communication skills and ability to explain technical concepts to a range of audiences.
  • Demonstrated ability to collaborate on interdisciplinary research projects.
  • Familiarity with secure research environments and compliance with federal data standards.
  • Experience developing or maintaining research software in a collaborative, open-source environment.
  • Experience with infrastructure-as-code tools and containerization (e.g., Docker, Singularity).

At Vanderbilt University , our work - regardless of title or role - is in service to an important and noble mission in which every member of our community serves in advancing knowledge and transforming lives on a daily basis. Located in Nashville, Tennessee, on a 330+ acre campus and arboretum dating back to 1873, Vanderbilt is proud to have been named as one of “America’s Best Large Employers” as well as a top employer in Tennessee and the Nashville metropolitan area by Forbes for several years running. We welcome those who are interested in learning and growing professionally with an employer that strives to create, foster and sustain opportunities as an employer of choice.

We understand you have a choice when choosing where to work and pursue a career. We understand you are unique and have a story. We want to hear it. We encourage you to apply today so that you might become a part of our story.

Vanderbilt University is an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran, or any other characteristic protected by law.
Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  1  0  0

Tags: CI/CD Computer Science CUDA Data management Deep Learning Docker Engineering Git GPU HPC Linux Machine Learning Model training Open Source Pipelines Python PyTorch R Research TensorFlow Testing

Perks/benefits: Career development

Region: North America
Country: United States

More jobs like this