Associate Director, Sr Principal Systems Engineer
Princeton LVL - NJ, United States
Bristol Myers Squibb
Bristol Myers Squibb is a global biopharmaceutical company committed to discovering, developing and delivering innovative medicines to patients with serious diseases.Working with Us
Challenging. Meaningful. Life-changing. Those aren’t words that are usually associated with a job. But working at Bristol Myers Squibb is anything but usual. Here, uniquely interesting work happens every day, in every department. From optimizing a production line to the latest breakthroughs in cell therapy, this is work that transforms the lives of patients, and the careers of those who do it. You’ll get the chance to grow and thrive through opportunities uncommon in scale and scope, alongside high-achieving teams rich in diversity. Take your career farther than you thought possible.
Bristol Myers Squibb recognizes the importance of balance and flexibility in our work environment. We offer a wide variety of competitive benefits, services and programs that provide our employees with the resources to pursue their goals, both at work and in their personal lives. Read more: careers.bms.com/working-with-us.
Summary:
Bristol Myers Squibb is looking for an experienced Sr Principal Systems Engineer in HPC/AI infrastructure to work with our technology teams and various stakeholders to design, manage, and support cutting-edge HPC/AI infrastructure platforms to serve our community of researchers and scientists, who are using Machine Learning, Deep Learning, and High-Performance Computing every day to make groundbreaking discoveries.
Collaborating with cross functional teams within BMS, the systems engineer would work our teams to define and execute our HPC/AI roadmap for both on-premises datacenters and in the cloud, provide guidance and technical expertise to senior research leaders and scientists, and work to build out standards and best practice design principles to guide BMS’ future roadmap.
Key areas of the role require strong knowledge and expertise in:
Software/Hardware Optimization, such as performance tuning for bespoke hardware, code refactoring, accelerated ML toolkit and libraries such as CUDA, and continuous integration of codes and ML models.
Development Tools and Environment, such as Git, Linux and python package management, pytorch lightning, containers, and Kubernetes.
Job/Scheduler Orchestration and Integration, knowledgeable in automating and integrating machine learning jobs with major resource schedulers such as SLURM, Grid Engine, AWS Batch, and Parallel Cluster to maximize throughput, performance, utilization, efficiency, and cost effectiveness for ML/AI training and prediction.
Datacenter/Colocation Operations, such as physical installation, networking or bespoke network fabrics, understanding of power/cooling, etc. are strongly preferred.
Vendor Outreach, ability to partner with leading vendors or partners to explore, experiment, and pilot proof-of-concept studies to help bring in, or deliver leading-edge, differentiating capabilities for BMS Research
Requirements:
Strong experience working with and supporting HPC users, including scientists, data scientists, and/or developers
Strong working experience with container runtimes and container orchestration platforms, including Kubernetes, Docker, and/or Singularity
Strong operational, architecture, and troubleshooting experience with cluster managers and schedulers, ideally Slurm but experience with other HPC schedulers should be acceptable.
Linux systems management and configuration management in an HPC environment
Expert troubleshooting skills with open source frameworks and libraries
Experience working with the NVIDIA software ecosystem and GPU-powered systems for Machine Learning and Deep Learning workloads (preferred)
Experience working with Deep Learning frameworks, libraries, and pipelines, either directly as a user or supporting researcher and/or data science users (preferred)
Experience working with parallel file systems for data storage strategies for large clusters (preferred)
Working knowledge of GPU profiling techniques (preferred)
If you come across a role that intrigues you but doesn’t perfectly line up with your resume, we encourage you to apply anyway. You could be one step away from work that will transform your life and career.
Uniquely Interesting Work, Life-changing Careers
With a single vision as inspiring as “Transforming patients’ lives through science™ ”, every BMS employee plays an integral role in work that goes far beyond ordinary. Each of us is empowered to apply our individual talents and unique perspectives in an inclusive culture, promoting diversity in clinical trials, while our shared values of passion, innovation, urgency, accountability, inclusion and integrity bring out the highest potential of each of our colleagues.
On-site Protocol
BMS has a diverse occupancy structure that determines where an employee is required to conduct their work. This structure includes site-essential, site-by-design, field-based and remote-by-design jobs. The occupancy type that you are assigned is determined by the nature and responsibilities of your role:
Site-essential roles require 100% of shifts onsite at your assigned facility. Site-by-design roles may be eligible for a hybrid work model with at least 50% onsite at your assigned facility. For these roles, onsite presence is considered an essential job function and is critical to collaboration, innovation, productivity, and a positive Company culture. For field-based and remote-by-design roles the ability to physically travel to visit customers, patients or business partners and to attend meetings on behalf of BMS as directed is an essential job function.
BMS is dedicated to ensuring that people with disabilities can excel through a transparent recruitment process, reasonable workplace accommodations/adjustments and ongoing support in their roles. Applicants can request a reasonable workplace accommodation/adjustment prior to accepting a job offer. If you require reasonable accommodations/adjustments in completing this application, or in any part of the recruitment process, direct your inquiries to adastaffingsupport@bms.com. Visit careers.bms.com/eeo-accessibility to access our complete Equal Employment Opportunity statement.
BMS cares about your well-being and the well-being of our staff, customers, patients, and communities. As a result, the Company strongly recommends that all employees be fully vaccinated for Covid-19 and keep up to date with Covid-19 boosters.
BMS will consider for employment qualified applicants with arrest and conviction records, pursuant to applicable laws in your area.
If you live in or expect to work from Los Angeles County if hired for this position, please visit this page for important additional information: https://careers.bms.com/california-residents/
Any data processed in connection with role applications will be treated in accordance with applicable data privacy policies and regulations.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Architecture AWS CUDA Deep Learning Docker Excel Git GPU HPC Kubernetes Linux Machine Learning ML infrastructure ML models Open Source Pipelines Privacy Python PyTorch Research
Perks/benefits: Career development
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.