Software Engineering Manager, Performance
Mountain View, California, US
Full Time Mid-level / Intermediate USD 189K - 300K
DeepMind
Artificial intelligence could be one of humanity’s most useful inventions. We research and build safe artificial intelligence systems. We're committed to solving intelligence, to advance science...Snapshot
We are seeking an experienced engineer and manager to lead a small team (3-6 people initially) of talented performance engineers and have hands-on impact on our projects. The team’s mission is to contribute to enhancing the performance of state-of-the-art ML models on hardware accelerators. You will lead a team dedicated to developing and optimizing the most critical parts of our models, at scale for use throughout Alphabet. This involves working across the stack, from ML frameworks to compilers, in collaboration with AI researchers, to deliver advanced features with maximum efficiency.
About us
Artificial Intelligence could be one of humanity’s most useful inventions. At Google DeepMind, we’re a team of scientists, engineers, machine learning experts and more, working together to advance the state of the art in artificial intelligence. We use our technologies for widespread public benefit and scientific discovery, and collaborate with others on critical challenges, ensuring safety and ethics are the highest priority.
The role
The rising use of large language models (LLMs) demands efficient and performant solutions for training and serving models. As a member of the performance team you will lead a team (3-6 people) of engineers helping us improve the efficiency and optimize the performance of the latest models on Google’s fleet of hardware accelerators - throughout the entire LLM research, training and deployment lifecycle.
This involves:
-
Leading and coaching a team of talented engineers whose mission is to improve the performance of LLM models on hardware accelerators by optimizing at all levels, including developing custom kernels when necessary
-
Establish and develop the cooperation with other relevant teams, such as AI researchers, compiler and framework teams
-
Managing, growing and developing the team
This role offers an opportunity to work on a wide range of problems and gain a deep understanding of cutting edge ML models, and on how to run them efficiently on the most advanced hardware accelerators.
You’ll join an inspiring and collaborative environment, where you’ll work alongside experienced software engineers and research scientists from a diverse set of backgrounds. You’ll be working closely with AI researchers and deliver solutions to enable them to advance the state of the art in AI by using hardware accelerators efficiently.
Key responsibilities
Your main responsibilities will be:
-
Lead and manage a performance engineering team focused on enhancing machine learning model performance and efficiency on hardware accelerators.
-
Develop team skills and provide coaching across the software stack, from high-level JAX abstractions to low-level kernel writing, requiring in-depth hardware accelerator knowledge and proficiency with performance debugging tools.
-
Collaborate closely with frameworks, compilers, and tools teams to improve efficiency, establishing and maintaining cooperation with relevant technical leads.
-
Coach and develop team members, offering guidance, support, and professional growth opportunities. Expand the team as needed to meet requirements.
About you
You're an engineer with a strong interest in contributing to the advancement of AI. You're excited by the challenge of optimizing performance and enjoy collaborating effectively within a team and across organizations.
To succeed as a Software Engineering Manager at Google DeepMind, we look for the following skills and experience:
-
Experience in leading teams of engineers, preferably in areas related to performance optimization
-
Interpersonal skills, such as leading technical discussions effectively with team members and across organizations
-
Excellent knowledge of either C++ or Python
-
Solid understanding of algorithm and data-structure design
-
An interest in Google DeepMind's mission
In addition we are looking for experience with at least two of the following:
-
Experience in programming hardware accelerators (GPUs, TPUs etc) via ML frameworks (e.g. JAX, PyTorch) or low-level programming models (e.g. CUDA, OpenCL)
-
Profiling software to find performance bottlenecks
-
Leveraging compiler infrastructure to improve performance on hardware
-
Distributed ML systems optimization
-
Training and using large ML models
-
Interest in AI and basic knowledge of AI algorithms and models (e.g. Transformer)
The US base salary range for this full-time position is between $189,000 - $300,000 + bonus + equity + benefits. Your recruiter can share more about the specific salary range for your targeted location during the hiring process.
At Google DeepMind, we value diversity of experience, knowledge, backgrounds and perspectives and harness these qualities to create extraordinary impact. We are committed to equal employment opportunity regardless of sex, race, religion or belief, ethnic or national origin, disability, age, citizenship, marital, domestic or civil partnership status, sexual orientation, gender identity, pregnancy, or related condition (including breastfeeding) or any other basis as protected by applicable law. If you have a disability or additional need that requires accommodation, please do not hesitate to let us know.
Note: In the event your application is successful and an offer of employment is made to you, any offer of employment will be conditional on the results of a background check, performed by a third party acting on our behalf. For more information on how we handle your data, please see our Applicant and Candidate Privacy Policy.
Tags: CUDA Engineering JAX LLMs Machine Learning ML models Privacy Python PyTorch Research
Perks/benefits: Career development Equity / stock options Salary bonus
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.