[PHD] SDE - Systems, Runtime, and ML Infrastructure (AWS Custom Silicon), Annapurna Labs

Seattle, Washington, USA

Amazon.com

Free shipping on millions of items. Get the best of Shopping and Entertainment with Prime. Enjoy low prices and great deals on the largest selection of everyday essentials and other products, including fashion, home, beauty, electronics, Alexa...

View all jobs at Amazon.com

Apply now Apply later

At AWS, we're pioneering the future of cloud computing and AI acceleration through innovative hardware-software co-design. Our teams within Annapurna Labs and AWS AI are creating the foundation for next-generation cloud infrastructure that powers thousands of customers worldwide, from cutting-edge startups to global enterprises.

We operate at an unprecedented scale, designing custom silicon chips, advanced networking solutions, and ML accelerators that were unimaginable just a few years ago.

Our work spans from the lowest levels of hardware abstraction to high-performance distributed training systems, creating unique opportunities for early-career engineers to make significant impact across multiple domains.


Key job responsibilities
- Develop and optimize software for custom hardware and ML infrastructure
- Collaborate with hardware teams to understand and leverage chip architecture
- Implement and improve networking, runtime, and system-level software
- Assist in building and maintaining tools for profiling, monitoring, and debugging ML workloads
- Contribute to the development of open-source ML frameworks and infrastructure projects
- Participate in code reviews and implement best practices for software development
- Learn and apply new technologies to solve complex engineering challenges

About the team
Candidates will be routed to specific teams based on their interests and our current needs during the application process:

- The Elastic Network Adapter (ENA) team revolutionizes EC2 core networking, enabling enhanced networking capabilities across AWS's most critical compute instances. Here, you'll work with networking protocols and high-performance drivers that power millions of cloud workloads.

- Our AWS Neuron SDK team develops the complete software stack for custom ML accelerators (Inferentia and Trainium), democratizing access to AI infrastructure. This team bridges the gap between popular ML frameworks and custom hardware.

- The Machine Learning Server Software team maintains and optimizes the world's most advanced ML servers, focusing on system-level software that ensures peak performance of AI workloads. While we don't work directly on ML algorithms, we build the critical infrastructure that makes ML possible at scale.

- The SoC Hardware Abstraction Layer (HAL) team works at the intersection of hardware and software, developing the crucial middleware that manages our custom silicon chips. This team ensures our innovative hardware designs translate into reliable, high-performance solutions.

Basic Qualifications


- To qualify, applicants should have earned (or expect to earn) a PhD degree between December 2022 to September 2025
- Research in systems, computer architecture, networking, or related areas, demonstrated through publications, internships, or projects
- Skilled in C/C++ and Python, with expertise in implementing complex algorithms and data structures
- Understanding of computer architecture, operating systems, and low-level system optimizations, preferably with hands-on experience in Linux environments


Preferred Qualifications

- Research contributions or expertise in systems for ML, compilers, distributed computing, or hardware acceleration, demonstrated through publications or open-source projects

- Experience with modern ML infrastructure, including frameworks (PyTorch/TensorFlow), compilers (XLA, MLIR), or hardware accelerators

- Technical leadership through open-source contributions, research collaborations, or mentoring experience

Amazon is committed to a diverse and inclusive workplace. Amazon is an equal opportunity employer and does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status.

Los Angeles County applicants: Job duties for this position include: work safely and cooperatively with other employees, supervisors, and staff; adhere to standards of excellence despite stressful conditions; communicate effectively and respectfully with employees, supervisors, and staff to ensure exceptional customer service; and follow all federal, state, and local laws and Company policies. Criminal history may have a direct, adverse, and negative relationship with some of the material job duties of this position. These include the duties and responsibilities listed above, as well as the abilities to adhere to company policies, exercise sound judgment, effectively manage stress and work safely and respectfully with others, exhibit trustworthiness and professionalism, and safeguard business operations and the Company’s reputation. Pursuant to the Los Angeles County Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.

Our compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from $129,300/year in our lowest geographic market up to $223,600/year in our highest geographic market. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience. Amazon is a total compensation company. Dependent on the position offered, equity, sign-on payments, and other forms of compensation may be provided as part of a total compensation package, in addition to a full range of medical, financial, and/or other benefits. For more information, please visit https://www.aboutamazon.com/workplace/employee-benefits. This position will remain posted until filled. Applicants should apply via our internal or external career site.

Apply now Apply later
Job stats:  0  0  0

Tags: Architecture AWS EC2 Engineering Linux Machine Learning ML infrastructure Open Source PhD Python PyTorch Research TensorFlow

Perks/benefits: Career development Equity / stock options

Region: North America
Country: United States

More jobs like this