Principal Systems Performance Engineer

Hyderabad - Phoenix Aquila, India

Full Time Senior-level / Expert USD 65K - 122K * ^est.

Micron Technology

Explore Micron Technology, leading in semiconductors with a broad range of performance-enhancing memory and storage solutions

View all jobs at Micron Technology

Apply now Apply later

Posted 1 month ago

Our vision is to transform how the world uses information to enrich life for all.

Micron Technology is a world leader in innovating memory and storage solutions that accelerate the transformation of information into intelligence, inspiring the world to learn, communicate and advance faster than ever.

Principal / Senior Systems Performance Engineer

Micron Data Center and Client Workload Engineering in Hyderabad, India, is seeking a senior/principal engineer to join our team. We build, performance tune, and test data center and client solutions using innovative DRAM and emerging memory hardware! Understanding key data center workloads is a Micron imperative, in order to improve current products in the deep-memory hierarchy (HBM, DDR, LPDDR, MRDIMM, GDDR) and bring about a total value proposition to customers based on efficiently applying several Micron products in concert!

Particularly, with the proliferation of generative AI, there is the urgent need to better understand how large language model training and inference is impacted by data center memory hierarchy and GPU characteristics. To this end, the successful candidate will primarily contribute to the HBM program in the data center by analyzing how AI/ML workloads perform on the latest MU-HBM / NVIDIA Blackwell GPUs / Grace-Blackwell systems, conduct competitive analysis, showcase the benefits that workloads see with MU-HBM’s capacity / bandwidth / thermals, contribute to marketing collateral, and extract AI/ML workload traces to help optimize future HBM designs.

Job Responsibilities:

These include but are not limited to the following:

Analysis and characterization of data center workloads in several areas in AI/ML: GenAI, LLMs, SLMs, Recommendation models, multi-modal models, etc.
Profiling AI training / inference models in generative AI, computer vision, and recommendation on GPU systems. Detailed telemetry of various subsystems in form of capacity/bandwidth/latency/power/thermals and their impact on the ML models.
Performance benchmarking of HBM using both microbenchmarks and data center applications and benchmarks.
Overlaying deep learning models on multi-GPU-based (or clustered) system architectures to understand their interplay.
Understand key care abouts when it comes to ML models such as: transformer architectures, precision, quantization, distillation, attention span & KV cache, MoE, etc.
Build workload memory access traces from AI models and HPC applications
Study system balance ratios for DRAM to HBM in terms of capacity and bandwidth to understand and model TCO
Study memory/core, byte/FLOP and memory bandwidth/core/FLOP requirements for a variety of workloads to influence future products
Study data movement between CPU, GPU and the associated memory subsystems (DDR, HBM) in heterogeneous system architectures via connectivity such as PCIe/NVLINK/Infinity Fabric to understand the bottlenecks in data movement for different workloads
Develop an automated testing framework through scripting
Customer engagements and conference presentations to showcase findings and develop whitepapers

Preferred Qualifications:

Strong background in AI/ML training/inference models and the use of one or more of these frameworks: PyTorch/TensorFlow/DeepSpeed/Megatron/TensorRT background
Strong computer systems foundations
Strong foundation in GPU and CPU processor architecture
Familiarity with and knowledge of server system memory (DRAM)
Strong experience with benchmarking and performance analysis
Strong software development skills using leading scripting, programming languages and technologies (Python, CUDA, RoCm, C, C++)
Familiarity with PCIe and NVLINK connectivity
Modeling and simulation experience by way of emulating pre-silicon behavior
Hands-on HW systems experience
Be abreast with the state of the art in deep learning and optimizations therein
Familiarity with system level automation tools and processes
Excellent oral communication skills
Excellent written and presentation skills to detail the findings

Education:

Bachelors or higher (with 12+ years of experience) in Computer Science or related field.

About Micron Technology, Inc.

We are an industry leader in innovative memory and storage solutions transforming how the world uses information to enrich life for all. With a relentless focus on our customers, technology leadership, and manufacturing and operational excellence, Micron delivers a rich portfolio of high-performance DRAM, NAND, and NOR memory and storage products through our Micron® and Crucial® brands. Every day, the innovations that our people create fuel the data economy, enabling advances in artificial intelligence and 5G applications that unleash opportunities — from the data center to the intelligent edge and across the client and mobile user experience.

To learn more, please visit micron.com/careers

All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status.

To request assistance with the application process and/or for reasonable accommodations, please contact hrsupport_india@micron.com

Micron Prohibits the use of child labor and complies with all applicable laws, rules, regulations, and other international and industry labor standards.

Micron does not charge candidates any recruitment fees or unlawfully collect any other payment from candidates as consideration for their employment with Micron.

Apply now Apply later

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats: 0 0 0

Category: Engineering Jobs

Tags: Architecture Computer Science Computer Vision CUDA Deep Learning Engineering Generative AI GPU HPC LLMs Machine Learning ML models Model training NVLink Python PyTorch TensorFlow TensorRT Testing