MCHPC'22: Workshop on Memory Centric High Performance Computing

Time/Date: 1:30PM - 5:00PM, U.S. Central Standard Time, Sunday, November 13, 2022

Location: Kay Bailey Hutchison Convention Center, Dallas

Program 1:30PM - 5:10PM, U.S. Central Standard Time, Sunday, November 13, 2022

Join the workshop virtually at Slido here

Program also available at SC22 Website

1:30pm - 1:32pm -- Welcome

Ronald Brightwell, YongHong Yan, Maya Gokhale, Ivy B. Peng

1:32pm - 2:30pm -- Keynote by Dr. Dan Ernst

Follow the Data: Memory-Centric Designs for Modern Datacenters

Session Chair: Ronald Brightwell

Join Presentation Here

Brief Bio: Dr. Daniel Ernst is currently a Principal Architect in Microsoft's Azure Hardware Architecture team, which is responsible for long-range technology pathfinding for future Azure Cloud systems. Within AHA, Dan leads the team responsible for future memory systems. This team investigates future architecture directions for Azure and serves as the primary architecture contact point in technical relationships with compute, memory, and device partners, as well as the primary driver of Microsoft’s memory standards activity.

Prior to joining Microsoft, Dan spent 10 years at Cray/HPE, most recently as a Distinguished Technologist in the HPC Advanced Technology team. While at Cray, Dan led multiple customer-visible collaborative pathfinding investigations into future HPC architectures and was part of the team that architected the Department of Energy’s Frontier and El Capitan Exascale systems.

Dan has served as part of multiple industry standards bodies throughout his career, including JEDEC, the CXL and CCIX consortia, and as a founding Board of Directors member of the Gen-Z Consortium.

Dan received his Ph.D. in Computer Science and Engineering from the University of Michigan, where he studied high-performance, low-power, and fault-tolerant microarchitectures. He also holds an MSE from Michigan and a BS in Computer Engineering from Iowa State University.

2:30pm - 3:00pm Technical Paper (Chair: Ronald Brightwell)

Title: Maximizing Performance Through Memory Hierarchy-Driven Data Layout Transformations

Author/Presenters: Benjamin Sepanski, Tuowen Zhao, Hans Johansen, Samuel Williams

Join Presentation Here

Abstract: Computations on structured grids using standard multidimensional array layouts can incur substantial data movement costs through the memory hierarchy. This presentation explores the benefits of using a framework (Bricks) to separate the complexity of data layout and optimized communication from the functional representation. To that end, we provide three novel contributions and evaluate them on several kernels taken from GENE, a phase-space fusion tokamak simulation code. We extend Bricks to support 6-dimensional arrays and kernels that operate on complex data types, and integrate Bricks with cuFFT. We demonstrate how to optimize Bricks for data reuse, spatial locality, and GPU hardware utilization achieving up to a 2.67× speedup on a single A100 GPU. We conclude with insights on how to rearchitect memory subsystems.

3pm - 3:30pm Break

3:30pm - 5:00pm Technical Papers (Chair: Yonghong Yan)

3:30pm - 4:00pm

Title: Evaluating Emerging CXL-Enabled Memory Pooling for HPC Systems

Author/Presenters: Jacob Wahlgren, Maya Gokhale, Ivy Peng

Join Presentation Here

Abstract: Current HPC systems provide memory resources tightly coupled with compute nodes. But HPC applications are evolving – diverse workloads demand different memory resources to achieve both high performance and utilization. In this study, we evaluate a memory subsystem leveraging CXL-enabled memory to provide configurable capacity and bandwidth. We propose an emulator to explore the performance impact of various memory configurations, and a profiler to identify optimization opportunities. We evaluate the performance of seven HPC workloads and six graph workloads on the emulated system. Our results show that three and two HPC workloads have less than 10% and 18% performance impact on 75% pooled memory. Also, a dynamically configured high-bandwidth system could effectively support bandwidth bottle-necked workloads like grid-based solvers. Finally, we identify interference through shared memory pools as a practical challenge for HPC systems to adopt CXL-enabled memory.

4:00pm - 4:30pm

Title: Reducing Memory-Bus Energy Consumption of GPUs via Software-Based Bit-Flip Minimization

Author/Presenters: Alex Fallin, Martin Burtscher

Join Presentation Here

Abstract: Energy consumption is a major concern in high-performance computing. One important contributing factor is the number of times the wires are charged and discharged, i.e., how often they switch from '0' to '1' and vice versa. We describe a software technique to minimize this switching activity in GPUs, thereby lowering the energy usage. Our technique targets the memory bus, which comprises many high-capacitance wires that are frequently used. Our approach is to strategically change data values in the source code such that loading and storing them yields fewer bit flips. The new values are guaranteed to produce the same control flow and program output. Measurements on GPUs from two generations show that our technique allows programmers to save up to 9.3% of the whole-GPU energy consumption and 1.2% on average across eight graph-analytics CUDA codes without impacting performance.

4:30pm - 5:00pm

Title: Assessing the Memory Wall in Complex Codes

Author/Presenters: Galen Shipman, Jered Dominguez-Trujillio, Kevin Sheridan, Sriram Swaminarayan

Join Presentation Here

Abstract: Many of Los Alamos National Laboratory's HPC codes are memory bandwidth bound. These codes exhibit high levels of sparse memory access which differ significantly from standard benchmarks. In this paper we present an analysis of the memory access of some of our most important code-bases. We then generate micro-benchmarks that preserve the memory access characteristics of our codes using two approaches, one based on statistical sampling of relative memory offsets in a sliding time window at the function level and another at the loop level. The function level approach is used to assess the impact of advanced memory technologies such as LPDDR5 and HBM3 using the gem5 simulator. Our simulation results show significant improvements for sparse memory access workloads using HBM3 relative to LPDDR5 and better scaling on a per core basis. Assessment of two different architectures show that higher peak memory bandwidth results in high bandwidth on sparse workloads.

5:00pm - 5:10pm Closing Remarks by Yonghong Yan and Ron Brightwell

held in conjunction with SC22: The International Conference on High Performance Computing, Networking, Storage and Analysis and in cooperation with IEEE Computer Society

SC22 compsoc

Introduction CFP Organizers Program Committee Submission Program Previous Workshops

Introduction

The growing disparity between CPU speed and memory speed, known as the memory wall problem, has been one of the most critical and long-standing challenges in the computing industry. The situation is further complicated by the recent expansion of the memory hierarchy and the blurred boundary between memory and storage. The memory hierarchy is becoming deeper and more diversified with the adoption of new memory technologies and architectures, including 3D-stacked memory, non-volatile random-access memory (NVRAM), memristor, hybrid software and hardware caches, etc. Computer architecture and hardware systems, operating systems, storage and file systems, programming stack, performance models and tools are being enhanced, augmented, or even redesigned to address the performance, programmability, and energy efficiency challenges of the increasingly complex and heterogeneous memory systems for HPC and data-intensive applications. The MCHPC workshop aims to bring together computer and computational science researchers, from industry, government labs and academia, concerned with the challenges of efficiently using existing and emerging memory systems. The term performance for memory systems is general, which includes latency, bandwidth, power consumption and reliability from the aspect of hardware memory technologies to what it is manifested in the application performance. The topics of interest for the MCHPC workshop include, but are not limited to:

Software, hardware, and co-design approaches that ease the adoption and optimize the use of processing-in-memory and near-memory computing technologies.
Evaluation, characterization, performance analysis, and use cases of emerging memory technologies, including non-volatile memories, high-bandwidth memory, heterogeneous memory, disaggregated memory, etc.
Programming interfaces or language extensions that improve the programmability of using emerging memory technologies and systems, heterogeneous memory system and multi-dimensional data, and unified memory systems.
Compiler, runtime, and system techniques for optimizing data layout and placement, page migration, coherence and consistency enforcement, latency hiding and improving bandwidth utilization and energy consumption of heterogeneous memory systems.
Enhancement or new development for operating systems, storage and file systems, and I/O system that address challenges of existing and emerging memory technologies, heterogeneous memory systems, and the blurred boundary between memory and storage.
Tools, modeling, evaluation, and case study of memory system behavior and application performance that reveals the limitation and characteristics of existing memory systems.
Application development and optimization for new memory architecture and technologies.

Important Dates

Submission Deadline -- August 19, 2022
Notifications -- September 12, 2022
November 13 2022 - Workshop

Organizers

Ron Brightwell (Sandia National Laboratories, USA)
Maya B Gokhale (Lawrence Livermore National Laboratory, USA)
Ivy B Peng (KTH Royal Institute of Technology, Sweden)
Yonghong Yan (University of North Carolina at Charlotte, USA)

Submission

Authors are invited to submit manuscripts in English structured as technical papers up to 8 pages or as short papers up to 5 pages, both of letter size (8.5in x 11in) and including figures, tables, and references. Submissions not conforming to these guidelines may be returned without review. Your paper should be formatted using IEEE conference format which can be found from https://www.ieee.org/conferences/publishing/templates.html. The workshop encourage submitters to include reproducibility information, using Reproducibility Initiative for SC'22 Technical Papers as guideline.

All manuscripts will be peer-reviewed and judged on correctness, originality, technical strength, and significance, quality of presentation, and interest and relevance to the workshop attendees. Submitted papers must represent original unpublished research that is not currently under review for any other conference or journal. Papers not following these guidelines will be rejected without review and further action may be taken, including (but not limited to) notifications sent to the heads of the institutions of the authors and sponsors of the conference. Submissions received after the due date, exceeding length limit, or not appropriately structured may also not be considered. At least one author of an accepted paper must register for and attend the workshop. Authors may contact the workshop organizers for more information.

Papers should be submitted electronically at: https://submissions.supercomputing.org/, choose "SC22 Workshop: MCHPC'22: Workshop on Memory Centric High Performance Computing".

The final papers are planned to be published through IEEE Computer Society. Published proceedings will be included in the IEEE Xplore digital library.

Program Committee

Ron Brightwell (Sandia National Laboratories)
Ivy Peng (KTH Royal Institute of Technology)
Stephen L Olivier (Sandia National Laboratories)
Gwendolyn Voskuilen (Sandia National Laboratories)
Seyong Lee (Oak Ridge National Laboratory)
Yonghong Yan (University of North Carolina at Charlotte)
Tom Deakin (University of Bristol)
Kyle Hale (Illinois Institute of Technology)
Dong Li (University of California, Merced)
Chunhua Liao (Lawrence Livermore National Laboratory)
Alice Koniges (Univ. of Hawaii, Maui High Performance Computing Center)