The growing disparity between CPU speed and memory speed, known as the memory wall problem, has been one of the most critical and long-standing challenges in the computing industry. The situation is further complicated by the recent expansion of the memory hierarchy, which is becoming deeper and more diversified with the adoption of new memory technologies and architectures including 3D-stacked memory, non-volatile random-access memory (NVRAM), memristor, hybrid software and hardware caches, etc. Computer architecture and hardware system, operating systems, storage and file systems, programming stack, performance model and tools are being enhanced, augmented, or even redesigned to address the performance, programmability and energy efficiency challenges of the increasingly complex and heterogeneous memory systems for HPC and data-intensive applications.
The MCHPC workshop aims to bring together computer and computational science researchers, from industry, government labs and academia, concerned with the challenges of efficiently using existing and emerging memory systems for high performance computing. The term performance for memory system is general, which include latency, bandwidth, power consumption and reliability from the aspect of hardware memory technologies to what it is manifested in the application performance. The topics of interest for the MCHPC workshop include, but are not limited to:
November 14th 2021 - Workshop
Authors are invited to submit manuscripts in English structured as technical papers up to 8 pages or as short papers up to 5 pages, both of letter size (8.5in x 11in) and including figures, tables, and references. Submissions not conforming to these guidelines may be returned without review. Your paper should be formatted using IEEE conference format which can be found from https://www.ieee.org/conferences/publishing/templates.html. The workshop encourage submitters to include reproducibility information, using Reproducibility Initiative for SC'21 Technical Papers as guideline.
All manuscripts will be peer-reviewed and judged on correctness, originality, technical strength, and significance, quality of presentation, and interest and relevance to the workshop attendees. Submitted papers must represent original unpublished research that is not currently under review for any other conference or journal. Papers not following these guidelines will be rejected without review and further action may be taken, including (but not limited to) notifications sent to the heads of the institutions of the authors and sponsors of the conference. Submissions received after the due date, exceeding length limit, or not appropriately structured may also not be considered. At least one author of an accepted paper must register for and attend the workshop. Authors may contact the workshop organizers for more information.
Papers should be submitted electronically at: https://submissions.supercomputing.org/, choose "SC21 Workshop: MCHPC'21: Workshop on Memory Centric High Performance Computing".
The final papers are planned to be published through IEEE TCHPC. Published proceedings will be included in the IEEE Xplore digital library.
Abstract: Previous attempts at providing commercial memory centric computing systems have largely failed due to technologies that did not live up to expectations. The emerging CXL standard is one of the crucial technologies that will enable commercial memory centric systems in the coming years. This talk will introduce the CXL standard, how the CXL standard will enable efficient memory centric computing and some of the market factors shaping current and future versions of the standard. The HPC community will be able to leverage these high-volume market features to realize memory centric computing systems that will provide significant improvements in performance and system energy consumption.
Bio: Tony Brewer is currently the chief architect in Micron’s Near Data Computing group. He is the principal investigator on multiple government contracts and manages a team of architects and researchers focused on various processing in or near memory style architectures. His career has been focused on system architecture in both the high-performance computing as well as the telecommunications industries. Prior to joining Micron in 2015, Tony Brewer was a Co-founder and Chief Technology Officer for Convey Computer. Former employers include Data General, Convex Computer, and Hewlett Packard. Tony Brewer received his MS and BS degrees in Computer Engineering from Purdue University and has over 175 filed patents.
10:30am - 11:00am Shaurya Patel, Tongping Liu, and Hui Guan; FreeLunch: Compression-based GPU Memory Management for Convolutional Neural Networks, presentation
11:00am - 11:30am Clément Foyer, and Brice Goglin; Using Bandwidth Throttling to Quantify Application Sensitivity to Heterogeneous Memory, presentation
Abstract: To make memory-centric compute devices impactful and sustainable for a wide range of customers in HPC and beyond, we need to find ways of leveraging their capabilities in language standards. One of the prime targets for such efforts is the ISO C++ standard, which has been the language of choice for HPC vendors to implement programming models for accelerators, such as CUDA, HIP and SYCL. This talk will discuss existing and upcoming capabilities in the C++ standard like std::mdspan and std::linalg, which enable memory-centric application design. Based on concepts popularized in the Kokkos ecosystem for performance portability, these new features allow the design of algorithms that are memory location and memory layout aware, that leverage advanced memory access capabilities, and that provide customization points to plug special hardware-specific functionality into C++ code without relying on non-standard APIs, such as intrinsics and vendor-specific libraries.
Bio: Christian Trott is a high performance computing expert with extensive experience designing and implementing software for modern HPC systems. He is a principal member of staff at Sandia National Laboratories, where he leads the Kokkos core team developing the performance portability programming model for C++ and heads Sandia's delegation to the ISO C++ standards committee. He also serves as adviser to numerous application teams, helping them redesign their codes using Kokkos and achieve performance portability for the next generation of supercomputers. Christian is a regular contributor to numerous scientific software projects including LAMMPS and Trilinos. He earned a doctorate from the University of Technology Ilmenau in theoretical physics with a focus on computational material research.
Abstract: Accelerated in-network computations promise significant optimizations ranging from data-movement reductions to specialization opportunities in processing elements. We show updates within the sPIN (streaming Processing in the Network) network accelerator programming model - the "CUDA for networking". There, we demonstrate 2x lower required bandwidth for (sparse) reductions and a highly-optimized packet processing design based on a low-power RISC-V multi-core architecture.
Bio: Torsten is a full Professor of Computer Science at ETH Zürich, Switzerland. Before joining ETH, he led the performance modeling and simulation efforts of parallel petascale applications for the NSF-funded Blue Waters project at NCSA/UIUC. He is also a key member of the Message Passing Interface (MPI) Forum where he chairs the "Collective Operations and Topologies" working group. Torsten won best paper awards at the ACM/IEEE Supercomputing Conference 2010 (SC10), EuroMPI 2013, SC13, SC14, SC19, IPDPS'15, ACM HPDC'15 and HPDC'16, ACM OOPSLA'16, and other conferences. He published numerous peer-reviewed scientific conference and journal articles and authored chapters of the MPI-2.2 and MPI-3.0 standards. For his work, Torsten received the ACM Gordon Bell Prize in 2019, the IEEE TCSC Award of Excellence (MCR) in 2019, ETH Zurich's Latsis Prize in 2015, the SIAM SIAG/Supercomputing Junior Scientist Prize in 2012, and the IEEE TCSC Young Achievers in Scalable Computing Award in 2013. He was also awarded the BenchCouncil Rising Star Award in 2020. Following his Ph.D., he received the Young Alumni Award 2014 from Indiana University. Torsten was elected into the first steering committee of ACM's SIGHPC in 2013 and he was re-elected in 2016 and 2019. He was the first European to receive many of those honors he also received both an ERC Starting and an ERC Consolidator grant. His research interests revolve around the central topic of "Performance-centric System Design" and include scalable networks, parallel programming techniques, and performance modeling. Additional information about Torsten can be found on his homepage at htor.inf.ethz.ch.
3:30pm - 4:00pm Xingfu Wu, Valerie Taylor, and Zhiling Lan; Performance and Energy Improvement of the ECP Proxy App SW4lite under Various Workloads, presentation
4:00pm - 4:30pm Aaron Walden, Mohammad Zubair, Christopher Stone, and Eric Nielsen; Memory Optimizations for Sparse Linear Algebra on GPU Hardware, presentation