Resources for Parallel Programming
CSE436 and 536: Concurrent and Multicore Programming, Oakland University
Introduction to Parallel and High Performance Computing
- top500
- HPCWire Online Magazine
- Major HPC/Supercomputing(SC) Conference: SC (SC in US), ISC (SC in Europe)
- Moore's Law
- Introduction to Parallel Computing from LLNL
Linux and C Programming
- Linux/Unix Introduction
- VI Command
- C Programming Tutorial
- Compiler, Assembler, Linker and Loader: A Brief Story
- GNU Compiler Collection (gcc) online manual
- Intel Compiler Manual
Performance Application Programming Interface(PAPI) Programming and Hardware Counter
- Short slide introduction and Longer slide for overview for PAPI
- PAPI Official website
- A list of derived metrics that can be calcualted using PAPI events
- PAPI API Reference Manual
- PAPI Getting Started which has an example showing the use of PAPI_flops API.
- One simple PAPI example
- RAPL access to PAPI, Reading RAPL energy measurements from Linux
- Ubuntu has a papi-examples packages that included lots of examples of advanced use PAPI, the files are stored in /usr/share/doc/papi-examples
- papi-tools provides utilities command for use with PAPI: papi_mem_info to list the system memory info. papi_avail to list the events available on the system.
Parallel Algorithm Design
OpenMP
- OpenMP Official Website, which provides the official standard, Technical Report, example document and tons of other information.
- OpenMP tutorial from LLNL
Parallel Program Measurement and Analysis
- isoefficiency paper
- Strong Scaling vs Weak Scaling, google search the keywords
- Amdahl's Law vs. Gustafson-Barsis' Law
Cilk and Cilkplus
- MIT Cilk
- Intel Cilkplus Runtime
- Cilkplus Tutorial
- Cilkplus Examples from Intel Compiler Manual
- Cilkplus performance tools (cilkview, cilkscreen)
PThread
Computer Architecture, Memory Hierarchy and Cache Coherence
- Lecture Notes for Computer Architecture from MIT
- A Primer on Memory Consistency and Cache Coherence from the Synthesis Lectures on Computer Architecture
- Computer Organization and Design, Revised Fourth Edition, The latest edition (5th edition) (bible for undergraduate computer architecture course)
- Computer Architecture, 5th Edition A Quantitative Approach (bible book for graduate computer architecture course)
- Ulrich Drepper, What Every Programmer Should Know About Memory
GPU and CUDA Programming
- GPU and CUDA examples used during the class
- Matrix Multiplication Examples (both using global memory and shared memory)
- CUDA C Programming Guide
- CUDA Toolkit documentation, which includes CUDA installation, C programming guide, APIs for cuBlas, cuFFT etc, tools, compiler SDK, and others.
- GPU Accelerated Computing with C and C++, which also has some videos.
MPI Programming
- Examples used during the class
- MPI Official Website that include the standard document.
- MPI Tutorial from LLNL
PGAS and others
Other Related Topics (not covered in the class)
- MapReduce with Hadoop/Spark
- Performance Profiling and Analysis Tools (TAU, HPCToolkit, Intel VTune, nvprof, etc)
- Algorithm/Dwarfs (Sequential, OpenMP, Cilkplus, C++11 (std::thread and std::async, CUDA and MPI versions)
- Dense matrix (axpy, mv, mm)
- Linear Algebra (LU, QR)
- Reduction (sum)
- Stencil and image processing related (jacobi, FFT)
- Sorting (quicksort)
- Graph (traversal, UTS)
- Sparse matrix
- Smith–Waterman
- Larger application
- Co-Design Prox App
- Mantevo miniapps