Mapping Data to GPU Devices

4.4. Mapping Data to GPU Devices#

Efficient data management is crucial for achieving high performance when offloading computations to GPU devices. OpenMP provides several mechanisms to map data between the host and the device, allowing programmers to control data transfers and optimize memory usage. In this section, we will explore the map clause, implicit and explicit data mapping, array shaping, pointer mapping, and mapping structured data types.

4.4.1. map clause#

The map clause is used to specify how data should be transferred between the host and the device. It is associated with device constructs such as target, target data, target enter data, and target exit data. The syntax of the map clause is as follows:

map([[map-type-modifier[,]] map-type: ] list)

The map-type specifies the direction of the data transfer and can be one of the following:

to: Transfers data from the host to the device.
from: Transfers data from the device to the host.
tofrom: Transfers data from the host to the device and back to the host.
alloc: Allocates memory on the device without initializing it.

The map-type-modifier can be used to specify additional behavior, such as always, which indicates that the data should always be mapped, regardless of the presence of the if clause.

Example:

#pragma omp target map(to: a[0:n]) map(from: b[0:n])
{
  // Compute on the device using 'a' and 'b'
}

In this example, array a is mapped to the device, and array b is mapped back from the device after the computation.

4.4.2. Implicit and explicit data mapping#

OpenMP supports both implicit and explicit data mapping. Implicit data mapping automatically maps variables used within a target region to the device, based on their usage and data-sharing attributes. Explicit data mapping, on the other hand, requires programmers to specify the variables and their mapping behavior using the map clause.

Implicit data mapping is convenient but may not always provide optimal performance. Explicit data mapping gives programmers fine-grained control over data transfers and allows for optimizations such as minimizing data movement and overlapping computation with data transfers.

Example (implicit mapping):

int a[100];
#pragma omp target
{
  // 'a' is implicitly mapped to the device
  for (int i = 0; i < 100; i++) {
    a[i] = i;
  }
}

Example (explicit mapping):

int a[100];
#pragma omp target map(tofrom: a[0:100])
{
  // 'a' is explicitly mapped to the device and back to the host
  for (int i = 0; i < 100; i++) {
    a[i] = i;
  }
}

4.4.3. Array shaping and pointer mapping#

OpenMP provides mechanisms to map arrays and pointers to GPU devices efficiently. Array shaping allows programmers to map non-contiguous portions of an array to the device, enabling more precise control over data transfers. Pointer mapping allows for the mapping of dynamically allocated memory to the device.

Example (array shaping):

int a[100][100];
#pragma omp target map(to: a[0:100][0:50])
{
  // Only the first 50 columns of 'a' are mapped to the device
  // ...
}

Example (pointer mapping):

int* ptr = malloc(100 * sizeof(int));
#pragma omp target map(to: ptr[0:100])
{
  // The dynamically allocated memory pointed to by 'ptr' is mapped to the device
  // ...
}

4.4.4. Mapping structured data types#

OpenMP supports mapping structured data types, such as arrays of structures or nested structures, to GPU devices. Programmers can use the map clause to specify the mapping of individual members or entire structures.

Example:

struct Point {
  double x;
  double y;
  double z;
};

Point points[100];
#pragma omp target map(to: points[0:100].x, points[0:100].y, points[0:100].z)
{
  // The 'x', 'y', and 'z' members of the 'points' array are mapped to the device
  // ...
}

Mapping structured data types allows for more complex data structures to be efficiently transferred between the host and the device, enabling a wider range of applications to leverage GPU acceleration.

By understanding and effectively utilizing the data mapping techniques provided by OpenMP, programmers can optimize data transfers, reduce memory overhead, and achieve better performance when offloading computations to GPU devices.

In the next section, we will explore asynchronous execution and dependencies in OpenMP, which allow for overlapping computation and data transfers, and specifying dependencies between tasks and target regions.