6.3. Pointer Mapping#

Pointers that contain host addresses require that those addresses are translated to device addresses for them to be useful in the context of a device data environment. Broadly speaking, there are two scenarios where this is important.

The first scenario is where the pointer is mapped to the device data environment, such that references to the pointer inside a target region are to the corresponding pointer. Pointer attachment ensures that the corresponding pointer will contain a device address when all of the following conditions are true:

  • the pointer is mapped by directive A to a device;

  • a list item that uses the pointer as its base pointer (call it the pointee) is mapped, to the same device, by directive B , which may be the same as A ;

  • the effect of directive B is to create either the corresponding pointer or pointee in the device data environment of the device.

Given the above conditions, pointer attachment is initiated as a result of directive B and subsequent references to the pointee list item in a target region that use the pointer will access the corresponding pointee. The corresponding pointer remains in this attached state until it is removed from the device data environment.

The second scenario, which is only applicable for C/C++, is where the pointer is implicitly privatized inside a target construct when it appears as the base pointer to a list item on the construct and does not appear explicitly as a list item in a map clause, is_device_ptr clause, or data-sharing attribute clause. This scenario can be further split into two cases: the list item is a zero-length array section (e.g., p[:0] ) or it is not.

If it is a zero-length array section, this will trigger a runtime check on entry to the target region for a previously mapped list item where the value of the pointer falls within the range of its base address and ending address. If such a match is found the private pointer is initialized to the device address corresponding to the value of the original pointer, and otherwise it is initialized to NULL (or retains its original value if the unified_address requirement is specified for that compilation unit).

If the list item (again, call it the pointee) is not a zero-length array section, the private pointer will be initialized such that references in the target region to the pointee list item that use the pointer will access the corresponding pointee.

The following example shows the basics of mapping pointers with and without associated storage on the host.

Storage for pointers ptr1 and ptr2 is created on the host. To map storage that is associated with a pointer on the host, the data can be explicitly mapped as an array section so that the compiler knows the amount of data to be assigned in the device (to the “corresponding” data storage area). On the target construct array sections are mapped; however, the pointer ptr1 is mapped, while ptr2 is not. Since ptr2 is not explicitly mapped, it is firstprivate. This creates a subtle difference in the way these pointers can be used.

As a firstprivate pointer, ptr2 can be manipulated on the device; however, as an explicitly mapped pointer, ptr1 becomes an attached pointer and cannot be manipulated. In both cases the host pointer is not updated with the device pointer address—as one would expect for distributed memory. The storage data on the host is updated from the corresponding device data at the end of the target region.

As a comparison, note that the aray array is automatically mapped, since the compiler knows the extent of the array.

The pointer ptr3 is used inside the target construct, but it does not appear in a data-mapping or data-sharing clause. Nor is there a defaultmap clause on the construct to indicate what its implicit data-mapping or data-sharing attribute should be. For such a case, ptr3 will be implicitly privatized within the construct and there will be a runtime check to see if the host memory to which it is pointing has corresponding memory in the device data environment. If this runtime check passes, the private ptr3 would be initialized to point to the corresponding memory. But in this case the check does not pass and so it is initialized to null. Since ptr3 is private, the value to which it is assigned in the target region is not returned into the original ptr3 on the host.

//%compiler: clang
//%cflags: -fopenmp

/*
* name:       target_ptr_map.1
* type:       C
* version: omp_5.0
*/
#include <stdio.h>
#include <stdlib.h>
#define N 100

int main()
{
  int *ptr1;
  int *ptr2;
  int *ptr3;
  int aray[N];

  ptr1 = (int *)malloc(sizeof(int)*N);
  ptr2 = (int *)malloc(sizeof(int)*N);

  #pragma omp target map(ptr1, ptr1[:N]) map(ptr2[:N] )
  {
     for (int i=0; i<N; i++)
     {
         ptr1[i] = i;
         ptr2[i] = i;
         aray[i] = i;
     }

   //*(++ptr1) = 9;  //NOT ALLOWED since ptr1 is an attached pointer
     *(++ptr2) = 9;  //    allowed since ptr2 is firstprivate

     ptr3=(int *)malloc(sizeof(int)*N); // ptr3 is firstprivate
                                        // ptr3 value not returned
     for (int i=0; i<N; i++) ptr3[i] = 5;

     for (int i=0; i<N; i++) ptr1[i] += ptr3[i];

     free(ptr3);     // explicitly free allocated storage on device
   }

   printf(" %d %d\n",ptr1[1],ptr2[1]);
   //        6  9

   free(ptr1);
   free(ptr2);
   return 0;
}

In the following example the global pointer p appears in a declare target directive. Hence, the pointer p will persist on the device throughout executions in all target regions.

The pointer is also used in an array section of a map clause on a target construct. When the pointer of storage associated with a declare target directive is mapped, as for the array section p[:N] in the target construct, the array section on the device is attached to the device pointer p on entry to the construct, and the value of the device pointer p becomes undefined on exit. (Of course, storage allocation for the array section on the device will occur before the pointer on the device is attached.)

//%compiler: clang
//%cflags: -fopenmp

/*
* name:       target_ptr_map.2
* type:       C
* version: omp_5.1
*/
#include <stdio.h>
#include <stdlib.h>
#define N 100

#pragma omp begin declare target
  int *p;
  extern void use_arg_p(int *p, int n);
  extern void use_global_p(     int n);
#pragma omp end declare target

int main()
{
  int i;
  p = (int *)malloc(sizeof(int)*N);

  #pragma omp target map(p[:N])  // device p attached to array section
  {
    for (i=0; i<N; i++) p[i] = i;
    use_arg_p(p, N);
    use_global_p(N);
  }                              // value of host p is preserved

  printf(" %3.3d %3.3d\n", p[1], p[N-1]);
         // 003   297   <- output

  free(p);
  return 0;
}

// A #pragma omp begin declare target is optional here
// because of prototype spec
void use_arg_p(int *q, int n)
{
  int i;
  for (i=0; i<n; i++)
    q[i] *= 2;
}

void use_global_p(int n)
{
  int i;
  for (i=0; i<n; i++)
    p[i] += i;   // valid since p is in declare target and called from
                 // inside target region where p was attached to
                 // valid memory
}
// A #pragma omp end declare target is optional here
// because of prototype spec

The following two examples illustrate subtle differences in pointer attachment to device address because of the order of data mapping.

In example target_ptr_map.3a the global pointer p1 points to array x and p2 points to array y on the host. The array section x[:N] is mapped by the target enter data directive while array y is mapped on the target construct. Since the begin declare target directive is applied to the declaration of p1 , p1 is a treated like a mapped variable on the target construct and references to p1 inside the construct will be to the corresponding p1 that exists on the device. However, the corresponding p1 will be undefined since there is no pointer attachment for it. Pointer attachment for p1 would require that (1) p1 (or an lvalue expression that refers to the same storage as p1 ) appears as a base pointer to a list item in a map clause, and (2) the construct that has the map clause causes the list item to transition from not mapped to mapped. The conditions are clearly not satisfied for this example.

The problem for p2 in this example is also subtle. It will be privatized inside the target construct, with a runtime check for whether the memory to which it is pointing has corresponding memory that is accessible on the device. If this check is successful, then the p2 inside the construct would be appropriately initialized to point to that corresponding memory. Unfortunately, despite there being an implicit map of the array y (to which p2 is pointing) on the construct, the order of this map relative to the initialization of p2 is unspecified. Therefore, the initial value of p2 will also be undefined.

Thus, referencing values via either p1 or p2 inside the target region would be invalid.

//%compiler: clang
//%cflags: -fopenmp

/*
* name:       target_ptr_map.3a
* type:       C
* version: omp_5.1
*/
#define N 100

int x[N], y[N];
#pragma omp begin declare target
int *p1;
#pragma omp end declare target
int *p2;

int foo()
{
  p1 = &x[0];
  p2 = &y[0];

  // Explicitly map array section x[:N]
  #pragma omp target enter data map(x[:N])

  #pragma omp target  // as if .. map(p1) map(p1[:0]) map(p2[:0]) map(y)
  {
    // Accessing the mapped arrays x,y is OK here.
    x[0] = 1;
    y[1] = 2;

    // Pointer attachment for p1 does not occur here
    //   because p1[:0] does not allocate a new array section and
    //   array x is present on the target construct as it was mapped
    //   before by the target enter data directive.
    p1[0] = 3;      // accessing p1 is undefined

    // The initial value of p2 in the target region is undefined
    //   because map(y) may occur after map(p2[:0]).
    p2[1] = 4;      // accessing p2 is undefined
  }

  return 0;
}

In example target_ptr_map.3b the mapping orders for arrays x and y were rearranged to allow proper pointer attachments. On the target construct, the map(x) clause triggers pointer attachment for p1 to the device address of x . Pointer p2 is assigned the device address of the previously mapped array y . Referencing values via either p1 or p2 inside the target region is now valid.

//%compiler: clang
//%cflags: -fopenmp

/*
* name:       target_ptr_map.3b
* type:       C
* version: omp_5.1
*/
#define N 100

int x[N], y[N];
#pragma omp begin declare target
int *p1;
#pragma omp end declare target
int *p2;

int foo()
{
  p1 = &x[0];
  p2 = &y[0];

  // Explicitly map array section y[:N]
  #pragma omp target enter data map(y[:N])

  #pragma omp target map(x[:N]) map(p1[:N]) map(p2[:0])
  {
    // Accessing the mapped arrays x,y is OK here.
    x[0] = 1;
    y[1] = 2;

    // Pointer attachment for p1 occurs here when array x is mapped
    //   on the target construct (as p1 = &x[0] on the device)
    p1[0] = 3;      // accessing p1 is OK

    // p2 in the target region is initialized to &y[0]
    p2[1] = 4;      // accessing p2 is OK
  }

  return 0;
}

In the following example, storage allocated on the host is not mapped in a target region if it is determined that the host memory is accessible from the device. On platforms that support host memory access from a target device, it may be more efficient to omit map clauses and avoid the potential memory allocation and data transfers that may result from the map. The omp_target_is_accessible API routine is used to determine if the host storage of size buf_size is accessible on the device, and a metadirective is used to select the directive variant (a target with/without a map clause).

The omp_target_is_accessible routine will return true if the storage indicated by the first and second arguments is accessible on the target device. In this case, the host pointer ptr may be directly dereferenced in the subsequent target region to access this storage, rather than mapping an array section based off the pointer. By explicitly specifying the host pointer in a firstprivate clause on the construct, its original value will be used directly in the target region. In OpenMP 5.1, removing the firstprivate clause will result in an implicit presence check of the storage to which ptr points, and since this storage is not mapped by the program, ptr will be NULL-initialized in the target region. In the next version of the OpenMP Specification, a false presence check without the firstprivate clause will cause the pointer to retain its original value.

//%compiler: clang
//%cflags: -fopenmp

/*
* name:       target_ptr_map.4
* type:       C
* version: omp_5.2
*/
#include <stdio.h>
#include <stdlib.h>
#include <omp.h>

void do_work(int *ptr, const int size);

int main()
{
   const int n = 1000;
   const int buf_size = sizeof(int) * n;
   const int dev = omp_get_default_device();

   int *ptr = (int *) malloc(buf_size); // possibly compiled on
                                        // Unified Shared Memory system
   const int accessible = omp_target_is_accessible(ptr, buf_size, dev);

   #pragma omp metadirective \
      when(user={condition(accessible)}: target firstprivate(ptr) ) \
      otherwise(                         target map(ptr[:n])      )
   {
      do_work(ptr, n);
   }

   free(ptr);
   return 0;
}

Similar to the previous example, the omp_target_is_accessible routine is used to discover if a deep copy is required for the platform. Here, the deep_copy map, defined in the declare mapper directive, is used if the host storage referenced by s.ptr (or s%ptr in Fortran) is not accessible from the device.

//%compiler: clang
//%cflags: -fopenmp

/*
* name:       target_ptr_map.5
* type:       C
* version: omp_5.2
*/
#include <stdio.h>
#include <stdlib.h>
#include <omp.h>

typedef struct {
   int *ptr;
   int buf_size;
} T;

#pragma omp declare mapper(deep_copy: T s) map(s, s.ptr[:s.buf_size])

void do_work(int *ptr, const int size);

int main()
{
   const int n = 1000;
   const int buf_size = sizeof(int) * n;
   T s = { 0, buf_size };
   const int dev = omp_get_default_device();
   s.ptr = (int *)malloc(buf_size);
   const int accessible =
      omp_target_is_accessible(s.ptr, s.buf_size, dev);

   #pragma omp metadirective \
      when(user={condition(accessible)}: target) \
      otherwise(target map(mapper(deep_copy),tofrom:s) )
   {
      do_work(s.ptr, n);
   }

   free(s.ptr);
   return 0;
}
!!%compiler: gfortran
!!%cflags: -fopenmp

! name:       target_ptr_map.5
! type:       F-free
! version:    omp_5.2
program main
   use omp_lib

   use, intrinsic :: iso_c_binding, only : c_loc, c_size_t, c_sizeof, c_int
   implicit none
   external :: do_work

   type T
     integer,pointer :: ptr(:)
     integer         :: buf_size
   end type

   !$omp declare mapper(deep_copy: T :: s) map(s, s%ptr(:s%buf_size))

   integer,parameter :: n = 1000
   integer(c_int)    :: dev, accessible
   integer(c_size_t) :: buf_size

   type(T) s

   allocate(s%ptr(n))

   buf_size = c_sizeof(s%ptr(1))*n
   dev = omp_get_default_device()

   accessible = omp_target_is_accessible(c_loc(s%ptr(1)), buf_size, dev)

   !$omp  begin metadirective                              &
   !$omp&       when(user={condition(accessible)}: target) &
   !$omp&       otherwise( target map(mapper(deep_copy),tofrom:s) )

      call do_work(s, n)

   !$omp  end   metadirective

   deallocate(s%ptr)

end program