6.10. target data Construct#

6.10.1. Simple target data Construct#

This example shows how the target data construct maps variables to a device data environment. The target data construct creates a new device data environment and maps the variables v1 , v2 , and p to the new device data environment. The target construct enclosed in the target data region creates a new device data environment, which inherits the variables v1 , v2 , and p from the enclosing device data environment. The variable N is mapped into the new device data environment from the encountering task’s data environment.

//%compiler: clang
//%cflags: -fopenmp

/*
* name: target_data.1
* type: C
* version: omp_4.0
*/
extern void init(float*, float*, int);
extern void output(float*, int);
void vec_mult(float *p, float *v1, float *v2, int N)
{
   int i;
   init(v1, v2, N);
   #pragma omp target data map(to: v1[0:N], v2[:N]) map(from: p[0:N])
   {
      #pragma omp target
      #pragma omp parallel for
      for (i=0; i<N; i++)
 p[i] = v1[i] * v2[i];
   }
   output(p, N);
}

The Fortran code passes a reference and specifies the extent of the arrays in the declaration. No length information is necessary in the map clause, as is required with C/C++ pointers.

!!%compiler: gfortran
!!%cflags: -fopenmp

! name: target_data.1
! type: F-free
! version:    omp_4.0
subroutine vec_mult(p, v1, v2, N)
   real    ::  p(N), v1(N), v2(N)
   integer ::  i
   call init(v1, v2, N)
   !$omp target data map(to: v1, v2) map(from: p)
   !$omp target
   !$omp parallel do
      do i=1,N
  p(i) = v1(i) * v2(i)
      end do
   !$omp end target
   !$omp end target data
   call output(p, N)
end subroutine

6.10.2. target data Region Enclosing Multiple target Regions#

The following examples show how the target data construct maps variables to a device data environment of a target region. The target data construct creates a device data environment and encloses target regions, which have their own device data environments. The device data environment of the target data region is inherited by the device data environment of an enclosed target region. The target data construct is used to create variables that will persist throughout the target data region.

In the following example the variables v1 and v2 are mapped at each target construct. Instead of mapping the variable p twice, once at each target construct, p is mapped once by the target data construct.

//%compiler: clang
//%cflags: -fopenmp

/*
* name: target_data.2
* type: C
* version: omp_4.0
*/
extern void init(float*, float*, int);
extern void init_again(float*, float*, int);
extern void output(float*, int);
void vec_mult(float *p, float *v1, float *v2, int N)
{
   int i;
   init(v1, v2, N);
   #pragma omp target data map(from: p[0:N])
   {
      #pragma omp target map(to: v1[:N], v2[:N])
      #pragma omp parallel for
      for (i=0; i<N; i++)
 p[i] = v1[i] * v2[i];
      init_again(v1, v2, N);
      #pragma omp target map(to: v1[:N], v2[:N])
      #pragma omp parallel for
      for (i=0; i<N; i++)
 p[i] = p[i] + (v1[i] * v2[i]);
   }
   output(p, N);
}

The Fortran code uses reference and specifies the extent of the p , v1 and v2 arrays. No length information is necessary in the map clause, as is required with C/C++ pointers. The arrays v1 and v2 are mapped at each target construct. Instead of mapping the array p twice, once at each target construct, p is mapped once by the target data construct.

!!%compiler: gfortran
!!%cflags: -fopenmp

! name: target_data.2
! type: F-free
! version:    omp_4.0
subroutine vec_mult(p, v1, v2, N)
   real    ::  p(N), v1(N), v2(N)
   integer ::  i
   call init(v1, v2, N)
   !$omp target data map(from: p)
      !$omp target map(to: v1, v2 )
         !$omp parallel do
         do i=1,N
            p(i) = v1(i) * v2(i)
         end do
      !$omp end target
      call init_again(v1, v2, N)
      !$omp target map(to: v1, v2 )
         !$omp parallel do
         do i=1,N
            p(i) = p(i) + v1(i) * v2(i)
         end do
      !$omp end target
   !$omp end target data
   call output(p, N)
end subroutine

In the following example, the array Q is mapped once at the enclosing target data region instead of at each target construct. In OpenMP 4.0, a scalar variable is implicitly mapped with the tofrom map-type. But since OpenMP 4.5, a scalar variable, such as the tmp variable, has to be explicitly mapped with the tofrom map-type at the first target construct in order to return its reduced value from the parallel loop construct to the host. The variable defaults to firstprivate at the second target construct.

//%compiler: clang
//%cflags: -fopenmp

/*
* name: target_data.3
* type: C
* version: omp_4.0
*/

#include <math.h>
#define COLS 100

void gramSchmidt(float Q[][COLS], const int rows)
{
    int cols = COLS;
    #pragma omp target data map(Q[0:rows][0:cols])
    for(int k=0; k < cols; k++)
    {
        double tmp = 0.0;

        #pragma omp target map(tofrom: tmp)
        #pragma omp parallel for reduction(+:tmp)
        for(int i=0; i < rows; i++)
            tmp += (Q[i][k] * Q[i][k]);

        tmp = 1/sqrt(tmp);

        #pragma omp target
        #pragma omp parallel for
        for(int i=0; i < rows; i++)
            Q[i][k] *= tmp;
    }
}

/* Note:  The variable tmp is now mapped with tofrom, for correct
          execution with 4.5 (and pre-4.5) compliant compilers.
          See Devices Intro.
 */
!!%compiler: gfortran
!!%cflags: -fopenmp

! name: target_data.3
! type: F-free
! version:    omp_4.0
subroutine gramSchmidt(Q,rows,cols)
integer             ::   rows,cols,  i,k
double precision    :: Q(rows,cols), tmp
      !$omp target data map(Q)
      do k=1,cols
         tmp = 0.0d0
        !$omp target map(tofrom: tmp)
           !$omp parallel do reduction(+:tmp)
           do i=1,rows
              tmp = tmp + (Q(i,k) * Q(i,k))
           end do
        !$omp end target

          tmp = 1.0d0/sqrt(tmp)

        !$omp target
           !$omp parallel do
           do i=1,rows
               Q(i,k) = Q(i,k)*tmp
           enddo
        !$omp end target
      end do
      !$omp end target data
end subroutine

! Note:  The variable tmp is now mapped with tofrom, for correct
! execution with 4.5 (and pre-4.5) compliant compilers. See Devices Intro.

6.10.3. target data Construct with Orphaned Call#

The following two examples show how the target data construct maps variables to a device data environment. The target data construct’s device data environment encloses the target construct’s device data environment in the function vec_mult().

When the type of the variable appearing in an array section is pointer, the pointer variable and the storage location of the corresponding array section are mapped to the device data environment. The pointer variable is treated as if it had appeared in a map clause with a map-type of alloc. The array section’s storage location is mapped according to the map-type in the map clause (the default map-type is tofrom).

The target construct’s device data environment inherits the storage locations of the array sections v1[0:N] , v2[:n] , and p0[0:N] from the enclosing target data construct’s device data environment. Neither initialization nor assignment is performed for the array sections in the new device data environment.

The pointer variables p1 , v3 , and v4 are mapped into the target construct’s device data environment with an implicit map-type of alloc and they are assigned the address of the storage location associated with their corresponding array sections. Note that the following pairs of array section storage locations are equivalent ( p0[:N] , p1[:N] ), ( v1[:N] , v3[:N] ), and ( v2[:N] , v4[:N] ).

//%compiler: clang
//%cflags: -fopenmp

/*
* name: target_data.4
* type: C
* version: omp_4.0
*/
void vec_mult(float*, float*, float*, int);

extern void init(float*, float*, int);
extern void output(float*, int);


void foo(float *p0, float *v1, float *v2, int N)
{
   init(v1, v2, N);

   #pragma omp target data map(to: v1[0:N], v2[:N]) map(from: p0[0:N])
   {
      vec_mult(p0, v1, v2, N);
   }

   output(p0, N);
}


void vec_mult(float *p1, float *v3, float *v4, int N)
{
   int i;
   #pragma omp target map(to: v3[0:N], v4[:N]) map(from: p1[0:N])
   #pragma omp parallel for
   for (i=0; i<N; i++)
   {
     p1[i] = v3[i] * v4[i];
   }
}

The Fortran code maps the pointers and storage in an identical manner (same extent, but uses indices from 1 to N ).

The target construct’s device data environment inherits the storage locations of the arrays v1 , v2 and p0 from the enclosing target data constructs’s device data environment. However, in Fortran the associated data of the pointer is known, and the shape is not required.

The pointer variables p1 , v3 , and v4 are mapped into the target construct’s device data environment with an implicit map-type of alloc and they are assigned the address of the storage location associated with their corresponding array sections. Note that the following pair of array storage locations are equivalent ( p0 , p1 ), ( v1 , v3 ), and ( v2 , v4 ).

!!%compiler: gfortran
!!%cflags: -fopenmp

! name: target_data.4
! type: F-free
! version:    omp_4.0
module mults
contains
subroutine foo(p0,v1,v2,N)
real,pointer,dimension(:) :: p0, v1, v2
integer                   :: N,i

   call init(v1, v2, N)

   !$omp target data map(to: v1, v2) map(from: p0)
    call vec_mult(p0,v1,v2,N)
   !$omp end target data

   call output(p0, N)

end subroutine

subroutine vec_mult(p1,v3,v4,N)
real,pointer,dimension(:) :: p1, v3, v4
integer                   :: N,i

   !$omp target map(to: v3, v4) map(from: p1)
   !$omp parallel do
   do i=1,N
      p1(i) = v3(i) * v4(i)
   end do
   !$omp end target

end subroutine
end module

In the following example, the variables p1 , v3 , and v4 are references to the pointer variables p0 , v1 and v2 respectively. The target construct’s device data environment inherits the pointer variables p0 , v1 , and v2 from the enclosing target data construct’s device data environment. Thus, p1 , v3 , and v4 are already present in the device data environment.

//%compiler: clang
//%cflags: -fopenmp

/*
* name: target_data.5
* type: C++
* version: omp_4.0
*/
void vec_mult(float* &, float* &, float* &, int &);
extern void init(float*, float*, int);
extern void output(float*, int);
void foo(float *p0, float *v1, float *v2, int N)
{
   init(v1, v2, N);
   #pragma omp target data map(to: v1[0:N], v2[:N]) map(from: p0[0:N])
   {
      vec_mult(p0, v1, v2, N);
   }
   output(p0, N);
}
void vec_mult(float* &p1, float* &v3, float* &v4, int &N)
{
   int i;
   #pragma omp target map(to: v3[0:N], v4[:N]) map(from: p1[0:N])
   #pragma omp parallel for
   for (i=0; i<N; i++)
     p1[i] = v3[i] * v4[i];
}

In the following example, the usual Fortran approach is used for dynamic memory. The p0 , v1 , and v2 arrays are allocated in the main program and passed as references from one routine to another. In vec_mult, p1 , v3 and v4 are references to the p0 , v1 , and v2 arrays, respectively. The target construct’s device data environment inherits the arrays p0 , v1 , and v2 from the enclosing target data construct’s device data environment. Thus, p1 , v3 , and v4 are already present in the device data environment.

!!%compiler: gfortran
!!%cflags: -fopenmp

! name: target_data.5
! type: F-free
! version:    omp_4.0
module my_mult
contains
subroutine foo(p0,v1,v2,N)
real,dimension(:) :: p0, v1, v2
integer           :: N,i
   call init(v1, v2, N)
   !$omp target data map(to: v1, v2) map(from: p0)
    call vec_mult(p0,v1,v2,N)
   !$omp end target data
   call output(p0, N)
end subroutine
subroutine vec_mult(p1,v3,v4,N)
real,dimension(:) :: p1, v3, v4
integer           :: N,i
   !$omp target map(to: v3, v4) map(from: p1)
   !$omp parallel do
   do i=1,N
      p1(i) = v3(i) * v4(i)
   end do
   !$omp end target
end subroutine
end module
program main
use my_mult
integer, parameter :: N=1024
real,allocatable, dimension(:) :: p, v1, v2
   allocate( p(N), v1(N), v2(N) )
   call foo(p,v1,v2,N)
   deallocate( p, v1, v2 )
end program

6.10.4. target data Construct with if Clause#

The following two examples show how the target data construct maps variables to a device data environment.

In the following example, the if clause on the target data construct indicates that if the variable N is smaller than a given threshold, then the target data construct will not create a device data environment.

The target constructs enclosed in the target data region must also use an if clause on the same condition, otherwise the pointer variable p is implicitly mapped with a map-type of tofrom, but the storage location for the array section p[0:N] will not be mapped in the device data environments of the target constructs.

//%compiler: clang
//%cflags: -fopenmp

/*
* name: target_data.6
* type: C
* version: omp_4.0
*/
#define THRESHOLD 1000000
extern void init(float*, float*, int);
extern void init_again(float*, float*, int);
extern void output(float*, int);
void vec_mult(float *p, float *v1, float *v2, int N)
{
   int i;
   init(v1, v2, N);
   #pragma omp target data if(N>THRESHOLD) map(from: p[0:N])
   {
      #pragma omp target if (N>THRESHOLD) map(to: v1[:N], v2[:N])
      #pragma omp parallel for
      for (i=0; i<N; i++)
        p[i] = v1[i] * v2[i];
      init_again(v1, v2, N);
      #pragma omp target if (N>THRESHOLD) map(to: v1[:N], v2[:N])
      #pragma omp parallel for
      for (i=0; i<N; i++)
        p[i] = p[i] + (v1[i] * v2[i]);
   }
   output(p, N);
}

The if clauses work the same way for the following Fortran code. The target constructs enclosed in the target data region should also use an if clause with the same condition, so that the target data region and the target region are either both created for the device, or are both ignored.

!!%compiler: gfortran
!!%cflags: -fopenmp

! name: target_data.6
! type: F-free
! version:    omp_4.0
module params
integer,parameter :: THRESHOLD=1000000
end module
subroutine vec_mult(p, v1, v2, N)
   use params
   real    ::  p(N), v1(N), v2(N)
   integer ::  i
   call init(v1, v2, N)
   !$omp target data if(N>THRESHOLD) map(from: p)
      !$omp target if(N>THRESHOLD) map(to: v1, v2)
         !$omp parallel do
         do i=1,N
            p(i) = v1(i) * v2(i)
         end do
      !$omp end target
      call init_again(v1, v2, N)
      !$omp target if(N>THRESHOLD) map(to: v1, v2)
         !$omp parallel do
         do i=1,N
            p(i) = p(i) + v1(i) * v2(i)
         end do
      !$omp end target
   !$omp end target data
   call output(p, N)
end subroutine

In the following example, when the if clause conditional expression on the target construct evaluates to false , the target region will execute on the host device. However, the target data construct created an enclosing device data environment that mapped p[0:N] to a device data environment on the default device. At the end of the target data region the array section p[0:N] will be assigned from the device data environment to the corresponding variable in the data environment of the task that encountered the target data construct, resulting in undefined values in p[0:N] .

//%compiler: clang
//%cflags: -fopenmp

/*
* name: target_data.7
* type: C
* version: omp_4.0
*/
#define THRESHOLD 1000000
extern void init(float*, float*, int);
extern void output(float*, int);
void vec_mult(float *p, float *v1, float *v2, int N)
{
   int i;
   init(v1, v2, N);
   #pragma omp target data map(from: p[0:N])
   {
      #pragma omp target if (N>THRESHOLD) map(to: v1[:N], v2[:N])
      #pragma omp parallel for
      for (i=0; i<N; i++)
        p[i] = v1[i] * v2[i];
   } /* UNDEFINED behavior if N<=THRESHOLD */
   output(p, N);
}

The if clauses work the same way for the following Fortran code. When the if clause conditional expression on the target construct evaluates to false , the target region will execute on the host device. However, the target data construct created an enclosing device data environment that mapped the p array (and v1 and v2 ) to a device data environment on the default target device. At the end of the target data region the p array will be assigned from the device data environment to the corresponding variable in the data environment of the task that encountered the target data construct, resulting in undefined values in p .

!!%compiler: gfortran
!!%cflags: -fopenmp

! name: target_data.7
! type: F-free
! version:    omp_4.0
module params
integer, parameter :: THRESHOLD=1000000
end module
subroutine vec_mult(p, v1, v2, N)
   use params
   real    ::  p(N), v1(N), v2(N)
   integer ::  i
   call init(v1, v2, N)
   !$omp target data map(from: p)
      !$omp target if(N>THRESHOLD) map(to: v1, v2)
         !$omp parallel do
         do i=1,N
            p(i) = v1(i) * v2(i)
         end do
      !$omp end target
   !$omp end target data
   call output(p, N)  !*** UNDEFINED behavior if N<=THRESHOLD
end subroutine