6.1. target Construct#

6.1.1. target Construct on parallel Construct#

This following example shows how the target construct offloads a code region to a target device. The variables p , v1 , v2 , and N are implicitly mapped to the target device.

//%compiler: clang
//%cflags: -fopenmp

/*
* name: target.1
* type: C
* version: omp_4.0
*/
extern void init(float*, float*, int);
extern void output(float*, int);
void vec_mult(int N)
{
   int i;
   float p[N], v1[N], v2[N];
   init(v1, v2, N);
   #pragma omp target
   #pragma omp parallel for private(i)
   for (i=0; i<N; i++)
     p[i] = v1[i] * v2[i];
   output(p, N);
}
!!%compiler: gfortran
!!%cflags: -fopenmp

! name: target.1
! type: F-free
! version:    omp_4.0
subroutine vec_mult(N)
   integer ::  i,N
   real    ::  p(N), v1(N), v2(N)
   call init(v1, v2, N)
   !$omp target
   !$omp parallel do
   do i=1,N
      p(i) = v1(i) * v2(i)
   end do
   !$omp end target
   call output(p, N)
end subroutine

6.1.2. target Construct with map Clause#

This following example shows how the target construct offloads a code region to a target device. The variables p , v1 and v2 are explicitly mapped to the target device using the map clause. The variable N is implicitly mapped to the target device.

//%compiler: clang
//%cflags: -fopenmp

/*
* name: target.2
* type: C
* version: omp_4.0
*/
extern void init(float*, float*, int);
extern void output(float*, int);
void vec_mult(int N)
{
   int i;
   float p[N], v1[N], v2[N];
   init(v1, v2, N);
   #pragma omp target map(v1, v2, p)
   #pragma omp parallel for
   for (i=0; i<N; i++)
     p[i] = v1[i] * v2[i];
   output(p, N);
}
!!%compiler: gfortran
!!%cflags: -fopenmp

! name: target.2
! type: F-free
! version:    omp_4.0
subroutine vec_mult(N)
   integer ::  i,N
   real    ::  p(N), v1(N), v2(N)
   call init(v1, v2, N)
   !$omp target map(v1,v2,p)
   !$omp parallel do
   do i=1,N
      p(i) = v1(i) * v2(i)
   end do
   !$omp end target
   call output(p, N)
end subroutine

6.1.3. map Clause with to/from map-types#

The following example shows how the target construct offloads a code region to a target device. In the map clause, the to and from map-types define the mapping between the original (host) data and the target (device) data. The to map-type specifies that the data will only be read on the device, and the from map-type specifies that the data will only be written to on the device. By specifying a guaranteed access on the device, data transfers can be reduced for the target region.

The to map-type indicates that at the start of the target region the variables v1 and v2 are initialized with the values of the corresponding variables on the host device, and at the end of the target region the variables v1 and v2 are not assigned to their corresponding variables on the host device.

The from map-type indicates that at the start of the target region the variable p is not initialized with the value of the corresponding variable on the host device, and at the end of the target region the variable p is assigned to the corresponding variable on the host device.

//%compiler: clang
//%cflags: -fopenmp

/*
* name: target.3
* type: C
* version: omp_4.0
*/
extern void init(float*, float*, int);
extern void output(float*, int);
void vec_mult(int N)
{
   int i;
   float p[N], v1[N], v2[N];
   init(v1, v2, N);
   #pragma omp target map(to: v1, v2) map(from: p)
   #pragma omp parallel for
   for (i=0; i<N; i++)
     p[i] = v1[i] * v2[i];
   output(p, N);
}

The to and from map-types allow programmers to optimize data motion. Since data for the v arrays are not returned, and data for the p array are not transferred to the device, only one-half of the data is moved, compared to the default behavior of an implicit mapping.

!!%compiler: gfortran
!!%cflags: -fopenmp

! name: target.3
! type: F-free
! version:    omp_4.0
subroutine vec_mult(N)
   integer ::  i,N
   real    ::  p(N), v1(N), v2(N)
   call init(v1, v2, N)
   !$omp target map(to: v1,v2) map(from: p)
   !$omp parallel do
   do i=1,N
      p(i) = v1(i) * v2(i)
   end do
   !$omp end target
   call output(p, N)
end subroutine

6.1.4. map Clause with Array Sections#

The following example shows how the target construct offloads a code region to a target device. In the map clause, map-types are used to optimize the mapping of variables to the target device. Because variables p , v1 and v2 are pointers, array section notation must be used to map the arrays. The notation :N is equivalent to 0:N.

//%compiler: clang
//%cflags: -fopenmp

/*
* name: target.4
* type: C
* version: omp_4.0
*/
extern void init(float*, float*, int);
extern void output(float*, int);
void vec_mult(float *p, float *v1, float *v2, int N)
{
   int i;
   init(v1, v2, N);
   #pragma omp target map(to: v1[0:N], v2[:N]) map(from: p[0:N])
   #pragma omp parallel for
   for (i=0; i<N; i++)
     p[i] = v1[i] * v2[i];
   output(p, N);
}

In C, the length of the pointed-to array must be specified. In Fortran the extent of the array is known and the length need not be specified. A section of the array can be specified with the usual Fortran syntax, as shown in the following example. The value 1 is assumed for the lower bound for array section v2(:N) .

!!%compiler: gfortran
!!%cflags: -fopenmp

! name: target.4
! type: F-free
! version:    omp_4.0
module mults
contains
subroutine vec_mult(p,v1,v2,N)
   real,pointer,dimension(:) :: p, v1, v2
   integer                   :: N,i
   call init(v1, v2, N)
   !$omp target map(to: v1(1:N), v2(:N)) map(from: p(1:N))
   !$omp parallel do
   do i=1,N
      p(i) = v1(i) * v2(i)
   end do
   !$omp end target
   call output(p, N)
end subroutine
end module

A more realistic situation in which an assumed-size array is passed to vec_mult requires that the length of the arrays be specified, because the compiler does not know the size of the storage. A section of the array must be specified with the usual Fortran syntax, as shown in the following example. The value 1 is assumed for the lower bound for array section v2(:N) .

!!%compiler: gfortran
!!%cflags: -fopenmp

! name: target.4b
! type: F-free
! version:    omp_4.0
module mults
contains
subroutine vec_mult(p,v1,v2,N)
   real,dimension(*) :: p, v1, v2
   integer           :: N,i
   call init(v1, v2, N)
   !$omp target map(to: v1(1:N), v2(:N)) map(from: p(1:N))
   !$omp parallel do
   do i=1,N
      p(i) = v1(i) * v2(i)
   end do
   !$omp end target
   call output(p, N)
end subroutine
end module

6.1.5. target Construct with if Clause#

The following example shows how the target construct offloads a code region to a target device.

The if clause on the target construct indicates that if the variable N is smaller than a given threshold, then the target region will be executed by the host device.

The if clause on the parallel construct indicates that if the variable N is smaller than a second threshold then the parallel region is inactive.

//%compiler: clang
//%cflags: -fopenmp

/*
* name: target.5
* type: C
* version: omp_4.0
*/
#define THRESHOLD1 1000000
#define THRESHOLD2 1000

extern void init(float*, float*, int);
extern void output(float*, int);

void vec_mult(float *p, float *v1, float *v2, int N)
{
   int i;

   init(v1, v2, N);

   #pragma omp target if(N>THRESHOLD1) map(to: v1[0:N], v2[:N])\
 map(from: p[0:N])
   #pragma omp parallel for if(N>THRESHOLD2)
   for (i=0; i<N; i++)
     p[i] = v1[i] * v2[i];

   output(p, N);
}
!!%compiler: gfortran
!!%cflags: -fopenmp

! name: target.5
! type: F-free
! version:    omp_4.0
module params
integer,parameter :: THRESHOLD1=1000000, THRESHHOLD2=1000
end module

subroutine vec_mult(p, v1, v2, N)
   use params
   real    ::  p(N), v1(N), v2(N)
   integer ::  i

   call init(v1, v2, N)

   !$omp target if(N>THRESHHOLD1) map(to: v1, v2 ) map(from: p)
      !$omp parallel do if(N>THRESHOLD2)
      do i=1,N
  p(i) = v1(i) * v2(i)
      end do
   !$omp end target

   call output(p, N)
end subroutine

The following example is a modification of the above target.5 code to show the combined target and parallel loop directives. It uses the directive-name modifier in multiple if clauses to specify the component directive to which it applies.

The if clause with the target modifier applies to the target component of the combined directive, and the if clause with the parallel modifier applies to the parallel component of the combined directive.

//%compiler: clang
//%cflags: -fopenmp

/*
* name: target.6
* type: C
* version: omp_4.5
*/
#define THRESHOLD1 1000000
#define THRESHOLD2 1000

extern void init(float*, float*, int);
extern void output(float*, int);

void vec_mult(float *p, float *v1, float *v2, int N)
{
   int i;

   init(v1, v2, N);

   #pragma omp target parallel for \
        if(target: N>THRESHOLD1) if(parallel: N>THRESHOLD2) \
        map(to: v1[0:N], v2[:N]) map(from: p[0:N])
   for (i=0; i<N; i++)
     p[i] = v1[i] * v2[i];

   output(p, N);
}
!!%compiler: gfortran
!!%cflags: -fopenmp

! name: target.6
! type: F-free
! version:    omp_4.5
module params
integer,parameter :: THRESHOLD1=1000000, THRESHHOLD2=1000
end module

subroutine vec_mult(p, v1, v2, N)
   use params
   real    ::  p(N), v1(N), v2(N)
   integer ::  i

   call init(v1, v2, N)

   !$omp target parallel do  &
   !$omp&   if(target: N>THRESHHOLD1) if(parallel: N>THRESHOLD2) &
   !$omp&   map(to: v1, v2 ) map(from: p)
      do i=1,N
  p(i) = v1(i) * v2(i)
      end do
   !$omp end target parallel do

   call output(p, N)
end subroutine

6.1.6. Target Reverse Offload#

Beginning with OpenMP 5.0, implementations are allowed to offload back to the host (reverse offload).

In the example below the error_handler function is executed back on the host, if an erroneous value is detected in the A array on the device.

This is accomplished by specifying the device-modifier ancestor modifier, along with a device number of 1, to indicate that the execution is to be performed on the immediate parent ( 1st ancestor )– the host.

The requires directive (another 5.0 feature) uses the reverse_offload clause to guarantee that the reverse offload is implemented.

Note that the declare target directive uses the device_type clause (another 5.0 feature) to specify that the error_handler function is compiled to execute on the host only. This ensures that no attempt will be made to create a device version of the function. This feature may be necessary if the function exists in another compile unit.

//%compiler: clang
//%cflags: -fopenmp

/*
* name: target_reverse_offload.7
* type: C
* version: omp_5.2
*/

#include <stdio.h>
#include <stdlib.h>

#define N 100

#pragma omp requires reverse_offload

void error_handler(int wrong_value, int index)
{
   printf(" Error in offload: A[%d]=%d\n", index,wrong_value);
   printf("        Expecting: A[i ]=i\n");
   exit(1);
// output:  Error in offload: A[99]=-1
//                 Expecting: A[i ]=i

}
#pragma omp declare target device_type(host) enter(error_handler)

int main()
{
   int A[N];

   for (int i=0; i<N; i++) A[i] = i;

   A[N-1]=-1;

   #pragma omp target map(A)
   {
      for (int i=0; i<N; i++)
      {
         if (A[i] != i)
         {
            #pragma omp target device(ancestor: 1) map(always,to: A[i:1])
               error_handler(A[i], i);
         }
      }
   }
   return 0;
}
!!%compiler: gfortran
!!%cflags: -fopenmp

! name:       target_reverse_offload.7
! type:       F-free
! version:    omp_5.0

!$omp requires reverse_offload

subroutine error_handler(wrong_value, index)
  integer :: wrong_value,index
  !$omp declare target device_type(host)

   write( *,'("Error in offload: A(",i3,")=",i3)' ) index,wrong_value
   write( *,'("       Expecting: A(  i)=  i")'    )
   stop
     !!output: Error in offload: A( 99)= -1
     !!               Expecting: A(  i)=  i
end subroutine

program rev_off
  use omp_lib
  integer, parameter :: N=100
  integer            :: A(N) = (/ (i, i=1,100) /)

   A(N-1)=-1

   !$omp target map(A)
      do i=1,N
         if (A(i) /= i)  then
           !$omp target device(ancestor: 1) map(always,to :A(i))
               call error_handler(A(i), i)
           !$omp end target
         endif
      end do
   !$omp end target

end program