12.7. Metadirectives#

A metadirective directive provides a mechanism to select a directive in a when clause to be used, depending upon one or more contexts: implementation, available devices and the present enclosing construct. The directive in an otherwise clause is used when a directive of the when clause is not selected.

In the when clause the context selector (or just selector ) defines traits that are evaluated for selection of the directive that follows the selector. This “selectable” directive is called a directive variant . Traits are grouped by construct , implementation and device sets to be used by a selector of the same name.

In the first example the architecture trait arch of the device selector set specifies that if an nvptx architecture is active in the OpenMP context, then the teams loop directive variant is selected as the directive; otherwise, the parallel loop directive variant of the otherwise clause is selected as the directive. That is, if a device of nvptx architecture is supported by the implementation within the enclosing target construct, its directive variant is selected. The architecture names, such as nvptx , are implementation defined. Also, note that device as used in a target construct specifies a device number, while device , as used in the metadirective directive as selector set, has traits of kind , isa and arch .

//%compiler: clang
//%cflags: -fopenmp

/*
* name: metadirective.1
* type: C
* version: omp_5.2
*/

#define N 100
#include <stdio.h>

int main()
{
   int v1[N], v2[N], v3[N];
   for(int i=0; i<N; i++){ v1[i]=(i+1); v2[i]=-(i+1); }

   #pragma omp target map(to:v1,v2) map(from:v3) device(0)
   #pragma omp metadirective \
                   when(     device={arch("nvptx")}: teams loop) \
                   otherwise(                     parallel loop)
     for (int i= 0; i< N; i++)  v3[i] = v1[i] * v2[i];

   printf(" %d  %d\n",v3[0],v3[N-1]); //output: -1  -10000

   return 0;
}
!!%compiler: gfortran
!!%cflags: -fopenmp

! name: metadirective.1
! type: F-free
! version: omp_5.2
program main
   integer, parameter :: N= 100
   integer ::  v1(N), v2(N), v3(N);

   do i=1,N;  v1(i)=i; v2(i)=-i;  enddo   ! initialize

   !$omp  target map(to:v1,v2) map(from:v3) device(0)
   !$omp  metadirective &
   !$omp&     when(     device={arch("nvptx")}: teams loop) &
   !$omp&     otherwise(                     parallel loop)
     do i= 1,N; v3(i) = v1(i) * v2(i); enddo
   !$omp  end target

   print *, v3(1),v3(N) !!output: -1  -10000
end program

In the second example, the implementation selector set is specified in the when clause to distinguish between platforms. Additionally, specific architectures are specified with the device selector set.

In the code, different teams constructs are employed as determined by the metadirective directive. The number of teams is restricted by a num_teams clause and a thread limit is also set by a thread_limit clause for vendor platforms and specific architecture traits. Otherwise, just the teams construct is used without any clauses, as prescribed by the otherwise clause.

//%compiler: clang
//%cflags: -fopenmp

/*
* name: metadirective.2
* type: C
* version: omp_5.2
*/
#define N 100
#include <stdio.h>
#include <omp.h>

void work_on_chunk(int idev, int i);

int main()                    //Driver
{
   int i,idev;

   for (idev=0; idev<omp_get_num_devices(); idev++)
   {
      #pragma omp target device(idev)
      #pragma omp metadirective \
               when( implementation={vendor(nvidia)},            \
                                       device={arch("kepler")}:  \
                     teams num_teams(512) thread_limit(32) )     \
               when( implementation={vendor(amd)},               \
                                       device={arch("fiji"  )}:  \
                     teams num_teams(512) thread_limit(64) )     \
               otherwise(                                        \
                     teams)
      #pragma omp distribute parallel for
      for (i=0; i<N; i++) work_on_chunk(idev,i);
   }
   return 0;
}
!!%compiler: gfortran
!!%cflags: -fopenmp

! name: metadirective.2
! type: F-free
! version: omp_5.2
program main                    !!Driver
  use omp_lib
  implicit none
  integer, parameter :: N=1000
  external           :: work_on_chunk
  integer            :: i,idev

  do idev=0,omp_get_num_devices()-1

    !$omp target device(idev)
    !$omp begin metadirective &
    !$omp&  when( implementation={vendor(nvidia)},        &
    !$omp&           device={arch("kepler")}:             &
    !$omp&        teams num_teams(512) thread_limit(32) ) &
    !$omp&  when( implementation={vendor(amd)},           &
    !$omp&           device={arch("fiji"  )}:             &
    !$omp&        teams num_teams(512) thread_limit(64) ) &
    !$omp&  otherwise( teams )
    !$omp distribute parallel do
    do i=1,N
       call work_on_chunk(idev,i)
    end do
    !$omp end metadirective
    !$omp end target

  end do

end program

In the third example, a construct selector set is specified in the when clause. Here, a metadirective directive is used within a function that is also compiled as a function for a target device as directed by a declare target directive. The target directive name of the construct selector ensures that the distribute parallel for/do construct is employed for the target compilation. Otherwise, for the host-compiled version the parallel for/do simd construct is used.

In the first call to the exp_pi_diff() routine the context is a target teams construct and the distribute parallel for/do construct version of the function is invoked, while in the second call the parallel for/do simd construct version is used.

This case illustrates an important point for users that may want to hoist the target directive out of a function that contains the usual target teams distribute parallel for/do construct (for providing alternate constructs through the metadirective directive as here). While this combined construct can be decomposed into a target and teams distribute parallel for/do constructs, the OpenMP 5.0 specification has the restriction: “If a teams construct is nested within a target construct, that target construct must contain no statements, declarations or directives outside of the teams construct’’. So, the teams construct must immediately follow the target construct without any intervening code statements (which includes function calls). Since the target construct alone cannot be hoisted out of a function, the target teams construct has been hoisted out of the function, and the distribute parallel for/do construct is used as the variant directive of the metadirective directive within the function.

//%compiler: clang
//%cflags: -fopenmp

/*
* name: metadirective.3
* type: C
* version: omp_5.2
*/
#include <stdio.h>
#include  <math.h>
#define      N 1000

#pragma omp begin declare target
void exp_pi_diff(double *d, double my_pi){
   #pragma omp metadirective \
               when(   construct={target}: distribute parallel for ) \
               otherwise(                  parallel for simd )
   for(int i = 0; i<N; i++) d[i] = exp( (M_PI-my_pi)*i );
}
#pragma omp end declare target

int main()
{
  //Calculates sequence of exponentials: (M_PI-my_pi) * index
  //M_PI is from math.h, and my_pi is user provided.

  double d[N];
  double my_pi=3.14159265358979e0;

      #pragma omp target teams map(tofrom: d[0:N])
      exp_pi_diff(d,my_pi);
                                           // value should be near 1
      printf("d[N-1] = %20.14f\n",d[N-1]); // ...= 1.00000000000311

      exp_pi_diff(d,my_pi);                // value should be near 1
      printf("d[N-1] = %20.14f\n",d[N-1]); // ...= 1.00000000000311
}
!!%compiler: gfortran
!!%cflags: -fopenmp

! name: metadirective.3
! type: F-free
! version: omp_5.2
module params
   integer, parameter :: N=1000
   DOUBLE PRECISION, PARAMETER::M_PI=4.0d0*DATAN(1.0d0)
                                     ! 3.1415926535897932_8
end module


subroutine exp_pi_diff(d,    my_pi)
  use params
  implicit none
  integer          ::  i
  double precision ::  d(N), my_pi
  !$omp declare target

  !$omp   metadirective &
  !$omp&      when( construct={target}: distribute parallel do )  &
  !$omp&      otherwise(                parallel do simd )

  do i = 1,size(d)
     d(i) = exp( (M_PI-my_pi)*i )
  end do

end subroutine

program main
  ! Calculates sequence of exponentials: (M_PI-my_pi) * index
  ! M_PI is from usual way, and my_pi is user provided.
  ! Fortran Standard does not provide PI

  use params
  implicit none
  double precision   :: d(N)
  double precision   :: my_pi=3.14159265358979d0

      !$omp target teams map(from: d)
      call exp_pi_diff(d,my_pi)
      !$omp end target teams
                                  ! value should be near 1
      print*, "d(N) = ",d(N)      ! 1.00000000000311

      call exp_pi_diff(d,my_pi) ! value should be near 1
      print*, "d(N) = ",d(N)      ! 1.00000000000311

end program

The user selector set can be used in a metadirective to select directives at execution time when the condition( boolean-expr ) selector expression is not a constant expression. In this case it is a dynamic trait set, and the selection is made at run time, rather than at compile time.

In the following example the foo function employs the condition selector to choose a device for execution at run time. In the bar routine metadirectives are nested. At the outer level a selection between serial and parallel execution in performed at run time, followed by another run time selection on the schedule kind in the inner level when the active construct trait is parallel.

(Note, the variable b in two of the “selected’’ constructs is declared private for the sole purpose of detecting and reporting that the construct is used. Since the variable is private, its value is unchanged outside of the construct region, whereas it is changed if the “unselected’’ construct is used.)

//%compiler: clang
//%cflags: -fopenmp

/*
* name:       metadirective.4
* type:       C
* version:    omp_5.2
*/
#define N 100
#include <stdbool.h>
#include   <stdio.h>
#include     <omp.h>

void foo(int *a, int n, bool use_gpu)
{
   int b=0;   //  use b to detect if run on gpu

   #pragma omp metadirective \
               when( user={condition(use_gpu)}:           \
                     target teams distribute parallel for \
                     private(b) map(from:a[0:n]) )        \
               otherwise(                                 \
                     parallel for )
   for (int i=0; i<n; i++) {a[i]=i; if(i==n-1) b=1;}

   if(b==0) printf("PASSED 1 of 3\n");
}

void bar (int *a, int n, bool run_parallel, bool unbalanced)
{
   int b=0;
   #pragma omp metadirective \
               when(user={condition(run_parallel)}: parallel)
   {
      if(omp_in_parallel() == 1 && omp_get_thread_num() == 0)
      {printf("PASSED 2 of 3\n");}

      #pragma omp metadirective \
          when( construct={parallel}, \
                user={condition(unbalanced)}: for schedule(guided) \
                                                  private(b)) \
          when( construct={parallel}        : for schedule(static))
      for (int i=0; i<n; i++) {a[i]=i; if(i==n-1) b=1;}
   }
   // if guided b=0, because b is private
   if(b==0) printf("PASSED 3 of 3\n");
}

void foo(int *a, int n, bool use_gpu);
void bar(int *a, int n, bool run_parallel, bool unbalanced);

int main(){

   int p[N];
   // App normally sets these, dependent on input parameters
   bool use_gpu=true, run_parallel=true, unbalanced=true;

   // Testing: set Env Var MK_FAIL to anything to fail tests
   if(getenv("MK_FAIL")!=NULL) {
      use_gpu=false; run_parallel=false; unbalanced=false;
   }

   foo(p, N, use_gpu);
   bar(p, N, run_parallel,unbalanced);

}
!!%compiler: gfortran
!!%cflags: -fopenmp

! name: metadirective.4
! type: F-free
! version: omp_5.2
subroutine foo(a, n, use_gpu)
   integer :: n, a(n)
   logical :: use_gpu

   integer :: b=0   !! use b to detect if run on gpu

   !$omp metadirective &
   !$omp&            when(user={condition(use_gpu)}:           &
   !$omp&                 target teams distribute parallel for &
   !$omp&                 private(b) map(from:a(1:n)) )        &
   !$omp&            otherwise(                                &
   !$omp&                 parallel do)
   do i = 1,n; a(i)=i; if(i==n) b=1; end do

   if(b==0) print *, "PASSED 1 of 3"  ! bc b is firstprivate for gpu run
end subroutine

subroutine bar (a, n, run_parallel, unbalanced)
   use omp_lib, only : omp_get_thread_num
   integer :: n, a(n)
   logical :: run_parallel, unbalanced

   integer :: b=0
   !$omp begin metadirective when(user={condition(run_parallel)}: parallel)

    if(omp_in_parallel() == 1 .and. omp_get_thread_num() == 0) &
       print *,"PASSED 2 of 3"

    !$omp metadirective &
    !$omp&  when(construct={parallel}, user={condition(unbalanced)}: &
    !$omp&         for schedule(guided) private(b)) &
    !$omp&  when(construct={parallel}: for schedule(static))
    do i = 1,n; a(i)=i; if(i==n) b=1; end do

   !$omp end metadirective

   if(b==0) print *, "PASSED 3 of 3"   !!if guided, b=0 since b is private
end subroutine

program meta
   use omp_lib
   integer, parameter :: N=100
   integer :: p(N)
   integer :: env_stat
                !! App normally sets these, dependent on input parameters
   logical ::  use_gpu=.true., run_parallel=.true., unbalanced=.true.

                !! Testing: set Env Var MK_FAIL to anything to fail tests
   call get_environment_variable('MK_FAIL',status=env_stat)
   if(env_stat /= 1) then                ! status =1 when not set!
      use_gpu=.false.; run_parallel=.false.; unbalanced=.false.
   endif


   call foo(p, N, use_gpu)
   call bar(p, N, run_parallel,unbalanced)

end program

Metadirectives can be used in conjunction with templates as shown in the C++ code below. Here the template definition generates two versions of the Fibonacci function. The tasking boolean is used in the condition selector to enable tasking. The true form implements a parallel version with task and taskwait constructs as in the tasking.4.c code in Section 5.1. The false form implements a serial version without any tasking constructs. Note that the serial version is used in the parallel function for optimally processing numbers less than 8.

//%compiler: clang
//%cflags: -fopenmp

/*
* name:       metadirective.5
* type:       C++
* version:    omp_5.0
*/
#include <stdio.h>

// revised Fibonacci from tasking.4.c example

template <bool tasking>
int fib(int n) {
  int i, j;
  if (n<2) {
    return n;
  } else if ( tasking && n<8 ) { // serial/taskless cutoff for n<8
    return fib<false>(n);
  } else {
    #pragma omp metadirective \
                when(user={condition(tasking)}: task shared(i))
    {
      i=fib<tasking>(n-1);
    }
    #pragma omp metadirective \
                when(user={condition(tasking)}: task shared(j))
    {
      j=fib<tasking>(n-2);
    }
    #pragma omp metadirective \
                when(user={condition(tasking)}: taskwait)
    return i+j;
  }
}

int main(int argc, char** argv) {
  int n = 15;
  #pragma omp parallel
  #pragma omp single
  {
    printf("fib(%i) = %i\n", n, fib<true>(n));
  }
  return 0;
}
// OUTPUT:
// fib(15) = 610