12.11. Controlling Concurrency and Reproducibility with the order Clause#

The order clause is used for controlling the parallel execution of loop iterations for one or more loops that are associated with a directive. It is specified with a clause argument and optional modifier. The only supported argument, introduced in OpenMP 5.0, is the keyword concurrent which indicates that the loop iterations may execute concurrently, including iterations in the same chunk per the loop schedule. Because of the relaxed execution permitted with an order(concurrent) clause, codes must not assume that any cross-iteration data dependences would be preserved or that any two iterations may execute on the same thread.

The following example in this section demonstrates the use of the order(concurrent) clause, without any modifiers, for controlling the parallel execution of loop iterations. The order(concurrent) clause cannot be used for the second and third parallel for/do constructs because of either having data dependences or accessing threadprivate variables.

//%compiler: clang
//%cflags: -fopenmp

/*
* name: reproducible.1
* type: C
* version: omp_5.0
*/
#include <stdio.h>
#include <omp.h>

int main()
{
   const int n = 1000;
   int v[n], u[n];
   static int sum;
   #pragma omp threadprivate(sum)

   // no data dependences, so can execute concurrently
   #pragma omp parallel for order(concurrent)
   for (int i = 0; i < n; i++) {
      u[i] = i;
      v[i] = i;
      v[i] += u[i] * u[i];
   }

   // with data dependences, so cannot execute iterations
   // concurrently with the order(concurrent) clause
   #pragma omp parallel for ordered
   for (int i = 1; i < n; i++) {
      v[i] += u[i] * u[i];
      #pragma omp ordered
   v[i] += v[i-1];
   }

   sum = 0;
   // accessing a threadprivate variable, which would not be
   // permitted if the order(concurrent) clause was present
   #pragma omp parallel for copyin(sum)
   for (int i = 0; i < n; i++) {
      sum += v[i];
   }

   #pragma omp parallel
   {
      printf("sum = %d on thread %d\n", sum, omp_get_thread_num());
   }

   return 0;
}
!!%compiler: gfortran
!!%cflags: -fopenmp

! name: reproducible.1
! type: F-free
! version:    omp_5.0
program main
   use omp_lib
   implicit none
   integer, parameter :: n = 1000
   integer :: v(n), u(n)
   integer :: i
   integer, save :: sum
   !$omp threadprivate(sum)

   !! no data dependences, so can execute concurrently
   !$omp parallel do order(concurrent)
   do i = 1, n
      u(i) = i
      v(i) = i
      v(i) = v(i) + u(i) * u(i)
   end do

   !! with data dependences, so cannot execute iterations
   !! concurrently with the order(concurrent) clause
   !$omp parallel do ordered
   do i = 2, n
      v(i) = v(i) + u(i) * u(i)
      !$omp ordered
  v(i) = v(i) + v(i-1)
      !$omp end ordered
   end do

   sum = 0
   !! accessing a threadprivate variable, which would not be
   !! permitted if the order(concurrent) clause was present
   !$omp parallel do copyin(sum)
   do i = 2, n
      sum = sum + v(i)
   end do

   !$omp parallel
      print *,"sum = ",sum," on thread ", omp_get_thread_num()
   !$omp end parallel

end program

Modifiers to the order clause, introduced in OpenMP 5.1, may be specified to control the reproducibility of the loop schedule for the associated loop(s). A reproducible loop schedule will consistently yield the same mapping of iterations to threads (or SIMD lanes) if the directive name, loop schedule, iteration space, and binding region remain the same. The reproducible modifier indicates the loop schedule must be reproducible, while the unconstrained modifier indicates that the loop schedule is not reproducible. If a modifier is not specified, then the order clause does not affect the reproducibility of the loop schedule.

The next example demonstrates the use of the order(concurrent) clause with modifiers for additionally controlling the reproducibility of a loop’s schedule. The two worksharing-loop constructs in the first parallel construct specify that the loops have reproducible schedules, thus memory effects from iteration i from the first loop will be observable to iteration i in the second loop. In the second parallel construct, the order clause does not control reproducibility for the loop schedules. However, since both loops specify the same static schedules, the schedules are reproducible and the data dependences between the loops are preserved by the execution. In the third parallel construct, the order clause indicates that the loops are not reproducible, overriding the default reproducibility prescribed by the specified static schedule. Consequentially, the nowait clause on the first worksharing-loop construct should not be used to ensure that the data dependences are preserved by the execution.

//%compiler: clang
//%cflags: -fopenmp

/*
* name: reproducible.2
* type: C
* version: omp_5.1
*/
#include <stdio.h>

int main()
{
   const int n = 1000;
   int v[n], u[n];

   #pragma omp parallel
   {
      // reproducible schedules are used for the following two constructs
      #pragma omp for order(reproducible: concurrent) nowait
      for (int i = 0; i < n; i++) {
         u[i] = i;
         v[i] = i;
      }
      #pragma omp for order(reproducible: concurrent)
      for (int i = 0; i < n; i++) {
         v[i]  += u[i] * u[i];
      }
   }

   #pragma omp parallel
   {
      // static schedules preserve data dependences between the loops
      #pragma omp for schedule(static) order(concurrent) nowait
      for (int i = 0; i < n; i++) {
         u[i] = i;
         v[i] = i;
      }
      #pragma omp for schedule(static) order(concurrent)
      for (int i = 0; i < n; i++) {
         v[i]  += u[i] * u[i];
      }
   }

   #pragma omp parallel
   {
      // the default reproducibility by the static schedule is not
      // preserved due to the unconstrained order clause.
      // use of nowait here could result in data race.
      #pragma omp for schedule(static) order(unconstrained: concurrent)
      for (int i = 0; i < n; i++) {
         u[i] = i;
         v[i] = i;
      }
      #pragma omp for schedule(static) order(unconstrained: concurrent)
      for (int i = 0; i < n; i++) {
         v[i]  += u[i] * u[i];
      }
   }

   return 0;
}
!!%compiler: gfortran
!!%cflags: -fopenmp

! name: reproducible.2
! type: F-free
! version:    omp_5.1
program main
   implicit none
   integer, parameter :: n = 1000
   integer :: v(n), u(n)
   integer :: i

   !$omp parallel
      !! reproducible schedules are used the following two constructs
      !$omp do order(reproducible: concurrent) nowait
      do i = 1, n
         u(i) = i
         v(i) = i
      end do
      !$omp do order(reproducible: concurrent)
      do i = 1, n
         v(i) = v(i) + u(i) * u(i)
      end do
   !$omp end parallel

   !$omp parallel
      !! static schedules preserve data dependences between the loops
      !$omp do schedule(static) order(concurrent) nowait
      do i = 1, n
         u(i) = i
         v(i) = i
      end do
      !$omp do schedule(static) order(concurrent)
      do i = 1, n
         v(i) = v(i) + u(i) * u(i)
      end do
   !$omp end parallel

   !$omp parallel
      !! the default reproducibility by the static schedule is not
      !! preserved due to the unconstrained order clause.
      !! use of nowait here could result in data race.
      !$omp do schedule(static) order(unconstrained: concurrent)
      do i = 1, n
         u(i) = i
         v(i) = i
      end do
      !$omp do schedule(static) order(unconstrained: concurrent)
      do i = 1, n
         v(i) = v(i) + u(i) * u(i)
      end do
   !$omp end parallel

end program