## Doacross Loop Nest

An __ordered__ clause can be used on a loop construct with an integer parameter argument to define the number of associated loops within  a  _doacross loop nest_  where cross-iteration dependences exist. A __doacross__ clause on an __ordered__ construct within an ordered  loop describes the dependences of the  _doacross_  loops.

In the code below, the __doacross(sink:i-1)__ clause defines an  _i-1_   to  _i_  cross-iteration dependence that specifies a wait point for  the completion of computation from iteration  _i-1_  before proceeding  to the subsequent statements. The __doacross(source:omp_cur_iteration)__  or __doacross(source:)__ clause indicates  the completion of computation from the current iteration ( _i_ )  to satisfy the cross-iteration dependence that arises from the iteration. The __omp_cur_iteration__ keyword is optional for the __source__ dependence type. For this example the same sequential ordering could have been achieved  with an __ordered__ clause without a parameter, on the loop directive,  and a single __ordered__ directive without the __doacross__ clause specified for the statement executing the  _bar_  function.

In [None]:
//%compiler: clang
//%cflags: -fopenmp

/*
* name: doacross.1
* type: C
* version: omp_5.2
*/

float foo(int i);
float bar(float a, float b);
float baz(float b);

void work( int N, float *A, float *B, float *C )
{
  int i;

  #pragma omp for ordered(1)
  for (i=1; i<N; i++)
  {
    A[i] = foo(i);

  #pragma omp ordered doacross(sink: i-1)
    B[i] = bar(A[i], B[i-1]);
  #pragma omp ordered doacross(source: omp_cur_iteration)

    C[i] = baz(B[i]);
  }
}

In [None]:
!!%compiler: gfortran
!!%cflags: -fopenmp

! name: doacross.1
! type: F-free
! version:    omp_5.2

subroutine work( N, A, B, C )
  integer :: N, i
  real, dimension(N) :: A, B, C
  real, external :: foo, bar, baz

  !$omp do ordered(1)
  do i=2, N
    A(i) = foo(i)

  !$omp ordered doacross(sink: i-1)
    B(i) = bar(A(i), B(i-1))
  !$omp ordered doacross(source: omp_cur_iteration)

    C(i) = baz(B(i))
  end do
end subroutine

The following code is similar to the previous example but with   _doacross loop nest_  extended to two nested loops,  _i_  and  _j_ ,  as specified by the __ordered(2)__ clause on the loop directive.  In the C/C++ code, the  _i_  and  _j_  loops are the first and second associated loops, respectively, whereas in the Fortran code, the  _j_  and  _i_  loops are the first and second associated loops, respectively. The __doacross(sink:i-1,j)__ and __doacross(sink:i,j-1)__ clauses in  the C/C++ code define cross-iteration dependences in two dimensions from  iterations ( _i-1, j_ ) and ( _i, j-1_ ) to iteration ( _i, j_ ).   Likewise, the __doacross(sink:j-1,i)__ and __doacross(sink:j,i-1)__ clauses  in the Fortran code define cross-iteration dependences from iterations  ( _j-1, i_ ) and ( _j, i-1_ ) to iteration ( _j, i_ ).

In [None]:
//%compiler: clang
//%cflags: -fopenmp

/*
* name: doacross.2
* type: C
* version: omp_5.2
*/

float foo(int i, int j);
float bar(float a, float b, float c);
float baz(float b);

void work( int N, int M, float **A, float **B, float **C )
{
  int i, j;

  #pragma omp for ordered(2)
  for (i=1; i<N; i++)
  {
    for (j=1; j<M; j++)
    {
      A[i][j] = foo(i, j);

  #pragma omp ordered doacross(sink: i-1,j) doacross(sink: i,j-1)
      B[i][j] = bar(A[i][j], B[i-1][j], B[i][j-1]);
  #pragma omp ordered doacross(source:)

      C[i][j] = baz(B[i][j]);
    }
  }
}

In [None]:
!!%compiler: gfortran
!!%cflags: -fopenmp

! name: doacross.2
! type: F-free
! version:    omp_5.2

subroutine work( N, M, A, B, C )
  integer :: N, M, i, j
  real, dimension(M,N) :: A, B, C
  real, external :: foo, bar, baz

  !$omp do ordered(2)
  do j=2, N
    do i=2, M
      A(i,j) = foo(i, j)

    !$omp ordered doacross(sink: j-1,i) doacross(sink: j,i-1)
      B(i,j) = bar(A(i,j), B(i-1,j), B(i,j-1))
    !$omp ordered doacross(source:)

      C(i,j) = baz(B(i,j))
    end do
  end do
end subroutine

The following example shows the incorrect use of the __ordered__  directive with a __doacross__ clause.  There are two issues with the code.   The first issue is a missing __ordered__ __doacross(source:)__ directive, which could cause a deadlock.   The second issue is the __doacross(sink:i+1,j)__ and __doacross(sink:i,j+1)__  clauses define dependences on lexicographically later  source iterations ( _i+1, j_ ) and ( _i, j+1_ ), which could cause  a deadlock as well since they may not start to execute until the current iteration completes.

In [None]:
//%compiler: clang
//%cflags: -fopenmp

/*
* name: doacross.3
* type: C
* version: omp_5.2
*/

#define N 100

void work_wrong(double p[][N][N])
{
  int i, j, k;

  #pragma omp parallel for ordered(2) private(i,j,k)
  for (i=1; i<N-1; i++)
  {
    for (j=1; j<N-1; j++)
    {
  #pragma omp ordered doacross(sink: i-1,j) doacross(sink: i+1,j) \
                      doacross(sink: i,j-1) doacross(sink: i,j+1)
      for (k=1; k<N-1; k++)
      {
        double tmp1 = p[i-1][j][k] + p[i+1][j][k];
        double tmp2 = p[i][j-1][k] + p[i][j+1][k];
        double tmp3 = p[i][j][k-1] + p[i][j][k+1];
        p[i][j][k] = (tmp1 + tmp2 + tmp3) / 6.0;
      }
/* missing #pragma omp ordered doacross(source:) */
    }
  }
}

In [None]:
!!%compiler: gfortran
!!%cflags: -fopenmp

! name: doacross.3
! type: F-free
! version:    omp_5.2

subroutine work_wrong(N, p)
  integer :: N
  real(8), dimension(N,N,N) :: p
  integer :: i, j, k
  real(8) :: tmp1, tmp2, tmp3

!$omp parallel do ordered(2) private(i,j,k,tmp1,tmp2,tmp3)
  do i=2, N-1
    do j=2, N-1
    !$omp ordered doacross(sink: i-1,j) doacross(sink: i+1,j) &
    !$omp&        doacross(sink: i,j-1) doacross(sink: i,j+1)
      do k=2, N-1
        tmp1 = p(k-1,j,i) + p(k+1,j,i)
        tmp2 = p(k,j-1,i) + p(k,j+1,i)
        tmp3 = p(k,j,i-1) + p(k,j,i+1)
        p(k,j,i) = (tmp1 + tmp2 + tmp3) / 6.0
      end do
! missing !$omp ordered doacross(source:)
    end do
  end do
end subroutine

The following example illustrates the use of the __collapse__ clause for a  _doacross loop nest_ .  The  _i_  and  _j_  loops are the associated loops for the collapsed loop as well as for the  _doacross loop nest_ . The example also shows a compliant usage of the dependence source directive placed before the corresponding sink directive. Checking the completion of computation from previous iterations at the sink point can occur after the source statement.

In [None]:
//%compiler: clang
//%cflags: -fopenmp

/*
* name: doacross.4
* type: C
* version: omp_5.2
*/

double foo(int i, int j);

void work( int N, int M, double **A, double **B, double **C )
{
  int i, j;
  double alpha = 1.2;

  #pragma omp for collapse(2) ordered(2)
  for (i = 1; i < N-1; i++)
  {
    for (j = 1; j < M-1; j++)
    {
      A[i][j] = foo(i, j);
  #pragma omp ordered doacross(source:)

      B[i][j] = alpha * A[i][j];

  #pragma omp ordered doacross(sink: i-1,j) doacross(sink: i,j-1)
      C[i][j] = 0.2 * (A[i-1][j] + A[i+1][j] +
                A[i][j-1] + A[i][j+1] + A[i][j]);
    }
  }
}

In [None]:
!!%compiler: gfortran
!!%cflags: -fopenmp

! name: doacross.4
! type: F-free
! version:    omp_5.2

subroutine work( N, M, A, B, C )
  integer :: N, M
  real(8), dimension(M, N) :: A, B, C
  real(8), external :: foo
  integer :: i, j
  real(8) :: alpha = 1.2

  !$omp do collapse(2) ordered(2)
  do j=2, N-1
    do i=2, M-1
      A(i,j) = foo(i, j)
    !$omp ordered doacross(source:)

      B(i,j) = alpha * A(i,j)

    !$omp ordered doacross(sink: j,i-1) doacross(sink: j-1,i)
      C(i,j) = 0.2 * (A(i-1,j) + A(i+1,j) +  &
               A(i,j-1) + A(i,j+1) + A(i,j))
    end do
  end do
end subroutine