3.8. collapse Clause#

In the following example, the k and j loops are associated with the loop construct. So the iterations of the k and j loops are collapsed into one loop with a larger iteration space, and that loop is then divided among the threads in the current team. Since the i loop is not associated with the loop construct, it is not collapsed, and the i loop is executed sequentially in its entirety in every iteration of the collapsed k and j loop.

The variable j can be omitted from the private clause when the collapse clause is used since it is implicitly private. However, if the collapse clause is omitted then j will be shared if it is omitted from the private clause. In either case, k is implicitly private and could be omitted from the private clause.

//%compiler: clang
//%cflags: -fopenmp

/*
* name: collapse.1
* type: C
* version: omp_3.0
*/

void bar(float *a, int i, int j, int k);

int kl, ku, ks, jl, ju, js, il, iu,is;

void sub(float *a)
{
    int i, j, k;

    #pragma omp for collapse(2) private(i, k, j)
    for (k=kl; k<=ku; k+=ks)
       for (j=jl; j<=ju; j+=js)
          for (i=il; i<=iu; i+=is)
             bar(a,i,j,k);
}

!!%compiler: gfortran
!!%cflags: -fopenmp

! name: collapse.1
! type: F-fixed
! version: omp_3.0

      subroutine sub(a)

      real a(*)
      integer kl, ku, ks, jl, ju, js, il, iu, is
      common /csub/ kl, ku, ks, jl, ju, js, il, iu, is
      integer i, j, k

!$omp do collapse(2) private(i,j,k)
       do k = kl, ku, ks
         do j = jl, ju, js
           do i = il, iu, is
             call bar(a,i,j,k)
          enddo
        enddo
      enddo
!$omp end do

      end subroutine

In the next example, the k and j loops are associated with the loop construct. So the iterations of the k and j loops are collapsed into one loop with a larger iteration space, and that loop is then divided among the threads in the current team.

The sequential execution of the iterations in the k and j loops determines the order of the iterations in the collapsed iteration space. This implies that in the sequentially last iteration of the collapsed iteration space, k will have the value 2 and j will have the value 3. Since klast and jlast are lastprivate, their values are assigned by the sequentially last iteration of the collapsed k and j loop. This example prints: 2 3.

//%compiler: clang
//%cflags: -fopenmp

/*
* name: collapse.2
* type: C
* version: omp_3.0
*/

#include <stdio.h>
void test()
{
   int j, k, jlast, klast;
   #pragma omp parallel
   {
      #pragma omp for collapse(2) lastprivate(jlast, klast)
      for (k=1; k<=2; k++)
         for (j=1; j<=3; j++)
         {
            jlast=j;
            klast=k;
         }
      #pragma omp single
      printf("%d %d\n", klast, jlast);
   }
}

!!%compiler: gfortran
!!%cflags: -fopenmp

! name: collapse.2
! type: F-fixed
! version: omp_3.0

      program test
!$omp parallel
!$omp do private(j,k) collapse(2) lastprivate(jlast, klast)
      do k = 1,2
        do j = 1,3
          jlast=j
          klast=k
        enddo
      enddo
!$omp end do
!$omp single
      print *, klast, jlast
!$omp end single
!$omp end parallel
      end program test

The next example illustrates the interaction of the collapse and ordered clauses.

In the example, the loop construct has both a collapse clause and an ordered clause. The collapse clause causes the iterations of the k and j loops to be collapsed into one loop with a larger iteration space, and that loop is divided among the threads in the current team. An ordered clause is added to the loop construct because an ordered region binds to the loop region arising from the loop construct.

According to Section 2.12.8 of the OpenMP 4.0 specification, a thread must not execute more than one ordered region that binds to the same loop region. So the collapse clause is required for the example to be conforming. With the collapse clause, the iterations of the k and j loops are collapsed into one loop, and therefore only one ordered region will bind to the collapsed k and j loop. Without the collapse clause, there would be two ordered regions that bind to each iteration of the k loop (one arising from the first iteration of the j loop, and the other arising from the second iteration of the j loop).

The code prints

0 1 1
0 1 2
0 2 1
1 2 2
1 3 1
1 3 2

//%compiler: clang
//%cflags: -fopenmp

/*
* name: collapse.3
* type: C
* version: omp_3.0
*/
#include <omp.h>
#include <stdio.h>
void work(int a, int j, int k);
void sub()
{
   int j, k, a;
   #pragma omp parallel num_threads(2)
   {
      #pragma omp for collapse(2) ordered private(j,k) schedule(static,3)
      for (k=1; k<=3; k++)
         for (j=1; j<=2; j++)
         {
            #pragma omp ordered
            printf("%d %d %d\n", omp_get_thread_num(), k, j);
            /* end ordered */
            work(a,j,k);
         }
   }
}
!!%compiler: gfortran
!!%cflags: -fopenmp

! name: collapse.3
! type: F-fixed
! version: omp_3.0
      program test
      include 'omp_lib.h'
!$omp parallel num_threads(2)
!$omp do collapse(2) ordered private(j,k) schedule(static,3)
      do k = 1,3
        do j = 1,2
!$omp ordered
          print *, omp_get_thread_num(), k, j
!$omp end ordered
          call work(a,j,k)
        enddo
      enddo
!$omp end do
!$omp end parallel
      end program test

The following example illustrates the collapse of a non-rectangular loop nest, a new feature in OpenMP 5.0. In a loop nest, a non-rectangular loop has a loop bound that references the iteration variable of an enclosing loop.

The motivation for this feature is illustrated in the example below that creates a symmetric correlation matrix for a set of variables. Note that the initial value of the second loop depends on the index variable of the first loop for the loops to be collapsed. Here the data are represented by a 2D array, each row corresponds to a variable and each column corresponds to a sample of the variable - the last two columns are the sample mean and standard deviation (for Fortran, rows and columns are swapped).

//%compiler: clang
//%cflags: -fopenmp

/*
* name: collapse.4
* type: C
* version: omp_5.0
*/
#include <stdio.h>
#define N 20
#define M 10

// routine to calculate a
// For variable a[i]:
// a[i][0],...,a[i][n-1]   contains the n samples
// a[i][n]                 contains the sample mean
// a[i][n+1]               contains the standard deviation
extern void calc_a(int n,int m, float a[][N+2]);

int main(){
  float a[M][N+2], b[M][M];

  calc_a(N,M,a);

  #pragma omp parallel for collapse(2)
  for (int i = 0; i < M; i++)
     for (int j = i; j < M; j++)
     {
        float temp = 0.0f;
        for (int k = 0; k < N; k++)
           temp += (a[i][k]-a[i][N])*(a[j][k]-a[j][N]);

        b[i][j] = temp / (a[i][N+1] * a[j][N+1] * (N - 1));
        b[j][i] = b[i][j];
     }

  printf("b[0][0] = %f, b[M-1][M-1] = %f\n", b[0][0], b[M-1][M-1]);

  return 0;
}
!!%compiler: gfortran
!!%cflags: -fopenmp

! name: collapse.4
! type: F-free
! version: omp_5.0
module calc_m
  interface
  subroutine calc_a(n, m, a)
  integer n, m
  real a(n+2,m)
  ! routine to calculate a
  ! For variable a(*,j):
  ! a(1,j),...,a(n,j)  contains the n samples
  ! a(n+1,j)           contains the sample mean
  ! a(n+2,j)           contains the standard deviation
  end subroutine
  end interface
end module

program main
  use calc_m
  integer, parameter :: N=20, M=10
  real a(N+2,M), b(M,M)
  real temp
  integer i, j, k

  call calc_a(N,M,a)

  !$omp parallel do collapse(2) private(k,temp)
  do i = 1, M
     do j = i, M
        temp = 0.0
        do k = 1, N
           temp = temp + (a(k,i)-a(N+1,i))*(a(k,j)-a(N+1,j))
        end do

        b(i,j) = temp / (a(N+2,i) * a(N+2,j) * (N - 1))
        b(j,i) = b(i,j)
     end do
  end do

  print *,"b(1,1) = ",b(1,1),", b(M,M) = ",b(M,M)

end program