collapse Clause
3.8. collapse Clause#
In the following example, the k and j loops are associated with the loop construct. So the iterations of the k and j loops are collapsed into one loop with a larger iteration space, and that loop is then divided among the threads in the current team. Since the i loop is not associated with the loop construct, it is not collapsed, and the i loop is executed sequentially in its entirety in every iteration of the collapsed k and j loop.
The variable j can be omitted from the private clause when the collapse clause is used since it is implicitly private. However, if the collapse clause is omitted then j will be shared if it is omitted from the private clause. In either case, k is implicitly private and could be omitted from the private clause.
//%compiler: clang
//%cflags: -fopenmp
/*
* name: collapse.1
* type: C
* version: omp_3.0
*/
void bar(float *a, int i, int j, int k);
int kl, ku, ks, jl, ju, js, il, iu,is;
void sub(float *a)
{
int i, j, k;
#pragma omp for collapse(2) private(i, k, j)
for (k=kl; k<=ku; k+=ks)
for (j=jl; j<=ju; j+=js)
for (i=il; i<=iu; i+=is)
bar(a,i,j,k);
}
!!%compiler: gfortran
!!%cflags: -fopenmp
! name: collapse.1
! type: F-fixed
! version: omp_3.0
subroutine sub(a)
real a(*)
integer kl, ku, ks, jl, ju, js, il, iu, is
common /csub/ kl, ku, ks, jl, ju, js, il, iu, is
integer i, j, k
!$omp do collapse(2) private(i,j,k)
do k = kl, ku, ks
do j = jl, ju, js
do i = il, iu, is
call bar(a,i,j,k)
enddo
enddo
enddo
!$omp end do
end subroutine
In the next example, the k and j loops are associated with the loop construct. So the iterations of the k and j loops are collapsed into one loop with a larger iteration space, and that loop is then divided among the threads in the current team.
The sequential execution of the iterations in the k and j loops determines the order of the iterations in the collapsed iteration space. This implies that in the sequentially last iteration of the collapsed iteration space, k will have the value 2 and j will have the value 3. Since klast and jlast are lastprivate, their values are assigned by the sequentially last iteration of the collapsed k and j loop. This example prints: 2 3.
//%compiler: clang
//%cflags: -fopenmp
/*
* name: collapse.2
* type: C
* version: omp_3.0
*/
#include <stdio.h>
void test()
{
int j, k, jlast, klast;
#pragma omp parallel
{
#pragma omp for collapse(2) lastprivate(jlast, klast)
for (k=1; k<=2; k++)
for (j=1; j<=3; j++)
{
jlast=j;
klast=k;
}
#pragma omp single
printf("%d %d\n", klast, jlast);
}
}
!!%compiler: gfortran
!!%cflags: -fopenmp
! name: collapse.2
! type: F-fixed
! version: omp_3.0
program test
!$omp parallel
!$omp do private(j,k) collapse(2) lastprivate(jlast, klast)
do k = 1,2
do j = 1,3
jlast=j
klast=k
enddo
enddo
!$omp end do
!$omp single
print *, klast, jlast
!$omp end single
!$omp end parallel
end program test
The next example illustrates the interaction of the collapse and ordered clauses.
In the example, the loop construct has both a collapse clause and an ordered clause. The collapse clause causes the iterations of the k and j loops to be collapsed into one loop with a larger iteration space, and that loop is divided among the threads in the current team. An ordered clause is added to the loop construct because an ordered region binds to the loop region arising from the loop construct.
According to Section 2.12.8 of the OpenMP 4.0 specification, a thread must not execute more than one ordered region that binds to the same loop region. So the collapse clause is required for the example to be conforming. With the collapse clause, the iterations of the k and j loops are collapsed into one loop, and therefore only one ordered region will bind to the collapsed k and j loop. Without the collapse clause, there would be two ordered regions that bind to each iteration of the k loop (one arising from the first iteration of the j loop, and the other arising from the second iteration of the j loop).
The code prints
0 1 1
0 1 2
0 2 1
1 2 2
1 3 1
1 3 2
//%compiler: clang
//%cflags: -fopenmp
/*
* name: collapse.3
* type: C
* version: omp_3.0
*/
#include <omp.h>
#include <stdio.h>
void work(int a, int j, int k);
void sub()
{
int j, k, a;
#pragma omp parallel num_threads(2)
{
#pragma omp for collapse(2) ordered private(j,k) schedule(static,3)
for (k=1; k<=3; k++)
for (j=1; j<=2; j++)
{
#pragma omp ordered
printf("%d %d %d\n", omp_get_thread_num(), k, j);
/* end ordered */
work(a,j,k);
}
}
}
!!%compiler: gfortran
!!%cflags: -fopenmp
! name: collapse.3
! type: F-fixed
! version: omp_3.0
program test
include 'omp_lib.h'
!$omp parallel num_threads(2)
!$omp do collapse(2) ordered private(j,k) schedule(static,3)
do k = 1,3
do j = 1,2
!$omp ordered
print *, omp_get_thread_num(), k, j
!$omp end ordered
call work(a,j,k)
enddo
enddo
!$omp end do
!$omp end parallel
end program test
The following example illustrates the collapse of a non-rectangular loop nest, a new feature in OpenMP 5.0. In a loop nest, a non-rectangular loop has a loop bound that references the iteration variable of an enclosing loop.
The motivation for this feature is illustrated in the example below that creates a symmetric correlation matrix for a set of variables. Note that the initial value of the second loop depends on the index variable of the first loop for the loops to be collapsed. Here the data are represented by a 2D array, each row corresponds to a variable and each column corresponds to a sample of the variable - the last two columns are the sample mean and standard deviation (for Fortran, rows and columns are swapped).
//%compiler: clang
//%cflags: -fopenmp
/*
* name: collapse.4
* type: C
* version: omp_5.0
*/
#include <stdio.h>
#define N 20
#define M 10
// routine to calculate a
// For variable a[i]:
// a[i][0],...,a[i][n-1] contains the n samples
// a[i][n] contains the sample mean
// a[i][n+1] contains the standard deviation
extern void calc_a(int n,int m, float a[][N+2]);
int main(){
float a[M][N+2], b[M][M];
calc_a(N,M,a);
#pragma omp parallel for collapse(2)
for (int i = 0; i < M; i++)
for (int j = i; j < M; j++)
{
float temp = 0.0f;
for (int k = 0; k < N; k++)
temp += (a[i][k]-a[i][N])*(a[j][k]-a[j][N]);
b[i][j] = temp / (a[i][N+1] * a[j][N+1] * (N - 1));
b[j][i] = b[i][j];
}
printf("b[0][0] = %f, b[M-1][M-1] = %f\n", b[0][0], b[M-1][M-1]);
return 0;
}
!!%compiler: gfortran
!!%cflags: -fopenmp
! name: collapse.4
! type: F-free
! version: omp_5.0
module calc_m
interface
subroutine calc_a(n, m, a)
integer n, m
real a(n+2,m)
! routine to calculate a
! For variable a(*,j):
! a(1,j),...,a(n,j) contains the n samples
! a(n+1,j) contains the sample mean
! a(n+2,j) contains the standard deviation
end subroutine
end interface
end module
program main
use calc_m
integer, parameter :: N=20, M=10
real a(N+2,M), b(M,M)
real temp
integer i, j, k
call calc_a(N,M,a)
!$omp parallel do collapse(2) private(k,temp)
do i = 1, M
do j = i, M
temp = 0.0
do k = 1, N
temp = temp + (a(k,i)-a(N+1,i))*(a(k,j)-a(N+1,j))
end do
b(i,j) = temp / (a(N+2,i) * a(N+2,j) * (N - 1))
b(j,i) = b(i,j)
end do
end do
print *,"b(1,1) = ",b(1,1),", b(M,M) = ",b(M,M)
end program