Affinity Display
4.3. Affinity Display#
The following examples illustrate ways to display thread affinity. Automatic display of affinity can be invoked by setting the OMP_DISPLAY_AFFINITY environment variable to TRUE. The format of the output can be customized by setting the OMP_AFFINITY_FORMAT environment variable to an appropriate string. Also, there are API calls for the user to display thread affinity at selected locations within code.
For the first example the environment variable OMP_DISPLAY_AFFINITY has been set to TRUE, and execution occurs on an 8-core system with OMP_NUM_THREADS set to 8.
The affinity for the primary thread is reported through a call to the API omp_display_affinity() routine. For default affinity settings the report shows that the primary thread can execute on any of the cores. In the following parallel region the affinity for each of the team threads is reported automatically since the OMP_DISPLAY_AFFINITY environment variable has been set to TRUE.
These two reports are often useful (as in hybrid codes using both MPI and OpenMP) to observe the affinity (for an MPI task) before the parallel region, and during an OpenMP parallel region. Note: the next parallel region uses the same number of threads as in the previous parallel region and affinities are not changed, so affinity is NOT reported.
In the last parallel region, the thread affinities are reported because the thread affinity has changed.
//%compiler: clang
//%cflags: -fopenmp
/*
* name: affinity_display.1
* type: C
* version: omp_5.0
*/
#include <stdio.h>
#include <omp.h>
int main(void){ //MAX threads = 8, single socket system
//API call-- Displays Affinity of Primary Thread
omp_display_affinity(NULL);
// API CALL OUTPUT (default format):
// team_num= 0, nesting_level= 0, thread_num= 0,
// thread_affinity= 0,1,2,3,4,5,6,7
// OMP_DISPLAY_AFFINITY=TRUE, OMP_NUM_THREADS=8
#pragma omp parallel num_threads(omp_get_num_procs())
{
if(omp_get_thread_num()==0)
printf("1st Parallel Region -- Affinity Reported \n");
// DISPLAY OUTPUT (default format) has been sorted:
// team_num= 0, nesting_level= 1, thread_num= 0, thread_affinity= 0
// team_num= 0, nesting_level= 1, thread_num= 1, thread_affinity= 1
// ...
// team_num= 0, nesting_level= 1, thread_num= 7, thread_affinity= 7
// doing work here
}
#pragma omp parallel num_threads( omp_get_num_procs() )
{
if(omp_get_thread_num()==0)
printf("%s%s\n","Same Affinity as in Previous Parallel Region",
" -- no Affinity Reported\n");
// NO AFFINITY OUTPUT:
//(output in 1st parallel region only for OMP_DISPLAY_AFFINITY=TRUE)
// doing more work here
}
// Report Affinity for 1/2 number of threads
#pragma omp parallel num_threads( omp_get_num_procs()/2 )
{
if(omp_get_thread_num()==0)
printf("Report Affinity for using 1/2 of max threads.\n");
// DISPLAY OUTPUT (default format) has been sorted:
// team_num= 0, nesting_level= 1, thread_num= 0, thread_affinity= 0,1
// team_num= 0, nesting_level= 1, thread_num= 1, thread_affinity= 2,3
// team_num= 0, nesting_level= 1, thread_num= 2, thread_affinity= 4,5
// team_num= 0, nesting_level= 1, thread_num= 3, thread_affinity= 6,7
// do work
}
return 0;
}
!!%compiler: gfortran
!!%cflags: -fopenmp
! name: affinity_display.1
! type: F-free
! version: omp_5.0
program affinity_display ! MAX threads = 8, single socket system
use omp_lib
implicit none
character(len=0) :: null
! API call - Displays Affinity of Primary Thread
call omp_display_affinity(null)
! API CALL OUTPUT (default format):
! team_num= 0, nesting_level= 0, thread_num= 0, &
! thread_affinity= 0,1,2,3,4,5,6,7
! OMP_DISPLAY_AFFINITY=TRUE, OMP_NUM_THREADS=8
!$omp parallel num_threads(omp_get_num_procs())
if(omp_get_thread_num()==0) then
print*, "1st Parallel Region -- Affinity Reported"
endif
! DISPLAY OUTPUT (default format) has been sorted:
! team_num= 0, nesting_level= 1, thread_num= 0, thread_affinity= 0
! team_num= 0, nesting_level= 1, thread_num= 1, thread_affinity= 1
! ...
! team_num= 0, nesting_level= 1, thread_num= 7, thread_affinity= 7
! doing work here
!$omp end parallel
!$omp parallel num_threads( omp_get_num_procs() )
if(omp_get_thread_num()==0) then
print*, "Same Affinity in Parallel Region -- no Affinity Reported"
endif
! NO AFFINITY OUTPUT:
! (output in 1st parallel region only for
! OMP_DISPLAY_AFFINITY=TRUE)
! doing more work here
!$omp end parallel
! Report Affinity for 1/2 number of threads
!$omp parallel num_threads( omp_get_num_procs()/2 )
if(omp_get_thread_num()==0) then
print*, "Altered Affinity in Parallel Region -- Affinity Reported"
endif
! DISPLAY OUTPUT (default format) has been sorted:
! team_num= 0, nesting_level= 1, thread_num= 0, &
! thread_affinity= 0,1
! team_num= 0, nesting_level= 1, thread_num= 1, &
! thread_affinity= 2,3
! team_num= 0, nesting_level= 1, thread_num= 2, &
! thread_affinity= 4,5
! team_num= 0, nesting_level= 1, thread_num= 3, &
! thread_affinity= 6,7
! do work
!$omp end parallel
end program
In the following example 2 threads are forked, and each executes on a socket. Next, a nested parallel region runs half of the available threads on each socket.
These OpenMP environment variables have been set:
OMP_PROC_BIND=”TRUE”
OMP_NUM_THREADS=”2,4”
OMP_PLACES=”{0,2,4,6},{1,3,5,7}”
OMP_AFFINITY_FORMAT=”nest_level= L, parent_thrd_num= a, thrd_num= n, thrd_affinity= A”
where the numbers correspond to core ids for the system. Note, OMP_DISPLAY_AFFINITY is not set and is FALSE by default. This example shows how to use API routines to perform affinity display operations.
For each of the two first-level threads the OMP_PLACES variable specifies a place with all the core-ids of the socket ({0,2,4,6} for one thread and {1,3,5,7} for the other). (As is sometimes the case in 2-socket systems, one socket may consist of the even id numbers, while the other may have the odd id numbers.) The affinities are printed according to the OMP_AFFINITY_FORMAT format: providing the parallel nesting level (%L), the ancestor thread number (%a), the thread number (%n) and the thread affinity (%A). In the nested parallel region within the socket_work routine the affinities for the threads on each socket are printed according to this format.
//%compiler: clang
//%cflags: -fopenmp
/*
* name: affinity_display.2
* type: C
* version: omp_5.0
*/
#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
void socket_work(int socket_num, int n_thrds);
int main(void)
{
int n_sockets, socket_num, n_thrds_on_socket;
omp_set_nested(1); // or env var= OMP_NESTED=true
omp_set_max_active_levels(2); // or env var= OMP_MAX_ACTIVE_LEVELS=2
n_sockets = omp_get_num_places();
n_thrds_on_socket = omp_get_place_num_procs(0);
// OMP_NUM_THREADS=2,4
// OMP_PLACES="{0,2,4,6},{1,3,5,7}" #2 sockets; even/odd proc-ids
// OMP_AFFINITY_FORMAT=\
// "nest_level= %L, parent_thrd_num= %a, thrd_num= %n, thrd_affinity= %A"
#pragma omp parallel num_threads(n_sockets) private(socket_num)
{
socket_num = omp_get_place_num();
if(socket_num==0)
printf(" LEVEL 1 AFFINITIES 1 thread/socket, %d sockets:\n\n",
n_sockets);
// not needed if OMP_DISPLAY_AFFINITY=TRUE
omp_display_affinity(NULL);
// OUTPUT:
// LEVEL 1 AFFINITIES 1 thread/socket, 2 sockets:
// nest_level= 1, parent_thrd_num= 0, thrd_num= 0, thrd_affinity= 0,2,4,6
// nest_level= 1, parent_thrd_num= 0, thrd_num= 1, thrd_affinity= 1,3,5,7
socket_work(socket_num, n_thrds_on_socket);
}
return 0;
}
void socket_work(int socket_num, int n_thrds)
{
#pragma omp parallel num_threads(n_thrds)
{
if(omp_get_thread_num()==0)
printf(" LEVEL 2 AFFINITIES, %d threads on socket %d\n",
n_thrds, socket_num);
// not needed if OMP_DISPLAY_AFFINITY=TRUE
omp_display_affinity(NULL);
// OUTPUT:
// LEVEL 2 AFFINITIES, 4 threads on socket 0
// nest_level= 2, parent_thrd_num= 0, thrd_num= 0, thrd_affinity= 0
// nest_level= 2, parent_thrd_num= 0, thrd_num= 1, thrd_affinity= 2
// nest_level= 2, parent_thrd_num= 0, thrd_num= 2, thrd_affinity= 4
// nest_level= 2, parent_thrd_num= 0, thrd_num= 3, thrd_affinity= 6
// LEVEL 2 AFFINITIES, 4 threads on socket 1
// nest_level= 2, parent_thrd_num= 1, thrd_num= 0, thrd_affinity= 1
// nest_level= 2, parent_thrd_num= 1, thrd_num= 1, thrd_affinity= 3
// nest_level= 2, parent_thrd_num= 1, thrd_num= 2, thrd_affinity= 5
// nest_level= 2, parent_thrd_num= 1, thrd_num= 3, thrd_affinity= 7
// ... Do Some work on Socket
}
}
!!%compiler: gfortran
!!%cflags: -fopenmp
! name: affinity_display.2
! type: F-free
! version: omp_5.0
program affinity_display
use omp_lib
implicit none
character(len=0) :: null
integer :: n_sockets, socket_num, n_thrds_on_socket;
call omp_set_nested(.true.) ! or env var= OMP_NESTED=true
call omp_set_max_active_levels(2) ! or env var= OMP_MAX_ACTIVE_LEVELS=2
n_sockets = omp_get_num_places()
n_thrds_on_socket = omp_get_place_num_procs(0)
! OMP_NUM_THREADS=2,4
! OMP_PLACES="{0,2,4,6},{1,3,5,7}" #2 sockets; even/odd proc-ids
! OMP_AFFINITY_FORMAT=\
!"nest_level= %L, parent_thrd_num= %a, thrd_num= %n, thrd_affinity= %A"
!$omp parallel num_threads(n_sockets) private(socket_num)
socket_num = omp_get_place_num()
if(socket_num==0) then
write(*,'("LEVEL 1 AFFINITIES 1 thread/socket ",i0," sockets")') &
n_sockets
endif
call omp_display_affinity(null) ! not needed
! if OMP_DISPLAY_AFFINITY=TRUE
! OUTPUT:
! LEVEL 1 AFFINITIES 1 thread/socket, 2 sockets:
! nest_level= 1, parent_thrd_num= 0, thrd_num= 0, &
! thrd_affinity= 0,2,4,6
! nest_level= 1, parent_thrd_num= 0, thrd_num= 1, &
! thrd_affinity= 1,3,5,7
call socket_work(socket_num, n_thrds_on_socket)
!$omp end parallel
end program
subroutine socket_work(socket_num, n_thrds)
use omp_lib
implicit none
integer :: socket_num, n_thrds
character(len=0) :: null
!$omp parallel num_threads(n_thrds)
if(omp_get_thread_num()==0) then
write(*,'("LEVEL 2 AFFINITIES, ",i0," threads on socket ",i0)') &
n_thrds,socket_num
endif
call omp_display_affinity(null) ! not needed
! if OMP_DISPLAY_AFFINITY=TRUE
! OUTPUT:
! LEVEL 2 AFFINITIES, 4 threads on socket 0
! nest_level= 2, parent_thrd_num= 0, thrd_num= 0, thrd_affinity= 0
! nest_level= 2, parent_thrd_num= 0, thrd_num= 1, thrd_affinity= 2
! nest_level= 2, parent_thrd_num= 0, thrd_num= 2, thrd_affinity= 4
! nest_level= 2, parent_thrd_num= 0, thrd_num= 3, thrd_affinity= 6
! LEVEL 2 AFFINITIES, 4 thrds on socket 1
! nest_level= 2, parent_thrd_num= 1, thrd_num= 0, thrd_affinity= 1
! nest_level= 2, parent_thrd_num= 1, thrd_num= 1, thrd_affinity= 3
! nest_level= 2, parent_thrd_num= 1, thrd_num= 2, thrd_affinity= 5
! nest_level= 2, parent_thrd_num= 1, thrd_num= 3, thrd_affinity= 7
! ... Do Some work on Socket
!$omp end parallel
end subroutine
The next example illustrates more details about affinity formatting. First, the omp_get_affinity_format() API routine is used to obtain the default format. The code checks to make sure the storage provides enough space to hold the format. Next, the omp_set_affinity_format() API routine sets a user-defined format: host=%20H thrd_num=%0.4n binds_to=%A .
The host, thread number and affinity fields are specified by %20H , %0.4n and %A : H , n and A are single character “short names” for the host, thread_num and thread_affinity data to be printed, with format sizes of 20 , 4 , and “size as needed”. The period (.) indicates that the field is displayed right-justified (default is left-justified) and the “0” indicates that any unused space is to be prefixed with zeros (e.g. instead of “1”, “0001” is displayed for the field size of 4).
Within the parallel region the affinity for each thread is captured by omp_capture_affinity() into a buffer array with elements indexed by the thread number ( thrd_num ). After the parallel region, the thread affinities are printed in thread-number order.
If the storage area in buffer is inadequate for holding the affinity data, the stored affinity data is truncated. The maximum value for the number of characters ( nchars ) returned by omp_capture_affinity is captured by the reduction(max:max_req_store) clause and the if(nchars >= max_req_store) max_req_store=nchars statement. It is used to report possible truncation (if max_req_store > buffer_store ).
//%compiler: clang
//%cflags: -fopenmp
/*
* name: affinity_display.3
* type: C
* version: omp_5.0
*/
#include <stdio.h>
#include <stdlib.h> // also null is in <stddef.h>
#include <stddef.h>
#include <string.h>
#include <omp.h>
#define FORMAT_STORE 80
#define BUFFER_STORE 80
int main(void){
int i, n, thrd_num, max_req_store;
size_t nchars;
char default_format[FORMAT_STORE];
char my_format[] = "host=%20H thrd_num=%0.4n binds_to=%A";
char **buffer;
// CODE SEGMENT 1 AFFINITY FORMAT
// Get and Display Default Affinity Format
nchars = omp_get_affinity_format(default_format,(size_t)FORMAT_STORE);
printf("Default Affinity Format is: %s\n",default_format);
if(nchars >= FORMAT_STORE){
printf("Caution: Reported Format is truncated. Increase\n");
printf(" FORMAT_STORE to %d.\n", nchars+1);
}
// Set Affinity Format
omp_set_affinity_format(my_format);
printf("Affinity Format set to: %s\n",my_format);
// CODE SEGMENT 2 CAPTURE AFFINITY
// Set up buffer for affinity of n threads
n = omp_get_num_procs();
buffer = (char **)malloc( sizeof(char *) * n );
for(i=0;i<n;i++){
buffer[i]=(char *)malloc( sizeof(char) * BUFFER_STORE);
}
// Capture Affinity using Affinity Format set above.
// Use max reduction to check size of buffer areas
max_req_store = 0;
#pragma omp parallel private(thrd_num,nchars) \
reduction(max:max_req_store)
{
//safety: don't exceed # of buffers
if(omp_get_num_threads()>n) exit(1);
thrd_num=omp_get_thread_num();
nchars=omp_capture_affinity(buffer[thrd_num],
(size_t)BUFFER_STORE,NULL);
if(nchars > max_req_store) max_req_store=nchars;
// ...
}
for(i=0;i<n;i++){
printf("thrd_num= %d, affinity: %s\n", i,buffer[i]);
}
// For 4 threads with OMP_PLACES='{0,1},{2,3},{4,5},{6,7}'
// Format host=%20H thrd_num=%0.4n binds_to=%A
// affinity: host=hpc.cn567 thrd_num=0000 binds_to=0,1
// affinity: host=hpc.cn567 thrd_num=0001 binds_to=2,3
// affinity: host=hpc.cn567 thrd_num=0002 binds_to=4,5
// affinity: host=hpc.cn567 thrd_num=0003 binds_to=6,7
if(max_req_store>=BUFFER_STORE){
printf("Caution: Affinity string truncated. Increase\n");
printf(" BUFFER_STORE to %d\n",max_req_store+1);
}
for(i=0;i<n;i++) free(buffer[i]);
free (buffer);
return 0;
}
!!%compiler: gfortran
!!%cflags: -fopenmp
! name: affinity_display.3
! type: F-free
! version: omp_5.0
program affinity_display
use omp_lib
implicit none
integer, parameter :: FORMAT_STORE=80
integer, parameter :: BUFFER_STORE=80
integer :: i, n, thrd_num, nchars, max_req_store
character(FORMAT_STORE) :: default_format
character(*), parameter :: my_format = &
"host=%20H thrd_num=%0.4n binds_to=%A"
character(:), allocatable :: buffer(:)
character(len=0) :: null
! CODE SEGMENT 1 AFFINITY FORMAT
! Get and Display Default Affinity Format
nchars = omp_get_affinity_format(default_format)
print*,"Default Affinity Format: ", trim(default_format)
if( nchars > FORMAT_STORE) then
print*,"Caution: Reported Format is truncated. Increase"
print*," FORMAT_STORE to ", nchars
endif
! Set Affinity Format
call omp_set_affinity_format(my_format)
print*,"Affinity Format set to: ", my_format
! CODE SEGMENT 2 CAPTURE AFFINITY
! Set up buffer for affinity of n threads
n = omp_get_num_procs()
allocate( character(len=BUFFER_STORE)::buffer(0:n-1) )
! Capture Affinity using Affinity Format set above.
! Use max reduction to check size of buffer areas
max_req_store = 0
!$omp parallel private(thrd_num,nchars) reduction(max:max_req_store)
if(omp_get_num_threads()>n) stop "ERROR: increase buffer lines"
thrd_num=omp_get_thread_num()
nchars=omp_capture_affinity(buffer(thrd_num),null)
if(nchars>max_req_store) max_req_store=nchars
! ...
!$omp end parallel
do i = 0, n-1
print*, "thrd_num= ",i," affinity:", trim(buffer(i))
end do
! For 4 threads with OMP_PLACES='{0,1},{2,3},{4,5},{6,7}'
! Format: host=%20H thrd_num=%0.4n binds_to=%A
! affinity: host=hpc.cn567 thrd_num=0000 binds_to=0,1
! affinity: host=hpc.cn567 thrd_num=0001 binds_to=2,3
! affinity: host=hpc.cn567 thrd_num=0002 binds_to=4,5
! affinity: host=hpc.cn567 thrd_num=0003 binds_to=6,7
if(max_req_store > BUFFER_STORE) then
print*, "Caution: Affinity string truncated. Increase"
print*, " BUFFER_STORE to ",max_req_store
endif
deallocate(buffer)
end program