4.3. Affinity Display#

The following examples illustrate ways to display thread affinity. Automatic display of affinity can be invoked by setting the OMP_DISPLAY_AFFINITY environment variable to TRUE. The format of the output can be customized by setting the OMP_AFFINITY_FORMAT environment variable to an appropriate string. Also, there are API calls for the user to display thread affinity at selected locations within code.

For the first example the environment variable OMP_DISPLAY_AFFINITY has been set to TRUE, and execution occurs on an 8-core system with OMP_NUM_THREADS set to 8.

The affinity for the primary thread is reported through a call to the API omp_display_affinity() routine. For default affinity settings the report shows that the primary thread can execute on any of the cores. In the following parallel region the affinity for each of the team threads is reported automatically since the OMP_DISPLAY_AFFINITY environment variable has been set to TRUE.

These two reports are often useful (as in hybrid codes using both MPI and OpenMP) to observe the affinity (for an MPI task) before the parallel region, and during an OpenMP parallel region. Note: the next parallel region uses the same number of threads as in the previous parallel region and affinities are not changed, so affinity is NOT reported.

In the last parallel region, the thread affinities are reported because the thread affinity has changed.

//%compiler: clang
//%cflags: -fopenmp

/*
* name: affinity_display.1
* type: C
* version: omp_5.0
*/
#include <stdio.h>
#include <omp.h>

int main(void){                  //MAX threads = 8, single socket system

   //API call-- Displays Affinity of Primary Thread
   omp_display_affinity(NULL);

   // API CALL OUTPUT (default format):
   // team_num= 0, nesting_level= 0, thread_num= 0,
   // thread_affinity= 0,1,2,3,4,5,6,7

   // OMP_DISPLAY_AFFINITY=TRUE, OMP_NUM_THREADS=8
   #pragma omp parallel num_threads(omp_get_num_procs())
   {
     if(omp_get_thread_num()==0)
        printf("1st Parallel Region -- Affinity Reported \n");

   // DISPLAY OUTPUT (default format) has been sorted:
   // team_num= 0, nesting_level= 1, thread_num= 0, thread_affinity= 0
   // team_num= 0, nesting_level= 1, thread_num= 1, thread_affinity= 1
   // ...
   // team_num= 0, nesting_level= 1, thread_num= 7, thread_affinity= 7

      // doing work here
   }

   #pragma omp parallel num_threads( omp_get_num_procs() )
   {
      if(omp_get_thread_num()==0)
         printf("%s%s\n","Same Affinity as in Previous Parallel Region",
                         " -- no Affinity Reported\n");

   // NO AFFINITY OUTPUT:
   //(output in 1st parallel region only for OMP_DISPLAY_AFFINITY=TRUE)

      // doing more work here
   }

   // Report Affinity for 1/2 number of threads
   #pragma omp parallel num_threads( omp_get_num_procs()/2 )
   {
     if(omp_get_thread_num()==0)
        printf("Report Affinity for using 1/2 of max threads.\n");

   // DISPLAY OUTPUT (default format) has been sorted:
   // team_num= 0, nesting_level= 1, thread_num= 0, thread_affinity= 0,1
   // team_num= 0, nesting_level= 1, thread_num= 1, thread_affinity= 2,3
   // team_num= 0, nesting_level= 1, thread_num= 2, thread_affinity= 4,5
   // team_num= 0, nesting_level= 1, thread_num= 3, thread_affinity= 6,7

     // do work
   }

   return 0;
}
!!%compiler: gfortran
!!%cflags: -fopenmp

! name: affinity_display.1
! type: F-free
! version: omp_5.0
program affinity_display        ! MAX threads = 8, single socket system

   use omp_lib
   implicit none
   character(len=0) :: null

   ! API call - Displays Affinity of Primary Thread
   call omp_display_affinity(null)

   ! API CALL OUTPUT (default format):
   ! team_num= 0, nesting_level= 0, thread_num= 0, &
   !   thread_affinity= 0,1,2,3,4,5,6,7


   ! OMP_DISPLAY_AFFINITY=TRUE, OMP_NUM_THREADS=8

   !$omp parallel num_threads(omp_get_num_procs())

     if(omp_get_thread_num()==0) then
        print*, "1st Parallel Region  -- Affinity Reported"
     endif

     ! DISPLAY OUTPUT (default format) has been sorted:
     ! team_num= 0, nesting_level= 1, thread_num= 0, thread_affinity= 0
     ! team_num= 0, nesting_level= 1, thread_num= 1, thread_affinity= 1
     ! ...
     ! team_num= 0, nesting_level= 1, thread_num= 7, thread_affinity= 7

      ! doing work here

   !$omp end parallel

   !$omp parallel num_threads( omp_get_num_procs() )

     if(omp_get_thread_num()==0) then
        print*, "Same Affinity in Parallel Region -- no Affinity Reported"
     endif

     ! NO AFFINITY OUTPUT:
     ! (output in 1st parallel region only for
     !  OMP_DISPLAY_AFFINITY=TRUE)

      ! doing more work here

   !$omp end parallel

   ! Report Affinity for 1/2 number of threads
   !$omp parallel num_threads( omp_get_num_procs()/2 )

     if(omp_get_thread_num()==0) then
        print*, "Altered Affinity in Parallel Region -- Affinity Reported"
     endif

     ! DISPLAY OUTPUT (default format) has been sorted:
     ! team_num= 0, nesting_level= 1, thread_num= 0, &
     !   thread_affinity= 0,1
     ! team_num= 0, nesting_level= 1, thread_num= 1, &
     !   thread_affinity= 2,3
     ! team_num= 0, nesting_level= 1, thread_num= 2, &
     !   thread_affinity= 4,5
     ! team_num= 0, nesting_level= 1, thread_num= 3, &
     !   thread_affinity= 6,7

      ! do work

   !$omp end parallel

end program

In the following example 2 threads are forked, and each executes on a socket. Next, a nested parallel region runs half of the available threads on each socket.

These OpenMP environment variables have been set:

  • OMP_PROC_BIND=”TRUE”

  • OMP_NUM_THREADS=”2,4”

  • OMP_PLACES=”{0,2,4,6},{1,3,5,7}”

  • OMP_AFFINITY_FORMAT=”nest_level= L, parent_thrd_num= a, thrd_num= n, thrd_affinity= A”

where the numbers correspond to core ids for the system. Note, OMP_DISPLAY_AFFINITY is not set and is FALSE by default. This example shows how to use API routines to perform affinity display operations.

For each of the two first-level threads the OMP_PLACES variable specifies a place with all the core-ids of the socket ({0,2,4,6} for one thread and {1,3,5,7} for the other). (As is sometimes the case in 2-socket systems, one socket may consist of the even id numbers, while the other may have the odd id numbers.) The affinities are printed according to the OMP_AFFINITY_FORMAT format: providing the parallel nesting level (%L), the ancestor thread number (%a), the thread number (%n) and the thread affinity (%A). In the nested parallel region within the socket_work routine the affinities for the threads on each socket are printed according to this format.

//%compiler: clang
//%cflags: -fopenmp

/*
* name:       affinity_display.2
* type:       C
* version: omp_5.0
*/
#include <stdio.h>
#include <stdlib.h>
#include <omp.h>

void socket_work(int socket_num, int n_thrds);

int main(void)
{
   int n_sockets, socket_num, n_thrds_on_socket;

   omp_set_nested(1);            // or env var= OMP_NESTED=true
   omp_set_max_active_levels(2); // or env var= OMP_MAX_ACTIVE_LEVELS=2

   n_sockets         = omp_get_num_places();
   n_thrds_on_socket = omp_get_place_num_procs(0);

// OMP_NUM_THREADS=2,4
// OMP_PLACES="{0,2,4,6},{1,3,5,7}"  #2 sockets; even/odd proc-ids
// OMP_AFFINITY_FORMAT=\
// "nest_level= %L, parent_thrd_num= %a, thrd_num= %n, thrd_affinity= %A"

   #pragma omp parallel num_threads(n_sockets) private(socket_num)
   {
      socket_num = omp_get_place_num();

      if(socket_num==0)
         printf(" LEVEL 1 AFFINITIES 1 thread/socket, %d sockets:\n\n",
                n_sockets);

      // not needed if OMP_DISPLAY_AFFINITY=TRUE
      omp_display_affinity(NULL);

// OUTPUT:
// LEVEL 1 AFFINITIES 1 thread/socket, 2 sockets:
// nest_level= 1, parent_thrd_num= 0, thrd_num= 0, thrd_affinity= 0,2,4,6
// nest_level= 1, parent_thrd_num= 0, thrd_num= 1, thrd_affinity= 1,3,5,7

      socket_work(socket_num, n_thrds_on_socket);
   }

 return 0;
}

void socket_work(int socket_num, int n_thrds)
{
   #pragma omp parallel num_threads(n_thrds)
   {
      if(omp_get_thread_num()==0)
         printf(" LEVEL 2 AFFINITIES, %d threads on socket %d\n",
                n_thrds, socket_num);

         // not needed if OMP_DISPLAY_AFFINITY=TRUE
         omp_display_affinity(NULL);

 // OUTPUT:
 // LEVEL 2 AFFINITIES, 4 threads on socket 0
 // nest_level= 2, parent_thrd_num= 0, thrd_num= 0, thrd_affinity= 0
 // nest_level= 2, parent_thrd_num= 0, thrd_num= 1, thrd_affinity= 2
 // nest_level= 2, parent_thrd_num= 0, thrd_num= 2, thrd_affinity= 4
 // nest_level= 2, parent_thrd_num= 0, thrd_num= 3, thrd_affinity= 6

 // LEVEL 2 AFFINITIES, 4 threads on socket 1
 // nest_level= 2, parent_thrd_num= 1, thrd_num= 0, thrd_affinity= 1
 // nest_level= 2, parent_thrd_num= 1, thrd_num= 1, thrd_affinity= 3
 // nest_level= 2, parent_thrd_num= 1, thrd_num= 2, thrd_affinity= 5
 // nest_level= 2, parent_thrd_num= 1, thrd_num= 3, thrd_affinity= 7

    // ... Do Some work on Socket
   }
}
!!%compiler: gfortran
!!%cflags: -fopenmp

! name: affinity_display.2
! type: F-free
! version: omp_5.0
program affinity_display

   use omp_lib
   implicit none
   character(len=0) :: null
   integer          :: n_sockets, socket_num, n_thrds_on_socket;

   call omp_set_nested(.true.)        ! or env var= OMP_NESTED=true
   call omp_set_max_active_levels(2)  ! or env var= OMP_MAX_ACTIVE_LEVELS=2

   n_sockets         = omp_get_num_places()
   n_thrds_on_socket = omp_get_place_num_procs(0)

    ! OMP_NUM_THREADS=2,4
    ! OMP_PLACES="{0,2,4,6},{1,3,5,7}"  #2 sockets; even/odd proc-ids
    ! OMP_AFFINITY_FORMAT=\
    !"nest_level= %L, parent_thrd_num= %a, thrd_num= %n, thrd_affinity= %A"

   !$omp parallel num_threads(n_sockets) private(socket_num)

     socket_num = omp_get_place_num()

     if(socket_num==0) then
       write(*,'("LEVEL 1 AFFINITIES 1 thread/socket ",i0," sockets")') &
             n_sockets
     endif

     call omp_display_affinity(null)  ! not needed
                                      ! if OMP_DISPLAY_AFFINITY=TRUE

       ! OUTPUT:
       ! LEVEL 1 AFFINITIES 1 thread/socket, 2 sockets:
       ! nest_level= 1, parent_thrd_num= 0, thrd_num= 0, &
       !   thrd_affinity= 0,2,4,6
       ! nest_level= 1, parent_thrd_num= 0, thrd_num= 1, &
       !   thrd_affinity= 1,3,5,7

     call socket_work(socket_num, n_thrds_on_socket)

   !$omp end parallel

end program

subroutine socket_work(socket_num, n_thrds)
   use omp_lib
   implicit none
   integer :: socket_num, n_thrds
   character(len=0) :: null

   !$omp parallel num_threads(n_thrds)

      if(omp_get_thread_num()==0) then
      write(*,'("LEVEL 2 AFFINITIES, ",i0," threads on socket ",i0)') &
            n_thrds,socket_num
      endif

      call omp_display_affinity(null)  ! not needed
                                       ! if OMP_DISPLAY_AFFINITY=TRUE

      ! OUTPUT:
      ! LEVEL 2 AFFINITIES, 4 threads on socket 0
      ! nest_level= 2, parent_thrd_num= 0, thrd_num= 0, thrd_affinity= 0
      ! nest_level= 2, parent_thrd_num= 0, thrd_num= 1, thrd_affinity= 2
      ! nest_level= 2, parent_thrd_num= 0, thrd_num= 2, thrd_affinity= 4
      ! nest_level= 2, parent_thrd_num= 0, thrd_num= 3, thrd_affinity= 6

      ! LEVEL 2 AFFINITIES, 4 thrds on socket 1
      ! nest_level= 2, parent_thrd_num= 1, thrd_num= 0, thrd_affinity= 1
      ! nest_level= 2, parent_thrd_num= 1, thrd_num= 1, thrd_affinity= 3
      ! nest_level= 2, parent_thrd_num= 1, thrd_num= 2, thrd_affinity= 5
      ! nest_level= 2, parent_thrd_num= 1, thrd_num= 3, thrd_affinity= 7

      ! ... Do Some work on Socket

   !$omp end parallel

end subroutine

The next example illustrates more details about affinity formatting. First, the omp_get_affinity_format() API routine is used to obtain the default format. The code checks to make sure the storage provides enough space to hold the format. Next, the omp_set_affinity_format() API routine sets a user-defined format: host=%20H thrd_num=%0.4n binds_to=%A .

The host, thread number and affinity fields are specified by %20H , %0.4n and %A : H , n and A are single character “short names” for the host, thread_num and thread_affinity data to be printed, with format sizes of 20 , 4 , and “size as needed”. The period (.) indicates that the field is displayed right-justified (default is left-justified) and the “0” indicates that any unused space is to be prefixed with zeros (e.g. instead of “1”, “0001” is displayed for the field size of 4).

Within the parallel region the affinity for each thread is captured by omp_capture_affinity() into a buffer array with elements indexed by the thread number ( thrd_num ). After the parallel region, the thread affinities are printed in thread-number order.

If the storage area in buffer is inadequate for holding the affinity data, the stored affinity data is truncated. The maximum value for the number of characters ( nchars ) returned by omp_capture_affinity is captured by the reduction(max:max_req_store) clause and the if(nchars >= max_req_store) max_req_store=nchars statement. It is used to report possible truncation (if max_req_store > buffer_store ).

//%compiler: clang
//%cflags: -fopenmp

/*
* name: affinity_display.3
* type: C
* version: omp_5.0
*/
#include <stdio.h>
#include <stdlib.h>  // also null is in <stddef.h>
#include <stddef.h>
#include <string.h>
#include <omp.h>

#define FORMAT_STORE   80
#define BUFFER_STORE   80

int main(void){

   int i, n, thrd_num, max_req_store;
   size_t nchars;

   char default_format[FORMAT_STORE];
   char my_format[]  = "host=%20H thrd_num=%0.4n binds_to=%A";
   char **buffer;


   // CODE SEGMENT 1         AFFINITY FORMAT

   // Get and Display Default Affinity Format

   nchars = omp_get_affinity_format(default_format,(size_t)FORMAT_STORE);
   printf("Default Affinity Format is: %s\n",default_format);

   if(nchars >= FORMAT_STORE){
      printf("Caution: Reported Format is truncated.  Increase\n");
      printf("         FORMAT_STORE to %d.\n", nchars+1);
   }

   // Set Affinity Format

   omp_set_affinity_format(my_format);
   printf("Affinity Format set to: %s\n",my_format);


   // CODE SEGMENT 2         CAPTURE AFFINITY

   // Set up buffer for affinity of n threads

   n = omp_get_num_procs();
   buffer = (char **)malloc( sizeof(char *) * n );
   for(i=0;i<n;i++){
      buffer[i]=(char *)malloc( sizeof(char) * BUFFER_STORE);
   }

   // Capture Affinity using Affinity Format set above.
   // Use max reduction to check size of buffer areas
   max_req_store = 0;
   #pragma omp parallel private(thrd_num,nchars) \
                        reduction(max:max_req_store)
   {
      //safety: don't exceed # of buffers
      if(omp_get_num_threads()>n) exit(1);

      thrd_num=omp_get_thread_num();
      nchars=omp_capture_affinity(buffer[thrd_num],
                                  (size_t)BUFFER_STORE,NULL);
      if(nchars > max_req_store) max_req_store=nchars;

      // ...
   }

   for(i=0;i<n;i++){
      printf("thrd_num= %d, affinity: %s\n", i,buffer[i]);
   }
      // For 4 threads with OMP_PLACES='{0,1},{2,3},{4,5},{6,7}'
      // Format    host=%20H thrd_num=%0.4n binds_to=%A

      // affinity: host=hpc.cn567            thrd_num=0000 binds_to=0,1
      // affinity: host=hpc.cn567            thrd_num=0001 binds_to=2,3
      // affinity: host=hpc.cn567            thrd_num=0002 binds_to=4,5
      // affinity: host=hpc.cn567            thrd_num=0003 binds_to=6,7


   if(max_req_store>=BUFFER_STORE){
      printf("Caution: Affinity string truncated.  Increase\n");
      printf("         BUFFER_STORE to %d\n",max_req_store+1);
   }

   for(i=0;i<n;i++) free(buffer[i]);
   free (buffer);

   return 0;
}
!!%compiler: gfortran
!!%cflags: -fopenmp

! name: affinity_display.3
! type: F-free
! version: omp_5.0
program affinity_display
   use omp_lib
   implicit none
   integer, parameter :: FORMAT_STORE=80
   integer, parameter :: BUFFER_STORE=80

   integer            :: i, n, thrd_num, nchars, max_req_store

   character(FORMAT_STORE)     :: default_format
   character(*), parameter     :: my_format = &
                                  "host=%20H thrd_num=%0.4n binds_to=%A"
   character(:), allocatable   :: buffer(:)
   character(len=0)            :: null


!  CODE SEGMENT 1         AFFINITY FORMAT

!                         Get and Display Default Affinity Format

   nchars = omp_get_affinity_format(default_format)
   print*,"Default Affinity Format: ", trim(default_format)

   if( nchars > FORMAT_STORE) then
      print*,"Caution: Reported Format is truncated.  Increase"
      print*,"         FORMAT_STORE to ", nchars
   endif

!                         Set Affinity Format

   call omp_set_affinity_format(my_format)
   print*,"Affinity Format set to: ", my_format


!  CODE SEGMENT 2         CAPTURE AFFINITY

!                         Set up buffer for affinity of n threads

   n = omp_get_num_procs()
   allocate( character(len=BUFFER_STORE)::buffer(0:n-1) )

!                         Capture Affinity using Affinity Format set above.
!                         Use max reduction to check size of buffer areas
   max_req_store = 0
   !$omp parallel private(thrd_num,nchars) reduction(max:max_req_store)

      if(omp_get_num_threads()>n) stop "ERROR: increase buffer lines"

      thrd_num=omp_get_thread_num()
      nchars=omp_capture_affinity(buffer(thrd_num),null)
      if(nchars>max_req_store) max_req_store=nchars
      !  ...

   !$omp end parallel

   do i = 0, n-1
      print*, "thrd_num= ",i,"   affinity:", trim(buffer(i))
   end do
         !  For 4 threads with OMP_PLACES='{0,1},{2,3},{4,5},{6,7}'
         !  Format:   host=%20H thrd_num=%0.4n binds_to=%A

         !  affinity: host=hpc.cn567            thrd_num=0000 binds_to=0,1
         !  affinity: host=hpc.cn567            thrd_num=0001 binds_to=2,3
         !  affinity: host=hpc.cn567            thrd_num=0002 binds_to=4,5
         !  affinity: host=hpc.cn567            thrd_num=0003 binds_to=6,7

   if(max_req_store > BUFFER_STORE) then
      print*,  "Caution: Affinity string truncated.  Increase"
      print*,  "         BUFFER_STORE to ",max_req_store
   endif

   deallocate(buffer)
end program