4.4. Affinity Query Functions#

In the example below a team of threads is generated on each socket of the system, using nested parallelism. Several query functions are used to gather information to support the creation of the teams and to obtain socket and thread numbers.

For proper execution of the code, the user must create a place partition, such that each place is a listing of the core numbers for a socket. For example, in a 2 socket system with 8 cores in each socket, and sequential numbering in the socket for the core numbers, the OMP_PLACES variable would be set to “{0:8},{8:8}”, using the place syntax { lower_bound : length : stride }, and the default stride of 1.

The code determines the number of sockets ( n_sockets ) using the omp_get_num_places() query function. In this example each place is constructed with a list of each socket’s core numbers, hence the number of places is equal to the number of sockets.

The outer parallel region forms a team of threads, and each thread executes on a socket (place) because the proc_bind clause uses spread in the outer parallel construct. Next, in the socket_init function, an inner parallel region creates a team of threads equal to the number of elements (core numbers) from the place of the parent thread. Because the outer parallel construct uses a spread affinity policy, each of its threads inherits a subpartition of the original partition. Hence, the omp_get_place_num_procs query function returns the number of elements (here procs = cores) in the subpartition of the thread. After each parent thread creates its nested parallel region on the section, the socket number and thread number are reported.

Note: Portable tools like hwloc (Portable HardWare LOCality package), which support many common operating systems, can be used to determine the configuration of a system. On some systems there are utilities, files or user guides that provide configuration information. For instance, the socket number and proc_id’s for a socket can be found in the /proc/cpuinfo text file on Linux systems.

//%compiler: clang
//%cflags: -fopenmp

/*
* name: affinity_query.1
* type: C
* version: omp_4.5
*/
#include <stdio.h>
#include <omp.h>

void socket_init(int socket_num)
{
   int n_procs;

   n_procs = omp_get_place_num_procs(socket_num);
   #pragma omp parallel num_threads(n_procs) proc_bind(close)
   {
      printf("Reporting in from socket num, thread num:  %d %d\n",
                                socket_num,omp_get_thread_num() );
   }
}

int main()
{
   int n_sockets, socket_num;

   omp_set_nested(1);              // or export OMP_NESTED=true
   omp_set_max_active_levels(2);   // or export OMP_MAX_ACTIVE_LEVELS=2

   n_sockets = omp_get_num_places();
   #pragma omp parallel num_threads(n_sockets) private(socket_num) \
                        proc_bind(spread)
   {
      socket_num = omp_get_place_num();
      socket_init(socket_num);
   }

   return 0;
}
!!%compiler: gfortran
!!%cflags: -fopenmp

! name: affinity_query.1
! type: F-free
! version: omp_4.5
subroutine socket_init(socket_num)
   use omp_lib
   integer  :: socket_num, n_procs

   n_procs = omp_get_place_num_procs(socket_num)
   !$omp parallel num_threads(n_procs) proc_bind(close)

      print*,"Reporting in from socket num, thread num: ",  &
                                socket_num,omp_get_thread_num()
   !$omp end parallel
end subroutine

program numa_teams
   use omp_lib
   integer :: n_sockets, socket_num

   call omp_set_nested(.true.)            ! or export OMP_NESTED=true
   call omp_set_max_active_levels(2) ! or export OMP_MAX_ACTIVE_LEVELS=2

   n_sockets = omp_get_num_places()
   !$omp parallel num_threads(n_sockets) private(socket_num) &
   !$omp&         proc_bind(spread)

      socket_num = omp_get_place_num()
      call socket_init(socket_num)

   !$omp end parallel
end program