3. Parallel Execution#

A single thread, the initial thread , begins sequential execution of an OpenMP enabled program, as if the whole program is in an implicit parallel region consisting of an implicit task executed by the initial thread .

A parallel construct encloses code, forming a parallel region. An initial thread encountering a parallel region forks (creates) a team of threads at the beginning of the parallel region, and joins them (removes from execution) at the end of the region. The initial thread becomes the primary thread of the team in a parallel region with a thread number equal to zero, the other threads are numbered from 1 to number of threads minus 1. A team may be comprised of just a single thread.

Each thread of a team is assigned an implicit task consisting of code within the parallel region. The task that creates a parallel region is suspended while the tasks of the team are executed. A thread is tied to its task; that is, only the thread assigned to the task can execute that task. After completion of the parallel region, the primary thread resumes execution of the generating task.

Any task within a parallel region is allowed to encounter another parallel region to form a nested parallel region. The parallelism of a nested parallel region (whether it forks additional threads, or is executed serially by the encountering task) can be controlled by the OMP_NESTED environment variable or the omp_set_nested() API routine with arguments indicating true or false.

The number of threads of a parallel region can be set by the OMP_NUM_THREADS environment variable, the omp_set_num_threads() routine, or on the parallel directive with the num_threads clause. The routine overrides the environment variable, and the clause overrides all. Use the OMP_DYNAMIC or the omp_set_dynamic() function to specify that the OpenMP implementation dynamically adjust the number of threads for parallel regions. The default setting for dynamic adjustment is implementation defined. When dynamic adjustment is on and the number of threads is specified, the number of threads becomes an upper limit for the number of threads to be provided by the OpenMP runtime.

WORKSHARING CONSTRUCTS

A worksharing construct distributes the execution of the associated region among the members of the team that encounter it. There is an implied barrier at the end of the worksharing region (there is no barrier at the beginning). The worksharing constructs are:

  • loop constructs: for and do

  • sections

  • single

  • workshare

The for and do constructs (loop constructs) create a region consisting of a loop. A loop controlled by a loop construct is called an associated loop. Nested loops can form a single region when the collapse clause (with an integer argument) designates the number of associated loops to be executed in parallel, by forming a “single iteration space” for the specified number of nested loops. The ordered clause can also control multiple associated loops.

An associated loop must adhere to a “canonical form” (specified in the Canonical Loop Form of the OpenMP Specifications document) which allows the iteration count (of all associated loops) to be computed before the (outermost) loop is executed. Most common loops comply with the canonical form, including C++ iterators.

A single construct forms a region in which only one thread (any one of the team) executes the region. The other threads wait at the implied barrier at the end, unless the nowait clause is specified.

The sections construct forms a region that contains one or more structured blocks. Each block of a sections directive is constructed with a section construct, and executed once by one of the threads (any one) in the team. (If only one block is formed in the region, the section construct, which is used to separate blocks, is not required.) The other threads wait at the implied barrier at the end, unless the nowait clause is specified.

The workshare construct is a Fortran feature that consists of a region with a single structure block (section of code). Statements in the workshare region are divided into units of work, and executed (once) by threads of the team.

MASKED CONSTRUCT

The masked construct is not a worksharing construct. The masked region is executed only by the primary thread. There is no implicit barrier (and flush) at the end of the masked region; hence the other threads of the team continue execution beyond code statements beyond the masked region. The master construct, which has been deprecated in OpenMP 5.1, has identical semantics to the masked construct with no filter clause.