offloading API overview

  • Target construct:
  • A. Syntax
    When an OpenMP program starts on the host device, if it encounters a target construct the target region is executed on the target device and the thread on the host waits until the execution of the thread on the device completes. In case of the target absence, the target region is also executed by the host device. In C, the target region in the code simply is created by adding the syntax bellow before the region:

    #pragma omp target [clause[ [,] clause] ... ] new-line

    The figure bellow shows how thread for host and devices are created for OpenMP accelerator.

    Here you can find the Example.1 for OpenMP offloading.

    In this example target data map and map clauses are used which we will discuss later in this tutorial.

    In order to run the first example you can use this Makefile.

    B. Clauses
    • device clause

    In target construct we can define specific target device by adding device clause. If device clause is not added in the syntax the default device is considered as a target device (as it is done in example.1).

    #pragma omp target device (0)

    Example.2:

    
          #pragma omp target device (0)
           #pragma omp for private(i)
            for (i = 0; i < n; i++)
                y[i] += a * x[i];
     

    Now, If the device (0) is MIC, you can use Intel compiler to compile the example as it is shown bellow and it uses in Makefile for Example1.

    icc -O0 -openmp example2_offload.c -o example_off
    
    • map clause

    Using data-mapping attribute clause explicitly maps the original variables on the host device to corresponding variables in a target device data environment.

    #pragma target map(to:x[0:n]) map(from:y[0:n])

    The to map indicates at the start of target region the variables with to map type are initialized with the values of the original values on the host devices. The from type indicates at the start of the target region the from map type is not initialized with the original value, but at the end of the target region these variables are assigned to the original variables on the host device.

    Using Map clauses helps compiler for moving data which leads to more accurate results. You may not be able to get the correct results without mapping clauses. The figure below illustrated data movement of the map clause.

    map consctruct

    There are various forms of map clause:
  • map(to:variables) : initializes the variables from host to target device.
  • map(from:variables): assigns the variables from the target device to corresponding data on the host device.
  • map(tofrom:variables: initials the variables from host to target device and also write the variables back from the target device to corresponding data on the host device.
  • map(alloc:variables): data is allocated instead of initialization.
  • map(variables): if there is no type specified it is referred as map (tofrom:variables).
  • map(to: x[0:N]): The array notation clause must be used when "X" is a pointer.
  • Example.3 shows the way that map clause can be used in the program. You can also download the runnable code here.

    Example.3:

    
      #pragma omp target map(to:x[0:n]) map(from:y[0:n])
         #pragma omp for private(i)
            for (i = 0; i < n; i++)
                y[i] += a * x[i];
    
    
  • if clause
  • Conditional clause on the target construct indicates that the device data environment creates if the condition is met. The target constructs enclosed in the target data region must use the same if conditional clause. Example.4 shows the way that a condition can be added to a target construct. In Example.9 the condition is applied to both target and target map construct.You can download Example.9 here.

    #pragma omp target if(n>THRESHOLD) map(from: p[0:N])

    Example.4:

    
       { 
        #pragma omp target  if(n>THRESHOLD)  map(to:x[0:n], z[0:n])
        #pragma omp parallel for
         for (i=0; i<n; i++) 
         y[i] = x[i] * z[i]
        }
    
  • Asynchronous clauses: nowait and depend
  • Asynchronous execution of a target region can be accomplished by creating an explicit task around the target region as it is shown in Example.5. An explicit task that includes the target region is generated when the task encounters the task construct, and the encountering thread to a target region waits for the completion of that region. The thread executing the explicit task encounters a task scheduling point while waiting for the execution of the target region to complete ,allowing the thread to switch back to the execution of the encountering task or one of the previously generated explicit tasks.

    Example.5:

    
    for (c=0; c<n; c+=CHUNKSZ)
          { 
           #pragma omp target update to(x[c:CHUNKSZ])
           #pragma omp task shared(x,y)
            #pragma omp target  
            #pragma omp parallel for 
            for (i = 0; i < CHUNKSZ; i++)
             y[i] += a*x[i];
           #pragma omp target update from(y[0:n])  
         }
           #pragma omp taskwait
     

    The runnable version of this example is available here.

    nowait and depend clauses were added to the target construct in OpenMP 4.5 to improve support for asynchronous execution of target regions.
    - nowait clause: When a thread encounters the nowait clause indicates that it would not wait for the target region, and the thread of the target task can perform other work while waiting for the target region execution to complete.

    Example.6:

    
    #pragma omp target map(to: x[0:n])map(from:y[0:n]) nowait
         {
         int i;
        #pragma omp for private(i)
        for (i = 0; i < n; i++)
        y[i] += a * x[i];
        }
        
    

    The code of this example is available here.

    -depend clause: The depend clause can be used for the synchronize with other tasks. In the following example different flow dependencies are used. In the first two dependencies the target task does not execute until the preceding explicit tasks have finished. The last dependence is produced in the target task. The last task does not execute until the target task finishes.

    Example.7:

    
     #pragma omp parallel num_threads(2)
         {
          #pragma omp single
            { 
          #pragma omp task depend(out:v1)
          init(v1,n);
          #pragma omp task depend(out:v2)
          init(v2,n);
          #pragma omp target nowait depend(in:v1,v2) depend(out:y)\
                                     map(to:v1,v2) map(from:y)
          #pragma omp parallel for private(i)
           for (i = 0; i < n; i++)
           y[i] += v1[i] * v2[i];
          #pragma omp task depend(in:p)
       output(p, N);
            }
         } 
    
    

    This example is available here for download.

  • Target data construct
  • A. Syntax
    Target data construct creates a new device data environment and maps the variables listed in map clause to the new device data environment. The target construct that is closed in the target data region also creates a new device data environment and inherits the variables from the target data map.

    #pragma omp target data clause[ [ [,] clause] ... ] new-line

    The map clause also can be used for multiple target data region in order to avoid frequent data transfer, Example.8 shows how the target data map clause can be used for more than one target regions. Here Example.8 is available for download.

    Example.8:

    
      #pragma omp target data map(to:x[0:n],k[0:n]) map(from:y[0:n], z[0:n])
        { int i;
          #pragma omp target 
           #pragma omp for private(i)
            for (i = 0; i < n; i++)
                y[i] += a * x[i];
          #pragma omp target 
           #pragma omp for private(i)
            for (i = 0; i < n; i++)
                z[i] += a * k[i];
        }
    
    B. Clauses
  • if clause
  • Conditional clause on the target data construct indicates that if the condition is met the device data environment will be created. The target constructs enclosed in the target data region must use the same if condition clause.

    #pragma omp target data if(N>THRESHOLD) map(from: p[0:N])

    Example.9:

    
    #pragma omp target data if(n>THRESHOLD) map(from: y[0:n])
         {
           int i;
           #pragma omp target if (n>THRESHOLD) map (to: x[0:n], z[0:n])  
           
           #pragma omp for private(i)
            for (i = 0; i < n; i++)
            y[i] += z[i] * x[i];     
          }  
     
    

    This example can be downloaded from this link.

    • enter and exit clause
    Structured data construct such as target data construct provides persistent data on a device for one or multiple target constructs , unstructured data construct such as target enter and exit data constructs, on the other hand, allow the creation and deletion data on the target device within the host code. The target enter data constructor uses an alloc modifier in the map clause to avoid copying values to the device and target exit clause uses the delete modifier to avoid copying data map to the host device.

    #pragma omp target enter data map(alloc:x[0:len])
    #pragma omp target exit data map(alloc:x[0:len])

    Example.10:

    
      void init_matrix( int n, double v[])
      {
      #pragma omp target enter data map(alloc:v[0:n])
       }
      void free_matrix(int n, double v[])
       {
      #pragma omp target exit data map(delete:v[0:n])
       }
    

    This example can be downloaded from this link.

  • Target update construct
  • A. Syntax
    Update construct uses to synchronize the value of mapped variables. It uses to maintain consistency between the original values on the host device and the corresponding data in the target device.

    #pragma omp target update clause[ [ [,] clause] ... ] new-line

    As it is shown in Example.11, after the first target region, the variable "x" is initialized with new value, and update construct is used in the second target region to assign the new value of x on the host device to the corresponding data on the target device.

    Example.11 is available here for download.

    Example.11:

    
    #pragma omp target data map(to:v1[0:n],v2[0:n]) map(from:y[0:n])
         {    
          #pragma omp target 
          #pragma omp parallel for private(i)
           for (i = 0; i < n; i++)
           y[i] += v1[i] * v2[i];
           init(v1,n);
          #pragma omp target update to (v1[0:n])
          #pragma omp target
          #pragma omp parallel for private(i)
            for (i = 0; i< n; i++)
            y[i] += v1[i] * a;
           }
     
    B. Clauses
    • if clause