Intel Compiler

2021-04-28

  • 1 Basics
    • 1.1 Compilation and linking
      • 1.1.1 Create static library
    • 1.2 Syntax
      • 1.2.1 Character arrays with varying component lengths
      • 1.2.2 Array SIZE>0 when not allocated
  • 2 Advanced
    • 2.1 Compiler configuration
      • 2.1.1 Floating Point
      • 2.1.2 AVX may be better than AVX2
      • 2.1.3 Optimization may give inaccurate results
      • 2.1.4 Brief explanation of the optimization options
      • 2.1.5 -m, -xHost
      • 2.1.6 Recommended Intel Compiler Debugging Options
    • 2.2 Installation
    • 2.3 GDB
      • 2.3.1 Backspace does not work in gdb-ia of 2019 version
    • 2.4 MPI
      • 2.4.1 MPI benchmark test
  • 3 MKL
    • 3.1 Data fitting
      • 3.1.1 Cubic interpolation
  • 4 vtune
    • 4.1 Microarchitecture Exploration Analysis

1 Basics

1.1 Compilation and linking

1.1.1 Create static library

  1. Use the c option to generate object files from the source files:
ifort -c my_source1.f90 my_source2.f90 my_source3.f90
  1. Use the Intel® xiar tool to create the library file from the object files:
xiar rc my_lib.a my_source1.o my_source2.o my_source3.o
  1. Compile and link your project with your new library:
ifort main.f90 my_lib.a

Reference

1.2 Syntax

1.2.1 Character arrays with varying component lengths

ifort allows that. However, gfortran does not allow that. For example, the following definition is invalid in gfortran while valid in ifort.

program test
character(10), dimension(5) :: models = (/"feddes.swp", "jarvis89.swp", "jarvis10.swp" ,   "pem.swp", "van.swp"/)
end

To be valid in gfortran, use the following statement,

character(len=12), dimension(5) :: models = [character(len=12) :: "feddes.swp", &
                "jarvis89.swp", "jarvis10.swp", "pem.swp", "van.swp"]

Reference

Note strings are fixed length. Text that is shorter is padded on right with spaces, while text that is longer is truncated.

Reference

1.2.2 Array SIZE>0 when not allocated

program allocator
  double precision, dimension(:), allocatable:: x
  allocate(x(10))
  write(*,*) size(x),allocated(x)  ! >> 10  t
  deallocate(x)
  write(*,*) size(x),allocated(x)  ! >> 10  f
end program

The above program is valid. The size is only reliable if the array has been allocated.

2 Advanced

2.1 Compiler configuration

2.1.1 Floating Point

fpe: Allows some control over floating-point exception handling for the main program at run-time. -fpe0: abort execution if all exceptions occur. Set in debug mode.

2.1.2 AVX may be better than AVX2

AVX2 doubles width of integer vector instructions to 256 bits, and adds FMA.

Reference: Maybe in some cases AVX runs faster on an AVX2 platform

2.1.3 Optimization may give inaccurate results

O2/O3 optimisation using Intel compiler 2017 update 2 gives different results. It also occurs for NFS. So I change the OPT flags to -O2 -xHost -fp-model precise. -fp-model precise is critical and it does not affect NFS’s performance.

Reference

2.1.4 Brief explanation of the optimization options

Reference: Code Optimization: Special Compiler Options

2.1.5 -m, -xHost

-xHost tells the compiler to generate instructions for the highest instruction set available on the compilation host processor. The specialized code generated by this option may only run on a subset of Intel® processors. The -x options enable additional optimizations not enabled with options -m.

-m tells the compiler which features it may target, including which instruction sets it may generate. Code generated with these options should execute on any compatible, non-Intel processor with support for the corresponding instruction set.

Options -x and -m are mutually exclusive. If both are specified, the compiler uses the last one specified and generates a warning.

So if you want to run programs on AMD processors, use -mavx.

2.1.6 Recommended Intel Compiler Debugging Options

Check Recommended Intel Compiler Debugging Options. Or check this PDF.

2.2 Installation

The official installer URL of Intel Parallel Studio XE 2020 is http://registrationcenter-download.intel.com/akdlm/IRC_NAS/tec/16744/parallel_studio_xe_2020_update2_cluster_edition.tgz. This URL is given by Arch Linux repo webpage for Intel Fortran Compiler. For other versions, check it for updates.

  • Intel 2018 Update 3 Composer Edition: http://registrationcenter-download.intel.com/akdlm/irc_nas/tec/13002/parallel_studio_xe_2018_update3_composer_edition.tgz
  • Intel MPI 2018 Update 3: http://registrationcenter-download.intel.com/akdlm/irc_nas/tec/13112/l_mpi_2018.3.222.tgz
  • Intel MKL 2018 Update 3: http://registrationcenter-download.intel.com/akdlm/irc_nas/tec/13005/l_mkl_2018.3.222.tgz
  • Intel 2019.1.144 Cluster Edition: http://registrationcenter-download.intel.com/akdlm/irc_nas/tec/14850/parallel_studio_xe_2019_update1_cluster_edition.tgz
  • Intel 2019 Update 5 Cluster Edition: http://registrationcenter-download.intel.com/akdlm/irc_nas/tec/15809/parallel_studio_xe_2019_update5_cluster_edition.tgz

2.3 GDB

2.3.1 Backspace does not work in gdb-ia of 2019 version

This was a known bug in some of the 2019 versions of our products. Please update your 2019 Intel products to Update 6 to get the fix. Or switch to the 2020 versions if available.

Reference,

2.4 MPI

2.4.1 MPI benchmark test

Intel MPI Benchmarks User Guide

For NFS, the most used MPI subroutine should be MPI_isend, MPI_irecv. These 2 subroutines are tested by IMB-MPI1 Exchange.

With the turbulence generation BC, MPI_bcast is also used heavily. It is tested by IMB-MPI1 Bcast.

3 MKL

3.1 Data fitting

3.1.1 Cubic interpolation

Example.

Main program file

include 'mkl_df.f90'

program main

use MKL_DF_TYPE
use MKL_DF

implicit none

  integer, parameter :: wp = 8
  integer, parameter :: xhint = DF_NON_UNIFORM_PARTITION
  integer, parameter :: yhint = DF_NO_HINT
  integer, parameter :: sorder = DF_PP_CUBIC
  integer, parameter :: stype = DF_PP_NATURAL
  integer, parameter :: bc_type = DF_BC_NOT_A_KNOT
  integer, parameter :: scoeffhint = DF_NO_HINT
  integer, parameter :: sitehint = DF_NON_UNIFORM_PARTITION
  integer, parameter :: ndorder = 1
  integer, dimension(1), parameter :: dorder = [0]
  integer, parameter :: rhint = DF_MATRIX_STORAGE_ROWS

  TYPE (DF_TASK) :: task
  integer :: errcode, i, nx, nvar
  real(wp), dimension(:), allocatable :: x, y, xi, yi, scoeff

continue

  nx = 7
  allocate(x(nx))
  do i = 1, nx
    x(i) = real(i**2)
  end do
  nvar = 1
  allocate(y(nvar*nx))
  do i = 1, nx
    y(i) = x(i)**3
  end do

  write(*, *) x
  write(*, *) y

  allocate(xi(nx), source=0.0_wp)
  allocate(yi(nx), source=0.0_wp)
  do i = 1, nx
    xi(i) = (real(i)/2.0_wp)**2+1.0_wp
  end do

  allocate(scoeff((nx-1)*sorder))

  errcode = dfdNewTask1D( task, nx, x, xhint, nvar, y, yhint )

  errcode = dfdEditPPSpline1D( task, sorder, stype, bc_type, scoeff=scoeff, scoeffhint=scoeffhint )

  errcode = dfdConstruct1D( task, DF_PP_SPLINE, DF_METHOD_STD )

  errcode = dfdInterpolate1D( task, DF_INTERP, DF_METHOD_PP, nx, xi, sitehint, ndorder, dorder, r=yi, rhint=rhint )

  write(*, *) xi
  write(*, *) yi
  write(*, *) xi**3

end program

Makefile

ifort -mkl -o dfd_test dfd_test.f90

Results show that it matches the analytical value.

   1.00000000000000        4.00000000000000        9.00000000000000 
   16.0000000000000        25.0000000000000        36.0000000000000 
   49.0000000000000
   1.00000000000000        64.0000000000000        729.000000000000 
   4096.00000000000        15625.0000000000        46656.0000000000 
   117649.000000000
   1.25000000000000        2.00000000000000        3.25000000000000 
   5.00000000000000        7.25000000000000        10.0000000000000 
   13.2500000000000
   1.95312500000001        8.00000000000001        34.3281250000000 
   125.000000000000        381.078125000000        1000.00000000000 
   2326.20312500000
   1.95312500000000        8.00000000000000        34.3281250000000 
   125.000000000000        381.078125000000        1000.00000000000 
   2326.20312500000

4 vtune

4.1 Microarchitecture Exploration Analysis

Use this type of analysis to check the instructions retired.

It requires the hardware event-based sampling collection. To enable it, you need to build and install the sampling driver. Check this Intel official webpage for reference.

.
Created on 2021-04-28 with pandoc