Difference between revisions of "Co-array Fortran on the GPC"

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search
 
(23 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Version 12 of the Intel Fortran compiler supports co-arrays, and is installed on the [[GPC Quickstart | GPC]].  This page will briefly sketch how to compile and run Co-array Fortran programs.
+
Versions 12 and higher of the Intel Fortran compiler, and version 5.1 and up of the GNU Fortran compiler, support almost all of Co-array Fortran, and are installed on the [[GPC Quickstart | GPC]].   
  
==Loading necessary modules==
+
This page will briefly sketch how to compile and run Co-array Fortran programs using these compilers.
First, you need to load the module for version 12 of the Intel compilers, as well as Intel MPI.
+
 
 +
==Example==
 +
Here is an example of a co-array fortran program:
 +
<source lang="fortran">
 +
program Hello_World
 +
  integer :: i ! Local variable
 +
  integer :: num[*] ! scalar coarray
 +
  if (this_image() == 1) then
 +
    write(*,'(a)') 'Enter a number: '
 +
    read(*,'(i80)') num
 +
    ! Distribute information to other images
 +
    do i = 2, num_images()
 +
      num[i] = num
 +
    end do
 +
  end if
 +
  sync all ! Barrier to make sure the data has arrived
 +
  ! I/O from all nodes
 +
  write(*,'(a,i0,a,i0)') 'Hello ',num,' from image ', this_image()
 +
end program Hello_world
 +
</source>
 +
(Adapted from [http://en.wikipedia.org/wiki/Co-array_Fortran]).
 +
 
 +
Compiling, linking and running co-array fortran programs is different depending on whether you will run the program only on a single node (with 8 cores), or on several nodes, and depends on which compiler you are using, Intel, or GNU.
 +
 
 +
==Intel compiler instructions for Coarray Fortran==
 +
 
 +
===Loading necessary modules===
 +
First, you need to load the module for version 12 or greater of the Intel compilers, as well as Intel MPI.
 
<pre>
 
<pre>
  module load intel/intel-v12.0.0.084 intelmpi
+
module load intel/14.0.1 intelmpi/4.1.2.040
 
</pre>
 
</pre>
(you may put this in your .bashrc file.)
+
 
 +
There are two modes in which the intel compiler supports coarray fortran:
 +
 
 +
1. Single node usage
 +
 
 +
2. Multiple node usage
 +
 
 +
The way you compile and run for these two cases is different. However, we're working on making coarray fortran compilation and running more uniform among these two cases, as well as with the, as of yet experimental, gfortran coarray support. See [[Co-array_Fortran_on_the_GPC#Uniformized_Usage|Uniformized Usage]] below.
  
 
Note: For multiple node usage, it makes sense to have to load the IntelMPI module, since Intel's implementation of Co-array Fortran uses MPI. However, the Intel MPI module is needed even for single-node usage, just in order to link successfully.
 
Note: For multiple node usage, it makes sense to have to load the IntelMPI module, since Intel's implementation of Co-array Fortran uses MPI. However, the Intel MPI module is needed even for single-node usage, just in order to link successfully.
  
Compiling, linking and running co-array fortran programs is different depending on whether you will run the program only on a single node (with 8 cores), or on several nodes.
+
===Single node usage===
 
 
==Single node usage==
 
  
===Compilation===
+
====Compilation====
 
<source lang="bash">
 
<source lang="bash">
 
ifort -O3 -xHost -coarray=shared -c [sourcefile] -o [objectfile]
 
ifort -O3 -xHost -coarray=shared -c [sourcefile] -o [objectfile]
 
</source>
 
</source>
  
===Linking===
+
====Linking====
 
<source lang="bash">
 
<source lang="bash">
 
ifort -coarray=shared [objectfile] -o [executable]
 
ifort -coarray=shared [objectfile] -o [executable]
 
</source>
 
</source>
  
===Running===
+
====Running====
 
To run this co-array program on one node with 16 images (co-array version for what openmp calls a thread and mpi calls a process), you simply put
 
To run this co-array program on one node with 16 images (co-array version for what openmp calls a thread and mpi calls a process), you simply put
 
<source lang="bash">
 
<source lang="bash">
Line 42: Line 74:
 
<source lang="bash">
 
<source lang="bash">
 
#!/bin/bash
 
#!/bin/bash
# MOAB/Torque submission script for SciNet GPC (OpenMP)
+
# MOAB/Torque submission script for SciNet GPC (Intel Coarray Fortran)
 
#
 
#
 
#PBS -l nodes=1:ppn=8,walltime=1:00:00
 
#PBS -l nodes=1:ppn=8,walltime=1:00:00
Line 50: Line 82:
 
cd $PBS_O_WORKDIR
 
cd $PBS_O_WORKDIR
  
 +
# LOAD MODULES THAT THE APPLICATION WAS COMPILED WITH
 +
module load intel/14.0.1 intelmpi/4.1.2.040
 +
 +
# RUN THE APPLICATION WITH 16 IMAGES
 
export FOR_COARRAY_NUM_IMAGES=16
 
export FOR_COARRAY_NUM_IMAGES=16
 
./[executable]
 
./[executable]
 
</source>
 
</source>
  
==Multiple nodes usage==
+
===Multiple nodes usage===
  
===Compilation===
+
====Compilation====
 
<source lang="bash">
 
<source lang="bash">
 
ifort -O3 -xHost -coarray=distributed -c [sourcefile] -o [objectfile]
 
ifort -O3 -xHost -coarray=distributed -c [sourcefile] -o [objectfile]
 
</source>
 
</source>
  
===Linking===
+
====Linking====
 
<source lang="bash">
 
<source lang="bash">
 
ifort -coarray=distributed [objectfile] -o [executable]
 
ifort -coarray=distributed [objectfile] -o [executable]
 
</source>
 
</source>
  
===Running===
+
====Running====
Because distributed co-array fortran is based on MPI, we need to launch the mpi processes on different nodes. The easiest way to set the number of images using FOR_COARRAY_NUM_IMAGES and then to use mpirun without an <tt>-np</tt> parameter:
+
Because distributed co-array fortran is based on MPI, we need to launch the mpi processes on different nodes. The easiest way to do this is to set the number of images using FOR_COARRAY_NUM_IMAGES and then to use mpirun without an <tt>-np</tt> parameter:
 
<source lang="bash">
 
<source lang="bash">
 +
export I_MPI_PROCESS_MANAGER=mpd
 
export FOR_COARRAY_NUM_IMAGES=32
 
export FOR_COARRAY_NUM_IMAGES=32
mpirun ./[executable]-env I_MPI_FABRICS shm:tcp
+
mpirun ./[executable]
 
</source>
 
</source>
Note that the total number of images is set explicitly, and should not be given to mpirun. You can still pass other parameters to mpirun, though, such as "<tt>-env I_MPI_FABRICS shm:tcp</tt>" if you're running on ethernet and want to suppress the warning messages saying that [[FAQ#Another_transport_will_be_used_instead | another transport will be used instead]].
+
Note that the total number of images is set explicitly, and should not be given to mpirun. You can still pass other parameters to mpirun, though.
 +
Also the default process manager for INTELMPI is not MPD, however Coarray needs MPD which is why I_MPI_PROCESS_MANAGER needs to be set to mpd.
  
 
An example submission script would look as follows:
 
An example submission script would look as follows:
Line 86: Line 124:
 
cd $PBS_O_WORKDIR
 
cd $PBS_O_WORKDIR
  
# EXECUTION COMMAND; FOR_COARRAY_NUM_IMAGES = nodes*ppn
+
# LOAD MODULES THAT THE APPLICATION WAS COMPILED WITH
 +
module load intel/14.0.1 intelmpi/4.1.2.040
 +
 
 +
# EXECUTION export COMMAND; FOR_COARRAY_NUM_IMAGES = nodes*ppn
 +
export I_MPI_PROCESS_MANAGER=mpd
 
export FOR_COARRAY_NUM_IMAGES=32
 
export FOR_COARRAY_NUM_IMAGES=32
mpirun -env I_MPI_FABRICS shm:tcp ./[executable]
+
mpirun ./[executable]
 
</source>
 
</source>
  
On Infiniband, in the last line you should replace <tt>tcp</tt> by <tt>dapl</tt> (see [[GPC MPI Versions]]), or omit the whole <tt>-env ...</tt> option.
+
===Features known not to workin earlier versions===
 +
 
 +
There are a few features that are known not to work in the current version of the Intel Fortran compiler (v12.0), such as character coarrays.  See section 3.2.3.3 of the [http://software.intel.com/en-us/articles/intel-fortran-composer-xe-2011-release-notes/ official release notes]. These issues may get fixed in later releases.
 +
 
 +
===Uniformized Usage===
 +
 
 +
If you load the addition module
 +
<pre>
 +
module load caf/intel/any
 +
</pre>
 +
you get access to  a compilation and linking wrapper called <tt>caf</tt> and a wrapper for running the application called <tt>cafrun</tt>.
 +
 
 +
====Compilation====
 +
<source lang="bash">
 +
caf -O3 -xhost -c [sourcefile] -o [objectfile]
 +
</source>
 +
 
 +
====Linking====
 +
<source lang="bash">
 +
caf [objectfile] -o [executable]
 +
</source>
 +
 
 +
====Running====
 +
To run this co-array program on one node with 8 images, you simply put
 +
<source lang="bash">
 +
cafrun ./[executable]
 +
</source>
 +
This runs 8 images, not 16.
 +
 
 +
To control the number of images, you can change the run command to
 +
<source lang="bash">
 +
cafrun -np 2 ./[executable]
 +
</source>
 +
This can be useful for testing.
 +
 
 +
To control the number of images per node, add the <tt>-N [images-per-node]</tt> option.
 +
 
 +
Note: currently, the uniformized mode doesn't explicitly utilize optimization opportunities offered by the single node mode, although it will work on one node.
 +
 
 +
 
 +
==GNU compiler instructions for Coarray Fortran==
 +
 
 +
Coarray fortran is supported in the GNU compiler suite (GCC) starting from version 5.1.  To implement coarrays, it uses the <tt>opencoarray</tt> library, which in turns uses openmpi (or at least, that's how it has been setup on the GPC).
 +
 
 +
''Issues with the gcc/opencoarray fortran compilers seem to exist, particularly with multidimensional arrays. We're still investigating the cause, but for now, one should consider these coarray fortran support by gcc as experimental''
 +
 
 +
===Loading necessary modules===
 +
First, you need to load the module for version 5.2 or greater of the GNU compilers (version 5.1 would've worked, but we skipped that release on the GPC), as well as OpenMPI.
 +
<pre>
 +
module load gcc/5.2.0 openmpi/gcc/1.8.3 use.experimental caf/gcc/5.2.0-openmpi
 +
</pre>
 +
 
 +
The caf/gcc/5.2.0-openmpi modules comes with a compilation and linking wrapper called <tt>caf</tt> and a wrapper for running the application called <tt>cafrun</tt>.
 +
 
 +
===Compilation===
 +
<source lang="bash">
 +
caf -O3 -march=native -c [sourcefile] -o [objectfile]
 +
</source>
 +
 
 +
===Linking===
 +
<source lang="bash">
 +
caf [objectfile] -o [executable]
 +
</source>
 +
 
 +
===Running===
 +
To run this co-array program on one node with 8 images (co-array version for what openmp calls a thread and mpi calls a process), you simply put
 +
<source lang="bash">
 +
cafrun ./[executable]
 +
</source>
 +
in your job submission script. In contrast with the Intel compiler, this does not runs 16 images, but only 8. The reason is that the gcc/opencoarray implementation uses MPI, and MPI is not aware of [[GPC_Quickstart#HyperThreading | HyperThreading]].
 +
 
 +
To control the number of images, you can change the run command to
 +
<source lang="bash">
 +
cafrun -np 2 ./[executable]
 +
</source>
 +
This can be useful for testing, or to exploit HyperThreading.
 +
 
 +
An example submission script would look as follows:
 +
 
 +
<source lang="bash">
 +
#!/bin/bash
 +
# MOAB/Torque submission script for SciNet GPC (GCC Coarray Fortran)
 +
#
 +
#PBS -l nodes=1:ppn=8,walltime=1:00:00
 +
#PBS -N test
 +
 
 +
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from
 +
cd $PBS_O_WORKDIR
 +
 
 +
# LOAD MODULES THAT THE APPLICATION WAS COMPILED WITH
 +
module load gcc/5.2.0 openmpi/gcc/1.8.3
 +
 
 +
# RUN WITH 16 IMAGES ON 1 NODE
 +
cafrun -np 16 ./[executable]
 +
</source>
 +
 
 +
===Multiple nodes usage===
 +
 
 +
Because the GNU implementation of Coarray Fortran in the gcc/5.2.0 module is based on MPI, running on multiple nodes is no different from the single-node usage.
 +
An example multi-node submission script would look as follows:
 +
 
 +
<source lang="bash">
 +
#!/bin/bash
 +
# MOAB/Torque submission script for SciNet GPC (GCC Coarray Fortran on multiple nodes)
 +
#
 +
#PBS -l nodes=4:ppn=8,walltime=1:00:00
 +
#PBS -N test
 +
 
 +
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from
 +
cd $PBS_O_WORKDIR
 +
 
 +
# LOAD MODULES THAT THE APPLICATION WAS COMPILED WITH
 +
module load gcc/5.2.0 openmpi/gcc/1.8.3
 +
 
 +
# EXECUTION with 32 images (nodes*ppn)
 +
cafrun -np 32 ./[executable]
 +
</source>

Latest revision as of 11:47, 9 November 2015

Versions 12 and higher of the Intel Fortran compiler, and version 5.1 and up of the GNU Fortran compiler, support almost all of Co-array Fortran, and are installed on the GPC.

This page will briefly sketch how to compile and run Co-array Fortran programs using these compilers.

Example

Here is an example of a co-array fortran program: <source lang="fortran"> program Hello_World

 integer :: i ! Local variable
 integer :: num[*] ! scalar coarray
 if (this_image() == 1) then
   write(*,'(a)') 'Enter a number: '
   read(*,'(i80)') num
   ! Distribute information to other images
   do i = 2, num_images()
     num[i] = num
   end do
 end if
 sync all ! Barrier to make sure the data has arrived
 ! I/O from all nodes
 write(*,'(a,i0,a,i0)') 'Hello ',num,' from image ', this_image()

end program Hello_world </source> (Adapted from [1]).

Compiling, linking and running co-array fortran programs is different depending on whether you will run the program only on a single node (with 8 cores), or on several nodes, and depends on which compiler you are using, Intel, or GNU.

Intel compiler instructions for Coarray Fortran

Loading necessary modules

First, you need to load the module for version 12 or greater of the Intel compilers, as well as Intel MPI.

module load intel/14.0.1 intelmpi/4.1.2.040

There are two modes in which the intel compiler supports coarray fortran:

1. Single node usage

2. Multiple node usage

The way you compile and run for these two cases is different. However, we're working on making coarray fortran compilation and running more uniform among these two cases, as well as with the, as of yet experimental, gfortran coarray support. See Uniformized Usage below.

Note: For multiple node usage, it makes sense to have to load the IntelMPI module, since Intel's implementation of Co-array Fortran uses MPI. However, the Intel MPI module is needed even for single-node usage, just in order to link successfully.

Single node usage

Compilation

<source lang="bash"> ifort -O3 -xHost -coarray=shared -c [sourcefile] -o [objectfile] </source>

Linking

<source lang="bash"> ifort -coarray=shared [objectfile] -o [executable] </source>

Running

To run this co-array program on one node with 16 images (co-array version for what openmp calls a thread and mpi calls a process), you simply put <source lang="bash"> ./[executable] </source> in your job submission script. The reason that this gives 16 images is that HyperThreading is enabled on the GPC nodes, which makes it seem to the system as if there are 16 computing units on a node, even though physically there are only 8.

To control the number of images, you can change the FOR_COARRAY_NUM_IMAGES environment variable: <source lang="bash"> export FOR_COARRAY_NUM_IMAGES=2 ./[executable] </source> This can be useful for testing.

An example submission script would look as follows:

<source lang="bash">

  1. !/bin/bash
  2. MOAB/Torque submission script for SciNet GPC (Intel Coarray Fortran)
  3. PBS -l nodes=1:ppn=8,walltime=1:00:00
  4. PBS -N test
  1. DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from

cd $PBS_O_WORKDIR

  1. LOAD MODULES THAT THE APPLICATION WAS COMPILED WITH

module load intel/14.0.1 intelmpi/4.1.2.040

  1. RUN THE APPLICATION WITH 16 IMAGES

export FOR_COARRAY_NUM_IMAGES=16 ./[executable] </source>

Multiple nodes usage

Compilation

<source lang="bash"> ifort -O3 -xHost -coarray=distributed -c [sourcefile] -o [objectfile] </source>

Linking

<source lang="bash"> ifort -coarray=distributed [objectfile] -o [executable] </source>

Running

Because distributed co-array fortran is based on MPI, we need to launch the mpi processes on different nodes. The easiest way to do this is to set the number of images using FOR_COARRAY_NUM_IMAGES and then to use mpirun without an -np parameter: <source lang="bash"> export I_MPI_PROCESS_MANAGER=mpd export FOR_COARRAY_NUM_IMAGES=32 mpirun ./[executable] </source> Note that the total number of images is set explicitly, and should not be given to mpirun. You can still pass other parameters to mpirun, though. Also the default process manager for INTELMPI is not MPD, however Coarray needs MPD which is why I_MPI_PROCESS_MANAGER needs to be set to mpd.

An example submission script would look as follows:

<source lang="bash">

  1. !/bin/bash
  2. MOAB/Torque submission script for SciNet GPC (ethernet)
  3. PBS -l nodes=4:ppn=8,walltime=1:00:00
  4. PBS -N test
  1. DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from

cd $PBS_O_WORKDIR

  1. LOAD MODULES THAT THE APPLICATION WAS COMPILED WITH

module load intel/14.0.1 intelmpi/4.1.2.040

  1. EXECUTION export COMMAND; FOR_COARRAY_NUM_IMAGES = nodes*ppn

export I_MPI_PROCESS_MANAGER=mpd export FOR_COARRAY_NUM_IMAGES=32 mpirun ./[executable] </source>

Features known not to workin earlier versions

There are a few features that are known not to work in the current version of the Intel Fortran compiler (v12.0), such as character coarrays. See section 3.2.3.3 of the official release notes. These issues may get fixed in later releases.

Uniformized Usage

If you load the addition module

module load caf/intel/any

you get access to a compilation and linking wrapper called caf and a wrapper for running the application called cafrun.

Compilation

<source lang="bash"> caf -O3 -xhost -c [sourcefile] -o [objectfile] </source>

Linking

<source lang="bash"> caf [objectfile] -o [executable] </source>

Running

To run this co-array program on one node with 8 images, you simply put <source lang="bash"> cafrun ./[executable] </source> This runs 8 images, not 16.

To control the number of images, you can change the run command to <source lang="bash"> cafrun -np 2 ./[executable] </source> This can be useful for testing.

To control the number of images per node, add the -N [images-per-node] option.

Note: currently, the uniformized mode doesn't explicitly utilize optimization opportunities offered by the single node mode, although it will work on one node.


GNU compiler instructions for Coarray Fortran

Coarray fortran is supported in the GNU compiler suite (GCC) starting from version 5.1. To implement coarrays, it uses the opencoarray library, which in turns uses openmpi (or at least, that's how it has been setup on the GPC).

Issues with the gcc/opencoarray fortran compilers seem to exist, particularly with multidimensional arrays. We're still investigating the cause, but for now, one should consider these coarray fortran support by gcc as experimental

Loading necessary modules

First, you need to load the module for version 5.2 or greater of the GNU compilers (version 5.1 would've worked, but we skipped that release on the GPC), as well as OpenMPI.

module load gcc/5.2.0 openmpi/gcc/1.8.3 use.experimental caf/gcc/5.2.0-openmpi

The caf/gcc/5.2.0-openmpi modules comes with a compilation and linking wrapper called caf and a wrapper for running the application called cafrun.

Compilation

<source lang="bash"> caf -O3 -march=native -c [sourcefile] -o [objectfile] </source>

Linking

<source lang="bash"> caf [objectfile] -o [executable] </source>

Running

To run this co-array program on one node with 8 images (co-array version for what openmp calls a thread and mpi calls a process), you simply put <source lang="bash"> cafrun ./[executable] </source> in your job submission script. In contrast with the Intel compiler, this does not runs 16 images, but only 8. The reason is that the gcc/opencoarray implementation uses MPI, and MPI is not aware of HyperThreading.

To control the number of images, you can change the run command to <source lang="bash"> cafrun -np 2 ./[executable] </source> This can be useful for testing, or to exploit HyperThreading.

An example submission script would look as follows:

<source lang="bash">

  1. !/bin/bash
  2. MOAB/Torque submission script for SciNet GPC (GCC Coarray Fortran)
  3. PBS -l nodes=1:ppn=8,walltime=1:00:00
  4. PBS -N test
  1. DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from

cd $PBS_O_WORKDIR

  1. LOAD MODULES THAT THE APPLICATION WAS COMPILED WITH

module load gcc/5.2.0 openmpi/gcc/1.8.3

  1. RUN WITH 16 IMAGES ON 1 NODE

cafrun -np 16 ./[executable] </source>

Multiple nodes usage

Because the GNU implementation of Coarray Fortran in the gcc/5.2.0 module is based on MPI, running on multiple nodes is no different from the single-node usage. An example multi-node submission script would look as follows:

<source lang="bash">

  1. !/bin/bash
  2. MOAB/Torque submission script for SciNet GPC (GCC Coarray Fortran on multiple nodes)
  3. PBS -l nodes=4:ppn=8,walltime=1:00:00
  4. PBS -N test
  1. DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from

cd $PBS_O_WORKDIR

  1. LOAD MODULES THAT THE APPLICATION WAS COMPILED WITH

module load gcc/5.2.0 openmpi/gcc/1.8.3

  1. EXECUTION with 32 images (nodes*ppn)

cafrun -np 32 ./[executable] </source>