<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en-GB">
	<id>https://oldwiki.scinet.utoronto.ca/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Ljdursi</id>
	<title>oldwiki.scinet.utoronto.ca - User contributions [en-gb]</title>
	<link rel="self" type="application/atom+xml" href="https://oldwiki.scinet.utoronto.ca/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Ljdursi"/>
	<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php/Special:Contributions/Ljdursi"/>
	<updated>2026-05-10T22:52:45Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.35.12</generator>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=PWC_Python&amp;diff=7434</id>
		<title>PWC Python</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=PWC_Python&amp;diff=7434"/>
		<updated>2014-12-02T20:38:01Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
This page contains the slides for the PWC Python class.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Slides ==&lt;br /&gt;
&lt;br /&gt;
* [http://wiki.scinethpc.ca/wiki/images/3/3c/PWCintro.pdf Morning of the first day].&lt;br /&gt;
* [http://wiki.scinethpc.ca/wiki/images/2/2d/PWCFirstAfternoon.pdf Afternoon of the first day].&lt;br /&gt;
&lt;br /&gt;
* [[Media:pwcfunctions.pdf | Second day ]]&lt;br /&gt;
* [http://support.scinet.utoronto.ca/CourseVideo/pwcfunctions.zip Code ]&lt;br /&gt;
** [http://support.scinet.utoronto.ca/~ljdursi/pwc/mapreduce.py mapreduce.py]&lt;br /&gt;
** [http://support.scinet.utoronto.ca/~ljdursi/pwc/mapreduce-ans.py mapreduce partial answer]&lt;br /&gt;
* [[Media:pwcobjects.pdf | Second day, objects ]]&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=PWC_Python&amp;diff=7431</id>
		<title>PWC Python</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=PWC_Python&amp;diff=7431"/>
		<updated>2014-12-02T18:29:27Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: /* Slides */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
This page contains the slides for the PWC Python class.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Slides ==&lt;br /&gt;
&lt;br /&gt;
* [http://wiki.scinethpc.ca/wiki/images/3/3c/PWCintro.pdf Morning of the first day].&lt;br /&gt;
* [http://wiki.scinethpc.ca/wiki/images/2/2d/PWCFirstAfternoon.pdf Afternoon of the first day].&lt;br /&gt;
&lt;br /&gt;
* [[Media:pwcfunctions.pdf | Second day ]]&lt;br /&gt;
* [http://support.scinet.utoronto.ca/CourseVideo/pwcfunctions.zip Code ]&lt;br /&gt;
** [http://support.scinet.utoronto.ca/~ljdursi/pwc/mapreduce.py mapreduce.py]&lt;br /&gt;
* [[Media:pwcobjects.pdf | Second day, objects ]]&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Python&amp;diff=7256</id>
		<title>Python</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Python&amp;diff=7256"/>
		<updated>2014-09-16T22:00:24Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: /* Python on the GPC */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[http://www.python.org/ Python] is programing language that continues to grow in popularity for scientific computing.   It is very fast to write code in, but the software that results is much much slower than C or Fortran; one should be wary of doing too much compute-intensive work in Python.     &lt;br /&gt;
&lt;br /&gt;
There is a dizzying amount of documentation available for programming in Python on the [http://python.org/ Python.org webpage]; SciNet has given a mini-course of 8 lectures on [[Research Computing with Python]] in the Fall of 2013.&lt;br /&gt;
An excellent set of material for teaching scientists to program in Python is also available at the [http://software-carpentry.org/4_0/python/ Software Carpentry homepage].&lt;br /&gt;
&lt;br /&gt;
__FORCETOC__ &lt;br /&gt;
&lt;br /&gt;
== Python on the GPC ==&lt;br /&gt;
&lt;br /&gt;
We currently have python 2.7.2, 2.7.3, 2.7.5, and 3.3.4 installed, compiled against fast intel math libraries.  To load the python modules, type the following commands:&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Version&lt;br /&gt;
! Command&lt;br /&gt;
|-&lt;br /&gt;
|2.7.2&lt;br /&gt;
|&amp;lt;tt&amp;gt;module load gcc intel python&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|2.7.3&lt;br /&gt;
|&amp;lt;tt&amp;gt;module load gcc intel/13.1.1 python/2.7.3&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|2.7.5&lt;br /&gt;
|&amp;lt;tt&amp;gt;module load gcc intel/13.1.1 python/2.7.5&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|3.3.4&lt;br /&gt;
|&amp;lt;tt&amp;gt;module load gcc intel/14.0.1 python/3.3.4&amp;lt;/tt&amp;gt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Modules installed system-wide ==&lt;br /&gt;
&lt;br /&gt;
Many optional packages are available for Python which greatly extend the language adding important new functionality.  Those packages which are likely to be important to all of our users &amp;amp;mdash; eg, [http://numpy.scipy.org/ NumPy], [http://www.scipy.org/ SciPy], and [http://matplotlib.sourceforge.net/ Matplotlib] are installed system-wide.&lt;br /&gt;
&lt;br /&gt;
Below is a list of the packages currently installed system-wide.&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
!{{Hl2}}| Module  &lt;br /&gt;
!{{Hl2}}| python/2.7.2 &lt;br /&gt;
!{{Hl2}}| python/2.7.3 &lt;br /&gt;
!{{Hl2}}| python/2.7.5 &lt;br /&gt;
!{{Hl2}}| python/3.3.4&lt;br /&gt;
!{{Hl2}}| Comments&lt;br /&gt;
|-  &lt;br /&gt;
|[http://www.scipy.org/ SciPy]&lt;br /&gt;
|  0.10.0&lt;br /&gt;
|  0.11.0&lt;br /&gt;
|  0.14.0&lt;br /&gt;
|  0.14.0&lt;br /&gt;
| An Open-source software for mathematics, science, and engineering.  Version in Python 2.7.x is linked against very fast MKL numerical libraries. &lt;br /&gt;
|-&lt;br /&gt;
|[http://numpy.scipy.org/ NumPy]&lt;br /&gt;
| 1.6.1&lt;br /&gt;
| 1.7.0&lt;br /&gt;
| 1.7.0&lt;br /&gt;
| 1.8.1&lt;br /&gt;
| NumPy is the fundamental package needed for scientific computing with Python. Contains fast arrays, tools for integrating C/C++ and Fortran code, linear algebra solvers, etc.  SciPy is built on top of NumPy.&lt;br /&gt;
|-&lt;br /&gt;
| [http://mpi4py.scipy.org/ mpi4py]&lt;br /&gt;
| 1.2.2&lt;br /&gt;
| 1.2.2&lt;br /&gt;
| 1.2.2&lt;br /&gt;
| 1.2.2&lt;br /&gt;
| A pythonic interface to mpi.   Available with openmpi; must load an openmpi module for this to work. (There is an issue with openmpi 1.4.x + infiniband, however it does appear to work fine with IntelMPI)&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.scipy.org/SciPyPackages/NumExpr Numexpr]&lt;br /&gt;
| 2.0&lt;br /&gt;
| 2.0.1&lt;br /&gt;
| 2.2.1&lt;br /&gt;
| 2.4_rc2&lt;br /&gt;
| Fast, memory-efficient elementwise operations on Numpy arrays.&lt;br /&gt;
|-&lt;br /&gt;
| [http://dirac.cnrs-orleans.fr/plone/software/scientificpython/ ScientificPython]&lt;br /&gt;
| 2.8 &lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| A collection of scientific python utilities.   Does not include MPI support.  No longer supported.&lt;br /&gt;
|-&lt;br /&gt;
| [http://yt.enzotools.org/ yt]&lt;br /&gt;
| 2.2&lt;br /&gt;
| 2.5.3&lt;br /&gt;
| 2.5.5&lt;br /&gt;
| -&lt;br /&gt;
| A collection of python tools for analyzing astrophysical simulation output.&lt;br /&gt;
|-&lt;br /&gt;
| [http://ipython.scipy.org/moin/ iPython]&lt;br /&gt;
| 0.11 &lt;br /&gt;
| 0.13.1&lt;br /&gt;
| 1.0.0&lt;br /&gt;
| 1.2.1&lt;br /&gt;
| An enhanced interactive python.&lt;br /&gt;
|-&lt;br /&gt;
| [http://matplotlib.sourceforge.net/ Matplotlib], pylab&lt;br /&gt;
| 1.1.0&lt;br /&gt;
| 1.2.0&lt;br /&gt;
| 1.3.0&lt;br /&gt;
| 1.3.1&lt;br /&gt;
| Matlab-like plotting for python.&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.pytables.org/moin PyTables]&lt;br /&gt;
| 2.3.1 &lt;br /&gt;
| 2.4.0&lt;br /&gt;
| 3.0.0&lt;br /&gt;
| 3.1.1&lt;br /&gt;
| Fast and efficient access to HDF5 files (and HDF5-format NetCDF4 files.)   Requires the &amp;lt;tt&amp;gt;hdf5/184-p1-v18-serial-gcc&amp;lt;/tt&amp;gt; module to be loaded. &lt;br /&gt;
|-&lt;br /&gt;
| [http://code.google.com/p/netcdf4-python/ NetCDF4-python]&lt;br /&gt;
| 0.9.8&lt;br /&gt;
| 1.0.4&lt;br /&gt;
| 1.1.1&lt;br /&gt;
| 1.1.0&lt;br /&gt;
| Python interface to NetCDF4 files.   Requires the &amp;lt;tt&amp;gt;netcdf/4.0.1_hdf5_v18-serial.shared-nofortran&amp;lt;/tt&amp;gt; module to be loaded. &lt;br /&gt;
|-&lt;br /&gt;
| [http://www.pyngl.ucar.edu/Nio.shtml pyNIO]&lt;br /&gt;
| 1.4.1&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| Yet another Python interface to NetCDF4 files; again, requires the &amp;lt;tt&amp;gt;netcdf/4.0.1_hdf5_v18-serial.shared-nofortran&amp;lt;/tt&amp;gt; module.  No longer supported.&lt;br /&gt;
|-&lt;br /&gt;
| [http://alfven.org/wp/hdf5-for-python/ h5py]&lt;br /&gt;
| 2.0.1&lt;br /&gt;
| 2.1.3&lt;br /&gt;
| 2.2.0&lt;br /&gt;
| 2.3.0&lt;br /&gt;
| Yet another Python interface to HDF5 files; again, requires an HDF5 module to be loaded.&lt;br /&gt;
|-&lt;br /&gt;
| [http://pysvn.tigris.org/ PySVN]&lt;br /&gt;
| 1.7.1&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
| Python interface to the svn version control system. &lt;br /&gt;
|-&lt;br /&gt;
| [http://mercurial.selenic.com/ Mercurial]&lt;br /&gt;
| 2.0.1&lt;br /&gt;
| 2.6.2&lt;br /&gt;
| 2.7.1&lt;br /&gt;
| -&lt;br /&gt;
| A distributed version-control system written in Python.&lt;br /&gt;
|-&lt;br /&gt;
| [http://cython.org/ Cython]&lt;br /&gt;
| 0.15.1&lt;br /&gt;
| 0.18&lt;br /&gt;
| 0.19.1&lt;br /&gt;
| 0.20.1&lt;br /&gt;
| Cython is a compiler which compiles Python-like code files to C code and allows them to be easily called from Python.&lt;br /&gt;
|-&lt;br /&gt;
| [http://code.google.com/p/python-nose/ nose]&lt;br /&gt;
| 1.1.2&lt;br /&gt;
| 1.2.1&lt;br /&gt;
| 1.3.0&lt;br /&gt;
| 1.3.0&lt;br /&gt;
| A unit-testing framework for python.&lt;br /&gt;
|- &lt;br /&gt;
| [http://pypi.python.org/pypi/setuptools setuptools]&lt;br /&gt;
| 0.6c11&lt;br /&gt;
| 0.6c11&lt;br /&gt;
| 1.1&lt;br /&gt;
| 5.1&lt;br /&gt;
| Enables easy installation of new python modules&lt;br /&gt;
|-&lt;br /&gt;
| [http://pandas.pydata.org/ pandas]&lt;br /&gt;
| 0.13.0&lt;br /&gt;
| 0.13.0&lt;br /&gt;
| 0.13.0&lt;br /&gt;
| 0.14.1&lt;br /&gt;
| high-performance, easy-to-use data structures and data analysis tools.&lt;br /&gt;
|- &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Installing your own Python Modules ==&lt;br /&gt;
&lt;br /&gt;
Python provides an easy way for users to install the libraries they need in their home directories rather than having them installed system-wide. There are so many optional  packages for Python people could potentially want (see e.g. http://pypi.python.org/pypi), that we recommend users install these additional packages locally in their home directories.  This is almost certainly the easiest way to deal with the wide range of packages, ensure they're up to date, and ensure that users' package choices don't conflict. &lt;br /&gt;
&lt;br /&gt;
To install your own Python modules, follow the instructions below.   Where the instructions say &amp;lt;tt&amp;gt;python2.X&amp;lt;/tt&amp;gt;, type &amp;lt;tt&amp;gt;python2.6&amp;lt;/tt&amp;gt; or &amp;lt;tt&amp;gt;python2.7&amp;lt;/tt&amp;gt; depending on the version of python you are using.&lt;br /&gt;
&lt;br /&gt;
* First, create a directory in your home directory, &amp;lt;tt&amp;gt;${HOME}/lib/python2.X/site-packages&amp;lt;/tt&amp;gt;, where the packages will go.&lt;br /&gt;
* Next, in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt;, *after* you &amp;lt;tt&amp;gt;module load python&amp;lt;/tt&amp;gt; and in the &amp;quot;GPC&amp;quot; section, add the following line:&lt;br /&gt;
&amp;lt;source lang=bash&amp;gt;&lt;br /&gt;
export PYTHONPATH=${PYTHONPATH}:${HOME}/lib/python2.X/site-packages/&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Re-load the modified .bashrc by typing &amp;lt;tt&amp;gt;source ~/.bashrc&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
* Now, if it's a standard python package and instructions say that you can use easy_intall to install it,&lt;br /&gt;
** install with the following command. where &amp;lt;tt&amp;gt;packagename&amp;lt;/tt&amp;gt; is the name of the package you are installing: &lt;br /&gt;
&amp;lt;source lang=bash&amp;gt;&lt;br /&gt;
easy_install --prefix=${HOME} -O1 [packagename]&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
** Continue doing this until all of the packages you need to install are successfully installed.&lt;br /&gt;
** If, upon importing the new python package, you get error messages like &amp;lt;tt&amp;gt;undefined symbol: __stack_chk_guard&amp;lt;/tt&amp;gt;, you may need to use the following command instead:&lt;br /&gt;
&amp;lt;source lang=bash&amp;gt;&lt;br /&gt;
LDFLAGS=-fstack-protector easy_install --prefix=${HOME} -O1 [packagename]&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
* If easy_install isn't an option for your package, and the installation instructions instead talk about downloading a file and using &amp;lt;tt&amp;gt;python setup.py install&amp;lt;/tt&amp;gt; then instead:&lt;br /&gt;
** Download the relevant files&lt;br /&gt;
** You will probably have to uncompress and untar them: &amp;lt;tt&amp;gt;tar -xzvf packagename.tgz&amp;lt;/tt&amp;gt; or &amp;lt;tt&amp;gt;tar -xjvf packagename.bz2&amp;lt;/tt&amp;gt;.&lt;br /&gt;
** cd into the newly created directory, and run &lt;br /&gt;
&amp;lt;source lang=bash&amp;gt;&lt;br /&gt;
python setup.py install --prefix=${HOME}&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
* Now, the install process may have added some .egg files or directories to your path.  For each .egg directory, add that to your python path as well in your .bashrc, in the same place as you had updated PYTHONPATH before: eg,&lt;br /&gt;
&amp;lt;source lang=bash&amp;gt;&lt;br /&gt;
export PYTHONPATH=${PYTHONPATH}:${HOME}/lib/python2.X/site-packages:${HOME}/lib/python2.X/site-packages/packagename1-x.y.z-yy2.X.egg:${HOME}/lib/python2.X/site-packages/packagename2-a.b.c-py2.X.egg&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* You should now be done!   Now, re-source your .bashrc and test your new python modules.&lt;br /&gt;
&lt;br /&gt;
* In order to keep your .bashrc relatively uncluttered, and to avoid potential conflicts among software modules, we recommend that users create their own  modules (for the &amp;quot;module&amp;quot; system, not specifically python modules).  &lt;br /&gt;
&lt;br /&gt;
[[Brian|Here]] is an example module for the [[Brian]] package, including instructions for the installation of the python [[Brian]] package itself.&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Python&amp;diff=7255</id>
		<title>Python</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Python&amp;diff=7255"/>
		<updated>2014-09-16T21:42:48Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: /* Modules installed system-wide */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[http://www.python.org/ Python] is programing language that continues to grow in popularity for scientific computing.   It is very fast to write code in, but the software that results is much much slower than C or Fortran; one should be wary of doing too much compute-intensive work in Python.     &lt;br /&gt;
&lt;br /&gt;
There is a dizzying amount of documentation available for programming in Python on the [http://python.org/ Python.org webpage]; SciNet has given a mini-course of 8 lectures on [[Research Computing with Python]] in the Fall of 2013.&lt;br /&gt;
An excellent set of material for teaching scientists to program in Python is also available at the [http://software-carpentry.org/4_0/python/ Software Carpentry homepage].&lt;br /&gt;
&lt;br /&gt;
__FORCETOC__ &lt;br /&gt;
&lt;br /&gt;
== Python on the GPC ==&lt;br /&gt;
&lt;br /&gt;
We currently have python 2.7.2 installed, compiled against fast intel math libraries.   To use this version,&lt;br /&gt;
&amp;lt;source lang=bash&amp;gt;&lt;br /&gt;
module load gcc intel python&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Modules installed system-wide ==&lt;br /&gt;
&lt;br /&gt;
Many optional packages are available for Python which greatly extend the language adding important new functionality.  Those packages which are likely to be important to all of our users &amp;amp;mdash; eg, [http://numpy.scipy.org/ NumPy], [http://www.scipy.org/ SciPy], and [http://matplotlib.sourceforge.net/ Matplotlib] are installed system-wide.&lt;br /&gt;
&lt;br /&gt;
Below is a list of the packages currently installed system-wide.&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
!{{Hl2}}| Module  &lt;br /&gt;
!{{Hl2}}| python/2.7.2 &lt;br /&gt;
!{{Hl2}}| python/2.7.3 &lt;br /&gt;
!{{Hl2}}| python/2.7.5 &lt;br /&gt;
!{{Hl2}}| python/3.3.4&lt;br /&gt;
!{{Hl2}}| Comments&lt;br /&gt;
|-  &lt;br /&gt;
|[http://www.scipy.org/ SciPy]&lt;br /&gt;
|  0.10.0&lt;br /&gt;
|  0.11.0&lt;br /&gt;
|  0.14.0&lt;br /&gt;
|  0.14.0&lt;br /&gt;
| An Open-source software for mathematics, science, and engineering.  Version in Python 2.7.x is linked against very fast MKL numerical libraries. &lt;br /&gt;
|-&lt;br /&gt;
|[http://numpy.scipy.org/ NumPy]&lt;br /&gt;
| 1.6.1&lt;br /&gt;
| 1.7.0&lt;br /&gt;
| 1.7.0&lt;br /&gt;
| 1.8.1&lt;br /&gt;
| NumPy is the fundamental package needed for scientific computing with Python. Contains fast arrays, tools for integrating C/C++ and Fortran code, linear algebra solvers, etc.  SciPy is built on top of NumPy.&lt;br /&gt;
|-&lt;br /&gt;
| [http://mpi4py.scipy.org/ mpi4py]&lt;br /&gt;
| 1.2.2&lt;br /&gt;
| 1.2.2&lt;br /&gt;
| 1.2.2&lt;br /&gt;
| 1.2.2&lt;br /&gt;
| A pythonic interface to mpi.   Available with openmpi; must load an openmpi module for this to work. (There is an issue with openmpi 1.4.x + infiniband, however it does appear to work fine with IntelMPI)&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.scipy.org/SciPyPackages/NumExpr Numexpr]&lt;br /&gt;
| 2.0&lt;br /&gt;
| 2.0.1&lt;br /&gt;
| 2.2.1&lt;br /&gt;
| 2.4_rc2&lt;br /&gt;
| Fast, memory-efficient elementwise operations on Numpy arrays.&lt;br /&gt;
|-&lt;br /&gt;
| [http://dirac.cnrs-orleans.fr/plone/software/scientificpython/ ScientificPython]&lt;br /&gt;
| 2.8 &lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| A collection of scientific python utilities.   Does not include MPI support.  No longer supported.&lt;br /&gt;
|-&lt;br /&gt;
| [http://yt.enzotools.org/ yt]&lt;br /&gt;
| 2.2&lt;br /&gt;
| 2.5.3&lt;br /&gt;
| 2.5.5&lt;br /&gt;
| -&lt;br /&gt;
| A collection of python tools for analyzing astrophysical simulation output.&lt;br /&gt;
|-&lt;br /&gt;
| [http://ipython.scipy.org/moin/ iPython]&lt;br /&gt;
| 0.11 &lt;br /&gt;
| 0.13.1&lt;br /&gt;
| 1.0.0&lt;br /&gt;
| 1.2.1&lt;br /&gt;
| An enhanced interactive python.&lt;br /&gt;
|-&lt;br /&gt;
| [http://matplotlib.sourceforge.net/ Matplotlib], pylab&lt;br /&gt;
| 1.1.0&lt;br /&gt;
| 1.2.0&lt;br /&gt;
| 1.3.0&lt;br /&gt;
| 1.3.1&lt;br /&gt;
| Matlab-like plotting for python.&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.pytables.org/moin PyTables]&lt;br /&gt;
| 2.3.1 &lt;br /&gt;
| 2.4.0&lt;br /&gt;
| 3.0.0&lt;br /&gt;
| 3.1.1&lt;br /&gt;
| Fast and efficient access to HDF5 files (and HDF5-format NetCDF4 files.)   Requires the &amp;lt;tt&amp;gt;hdf5/184-p1-v18-serial-gcc&amp;lt;/tt&amp;gt; module to be loaded. &lt;br /&gt;
|-&lt;br /&gt;
| [http://code.google.com/p/netcdf4-python/ NetCDF4-python]&lt;br /&gt;
| 0.9.8&lt;br /&gt;
| 1.0.4&lt;br /&gt;
| 1.1.1&lt;br /&gt;
| 1.1.0&lt;br /&gt;
| Python interface to NetCDF4 files.   Requires the &amp;lt;tt&amp;gt;netcdf/4.0.1_hdf5_v18-serial.shared-nofortran&amp;lt;/tt&amp;gt; module to be loaded. &lt;br /&gt;
|-&lt;br /&gt;
| [http://www.pyngl.ucar.edu/Nio.shtml pyNIO]&lt;br /&gt;
| 1.4.1&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| Yet another Python interface to NetCDF4 files; again, requires the &amp;lt;tt&amp;gt;netcdf/4.0.1_hdf5_v18-serial.shared-nofortran&amp;lt;/tt&amp;gt; module.  No longer supported.&lt;br /&gt;
|-&lt;br /&gt;
| [http://alfven.org/wp/hdf5-for-python/ h5py]&lt;br /&gt;
| 2.0.1&lt;br /&gt;
| 2.1.3&lt;br /&gt;
| 2.2.0&lt;br /&gt;
| 2.3.0&lt;br /&gt;
| Yet another Python interface to HDF5 files; again, requires an HDF5 module to be loaded.&lt;br /&gt;
|-&lt;br /&gt;
| [http://pysvn.tigris.org/ PySVN]&lt;br /&gt;
| 1.7.1&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
| Python interface to the svn version control system. &lt;br /&gt;
|-&lt;br /&gt;
| [http://mercurial.selenic.com/ Mercurial]&lt;br /&gt;
| 2.0.1&lt;br /&gt;
| 2.6.2&lt;br /&gt;
| 2.7.1&lt;br /&gt;
| -&lt;br /&gt;
| A distributed version-control system written in Python.&lt;br /&gt;
|-&lt;br /&gt;
| [http://cython.org/ Cython]&lt;br /&gt;
| 0.15.1&lt;br /&gt;
| 0.18&lt;br /&gt;
| 0.19.1&lt;br /&gt;
| 0.20.1&lt;br /&gt;
| Cython is a compiler which compiles Python-like code files to C code and allows them to be easily called from Python.&lt;br /&gt;
|-&lt;br /&gt;
| [http://code.google.com/p/python-nose/ nose]&lt;br /&gt;
| 1.1.2&lt;br /&gt;
| 1.2.1&lt;br /&gt;
| 1.3.0&lt;br /&gt;
| 1.3.0&lt;br /&gt;
| A unit-testing framework for python.&lt;br /&gt;
|- &lt;br /&gt;
| [http://pypi.python.org/pypi/setuptools setuptools]&lt;br /&gt;
| 0.6c11&lt;br /&gt;
| 0.6c11&lt;br /&gt;
| 1.1&lt;br /&gt;
| 5.1&lt;br /&gt;
| Enables easy installation of new python modules&lt;br /&gt;
|-&lt;br /&gt;
| [http://pandas.pydata.org/ pandas]&lt;br /&gt;
| 0.13.0&lt;br /&gt;
| 0.13.0&lt;br /&gt;
| 0.13.0&lt;br /&gt;
| 0.14.1&lt;br /&gt;
| high-performance, easy-to-use data structures and data analysis tools.&lt;br /&gt;
|- &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Installing your own Python Modules ==&lt;br /&gt;
&lt;br /&gt;
Python provides an easy way for users to install the libraries they need in their home directories rather than having them installed system-wide. There are so many optional  packages for Python people could potentially want (see e.g. http://pypi.python.org/pypi), that we recommend users install these additional packages locally in their home directories.  This is almost certainly the easiest way to deal with the wide range of packages, ensure they're up to date, and ensure that users' package choices don't conflict. &lt;br /&gt;
&lt;br /&gt;
To install your own Python modules, follow the instructions below.   Where the instructions say &amp;lt;tt&amp;gt;python2.X&amp;lt;/tt&amp;gt;, type &amp;lt;tt&amp;gt;python2.6&amp;lt;/tt&amp;gt; or &amp;lt;tt&amp;gt;python2.7&amp;lt;/tt&amp;gt; depending on the version of python you are using.&lt;br /&gt;
&lt;br /&gt;
* First, create a directory in your home directory, &amp;lt;tt&amp;gt;${HOME}/lib/python2.X/site-packages&amp;lt;/tt&amp;gt;, where the packages will go.&lt;br /&gt;
* Next, in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt;, *after* you &amp;lt;tt&amp;gt;module load python&amp;lt;/tt&amp;gt; and in the &amp;quot;GPC&amp;quot; section, add the following line:&lt;br /&gt;
&amp;lt;source lang=bash&amp;gt;&lt;br /&gt;
export PYTHONPATH=${PYTHONPATH}:${HOME}/lib/python2.X/site-packages/&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Re-load the modified .bashrc by typing &amp;lt;tt&amp;gt;source ~/.bashrc&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
* Now, if it's a standard python package and instructions say that you can use easy_intall to install it,&lt;br /&gt;
** install with the following command. where &amp;lt;tt&amp;gt;packagename&amp;lt;/tt&amp;gt; is the name of the package you are installing: &lt;br /&gt;
&amp;lt;source lang=bash&amp;gt;&lt;br /&gt;
easy_install --prefix=${HOME} -O1 [packagename]&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
** Continue doing this until all of the packages you need to install are successfully installed.&lt;br /&gt;
** If, upon importing the new python package, you get error messages like &amp;lt;tt&amp;gt;undefined symbol: __stack_chk_guard&amp;lt;/tt&amp;gt;, you may need to use the following command instead:&lt;br /&gt;
&amp;lt;source lang=bash&amp;gt;&lt;br /&gt;
LDFLAGS=-fstack-protector easy_install --prefix=${HOME} -O1 [packagename]&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
* If easy_install isn't an option for your package, and the installation instructions instead talk about downloading a file and using &amp;lt;tt&amp;gt;python setup.py install&amp;lt;/tt&amp;gt; then instead:&lt;br /&gt;
** Download the relevant files&lt;br /&gt;
** You will probably have to uncompress and untar them: &amp;lt;tt&amp;gt;tar -xzvf packagename.tgz&amp;lt;/tt&amp;gt; or &amp;lt;tt&amp;gt;tar -xjvf packagename.bz2&amp;lt;/tt&amp;gt;.&lt;br /&gt;
** cd into the newly created directory, and run &lt;br /&gt;
&amp;lt;source lang=bash&amp;gt;&lt;br /&gt;
python setup.py install --prefix=${HOME}&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
* Now, the install process may have added some .egg files or directories to your path.  For each .egg directory, add that to your python path as well in your .bashrc, in the same place as you had updated PYTHONPATH before: eg,&lt;br /&gt;
&amp;lt;source lang=bash&amp;gt;&lt;br /&gt;
export PYTHONPATH=${PYTHONPATH}:${HOME}/lib/python2.X/site-packages:${HOME}/lib/python2.X/site-packages/packagename1-x.y.z-yy2.X.egg:${HOME}/lib/python2.X/site-packages/packagename2-a.b.c-py2.X.egg&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* You should now be done!   Now, re-source your .bashrc and test your new python modules.&lt;br /&gt;
&lt;br /&gt;
* In order to keep your .bashrc relatively uncluttered, and to avoid potential conflicts among software modules, we recommend that users create their own  modules (for the &amp;quot;module&amp;quot; system, not specifically python modules).  &lt;br /&gt;
&lt;br /&gt;
[[Brian|Here]] is an example module for the [[Brian]] package, including instructions for the installation of the python [[Brian]] package itself.&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Python&amp;diff=7254</id>
		<title>Python</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Python&amp;diff=7254"/>
		<updated>2014-09-16T21:42:14Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: /* Modules installed system-wide */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[http://www.python.org/ Python] is programing language that continues to grow in popularity for scientific computing.   It is very fast to write code in, but the software that results is much much slower than C or Fortran; one should be wary of doing too much compute-intensive work in Python.     &lt;br /&gt;
&lt;br /&gt;
There is a dizzying amount of documentation available for programming in Python on the [http://python.org/ Python.org webpage]; SciNet has given a mini-course of 8 lectures on [[Research Computing with Python]] in the Fall of 2013.&lt;br /&gt;
An excellent set of material for teaching scientists to program in Python is also available at the [http://software-carpentry.org/4_0/python/ Software Carpentry homepage].&lt;br /&gt;
&lt;br /&gt;
__FORCETOC__ &lt;br /&gt;
&lt;br /&gt;
== Python on the GPC ==&lt;br /&gt;
&lt;br /&gt;
We currently have python 2.7.2 installed, compiled against fast intel math libraries.   To use this version,&lt;br /&gt;
&amp;lt;source lang=bash&amp;gt;&lt;br /&gt;
module load gcc intel python&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Modules installed system-wide ==&lt;br /&gt;
&lt;br /&gt;
Many optional packages are available for Python which greatly extend the language adding important new functionality.  Those packages which are likely to be important to all of our users &amp;amp;mdash; eg, [http://numpy.scipy.org/ NumPy], [http://www.scipy.org/ SciPy], and [http://matplotlib.sourceforge.net/ Matplotlib] are installed system-wide.&lt;br /&gt;
&lt;br /&gt;
Below is a list of the packages currently installed system-wide.&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
!{{Hl2}}| Module  &lt;br /&gt;
!{{Hl2}}| python/2.7.2 &lt;br /&gt;
!{{Hl2}}| python/2.7.3 &lt;br /&gt;
!{{Hl2}}| python/2.7.5 &lt;br /&gt;
!{{Hl2}}| python/3.3.4&lt;br /&gt;
!{{Hl2}}| Comments&lt;br /&gt;
|-  &lt;br /&gt;
|[http://www.scipy.org/ SciPy]&lt;br /&gt;
|  0.10.0&lt;br /&gt;
|  0.11.0&lt;br /&gt;
|  0.14.0&lt;br /&gt;
|  0.14.0&lt;br /&gt;
| An Open-source software for mathematics, science, and engineering.  Version in Python 2.7.x is linked against very fast MKL numerical libraries. &lt;br /&gt;
|-&lt;br /&gt;
|[http://numpy.scipy.org/ NumPy]&lt;br /&gt;
| 1.6.1&lt;br /&gt;
| 1.7.0&lt;br /&gt;
| 1.7.0&lt;br /&gt;
| 1.8.1&lt;br /&gt;
| NumPy is the fundamental package needed for scientific computing with Python. Contains fast arrays, tools for integrating C/C++ and Fortran code, linear algebra solvers, etc.  SciPy is built on top of NumPy.&lt;br /&gt;
|-&lt;br /&gt;
| [http://mpi4py.scipy.org/ mpi4py]&lt;br /&gt;
| 1.2.2&lt;br /&gt;
| 1.2.2&lt;br /&gt;
| 1.2.2&lt;br /&gt;
| 1.2.2&lt;br /&gt;
| A pythonic interface to mpi.   Available with openmpi; must load an openmpi module for this to work. (There is an issue with openmpi 1.4.x + infiniband, however it does appear to work fine with IntelMPI)&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.scipy.org/SciPyPackages/NumExpr Numexpr]&lt;br /&gt;
| 2.0&lt;br /&gt;
| 2.0.1&lt;br /&gt;
| 2.2.1&lt;br /&gt;
| 2.4_rc2&lt;br /&gt;
| Fast, memory-efficient elementwise operations on Numpy arrays.&lt;br /&gt;
|-&lt;br /&gt;
| [http://dirac.cnrs-orleans.fr/plone/software/scientificpython/ ScientificPython]&lt;br /&gt;
| 2.8 &lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| A collection of scientific python utilities.   Does not include MPI support.  No longer supported.&lt;br /&gt;
|-&lt;br /&gt;
| [http://yt.enzotools.org/ yt]&lt;br /&gt;
| 2.2&lt;br /&gt;
| 2.5.3&lt;br /&gt;
| 2.5.5&lt;br /&gt;
| -&lt;br /&gt;
| A collection of python tools for analyzing astrophysical simulation output.&lt;br /&gt;
|-&lt;br /&gt;
| [http://ipython.scipy.org/moin/ iPython]&lt;br /&gt;
| 0.11 &lt;br /&gt;
| 0.13.1&lt;br /&gt;
| 1.0.0&lt;br /&gt;
| 1.2.1&lt;br /&gt;
| An enhanced interactive python.&lt;br /&gt;
|-&lt;br /&gt;
| [http://matplotlib.sourceforge.net/ Matplotlib], pylab&lt;br /&gt;
| 1.1.0&lt;br /&gt;
| 1.2.0&lt;br /&gt;
| 1.3.0&lt;br /&gt;
| 1.3.1&lt;br /&gt;
| Matlab-like plotting for python.&lt;br /&gt;
|-&lt;br /&gt;
| [http://www.pytables.org/moin PyTables]&lt;br /&gt;
| 2.3.1 &lt;br /&gt;
| 2.4.0&lt;br /&gt;
| 3.0.0&lt;br /&gt;
| 3.1.1&lt;br /&gt;
| Fast and efficient access to HDF5 files (and HDF5-format NetCDF4 files.)   Requires the &amp;lt;tt&amp;gt;hdf5/184-p1-v18-serial-gcc&amp;lt;/tt&amp;gt; module to be loaded. &lt;br /&gt;
|-&lt;br /&gt;
| [http://code.google.com/p/netcdf4-python/ NetCDF4-python]&lt;br /&gt;
| 0.9.8&lt;br /&gt;
| 1.0.4&lt;br /&gt;
| 1.1.1&lt;br /&gt;
| 1.1.0&lt;br /&gt;
| Python interface to NetCDF4 files.   Requires the &amp;lt;tt&amp;gt;netcdf/4.0.1_hdf5_v18-serial.shared-nofortran&amp;lt;/tt&amp;gt; module to be loaded. &lt;br /&gt;
|-&lt;br /&gt;
| [http://www.pyngl.ucar.edu/Nio.shtml pyNIO]&lt;br /&gt;
| 1.4.1&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| Yet another Python interface to NetCDF4 files; again, requires the &amp;lt;tt&amp;gt;netcdf/4.0.1_hdf5_v18-serial.shared-nofortran&amp;lt;/tt&amp;gt; module.  No longer supported.&lt;br /&gt;
|-&lt;br /&gt;
| [http://alfven.org/wp/hdf5-for-python/ h5py]&lt;br /&gt;
| 2.0.1&lt;br /&gt;
| 2.1.3&lt;br /&gt;
| 2.2.0&lt;br /&gt;
| 2.3.0&lt;br /&gt;
| Yet another Python interface to HDF5 files; again, requires an HDF5 module to be loaded.&lt;br /&gt;
|-&lt;br /&gt;
| [http://pysvn.tigris.org/ PySVN]&lt;br /&gt;
| 1.7.1&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
| Python interface to the svn version control system.  Requires the &amp;lt;tt&amp;gt;svn&amp;lt;/tt&amp;gt; module to be loaded on CentOS5.&lt;br /&gt;
|-&lt;br /&gt;
| [http://mercurial.selenic.com/ Mercurial]&lt;br /&gt;
| 2.0.1&lt;br /&gt;
| 2.6.2&lt;br /&gt;
| 2.7.1&lt;br /&gt;
| -&lt;br /&gt;
| A distributed version-control system written in Python.&lt;br /&gt;
|-&lt;br /&gt;
| [http://cython.org/ Cython]&lt;br /&gt;
| 0.15.1&lt;br /&gt;
| 0.18&lt;br /&gt;
| 0.19.1&lt;br /&gt;
| 0.20.1&lt;br /&gt;
| Cython is a compiler which compiles Python-like code files to C code and allows them to be easily called from Python.&lt;br /&gt;
|-&lt;br /&gt;
| [http://code.google.com/p/python-nose/ nose]&lt;br /&gt;
| 1.1.2&lt;br /&gt;
| 1.2.1&lt;br /&gt;
| 1.3.0&lt;br /&gt;
| 1.3.0&lt;br /&gt;
| A unit-testing framework for python.&lt;br /&gt;
|- &lt;br /&gt;
| [http://pypi.python.org/pypi/setuptools setuptools]&lt;br /&gt;
| 0.6c11&lt;br /&gt;
| 0.6c11&lt;br /&gt;
| 1.1&lt;br /&gt;
| 5.1&lt;br /&gt;
| Enables easy installation of new python modules&lt;br /&gt;
|-&lt;br /&gt;
| [http://pandas.pydata.org/ pandas]&lt;br /&gt;
| 0.13.0&lt;br /&gt;
| 0.13.0&lt;br /&gt;
| 0.13.0&lt;br /&gt;
| 0.14.1&lt;br /&gt;
| high-performance, easy-to-use data structures and data analysis tools.&lt;br /&gt;
|- &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Installing your own Python Modules ==&lt;br /&gt;
&lt;br /&gt;
Python provides an easy way for users to install the libraries they need in their home directories rather than having them installed system-wide. There are so many optional  packages for Python people could potentially want (see e.g. http://pypi.python.org/pypi), that we recommend users install these additional packages locally in their home directories.  This is almost certainly the easiest way to deal with the wide range of packages, ensure they're up to date, and ensure that users' package choices don't conflict. &lt;br /&gt;
&lt;br /&gt;
To install your own Python modules, follow the instructions below.   Where the instructions say &amp;lt;tt&amp;gt;python2.X&amp;lt;/tt&amp;gt;, type &amp;lt;tt&amp;gt;python2.6&amp;lt;/tt&amp;gt; or &amp;lt;tt&amp;gt;python2.7&amp;lt;/tt&amp;gt; depending on the version of python you are using.&lt;br /&gt;
&lt;br /&gt;
* First, create a directory in your home directory, &amp;lt;tt&amp;gt;${HOME}/lib/python2.X/site-packages&amp;lt;/tt&amp;gt;, where the packages will go.&lt;br /&gt;
* Next, in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt;, *after* you &amp;lt;tt&amp;gt;module load python&amp;lt;/tt&amp;gt; and in the &amp;quot;GPC&amp;quot; section, add the following line:&lt;br /&gt;
&amp;lt;source lang=bash&amp;gt;&lt;br /&gt;
export PYTHONPATH=${PYTHONPATH}:${HOME}/lib/python2.X/site-packages/&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Re-load the modified .bashrc by typing &amp;lt;tt&amp;gt;source ~/.bashrc&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
* Now, if it's a standard python package and instructions say that you can use easy_intall to install it,&lt;br /&gt;
** install with the following command. where &amp;lt;tt&amp;gt;packagename&amp;lt;/tt&amp;gt; is the name of the package you are installing: &lt;br /&gt;
&amp;lt;source lang=bash&amp;gt;&lt;br /&gt;
easy_install --prefix=${HOME} -O1 [packagename]&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
** Continue doing this until all of the packages you need to install are successfully installed.&lt;br /&gt;
** If, upon importing the new python package, you get error messages like &amp;lt;tt&amp;gt;undefined symbol: __stack_chk_guard&amp;lt;/tt&amp;gt;, you may need to use the following command instead:&lt;br /&gt;
&amp;lt;source lang=bash&amp;gt;&lt;br /&gt;
LDFLAGS=-fstack-protector easy_install --prefix=${HOME} -O1 [packagename]&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
* If easy_install isn't an option for your package, and the installation instructions instead talk about downloading a file and using &amp;lt;tt&amp;gt;python setup.py install&amp;lt;/tt&amp;gt; then instead:&lt;br /&gt;
** Download the relevant files&lt;br /&gt;
** You will probably have to uncompress and untar them: &amp;lt;tt&amp;gt;tar -xzvf packagename.tgz&amp;lt;/tt&amp;gt; or &amp;lt;tt&amp;gt;tar -xjvf packagename.bz2&amp;lt;/tt&amp;gt;.&lt;br /&gt;
** cd into the newly created directory, and run &lt;br /&gt;
&amp;lt;source lang=bash&amp;gt;&lt;br /&gt;
python setup.py install --prefix=${HOME}&lt;br /&gt;
&amp;lt;/source&amp;gt; &lt;br /&gt;
&lt;br /&gt;
* Now, the install process may have added some .egg files or directories to your path.  For each .egg directory, add that to your python path as well in your .bashrc, in the same place as you had updated PYTHONPATH before: eg,&lt;br /&gt;
&amp;lt;source lang=bash&amp;gt;&lt;br /&gt;
export PYTHONPATH=${PYTHONPATH}:${HOME}/lib/python2.X/site-packages:${HOME}/lib/python2.X/site-packages/packagename1-x.y.z-yy2.X.egg:${HOME}/lib/python2.X/site-packages/packagename2-a.b.c-py2.X.egg&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* You should now be done!   Now, re-source your .bashrc and test your new python modules.&lt;br /&gt;
&lt;br /&gt;
* In order to keep your .bashrc relatively uncluttered, and to avoid potential conflicts among software modules, we recommend that users create their own  modules (for the &amp;quot;module&amp;quot; system, not specifically python modules).  &lt;br /&gt;
&lt;br /&gt;
[[Brian|Here]] is an example module for the [[Brian]] package, including instructions for the installation of the python [[Brian]] package itself.&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Using_Paraview&amp;diff=7232</id>
		<title>Using Paraview</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Using_Paraview&amp;diff=7232"/>
		<updated>2014-09-10T20:04:37Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: /* Connect Client and Server */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[http://www.paraview.org/ ParaView] is a powerful, parallel, client-server based visualization system that allows you to use SciNet's GPC nodes to render data on SciNet, and manipulate the results interactively on your own desktop.   To use the paraview server on SciNet is much like using it locally, but there is an additional step in setting up a connection directly between your desktop and the compute nodes.&lt;br /&gt;
&lt;br /&gt;
[[Image:Paraview.png|thumb|right|320px|The ParaView Client GUI]]&lt;br /&gt;
&lt;br /&gt;
===Installing ParaView===&lt;br /&gt;
&lt;br /&gt;
To use Paraview, you will have to have the client software installed on your system; you will need ParaView from [http://www.paraview.org/paraview/resources/software.html the Paraview website].  Binaries exist for Linux, Mac, and Windows systems.   The client version must exactly match the version installed on the server, currently 3.12 or 3.14.1.   The client version has all the functionality of the server, and can analyze data locally.&lt;br /&gt;
&lt;br /&gt;
===SSH Forwarding For ParaView===&lt;br /&gt;
&lt;br /&gt;
To interactively use the ParaView server on GPC, you will have to work some ssh magic to allow the client on your desktop to connect to the server through the scinet login nodes.  The steps required are&lt;br /&gt;
&lt;br /&gt;
* Have an SSH key that you can use to log into SciNet&lt;br /&gt;
* Submit an interactive job, with a shell on the head node that you'll be running the server on&lt;br /&gt;
* Start ssh forwarding&lt;br /&gt;
* Start paraview server&lt;br /&gt;
* Connecting client and server&lt;br /&gt;
&lt;br /&gt;
====SSH Keys====&lt;br /&gt;
&lt;br /&gt;
To be able to log into the compute nodes where ParaView will be running, you'll have to have an [[Ssh_keys | SSH key]] set up, as password authentication won't work.    Our [[Ssh_keys | SSH Keys and SciNet]] page describes how to do this.&lt;br /&gt;
&lt;br /&gt;
====Log into node====&lt;br /&gt;
&lt;br /&gt;
The first thing to do is to go to the node from which you'll start the ParaView server.   This is typically done by starting an interactive job on the GPC, perhaps on the [[Moab#debug | debug ]] queue or sandybridge [[GPC_Quickstart#Memory_Configuration | large memory]] nodes.   Paraview can in principle make use of as many nodes as you throw at it.  So one might  begin jobs as below:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -l nodes=1:m128g:ppn=16,walltime=1:00:00 -q sandy -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -l nodes=2:ppn=8,walltime=1:00:00 -q debug -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once this job has started, you'll be placed in a shell on the head node of the job; typing `&amp;lt;pre&amp;gt;hostname&amp;lt;/pre&amp;gt;' will tell you the name of the host, eg&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ hostname&lt;br /&gt;
gpc-f148n089-ib0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ hostname&lt;br /&gt;
gpc-f107n045-ib0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
you will need this hostname in the following steps.&lt;br /&gt;
&lt;br /&gt;
====Start SSH port forwarding====&lt;br /&gt;
&lt;br /&gt;
Once the ssh configuration is set, the port forwarding can be started with the command (on your local machine in a terminal window), using the local host name from above - here we'll take the example of gpc-f148n089-ib0:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ export gpcnode=&amp;quot;gpc-f148n089-ib0&amp;quot;&lt;br /&gt;
$ ssh -N -L 20080:${gpcnode}:22 -L 20090:${gpcnode}:11111 login.scinet.utoronto.ca&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
this command will not return anything until the forwarding is terminated, and will just look like it's sitting there.  It doesn't start a remote shell or command (-N), but it will connect to login.scinet.utoronto.ca, and from there it will redirect your local (-L) port 20080 to &amp;lt;tt&amp;gt;${gpcnode}&amp;lt;/tt&amp;gt; port 22, and similarly local port 20090 to &amp;lt;tt&amp;gt;${gpcnode}&amp;lt;/tt&amp;gt; port 11111.  We'll use the first for ssh'ing to the remote node (mainly for testing), and the second to conect the local paraview client to the remote paraview server.&lt;br /&gt;
&lt;br /&gt;
To make sure the port forwarding is working correctly, in another window try sshing directly to the compute node from your desktop:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh -p 20080 [your-scinet-username]@localhost&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and this should land you directly on the compute node.   If it does not, then something is wrong with the ssh forwarding.&lt;br /&gt;
&lt;br /&gt;
====Start Server====&lt;br /&gt;
&lt;br /&gt;
Now that the tunnel is set up, on the compute node you can start the paraview server.    To do this, you will have to have the following modules loaded:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load Xlibraries intel gcc python openmpi paraview&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
(You can replace intelmpi with openmpi, and of course any module that is already loaded does have to be loaded again.)&lt;br /&gt;
&lt;br /&gt;
Then start the paraview server with the intel mpirun as with any MPI job:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun -np [NP] pvserver --use-offscreen-rendering&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where NP is the number of processors; 16 processors per node on the largemem nodes, or 8 per node otherwise.    &lt;br /&gt;
&lt;br /&gt;
Once running, the ParaView server should output&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Listen on port: 11111&lt;br /&gt;
Waiting for client...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
====Connect Client and Server====&lt;br /&gt;
&lt;br /&gt;
[[Image:Configure.png|thumb|right|320px|Configuring the client]]&lt;br /&gt;
&lt;br /&gt;
Once the server is running, you can connect the client.   Start the ParaView client on your desktop, and choose File-&amp;gt;Connect.   Click `Add Server', give the server a name (say, GPC), and give the port number 20090.   The other values should be correct by default; host is &amp;lt;tt&amp;gt;localhost&amp;lt;/tt&amp;gt;, and the server type is Client/Server.  Click `Configure'.&lt;br /&gt;
&lt;br /&gt;
On the next window, you'll be asked for a command to start up the server; select `Manual', and ok.&lt;br /&gt;
&lt;br /&gt;
Once the server is selected, click `Connect'.  On the compute node, the server should respond `Client connected'.   In the client window, when you (for instance) select File-&amp;gt;Open, you will be seeing the files on the GPC, rather than the local host.&lt;br /&gt;
&lt;br /&gt;
From here, the [http://paraview.org/Wiki/ParaView ParaView Wiki] can give you instructions as to how to plot your data.&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Using_Paraview&amp;diff=7231</id>
		<title>Using Paraview</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Using_Paraview&amp;diff=7231"/>
		<updated>2014-09-10T20:03:15Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: /* SSH Forwarding For ParaView */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[http://www.paraview.org/ ParaView] is a powerful, parallel, client-server based visualization system that allows you to use SciNet's GPC nodes to render data on SciNet, and manipulate the results interactively on your own desktop.   To use the paraview server on SciNet is much like using it locally, but there is an additional step in setting up a connection directly between your desktop and the compute nodes.&lt;br /&gt;
&lt;br /&gt;
[[Image:Paraview.png|thumb|right|320px|The ParaView Client GUI]]&lt;br /&gt;
&lt;br /&gt;
===Installing ParaView===&lt;br /&gt;
&lt;br /&gt;
To use Paraview, you will have to have the client software installed on your system; you will need ParaView from [http://www.paraview.org/paraview/resources/software.html the Paraview website].  Binaries exist for Linux, Mac, and Windows systems.   The client version must exactly match the version installed on the server, currently 3.12 or 3.14.1.   The client version has all the functionality of the server, and can analyze data locally.&lt;br /&gt;
&lt;br /&gt;
===SSH Forwarding For ParaView===&lt;br /&gt;
&lt;br /&gt;
To interactively use the ParaView server on GPC, you will have to work some ssh magic to allow the client on your desktop to connect to the server through the scinet login nodes.  The steps required are&lt;br /&gt;
&lt;br /&gt;
* Have an SSH key that you can use to log into SciNet&lt;br /&gt;
* Submit an interactive job, with a shell on the head node that you'll be running the server on&lt;br /&gt;
* Start ssh forwarding&lt;br /&gt;
* Start paraview server&lt;br /&gt;
* Connecting client and server&lt;br /&gt;
&lt;br /&gt;
====SSH Keys====&lt;br /&gt;
&lt;br /&gt;
To be able to log into the compute nodes where ParaView will be running, you'll have to have an [[Ssh_keys | SSH key]] set up, as password authentication won't work.    Our [[Ssh_keys | SSH Keys and SciNet]] page describes how to do this.&lt;br /&gt;
&lt;br /&gt;
====Log into node====&lt;br /&gt;
&lt;br /&gt;
The first thing to do is to go to the node from which you'll start the ParaView server.   This is typically done by starting an interactive job on the GPC, perhaps on the [[Moab#debug | debug ]] queue or sandybridge [[GPC_Quickstart#Memory_Configuration | large memory]] nodes.   Paraview can in principle make use of as many nodes as you throw at it.  So one might  begin jobs as below:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -l nodes=1:m128g:ppn=16,walltime=1:00:00 -q sandy -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -l nodes=2:ppn=8,walltime=1:00:00 -q debug -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once this job has started, you'll be placed in a shell on the head node of the job; typing `&amp;lt;pre&amp;gt;hostname&amp;lt;/pre&amp;gt;' will tell you the name of the host, eg&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ hostname&lt;br /&gt;
gpc-f148n089-ib0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ hostname&lt;br /&gt;
gpc-f107n045-ib0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
you will need this hostname in the following steps.&lt;br /&gt;
&lt;br /&gt;
====Start SSH port forwarding====&lt;br /&gt;
&lt;br /&gt;
Once the ssh configuration is set, the port forwarding can be started with the command (on your local machine in a terminal window), using the local host name from above - here we'll take the example of gpc-f148n089-ib0:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ export gpcnode=&amp;quot;gpc-f148n089-ib0&amp;quot;&lt;br /&gt;
$ ssh -N -L 20080:${gpcnode}:22 -L 20090:${gpcnode}:11111 login.scinet.utoronto.ca&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
this command will not return anything until the forwarding is terminated, and will just look like it's sitting there.  It doesn't start a remote shell or command (-N), but it will connect to login.scinet.utoronto.ca, and from there it will redirect your local (-L) port 20080 to &amp;lt;tt&amp;gt;${gpcnode}&amp;lt;/tt&amp;gt; port 22, and similarly local port 20090 to &amp;lt;tt&amp;gt;${gpcnode}&amp;lt;/tt&amp;gt; port 11111.  We'll use the first for ssh'ing to the remote node (mainly for testing), and the second to conect the local paraview client to the remote paraview server.&lt;br /&gt;
&lt;br /&gt;
To make sure the port forwarding is working correctly, in another window try sshing directly to the compute node from your desktop:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh -p 20080 [your-scinet-username]@localhost&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and this should land you directly on the compute node.   If it does not, then something is wrong with the ssh forwarding.&lt;br /&gt;
&lt;br /&gt;
====Start Server====&lt;br /&gt;
&lt;br /&gt;
Now that the tunnel is set up, on the compute node you can start the paraview server.    To do this, you will have to have the following modules loaded:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load Xlibraries intel gcc python openmpi paraview&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
(You can replace intelmpi with openmpi, and of course any module that is already loaded does have to be loaded again.)&lt;br /&gt;
&lt;br /&gt;
Then start the paraview server with the intel mpirun as with any MPI job:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun -np [NP] pvserver --use-offscreen-rendering&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where NP is the number of processors; 16 processors per node on the largemem nodes, or 8 per node otherwise.    &lt;br /&gt;
&lt;br /&gt;
Once running, the ParaView server should output&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Listen on port: 11111&lt;br /&gt;
Waiting for client...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
====Connect Client and Server====&lt;br /&gt;
&lt;br /&gt;
[[Image:Configure.png|thumb|right|320px|Configuring the client]]&lt;br /&gt;
&lt;br /&gt;
Once the server is running, you can connect the client.   Start the ParaView client on your desktop, and choose File-&amp;gt;Connect.   Click `Add Server', give the server a name (say, GPC), and give the port number 20090.   The other values should be correct by default; host is &amp;lt;tt&amp;gt;localhost&amp;lt;/tt&amp;gt;, and the server type is Client/Server.  Click `Configure'.&lt;br /&gt;
&lt;br /&gt;
On the next window, you'll be asked for a command to start up the server; select `Manual', and ok.&lt;br /&gt;
&lt;br /&gt;
In future runs, you'll be able to re-use this server, even if the host is different, because the correct host will be set in your &amp;lt;tt&amp;gt;.ssh/config&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Once the server is selected, click `Connect'.  On the compute node, the server should respond `Client connected'.   In the client window, when you (for instance) select File-&amp;gt;Open, you will be seeing the files on the GPC, rather than the local host.&lt;br /&gt;
&lt;br /&gt;
From here, the [http://paraview.org/Wiki/ParaView ParaView Wiki] can give you instructions as to how to plot your data.&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Using_Paraview&amp;diff=7230</id>
		<title>Using Paraview</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Using_Paraview&amp;diff=7230"/>
		<updated>2014-09-10T19:57:23Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[http://www.paraview.org/ ParaView] is a powerful, parallel, client-server based visualization system that allows you to use SciNet's GPC nodes to render data on SciNet, and manipulate the results interactively on your own desktop.   To use the paraview server on SciNet is much like using it locally, but there is an additional step in setting up a connection directly between your desktop and the compute nodes.&lt;br /&gt;
&lt;br /&gt;
[[Image:Paraview.png|thumb|right|320px|The ParaView Client GUI]]&lt;br /&gt;
&lt;br /&gt;
===Installing ParaView===&lt;br /&gt;
&lt;br /&gt;
To use Paraview, you will have to have the client software installed on your system; you will need ParaView from [http://www.paraview.org/paraview/resources/software.html the Paraview website].  Binaries exist for Linux, Mac, and Windows systems.   The client version must exactly match the version installed on the server, currently 3.12 or 3.14.1.   The client version has all the functionality of the server, and can analyze data locally.&lt;br /&gt;
&lt;br /&gt;
===SSH Forwarding For ParaView===&lt;br /&gt;
&lt;br /&gt;
To interactively use the ParaView server on GPC, you will have to work some ssh magic to allow the client on your desktop to connect to the server through the scinet login nodes.  The steps required are&lt;br /&gt;
&lt;br /&gt;
* Have an SSH key that you can use to log into SciNet&lt;br /&gt;
* Log into the head node that you'll be using the server on&lt;br /&gt;
* (Mac or Linux): Edit your local &amp;lt;tt&amp;gt;~/.ssh/config&amp;lt;/tt&amp;gt; to enable forwarding to that node&lt;br /&gt;
* Start ssh forwarding&lt;br /&gt;
* Start server&lt;br /&gt;
* Connecting client and server&lt;br /&gt;
&lt;br /&gt;
====SSH Keys====&lt;br /&gt;
&lt;br /&gt;
To be able to log into the compute nodes where ParaView will be running, you'll have to have an [[Ssh_keys | SSH key]] set up, as password authentication won't work.    Our [[Ssh_keys | SSH Keys and SciNet]] page describes how to do this.&lt;br /&gt;
&lt;br /&gt;
====Log into node====&lt;br /&gt;
&lt;br /&gt;
The first thing to do is to go to the node from which you'll start the ParaView server.   This is typically done by starting an interactive job on the GPC, perhaps on the [[Moab#debug | debug ]] queue or sandybridge [[GPC_Quickstart#Memory_Configuration | large memory]] nodes.   Paraview can in principle make use of as many nodes as you throw at it.  So one might  begin jobs as below:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -l nodes=1:m128g:ppn=16,walltime=1:00:00 -q sandy -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -l nodes=2:ppn=8,walltime=1:00:00 -q debug -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once this job has started, you'll be placed in a shell on the head node of the job; typing `&amp;lt;pre&amp;gt;hostname&amp;lt;/pre&amp;gt;' will tell you the name of the host, eg&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ hostname&lt;br /&gt;
gpc-f148n089-ib0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ hostname&lt;br /&gt;
gpc-f107n045-ib0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
you will need this hostname in the following steps.&lt;br /&gt;
&lt;br /&gt;
====Start SSH port forwarding====&lt;br /&gt;
&lt;br /&gt;
Once the ssh configuration is set, the port forwarding can be started with the command (on your local machine in a terminal window), using the local host name from above - here we'll take the example of gpc-f148n089-ib0:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ export gpcnode=&amp;quot;gpc-f148n089-ib0&amp;quot;&lt;br /&gt;
$ ssh -N -L 20080:${gpcnode}:22 -L 20090:${gpcnode}:11111 login.scinet.utoronto.ca&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
this command will not return anything until the forwarding is terminated, and will just look like it's sitting there.  It doesn't start a remote shell or command (-N), but it will connect to login.scinet.utoronto.ca, and from there it will redirect your local (-L) port 20080 to &amp;lt;tt&amp;gt;${gpcnode}&amp;lt;/tt&amp;gt; port 22, and similarly local port 20090 to &amp;lt;tt&amp;gt;${gpcnode}&amp;lt;/tt&amp;gt; port 11111.  We'll use the first for ssh'ing to the remote node (mainly for testing), and the second to conect the local paraview client to the remote paraview server.&lt;br /&gt;
&lt;br /&gt;
To make sure the port forwarding is working correctly, in another window try sshing directly to the compute node from your desktop:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh -p 20080 [your-scinet-username]@localhost&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and this should land you directly on the compute node.   If it does not, then something is wrong with the ssh forwarding.&lt;br /&gt;
&lt;br /&gt;
====Start Server====&lt;br /&gt;
&lt;br /&gt;
Now that the tunnel is set up, on the compute node you can start the paraview server.    To do this, you will have to have the following modules loaded:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load Xlibraries intel gcc python openmpi paraview&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
(You can replace intelmpi with openmpi, and of course any module that is already loaded does have to be loaded again.)&lt;br /&gt;
&lt;br /&gt;
Then start the paraview server with the intel mpirun as with any MPI job:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun -np [NP] pvserver --use-offscreen-rendering&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where NP is the number of processors; 16 processors per node on the largemem nodes, or 8 per node otherwise.    &lt;br /&gt;
&lt;br /&gt;
Once running, the ParaView server should output&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Listen on port: 11111&lt;br /&gt;
Waiting for client...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
====Connect Client and Server====&lt;br /&gt;
&lt;br /&gt;
[[Image:Configure.png|thumb|right|320px|Configuring the client]]&lt;br /&gt;
&lt;br /&gt;
Once the server is running, you can connect the client.   Start the ParaView client on your desktop, and choose File-&amp;gt;Connect.   Click `Add Server', give the server a name (say, GPC), and give the port number 20090.   The other values should be correct by default; host is &amp;lt;tt&amp;gt;localhost&amp;lt;/tt&amp;gt;, and the server type is Client/Server.  Click `Configure'.&lt;br /&gt;
&lt;br /&gt;
On the next window, you'll be asked for a command to start up the server; select `Manual', and ok.&lt;br /&gt;
&lt;br /&gt;
In future runs, you'll be able to re-use this server, even if the host is different, because the correct host will be set in your &amp;lt;tt&amp;gt;.ssh/config&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Once the server is selected, click `Connect'.  On the compute node, the server should respond `Client connected'.   In the client window, when you (for instance) select File-&amp;gt;Open, you will be seeing the files on the GPC, rather than the local host.&lt;br /&gt;
&lt;br /&gt;
From here, the [http://paraview.org/Wiki/ParaView ParaView Wiki] can give you instructions as to how to plot your data.&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Using_Paraview&amp;diff=7229</id>
		<title>Using Paraview</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Using_Paraview&amp;diff=7229"/>
		<updated>2014-09-10T19:52:50Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[http://www.paraview.org/ ParaView] is a powerful, parallel, client-server based visualization system that allows you to use SciNet's GPC nodes to render data on SciNet, and manipulate the results interactively on your own desktop.   To use the paraview server on SciNet is much like using it locally, but there is an additional step in setting up a connection directly between your desktop and the compute nodes.&lt;br /&gt;
&lt;br /&gt;
[[Image:Paraview.png|thumb|right|320px|The ParaView Client GUI]]&lt;br /&gt;
&lt;br /&gt;
===Installing ParaView===&lt;br /&gt;
&lt;br /&gt;
To use Paraview, you will have to have the client software installed on your system; you will need ParaView from [http://www.paraview.org/paraview/resources/software.html the Paraview website].  Binaries exist for Linux, Mac, and Windows systems.   The client version must exactly match the version installed on the server, currently 3.12 or 3.14.1.   The client version has all the functionality of the server, and can analyze data locally.&lt;br /&gt;
&lt;br /&gt;
===SSH Forwarding For ParaView===&lt;br /&gt;
&lt;br /&gt;
To interactively use the ParaView server on GPC, you will have to work some ssh magic to allow the client on your desktop to connect to the server through the scinet login nodes.  The steps required are&lt;br /&gt;
&lt;br /&gt;
* Have an SSH key that you can use to log into SciNet&lt;br /&gt;
* Log into the head node that you'll be using the server on&lt;br /&gt;
* (Mac or Linux): Edit your local &amp;lt;tt&amp;gt;~/.ssh/config&amp;lt;/tt&amp;gt; to enable forwarding to that node&lt;br /&gt;
* Start ssh forwarding&lt;br /&gt;
* Start server&lt;br /&gt;
* Connecting client and server&lt;br /&gt;
&lt;br /&gt;
====SSH Keys====&lt;br /&gt;
&lt;br /&gt;
To be able to log into the compute nodes where ParaView will be running, you'll have to have an [[Ssh_keys | SSH key]] set up, as password authentication won't work.    Our [[Ssh_keys | SSH Keys and SciNet]] page describes how to do this.&lt;br /&gt;
&lt;br /&gt;
====Log into node====&lt;br /&gt;
&lt;br /&gt;
The first thing to do is to go to the node from which you'll start the ParaView server.   This is typically done by starting an interactive job on the GPC, perhaps on the [[Moab#debug | debug ]] queue or sandybridge [[GPC_Quickstart#Memory_Configuration | large memory]] nodes.   Paraview can in principle make use of as many nodes as you throw at it.  So one might  begin jobs as below:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -l nodes=1:m128g:ppn=16,walltime=1:00:00 -q sandy -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -l nodes=2:ppn=8,walltime=1:00:00 -q debug -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once this job has started, you'll be placed in a shell on the head node of the job; typing `&amp;lt;pre&amp;gt;hostname&amp;lt;/pre&amp;gt;' will tell you the name of the host, eg&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ hostname&lt;br /&gt;
gpc-f148n089-ib0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ hostname&lt;br /&gt;
gpc-f107n045-ib0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
you will need this hostname in the following steps; on your local machine, in the terminal, set a variable name &amp;lt;tt&amp;gt;gpcnode&amp;lt;/tt&amp;gt; to the remote node name, eg&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
export gpcnode=gpc-f148n089-ib0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
====Start SSH port forwarding====&lt;br /&gt;
&lt;br /&gt;
Once the ssh configuration is set, the port forwarding can be started with the command (on your desktop, in the same terminal you set the gpcnode variable above:)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh -N -L 20080:${gpcnode}:22 -L 20090:${gpcnode}:11111 login.scinet.utoronto.ca&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
this command will not return anything until the forwarding is terminated, and will just look like it's sitting there.  It doesn't start a remote shell or command (-N), but it will connect to login.scinet.utoronto.ca, and from there it will redirect your local (-L) port 20080 to &amp;lt;tt&amp;gt;${gpcnode}&amp;lt;/tt&amp;gt; port 22, and similarly local port 20090 to &amp;lt;tt&amp;gt;${gpcnode}&amp;lt;/tt&amp;gt; port 11111.  We'll use the first for ssh'ing to the remote node (mainly for testing), and the second to conect the local paraview client to the remote paraview server.&lt;br /&gt;
&lt;br /&gt;
To make sure the port forwarding is working correctly, in another window try sshing directly to the compute node from your desktop:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh -p 20080 [your-scinet-username]@localhost&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and this should land you directly on the compute node.   If it does not, then something is wrong with the ssh forwarding.&lt;br /&gt;
&lt;br /&gt;
====Start Server====&lt;br /&gt;
&lt;br /&gt;
Now that the tunnel is set up, on the compute node you can start the paraview server.    To do this, you will have to have the following modules loaded:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load Xlibraries intel gcc python openmpi paraview&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
(You can replace intelmpi with openmpi, and of course any module that is already loaded does have to be loaded again.)&lt;br /&gt;
&lt;br /&gt;
Then start the paraview server with the intel mpirun as with any MPI job:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun -np [NP] pvserver --use-offscreen-rendering&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where NP is the number of processors; 16 processors per node on the largemem nodes, or 8 per node otherwise.    &lt;br /&gt;
&lt;br /&gt;
Once running, the ParaView server should output&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Listen on port: 11111&lt;br /&gt;
Waiting for client...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
====Connect Client and Server====&lt;br /&gt;
&lt;br /&gt;
[[Image:Configure.png|thumb|right|320px|Configuring the client]]&lt;br /&gt;
&lt;br /&gt;
Once the server is running, you can connect the client.   Start the ParaView client on your desktop, and choose File-&amp;gt;Connect.   Click `Add Server', give the server a name (say, GPC), and give the port number 20090.   The other values should be correct by default; host is &amp;lt;tt&amp;gt;localhost&amp;lt;/tt&amp;gt;, and the server type is Client/Server.  Click `Configure'.&lt;br /&gt;
&lt;br /&gt;
On the next window, you'll be asked for a command to start up the server; select `Manual', and ok.&lt;br /&gt;
&lt;br /&gt;
In future runs, you'll be able to re-use this server, even if the host is different, because the correct host will be set in your &amp;lt;tt&amp;gt;.ssh/config&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Once the server is selected, click `Connect'.  On the compute node, the server should respond `Client connected'.   In the client window, when you (for instance) select File-&amp;gt;Open, you will be seeing the files on the GPC, rather than the local host.&lt;br /&gt;
&lt;br /&gt;
From here, the [http://paraview.org/Wiki/ParaView ParaView Wiki] can give you instructions as to how to plot your data.&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Using_Paraview&amp;diff=7228</id>
		<title>Using Paraview</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Using_Paraview&amp;diff=7228"/>
		<updated>2014-09-10T19:49:21Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: /* Log into node */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[http://www.paraview.org/ ParaView] is a powerful, parallel, client-server based visualization system that allows you to use SciNet's GPC nodes to render data on SciNet, and manipulate the results interactively on your own desktop.   To use the paraview server on SciNet is much like using it locally, but there is an additional step in setting up a connection directly between your desktop and the compute nodes.&lt;br /&gt;
&lt;br /&gt;
[[Image:Paraview.png|thumb|right|320px|The ParaView Client GUI]]&lt;br /&gt;
&lt;br /&gt;
===Installing ParaView===&lt;br /&gt;
&lt;br /&gt;
To use Paraview, you will have to have the client software installed on your system; you will need ParaView from [http://www.paraview.org/paraview/resources/software.html the Paraview website].  Binaries exist for Linux, Mac, and Windows systems.   The client version must exactly match the version installed on the server, currently 3.12 or 3.14.1.   The client version has all the functionality of the server, and can analyze data locally.&lt;br /&gt;
&lt;br /&gt;
===SSH Forwarding For ParaView===&lt;br /&gt;
&lt;br /&gt;
To interactively use the ParaView server on GPC, you will have to work some ssh magic to allow the client on your desktop to connect to the server through the scinet login nodes.  The steps required are&lt;br /&gt;
&lt;br /&gt;
* Have an SSH key that you can use to log into SciNet&lt;br /&gt;
* Log into the head node that you'll be using the server on&lt;br /&gt;
* (Mac or Linux): Edit your local &amp;lt;tt&amp;gt;~/.ssh/config&amp;lt;/tt&amp;gt; to enable forwarding to that node&lt;br /&gt;
* Start ssh forwarding&lt;br /&gt;
* Start server&lt;br /&gt;
* Connecting client and server&lt;br /&gt;
&lt;br /&gt;
====SSH Keys====&lt;br /&gt;
&lt;br /&gt;
To be able to log into the compute nodes where ParaView will be running, you'll have to have an [[Ssh_keys | SSH key]] set up, as password authentication won't work.    Our [[Ssh_keys | SSH Keys and SciNet]] page describes how to do this.&lt;br /&gt;
&lt;br /&gt;
====Log into node====&lt;br /&gt;
&lt;br /&gt;
The first thing to do is to go to the node from which you'll start the ParaView server.   This is typically done by starting an interactive job on the GPC, perhaps on the [[Moab#debug | debug ]] queue or sandybridge [[GPC_Quickstart#Memory_Configuration | large memory]] nodes.   Paraview can in principle make use of as many nodes as you throw at it.  So one might  begin jobs as below:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -l nodes=1:m128g:ppn=16,walltime=1:00:00 -q sandy -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -l nodes=2:ppn=8,walltime=1:00:00 -q debug -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once this job has started, you'll be placed in a shell on the head node of the job; typing `&amp;lt;pre&amp;gt;hostname&amp;lt;/pre&amp;gt;' will tell you the name of the host, eg&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ hostname&lt;br /&gt;
gpc-f148n089-ib0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ hostname&lt;br /&gt;
gpc-f107n045-ib0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
you will need this hostname in the following steps; on your local machine, in the terminal, set a variable name &amp;lt;tt&amp;gt;gpcnode&amp;lt;/tt&amp;gt; to the remote node name, eg&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
export gpcnode=gpc-f148n089-ib0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
====Edit ssh config (MacOS/Linux)====&lt;br /&gt;
&lt;br /&gt;
You will now need to edit your ssh config to allow ssh forwarding so that you can seemingly connect directly to the compute node above.   Add the following lines to your &amp;lt;tt&amp;gt;~/.ssh/config&amp;lt;/tt&amp;gt; file in MacOS or Linux; windows users will have to consult their ssh client's documentation as to how to implement the forwarding:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Host gpc_gw&lt;br /&gt;
   HostName login.scinet.utoronto.ca&lt;br /&gt;
   User [username]&lt;br /&gt;
   LocalForward 20080 [hostname]:22&lt;br /&gt;
   LocalForward 20090 [hostname]:11111&lt;br /&gt;
&lt;br /&gt;
Host gpcnode&lt;br /&gt;
   HostName localhost&lt;br /&gt;
   HostKeyAlias gpcnode&lt;br /&gt;
   User [username]&lt;br /&gt;
   Port 20080 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Replace &amp;lt;tt&amp;gt;[username]&amp;lt;/tt&amp;gt; with your username, and &amp;lt;tt&amp;gt;[hostname]&amp;lt;/tt&amp;gt; with the name of the host from the previous step.  This sets two ssh port forwards; one to port 11111 of the compute nodes, which is needed by ParaView; and one is just the usual SSH port 22, which can be used for testing.   In future runs of the server, only the hostname in the first stanza needs to be changed.&lt;br /&gt;
&lt;br /&gt;
====Edit ssh config (Windows, Cygwin)====&lt;br /&gt;
&lt;br /&gt;
If you have Cygwin X installed on Windows, including the &amp;lt;tt&amp;gt;openssh&amp;lt;/tt&amp;gt; package, take the following steps. First run in your Cygwin Bash Shell:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh-user-config&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
There is no need to create any of the keys, but this will create the ssh directory where you need to put the config file. &lt;br /&gt;
&lt;br /&gt;
Next, go to the directory &amp;lt;tt&amp;gt;cygwin\home\[username]\.ssh\&amp;lt;/tt&amp;gt;, where &amp;lt;tt&amp;gt;username&amp;lt;/tt&amp;gt; is your computer login name. In this directory, create a file called &amp;lt;tt&amp;gt;config&amp;lt;/tt&amp;gt; containing the code from the previous section, with the stanzas replaced by the appropriate host- and username. Make the file read-only. &lt;br /&gt;
The ssh port forwarding is now set up. Open a Cygwin Bash Shell and follow the rest of the instructions below.&lt;br /&gt;
&lt;br /&gt;
====Start SSH port forwarding====&lt;br /&gt;
&lt;br /&gt;
Once the ssh configuration is set, the port forwarding can be started with the command (on your desktop)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh -N gpc_gw&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
this command will not return anything until the forwarding is terminated, and will just look like it's sitting there.  To make sure the port forwarding is working correctly, in another window try sshing directly to the compute node from your desktop:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh gpcnode&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and this should land you directly on the compute node.   If it does not, then something is wrong with the ssh forwarding.&lt;br /&gt;
&lt;br /&gt;
====Start Server====&lt;br /&gt;
&lt;br /&gt;
Now that the tunnel is set up, on the compute node you can start the paraview server.    To do this, you will have to have the following modules loaded:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load Xlibraries intel gcc python openmpi paraview&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
(You can replace intelmpi with openmpi, and of course any module that is already loaded does have to be loaded again.)&lt;br /&gt;
&lt;br /&gt;
Then start the paraview server with the intel mpirun as with any MPI job:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun -np [NP] pvserver --use-offscreen-rendering&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where NP is the number of processors; 16 processors per node on the largemem nodes, or 8 per node otherwise.    &lt;br /&gt;
&lt;br /&gt;
Once running, the ParaView server should output&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Listen on port: 11111&lt;br /&gt;
Waiting for client...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
====Connect Client and Server====&lt;br /&gt;
&lt;br /&gt;
[[Image:Configure.png|thumb|right|320px|Configuring the client]]&lt;br /&gt;
&lt;br /&gt;
Once the server is running, you can connect the client.   Start the ParaView client on your desktop, and choose File-&amp;gt;Connect.   Click `Add Server', give the server a name (say, GPC), and give the port number 20090.   The other values should be correct by default; host is &amp;lt;tt&amp;gt;localhost&amp;lt;/tt&amp;gt;, and the server type is Client/Server.  Click `Configure'.&lt;br /&gt;
&lt;br /&gt;
On the next window, you'll be asked for a command to start up the server; select `Manual', and ok.&lt;br /&gt;
&lt;br /&gt;
In future runs, you'll be able to re-use this server, even if the host is different, because the correct host will be set in your &amp;lt;tt&amp;gt;.ssh/config&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Once the server is selected, click `Connect'.  On the compute node, the server should respond `Client connected'.   In the client window, when you (for instance) select File-&amp;gt;Open, you will be seeing the files on the GPC, rather than the local host.&lt;br /&gt;
&lt;br /&gt;
From here, the [http://paraview.org/Wiki/ParaView ParaView Wiki] can give you instructions as to how to plot your data.&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Hadoop_for_HPCers&amp;diff=7206</id>
		<title>Hadoop for HPCers</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Hadoop_for_HPCers&amp;diff=7206"/>
		<updated>2014-09-03T17:28:55Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
=Overview=&lt;br /&gt;
&lt;br /&gt;
This is a ~3 hour class that will introduce Hadoop to HPC users with a background in numerical simulation.  We will walk through a brief overview of:&lt;br /&gt;
* The Hadoop File System (HDFS)&lt;br /&gt;
* Map Reduce &lt;br /&gt;
* Pig&lt;br /&gt;
* Spark&lt;br /&gt;
&lt;br /&gt;
Most examples will be written in Python.&lt;br /&gt;
&lt;br /&gt;
=VM Instructions=&lt;br /&gt;
&lt;br /&gt;
This course will feature hands-on work with a 1-node Hadoop cluster running on your laptop.  The VMs are created with [http://www.vagrantup.com Vagrant].  Before the course, ensure this is up and running:&lt;br /&gt;
&lt;br /&gt;
* Install [https://www.virtualbox.org VirtualBox] on your laptop and start it. (Note!  At time of writing, the newest version, 4.3.14, is broken on at least Mac and Windows; you'll want to install 4.3.12 from [https://www.virtualbox.org/wiki/Download_Old_Builds_4_3 &amp;quot;older builds&amp;quot;].)&lt;br /&gt;
* Under Settings or Preferences, go to Network, then Host-only networks, and add/create two host-only networks.&lt;br /&gt;
* Then download the virtual machine image you want to use:&lt;br /&gt;
** [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/GUI/GUI-VM.ova Full Size VM with GUI (require peak of ~8GB free disk space)]&lt;br /&gt;
** [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/text/Text-VM.ova Smaller, Text-only (require peak of ~6GB free disk space)]&lt;br /&gt;
* &amp;quot;Import Appliance&amp;quot;, and select the downloaded image; this will uncompress the image which will take some minutes.&lt;br /&gt;
* Start the new virtual machine.&lt;br /&gt;
&lt;br /&gt;
If you get any warnings about shared folders not existing, that's fine.&lt;br /&gt;
&lt;br /&gt;
The GUI VM will start up a console with a full desktop environment; you can open a terminal and begin working.  For the text VM, you will have to login to the console; the username/password is vagrant/vagrant.  For either machine, you can also ssh into the VM from your laptop from the terminal: &amp;lt;pre&amp;gt;ssh vagrant@192.168.33.10&amp;lt;/pre&amp;gt; (or &amp;lt;pre&amp;gt;ssh -p 2222 vagrant@localhost&amp;lt;/pre&amp;gt;) or to the laptop from the VM with &amp;lt;pre&amp;gt;ssh [username]@192.168.33.1&amp;lt;/pre&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
(If that particular address pair doesn't work, from a window within the VM, type &amp;quot;ifconfig&amp;quot; to find a line like &amp;quot;inet addr: 192.168....&amp;quot; or &amp;quot;inet adde: 10. ..&amp;quot;; that's the VMs IP address)&lt;br /&gt;
&lt;br /&gt;
Then make sure everything is working:&lt;br /&gt;
* From a terminal, start up the hadoop cluster by typing &amp;lt;pre&amp;gt;~/bin/init.sh&amp;lt;/pre&amp;gt;  You may have to answer &amp;quot;yes&amp;quot; a few times to start up some servers.&lt;br /&gt;
* Go to one of the example directories by typing &amp;lt;pre&amp;gt;cd ~/examples/wordcount/streaming&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Then start the example by typing &amp;lt;pre&amp;gt;make&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You've now run your (maybe) first Hadoop job!&lt;br /&gt;
&lt;br /&gt;
If you'd like, you can also create the virtual machine image yourself by downloading [http://www.vagrantup.com Vagrant] and the Vagrantfile for the [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/GUI/Vagrantfile GUI] or [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/text/Vagrantfile text] image and running &amp;quot;vagrant up&amp;quot;.  If you vagrant-up the GUI VM, you will have to &amp;quot;vagrant reload&amp;quot; after installation is completed to restart with all the software installed.&lt;br /&gt;
&lt;br /&gt;
If you can't get the VM working for whatever reason, please contact us and we will make alternate arrangements.&lt;br /&gt;
&lt;br /&gt;
= Updated Examples =&lt;br /&gt;
&lt;br /&gt;
If you've downloaded the image before Wednesday morning, from within the VM you may want to download the updated examples from [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/examples.tgz https://support.scinet.utoronto.ca/~ljdursi/Hadoop/examples.tgz]&lt;br /&gt;
&lt;br /&gt;
= Slides =&lt;br /&gt;
&lt;br /&gt;
You can download [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/presentation.pdf the slides from here].&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Hadoop_for_HPCers&amp;diff=7205</id>
		<title>Hadoop for HPCers</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Hadoop_for_HPCers&amp;diff=7205"/>
		<updated>2014-09-03T16:59:30Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: /* VM Instructions */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
=Overview=&lt;br /&gt;
&lt;br /&gt;
This is a ~3 hour class that will introduce Hadoop to HPC users with a background in numerical simulation.  We will walk through a brief overview of:&lt;br /&gt;
* The Hadoop File System (HDFS)&lt;br /&gt;
* Map Reduce &lt;br /&gt;
* Pig&lt;br /&gt;
* Spark&lt;br /&gt;
&lt;br /&gt;
Most examples will be written in Python.&lt;br /&gt;
&lt;br /&gt;
=VM Instructions=&lt;br /&gt;
&lt;br /&gt;
This course will feature hands-on work with a 1-node Hadoop cluster running on your laptop.  The VMs are created with [http://www.vagrantup.com Vagrant].  Before the course, ensure this is up and running:&lt;br /&gt;
&lt;br /&gt;
* Install [https://www.virtualbox.org VirtualBox] on your laptop and start it. (Note!  At time of writing, the newest version, 4.3.14, is broken on at least Mac and Windows; you'll want to install 4.3.12 from [https://www.virtualbox.org/wiki/Download_Old_Builds_4_3 &amp;quot;older builds&amp;quot;].)&lt;br /&gt;
* Under Settings or Preferences, go to Network, then Host-only networks, and add/create two host-only networks.&lt;br /&gt;
* Then download the virtual machine image you want to use:&lt;br /&gt;
** [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/GUI/GUI-VM.ova Full Size VM with GUI (require peak of ~8GB free disk space)]&lt;br /&gt;
** [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/text/Text-VM.ova Smaller, Text-only (require peak of ~6GB free disk space)]&lt;br /&gt;
* &amp;quot;Import Appliance&amp;quot;, and select the downloaded image; this will uncompress the image which will take some minutes.&lt;br /&gt;
* Start the new virtual machine.&lt;br /&gt;
&lt;br /&gt;
If you get any warnings about shared folders not existing, that's fine.&lt;br /&gt;
&lt;br /&gt;
The GUI VM will start up a console with a full desktop environment; you can open a terminal and begin working.  For the text VM, you will have to login to the console; the username/password is vagrant/vagrant.  For either machine, you can also ssh into the VM from your laptop from the terminal: &amp;lt;pre&amp;gt;ssh vagrant@192.168.33.10&amp;lt;/pre&amp;gt; (or &amp;lt;pre&amp;gt;ssh -p 2222 vagrant@localhost&amp;lt;/pre&amp;gt;) or to the laptop from the VM with &amp;lt;pre&amp;gt;ssh [username]@192.168.33.1&amp;lt;/pre&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
(If that particular address pair doesn't work, from a window within the VM, type &amp;quot;ifconfig&amp;quot; to find a line like &amp;quot;inet addr: 192.168....&amp;quot; or &amp;quot;inet adde: 10. ..&amp;quot;; that's the VMs IP address)&lt;br /&gt;
&lt;br /&gt;
Then make sure everything is working:&lt;br /&gt;
* From a terminal, start up the hadoop cluster by typing &amp;lt;pre&amp;gt;~/bin/init.sh&amp;lt;/pre&amp;gt;  You may have to answer &amp;quot;yes&amp;quot; a few times to start up some servers.&lt;br /&gt;
* Go to one of the example directories by typing &amp;lt;pre&amp;gt;cd ~/examples/wordcount/streaming&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Then start the example by typing &amp;lt;pre&amp;gt;make&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You've now run your (maybe) first Hadoop job!&lt;br /&gt;
&lt;br /&gt;
If you'd like, you can also create the virtual machine image yourself by downloading [http://www.vagrantup.com Vagrant] and the Vagrantfile for the [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/GUI/Vagrantfile GUI] or [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/text/Vagrantfile text] image and running &amp;quot;vagrant up&amp;quot;.  If you vagrant-up the GUI VM, you will have to &amp;quot;vagrant reload&amp;quot; after installation is completed to restart with all the software installed.&lt;br /&gt;
&lt;br /&gt;
If you can't get the VM working for whatever reason, please contact us and we will make alternate arrangements.&lt;br /&gt;
&lt;br /&gt;
= Updated Examples =&lt;br /&gt;
&lt;br /&gt;
If you've downloaded the image before Wednesday morning, from within the VM you may want to download the updated examples from [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/examples.tgz https://support.scinet.utoronto.ca/~ljdursi/Hadoop/examples.tgz]&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Hadoop_for_HPCers&amp;diff=7202</id>
		<title>Hadoop for HPCers</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Hadoop_for_HPCers&amp;diff=7202"/>
		<updated>2014-09-03T13:08:52Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: /* VM Instructions */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
=Overview=&lt;br /&gt;
&lt;br /&gt;
This is a ~3 hour class that will introduce Hadoop to HPC users with a background in numerical simulation.  We will walk through a brief overview of:&lt;br /&gt;
* The Hadoop File System (HDFS)&lt;br /&gt;
* Map Reduce &lt;br /&gt;
* Pig&lt;br /&gt;
* Spark&lt;br /&gt;
&lt;br /&gt;
Most examples will be written in Python.&lt;br /&gt;
&lt;br /&gt;
=VM Instructions=&lt;br /&gt;
&lt;br /&gt;
This course will feature hands-on work with a 1-node Hadoop cluster running on your laptop.  The VMs are created with [http://www.vagrantup.com Vagrant].  Before the course, ensure this is up and running:&lt;br /&gt;
&lt;br /&gt;
* Install [https://www.virtualbox.org VirtualBox] on your laptop and start it. (Note!  At time of writing, the newest version, 4.3.14, is broken on at least Mac and Windows; you'll want to install 4.3.12 from [https://www.virtualbox.org/wiki/Download_Old_Builds_4_3 &amp;quot;older builds&amp;quot;].)&lt;br /&gt;
* Under Settings or Preferences, go to Network, then Host-only networks, and add/create two host-only networks.&lt;br /&gt;
* Then download the virtual machine image you want to use:&lt;br /&gt;
** [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/GUI/GUI-VM.ova Full Size VM with GUI (require peak of ~8GB free disk space)]&lt;br /&gt;
** [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/text/Text-VM.ova Smaller, Text-only (require peak of ~6GB free disk space)]&lt;br /&gt;
* &amp;quot;Import Appliance&amp;quot;, and select the downloaded image; this will uncompress the image which will take some minutes.&lt;br /&gt;
* Start the new virtual machine.&lt;br /&gt;
&lt;br /&gt;
If you get any warnings about shared folders not existing, that's fine.&lt;br /&gt;
&lt;br /&gt;
The GUI VM will start up a console with a full desktop environment; you can open a terminal and begin working.  For the text VM, you will have to login to the console; the username/password is vagrant/vagrant.  For either machine, you can also ssh into the VM from your laptop from the terminal: &amp;lt;pre&amp;gt;ssh vagrant@192.168.33.10&amp;lt;/pre&amp;gt; (or &amp;lt;pre&amp;gt;ssh -p 2222 vagrant@localhost&amp;lt;/pre&amp;gt;) or to the laptop from the VM with &amp;lt;pre&amp;gt;ssh [username]@192.168.33.1&amp;lt;/pre&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
(If that particular address pair doesn't work, from a window within the VM, type &amp;quot;ifconfig&amp;quot; to find a line like &amp;quot;inet addr: 192.168....&amp;quot; or &amp;quot;inet adde: 10. ..&amp;quot;; that's the VMs IP address)&lt;br /&gt;
&lt;br /&gt;
Then make sure everything is working:&lt;br /&gt;
* From a terminal, start up the hadoop cluster by typing &amp;lt;pre&amp;gt;~/bin/init.sh&amp;lt;/pre&amp;gt;  You may have to answer &amp;quot;yes&amp;quot; a few times to start up some servers.&lt;br /&gt;
* Go to one of the example directories by typing &amp;lt;pre&amp;gt;cd ~/examples/wordcount/streaming&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Then start the example by typing &amp;lt;pre&amp;gt;make&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You've now run your (maybe) first Hadoop job!&lt;br /&gt;
&lt;br /&gt;
If you'd like, you can also create the virtual machine image yourself by downloading [http://www.vagrantup.com Vagrant] and the Vagrantfile for the [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/GUI/Vagrantfile GUI] or [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/text/Vagrantfile text] image and running &amp;quot;vagrant up&amp;quot;.  If you vagrant-up the GUI VM, you will have to &amp;quot;vagrant reload&amp;quot; after installation is completed to restart with all the software installed.&lt;br /&gt;
&lt;br /&gt;
If you can't get the VM working for whatever reason, please contact us and we will make alternate arrangements.&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Hadoop_for_HPCers&amp;diff=7201</id>
		<title>Hadoop for HPCers</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Hadoop_for_HPCers&amp;diff=7201"/>
		<updated>2014-09-03T13:07:33Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: /* VM Instructions */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
=Overview=&lt;br /&gt;
&lt;br /&gt;
This is a ~3 hour class that will introduce Hadoop to HPC users with a background in numerical simulation.  We will walk through a brief overview of:&lt;br /&gt;
* The Hadoop File System (HDFS)&lt;br /&gt;
* Map Reduce &lt;br /&gt;
* Pig&lt;br /&gt;
* Spark&lt;br /&gt;
&lt;br /&gt;
Most examples will be written in Python.&lt;br /&gt;
&lt;br /&gt;
=VM Instructions=&lt;br /&gt;
&lt;br /&gt;
This course will feature hands-on work with a 1-node Hadoop cluster running on your laptop.  The VMs are created with [http://www.vagrantup.com Vagrant].  Before the course, ensure this is up and running:&lt;br /&gt;
&lt;br /&gt;
* Install [https://www.virtualbox.org VirtualBox] on your laptop and start it. (Note!  At time of writing, the newest version, 4.3.14, is broken on at least Mac and Windows; you'll want to install 4.3.12 from [https://www.virtualbox.org/wiki/Download_Old_Builds_4_3 &amp;quot;older builds&amp;quot;].)&lt;br /&gt;
* Under Settings or Preferences, go to Network, then Host-only networks, and add/create two host-only networks.&lt;br /&gt;
* Then download the virtual machine image you want to use:&lt;br /&gt;
** [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/GUI/GUI-VM.ova Full Size VM with GUI (require peak of ~8GB free disk space)]&lt;br /&gt;
** [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/text/Text-VM.ova Smaller, Text-only (require peak of ~6GB free disk space)]&lt;br /&gt;
* &amp;quot;Import Appliance&amp;quot;, and select the downloaded image; this will uncompress the image which will take some minutes.&lt;br /&gt;
* Start the new virtual machine.&lt;br /&gt;
&lt;br /&gt;
The GUI VM will start up a console with a full desktop environment; you can open a terminal and begin working.  For the text VM, you will have to login to the console; the username/password is vagrant/vagrant.  For either machine, you can also ssh into the VM from your laptop from the terminal: &amp;lt;pre&amp;gt;ssh vagrant@192.168.33.10&amp;lt;/pre&amp;gt; or to the laptop from the VM with &amp;lt;pre&amp;gt;ssh [username]@192.168.33.1&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
(If that particular address pair doesn't work, from a window within the VM, type &amp;quot;ifconfig | grep 192&amp;quot; to find a line like &amp;quot;inet addr: 192.168....&amp;quot;; that's the VMs IP address)&lt;br /&gt;
&lt;br /&gt;
Then make sure everything is working:&lt;br /&gt;
* From a terminal, start up the hadoop cluster by typing &amp;lt;pre&amp;gt;~/bin/init.sh&amp;lt;/pre&amp;gt;  You may have to answer &amp;quot;yes&amp;quot; a few times to start up some servers.&lt;br /&gt;
* Go to one of the example directories by typing &amp;lt;pre&amp;gt;cd ~/examples/wordcount/streaming&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Then start the example by typing &amp;lt;pre&amp;gt;make&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You've now run your (maybe) first Hadoop job!&lt;br /&gt;
&lt;br /&gt;
If you'd like, you can also create the virtual machine image yourself by downloading [http://www.vagrantup.com Vagrant] and the Vagrantfile for the [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/GUI/Vagrantfile GUI] or [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/text/Vagrantfile text] image and running &amp;quot;vagrant up&amp;quot;.  If you vagrant-up the GUI VM, you will have to &amp;quot;vagrant reload&amp;quot; after installation is completed to restart with all the software installed.&lt;br /&gt;
&lt;br /&gt;
If you can't get the VM working for whatever reason, please contact us and we will make alternate arrangements.&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Hadoop_for_HPCers&amp;diff=7200</id>
		<title>Hadoop for HPCers</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Hadoop_for_HPCers&amp;diff=7200"/>
		<updated>2014-09-03T13:05:58Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: /* VM Instructions */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
=Overview=&lt;br /&gt;
&lt;br /&gt;
This is a ~3 hour class that will introduce Hadoop to HPC users with a background in numerical simulation.  We will walk through a brief overview of:&lt;br /&gt;
* The Hadoop File System (HDFS)&lt;br /&gt;
* Map Reduce &lt;br /&gt;
* Pig&lt;br /&gt;
* Spark&lt;br /&gt;
&lt;br /&gt;
Most examples will be written in Python.&lt;br /&gt;
&lt;br /&gt;
=VM Instructions=&lt;br /&gt;
&lt;br /&gt;
This course will feature hands-on work with a 1-node Hadoop cluster running on your laptop.  The VMs are created with [http://www.vagrantup.com Vagrant].  Before the course, ensure this is up and running:&lt;br /&gt;
&lt;br /&gt;
* Install [https://www.virtualbox.org VirtualBox] on your laptop and start it. (Note!  At time of writing, the newest version, 4.3.14, is broken on at least Mac and Windows; you'll want to install 4.3.12 from &amp;quot;older builds&amp;quot;.)&lt;br /&gt;
* Under Settings or Preferences, go to Network, then Host-only networks, and add/create two host-only networks.&lt;br /&gt;
* Then download the virtual machine image you want to use:&lt;br /&gt;
** [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/GUI/GUI-VM.ova Full Size VM with GUI (require peak of ~8GB free disk space)]&lt;br /&gt;
** [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/text/Text-VM.ova Smaller, Text-only (require peak of ~6GB free disk space)]&lt;br /&gt;
* &amp;quot;Import Appliance&amp;quot;, and select the downloaded image; this will uncompress the image which will take some minutes.&lt;br /&gt;
* Start the new virtual machine.&lt;br /&gt;
&lt;br /&gt;
The GUI VM will start up a console with a full desktop environment; you can open a terminal and begin working.  For the text VM, you will have to login to the console; the username/password is vagrant/vagrant.  For either machine, you can also ssh into the VM from your laptop from the terminal: &amp;lt;pre&amp;gt;ssh vagrant@192.168.33.10&amp;lt;/pre&amp;gt; or to the laptop from the VM with &amp;lt;pre&amp;gt;ssh [username]@192.168.33.1&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
(If that particular address pair doesn't work, from a window within the VM, type &amp;quot;ifconfig | grep 192&amp;quot; to find a line like &amp;quot;inet addr: 192.168....&amp;quot;; that's the VMs IP address)&lt;br /&gt;
&lt;br /&gt;
Then make sure everything is working:&lt;br /&gt;
* From a terminal, start up the hadoop cluster by typing &amp;lt;pre&amp;gt;~/bin/init.sh&amp;lt;/pre&amp;gt;  You may have to answer &amp;quot;yes&amp;quot; a few times to start up some servers.&lt;br /&gt;
* Go to one of the example directories by typing &amp;lt;pre&amp;gt;cd ~/examples/wordcount/streaming&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Then start the example by typing &amp;lt;pre&amp;gt;make&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You've now run your (maybe) first Hadoop job!&lt;br /&gt;
&lt;br /&gt;
If you'd like, you can also create the virtual machine image yourself by downloading [http://www.vagrantup.com Vagrant] and the Vagrantfile for the [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/GUI/Vagrantfile GUI] or [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/text/Vagrantfile text] image and running &amp;quot;vagrant up&amp;quot;.  If you vagrant-up the GUI VM, you will have to &amp;quot;vagrant reload&amp;quot; after installation is completed to restart with all the software installed.&lt;br /&gt;
&lt;br /&gt;
If you can't get the VM working for whatever reason, please contact us and we will make alternate arrangements.&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Hadoop_for_HPCers&amp;diff=7187</id>
		<title>Hadoop for HPCers</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Hadoop_for_HPCers&amp;diff=7187"/>
		<updated>2014-08-31T17:09:41Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: /* VM Instructions */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
=Overview=&lt;br /&gt;
&lt;br /&gt;
This is a ~3 hour class that will introduce Hadoop to HPC users with a background in numerical simulation.  We will walk through a brief overview of:&lt;br /&gt;
* The Hadoop File System (HDFS)&lt;br /&gt;
* Map Reduce &lt;br /&gt;
* Pig&lt;br /&gt;
* Spark&lt;br /&gt;
&lt;br /&gt;
Most examples will be written in Python.&lt;br /&gt;
&lt;br /&gt;
=VM Instructions=&lt;br /&gt;
&lt;br /&gt;
This course will feature hands-on work with a 1-node Hadoop cluster running on your laptop.  The VMs are created with [http://www.vagrantup.com Vagrant].  Before the course, ensure this is up and running:&lt;br /&gt;
&lt;br /&gt;
* Install [https://www.virtualbox.org VirtualBox] on your laptop and start it.&lt;br /&gt;
* Under Settings or Preferences, go to Network, then Host-only networks, and add/create two host-only networks.&lt;br /&gt;
* Then download the virtual machine image you want to use:&lt;br /&gt;
** [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/GUI/GUI-VM.ova Full Size VM with GUI (require peak of ~8GB free disk space)]&lt;br /&gt;
** [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/text/Text-VM.ova Smaller, Text-only (require peak of ~6GB free disk space)]&lt;br /&gt;
* &amp;quot;Import Appliance&amp;quot;, and select the downloaded image; this will uncompress the image which will take some minutes.&lt;br /&gt;
* Start the new virtual machine.&lt;br /&gt;
&lt;br /&gt;
The GUI VM will start up a console with a full desktop environment; you can open a terminal and begin working.  For the text VM, you will have to login to the console; the username/password is vagrant/vagrant.  For either machine, you can also ssh into the VM from your laptop from the terminal: &amp;lt;pre&amp;gt;ssh vagrant@192.168.33.10&amp;lt;/pre&amp;gt; or to the laptop from the VM with &amp;lt;pre&amp;gt;ssh [username]@192.168.33.1&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
(If that particular address pair doesn't work, from a window within the VM, type &amp;quot;ifconfig | grep 192&amp;quot; to find a line like &amp;quot;inet addr: 192.168....&amp;quot;; that's the VMs IP address)&lt;br /&gt;
&lt;br /&gt;
Then make sure everything is working:&lt;br /&gt;
* From a terminal, start up the hadoop cluster by typing &amp;lt;pre&amp;gt;~/bin/init.sh&amp;lt;/pre&amp;gt;  You may have to answer &amp;quot;yes&amp;quot; a few times to start up some servers.&lt;br /&gt;
* Go to one of the example directories by typing &amp;lt;pre&amp;gt;cd ~/examples/wordcount/streaming&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Then start the example by typing &amp;lt;pre&amp;gt;make&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You've now run your (maybe) first Hadoop job!&lt;br /&gt;
&lt;br /&gt;
If you'd like, you can also create the virtual machine image yourself by downloading [http://www.vagrantup.com Vagrant] and the Vagrantfile for the [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/GUI/Vagrantfile GUI] or [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/text/Vagrantfile text] image and running &amp;quot;vagrant up&amp;quot;.  If you vagrant-up the GUI VM, you will have to &amp;quot;vagrant reload&amp;quot; after installation is completed to restart with all the software installed.&lt;br /&gt;
&lt;br /&gt;
If you can't get the VM working for whatever reason, please contact us and we will make alternate arrangements.&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Hadoop_for_HPCers&amp;diff=7186</id>
		<title>Hadoop for HPCers</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Hadoop_for_HPCers&amp;diff=7186"/>
		<updated>2014-08-29T18:23:06Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: /* VM Instructions */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
=Overview=&lt;br /&gt;
&lt;br /&gt;
This is a ~3 hour class that will introduce Hadoop to HPC users with a background in numerical simulation.  We will walk through a brief overview of:&lt;br /&gt;
* The Hadoop File System (HDFS)&lt;br /&gt;
* Map Reduce &lt;br /&gt;
* Pig&lt;br /&gt;
* Spark&lt;br /&gt;
&lt;br /&gt;
Most examples will be written in Python.&lt;br /&gt;
&lt;br /&gt;
=VM Instructions=&lt;br /&gt;
&lt;br /&gt;
This course will feature hands-on work with a 1-node Hadoop cluster running on your laptop.  The VMs are created with [http://www.vagrantup.com Vagrant].  Before the course, ensure this is up and running:&lt;br /&gt;
&lt;br /&gt;
* Install [https://www.virtualbox.org VirtualBox] on your laptop and start it.&lt;br /&gt;
* Under Settings or Preferences, go to Network, then Host-only networks, and add/create two host-only networks.&lt;br /&gt;
* Then download the virtual machine image you want to use:&lt;br /&gt;
** [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/GUI/GUI-VM.ova Full Size VM with GUI (require peak of ~8GB free disk space)]&lt;br /&gt;
** [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/text/Text-VM.ova Smaller, Text-only (require peak of ~6GB free disk space)]&lt;br /&gt;
* &amp;quot;Import Appliance&amp;quot;, and select the downloaded image; this will uncompress the image which will take some minutes.&lt;br /&gt;
* Start the new virtual machine.&lt;br /&gt;
&lt;br /&gt;
The GUI VM will start up a console with a full desktop environment; you can open a terminal and begin working.  For the text VM, you will have to login to the console; the username/password is vagrant/vagrant.  For either machine, you can also ssh into the VM from your laptop from the terminal: &amp;lt;pre&amp;gt;ssh vagrant@192.168.33.10&amp;lt;/pre&amp;gt; or to the laptop from the VM with &amp;lt;pre&amp;gt;ssh [username]@192.168.33.1&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
(If that particular address pair doesn't work, from a window within the VM, type &amp;quot;ifconfig | grep 192&amp;quot; to find a line like &amp;quot;inet addr: 192.168....&amp;quot;; that's the VMs IP address)&lt;br /&gt;
&lt;br /&gt;
Then make sure everything is working:&lt;br /&gt;
* From a terminal, start up the hadoop cluster by typing &amp;lt;pre&amp;gt;~/bin/init.sh&amp;lt;/pre&amp;gt;  You may have to answer &amp;quot;yes&amp;quot; a few times to start up some servers.&lt;br /&gt;
* Go to one of the example directories by typing &amp;lt;pre&amp;gt;cd ~/examples/wordcount/streaming&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Then start the example by typing &amp;lt;pre&amp;gt;make&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You've now run your (maybe) first Hadoop job!&lt;br /&gt;
&lt;br /&gt;
If you'd like, you can also create the virtual machine image yourself by downloading [http://www.vagrantup.com Vagrant] and the Vagrantfile for the [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/GUI/Vagrantfile GUI] or [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/text/Vagrantfile text] image and running &amp;quot;vagrant up&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
If you can't get the VM working for whatever reason, please contact us and we will make alternate arrangements.&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Hadoop_for_HPCers&amp;diff=7185</id>
		<title>Hadoop for HPCers</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Hadoop_for_HPCers&amp;diff=7185"/>
		<updated>2014-08-29T14:06:38Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
=Overview=&lt;br /&gt;
&lt;br /&gt;
This is a ~3 hour class that will introduce Hadoop to HPC users with a background in numerical simulation.  We will walk through a brief overview of:&lt;br /&gt;
* The Hadoop File System (HDFS)&lt;br /&gt;
* Map Reduce &lt;br /&gt;
* Pig&lt;br /&gt;
* Spark&lt;br /&gt;
&lt;br /&gt;
Most examples will be written in Python.&lt;br /&gt;
&lt;br /&gt;
=VM Instructions=&lt;br /&gt;
&lt;br /&gt;
This course will feature hands-on work with a 1-node Hadoop cluster running on your laptop.  The VMs are created with [http://www.vagrantup.com Vagrant].  Before the course, ensure this is up and running:&lt;br /&gt;
&lt;br /&gt;
* Install [https://www.virtualbox.org VirtualBox] on your laptop&lt;br /&gt;
* Download the virtual machine image you want to use:&lt;br /&gt;
** [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/GUI/GUI-VM.ova Full Size VM with GUI (require peak of ~8GB free disk space)]&lt;br /&gt;
** [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/text/Text-VM.ova Smaller, Text-only (require peak of ~6GB free disk space)]&lt;br /&gt;
* Start VirtualBox&lt;br /&gt;
* &amp;quot;Import Appliance&amp;quot;, and select the downloaded image; this will uncompress the image which will take some minutes.&lt;br /&gt;
* Start the new virtual machine.&lt;br /&gt;
&lt;br /&gt;
The GUI VM will start up a console with a full desktop environment; you can open a terminal and begin working.  For the text VM, you will have to login to the console; the username/password is vagrant/vagrant.  For either machine, you can also ssh into the VM from your laptop from the terminal: &amp;lt;pre&amp;gt;ssh vagrant@192.168.33.10&amp;lt;/pre&amp;gt; or to the laptop from the VM with &amp;lt;pre&amp;gt;ssh [username]@192.168.33.1&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then make sure everything is working:&lt;br /&gt;
* From a terminal, start up the hadoop cluster by typing &amp;lt;pre&amp;gt;~/bin/init.sh&amp;lt;/pre&amp;gt;  You may have to answer &amp;quot;yes&amp;quot; a few times to start up some servers.&lt;br /&gt;
* Go to one of the example directories by typing &amp;lt;pre&amp;gt;cd ~/examples/wordcount/streaming&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Then start the example by typing &amp;lt;pre&amp;gt;make&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You've now run your (maybe) first Hadoop job!&lt;br /&gt;
&lt;br /&gt;
If you'd like, you can also create the virtual machine image yourself by downloading [http://www.vagrantup.com Vagrant] and the Vagrantfile for the [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/GUI/Vagrantfile GUI] or [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/text/Vagrantfile text] image and running &amp;quot;vagrant up&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
If you can't get the VM working for whatever reason, please contact us and we will make alternate arrangements.&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Hadoop_for_HPCers&amp;diff=7184</id>
		<title>Hadoop for HPCers</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Hadoop_for_HPCers&amp;diff=7184"/>
		<updated>2014-08-29T12:32:56Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
=Overview=&lt;br /&gt;
&lt;br /&gt;
This is a ~3 hour class that will introduce Hadoop to HPC users with a background in numerical simulation.  We will walk through a brief overview of:&lt;br /&gt;
* The Hadoop File System (HDFS)&lt;br /&gt;
* Map Reduce &lt;br /&gt;
* Pig&lt;br /&gt;
* Spark&lt;br /&gt;
&lt;br /&gt;
Most examples will be written in Python.&lt;br /&gt;
&lt;br /&gt;
=VM Instructions=&lt;br /&gt;
&lt;br /&gt;
This course will feature hands-on work with a 1-node Hadoop cluster running on your laptop.  The VMs are created with [http://www.vagrantup.com Vagrant].  Before the course, ensure this is up and running:&lt;br /&gt;
&lt;br /&gt;
* Install [https://www.virtualbox.org VirtualBox] on your laptop&lt;br /&gt;
* Download the virtual machine image you want to use:&lt;br /&gt;
** [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/GUI/GUI-VM.ova Full Size VM with GUI (require peak of ~8GB free disk space)]&lt;br /&gt;
** [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/text/Text-VM.ova Smaller, Text-only (require peak of ~6GB free disk space)]&lt;br /&gt;
* Start VirtualBox&lt;br /&gt;
* &amp;quot;Import Appliance&amp;quot;, and select the downloaded image; this will uncompress the image which will take some minutes.&lt;br /&gt;
* Start the new virtual machine.&lt;br /&gt;
&lt;br /&gt;
The GUI VM will start up a console with a full desktop environment; you can open a terminal and begin working.  For the text VM, you will have to login to the console; the username/password is vagrant/vagrant.  For either machine, you can also ssh into the VM from your laptop from the terminal: &amp;lt;pre&amp;gt;ssh vagrant@192.168.33.10&amp;lt;/pre&amp;gt; or to the laptop from the VM with &amp;lt;pre&amp;gt;ssh [username]@192.168.33.1&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then make sure everything is working:&lt;br /&gt;
* From a terminal, start up the hadoop cluster by typing &amp;lt;pre&amp;gt;~/bin/init.sh&amp;lt;/pre&amp;gt;  You may have to answer &amp;quot;yes&amp;quot; a few times to start up some servers.&lt;br /&gt;
* Go to one of the example directories by typing &amp;lt;pre&amp;gt;cd ~/examples/wordcount/streaming&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Then start the example by typing &amp;lt;pre&amp;gt;make&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You've now run your (maybe) first Hadoop job!&lt;br /&gt;
&lt;br /&gt;
If you'd like, you can also create the virtual machine image yourself by downloading [http://www.vagrantup.com Vagrant] and the Vagrantfile for the [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/Gui/Vagrantfile GUI] or [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/text/Vagrantfile text] image and running &amp;quot;vagrant up&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
If you can't get the VM working for whatever reason, please contact us and we will make alternate arrangements.&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Hadoop_for_HPCers&amp;diff=7183</id>
		<title>Hadoop for HPCers</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Hadoop_for_HPCers&amp;diff=7183"/>
		<updated>2014-08-29T12:20:57Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
=Overview=&lt;br /&gt;
&lt;br /&gt;
This is a ~3 hour class that will introduce Hadoop to HPC users with a background in numerical simulation.  We will walk through a brief overview of:&lt;br /&gt;
* The Hadoop File System (HDFS)&lt;br /&gt;
* Map Reduce &lt;br /&gt;
* Pig&lt;br /&gt;
* Spark&lt;br /&gt;
&lt;br /&gt;
Most examples will be written in Python.&lt;br /&gt;
&lt;br /&gt;
=VM Instructions=&lt;br /&gt;
&lt;br /&gt;
This course will feature hands-on work with a 1-node Hadoop cluster running on your laptop.  The VMs are created with [http://www.vagrantup.com Vagrant].  Before the course, ensure this is up and running:&lt;br /&gt;
&lt;br /&gt;
* Install [https://www.virtualbox.org VirtualBox] on your laptop&lt;br /&gt;
* Download the virtual machine image you want to use:&lt;br /&gt;
** [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/GUI/GUI-VM.ova Full Size VM with GUI (require peak of ~8GB free disk space)]&lt;br /&gt;
** [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/text/Text-VM.ova Smaller, Text-only (require peak of ~6GB free disk space)]&lt;br /&gt;
* Start VirtualBox&lt;br /&gt;
* &amp;quot;Import Appliance&amp;quot;, and select the downloaded image; this will uncompress the image which will take some minutes.&lt;br /&gt;
* Start the new virtual machine.&lt;br /&gt;
&lt;br /&gt;
The GUI VM will start up a console with a full desktop environment; you can open a terminal and begin working.  For the text VM, you will have to login to the console; the username/password is vagrant/vagrant.  For either machine, you can also ssh into the VM from your laptop from the terminal: &amp;lt;pre&amp;gt;ssh vagrant@192.168.33.10&amp;lt;/pre&amp;gt; or to the laptop from the VM with &amp;lt;pre&amp;gt;ssh [username]@192.168.33.1&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then make sure everything is working:&lt;br /&gt;
* From a terminal, start up the hadoop cluster by typing &amp;lt;pre&amp;gt;~/bin/init.sh&amp;lt;/pre&amp;gt;  You may have to answer &amp;quot;yes&amp;quot; a few times to start up some servers.&lt;br /&gt;
* Go to one of the example directories by typing &amp;lt;pre&amp;gt;cd ~/examples/wordcount/streaming&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Then start the example by typing &amp;lt;pre&amp;gt;make&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You've now run your (maybe) first Hadoop job!&lt;br /&gt;
&lt;br /&gt;
If you'd like, you can also create the virtual machine image yourself by downloading [http://www.vagrantup.com Vagrant] and the Vagrantfile for the [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/Gui/Vagrantfile GUI] or [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/text/Vagrantfile text] image and running &amp;quot;vagrant up&amp;quot;.&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Hadoop_for_HPCers&amp;diff=7182</id>
		<title>Hadoop for HPCers</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Hadoop_for_HPCers&amp;diff=7182"/>
		<updated>2014-08-29T12:18:44Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: Created page with &amp;quot; =Overview=  This is a ~3 hour class that will introduce Hadoop to HPC users with a background in numerical simulation.  We will walk through a brief overview of: * The Hadoop...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
=Overview=&lt;br /&gt;
&lt;br /&gt;
This is a ~3 hour class that will introduce Hadoop to HPC users with a background in numerical simulation.  We will walk through a brief overview of:&lt;br /&gt;
* The Hadoop File System (HDFS)&lt;br /&gt;
* Map Reduce &lt;br /&gt;
* Pig&lt;br /&gt;
* Spark&lt;br /&gt;
&lt;br /&gt;
Most examples will be written in Python.&lt;br /&gt;
&lt;br /&gt;
=VM Instructions=&lt;br /&gt;
&lt;br /&gt;
This course will feature hands-on work with a 1-node Hadoop cluster running on your laptop.  The VMs are created with [http://www.vagrantup.com Vagrant].  Before the course, ensure this is up and running:&lt;br /&gt;
&lt;br /&gt;
* Install [https://www.virtualbox.org VirtualBox] on your laptop&lt;br /&gt;
* Download the virtual machine image you want to use:&lt;br /&gt;
** [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/GUI-VM.ova Full Size VM with GUI (require peak of ~8GB free disk space)]&lt;br /&gt;
** [https://support.scinet.utoronto.ca/~ljdursi/Hadoop/VMs/Text-VM.ova Smaller, Text-only (require peak of ~6GB free disk space)]&lt;br /&gt;
* Start VirtualBox&lt;br /&gt;
* &amp;quot;Import Appliance&amp;quot;, and select the downloaded image; this will uncompress the image which will take some minutes.&lt;br /&gt;
* Start the new virtual machine.&lt;br /&gt;
&lt;br /&gt;
The GUI VM will start up a console with a full desktop environment; you can open a terminal and begin working.  For the text VM, you will have to login to the console; the username/password is vagrant/vagrant.  For either machine, you can also ssh into the VM from your laptop from the terminal: &amp;lt;pre&amp;gt;ssh vagrant@192.168.33.10&amp;lt;/pre&amp;gt; or to the laptop from the VM with &amp;lt;pre&amp;gt;ssh [username]@192.168.33.1&amp;lt;/pre&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Then make sure everything is working:&lt;br /&gt;
* From a terminal, start up the hadoop cluster by typing &amp;lt;pre&amp;gt;~/bin/init.sh&amp;lt;/pre&amp;gt;  You may have to answer &amp;quot;yes&amp;quot; a few times to start up some servers.&lt;br /&gt;
* Go to one of the example directories by typing &amp;lt;pre&amp;gt;cd ~/examples/wordcount/streaming&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Then start the example by typing &amp;lt;pre&amp;gt;make&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You've now run your (maybe) first Hadoop job!&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=FAQ&amp;diff=7168</id>
		<title>FAQ</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=FAQ&amp;diff=7168"/>
		<updated>2014-08-20T20:09:52Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: Time to stop talking about 2012 allocations.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__TOC__&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==The Basics==&lt;br /&gt;
===Whom do I contact for support?===&lt;br /&gt;
&lt;br /&gt;
Whom do I contact if I have problems or questions about how to use the SciNet systems?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
E-mail [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;]  &lt;br /&gt;
&lt;br /&gt;
In your email, please include the following information:&lt;br /&gt;
&lt;br /&gt;
* your username on SciNet&lt;br /&gt;
* the cluster that your question pertains to (GPC or TCS; SciNet is not a cluster!),&lt;br /&gt;
* any relevant error messages&lt;br /&gt;
* the commands you typed before the errors occured&lt;br /&gt;
* the path to your code (if applicable)&lt;br /&gt;
* the location of the job scripts (if applicable)&lt;br /&gt;
* the directory from which it was submitted (if applicable)&lt;br /&gt;
* a description of what it is supposed to do (if applicable)&lt;br /&gt;
* if your problem is about connecting to SciNet, the type of computer you are connecting from.&lt;br /&gt;
&lt;br /&gt;
Note that your password should never, never, never be to sent to us, even if your question is about your account.&lt;br /&gt;
&lt;br /&gt;
Try to avoid sending email only to specific individuals at SciNet. Your chances of a quick reply increase significantly if you email our team!&lt;br /&gt;
&lt;br /&gt;
===What does ''code scaling'' mean?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please see [[Introduction_To_Performance#Parallel_Speedup|A Performance Primer]]&lt;br /&gt;
&lt;br /&gt;
===What do you mean by ''throughput''?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please see [[Introduction_To_Performance#Throughput|A Performance Primer]].&lt;br /&gt;
&lt;br /&gt;
Here is a simple example:&lt;br /&gt;
&lt;br /&gt;
Suppose you need to do 10 computations.  Say each of these runs for&lt;br /&gt;
1 day on 8 cores, but they take &amp;quot;only&amp;quot; 18 hours on 16 cores.  What is the&lt;br /&gt;
fastest way to get all 10 computations done - as 8-core jobs or as&lt;br /&gt;
16-core jobs?  Let us assume you have 2 nodes at your disposal.&lt;br /&gt;
The answer, after some simple arithmetic, is that running your 10&lt;br /&gt;
jobs as 8-core jobs will take 5 days, whereas if you ran them&lt;br /&gt;
as 16-core jobs it would take 7.5 days.  Take your own conclusions...&lt;br /&gt;
&lt;br /&gt;
===I changed my .bashrc/.bash_profile and now nothing works===&lt;br /&gt;
&lt;br /&gt;
The default startup scripts provided by SciNet, and guidelines for them, can be found [[Important_.bashrc_guidelines|here]].  Certain things - like sourcing &amp;lt;tt&amp;gt;/etc/profile&amp;lt;/tt&amp;gt;&lt;br /&gt;
and &amp;lt;tt&amp;gt;/etc/bashrc&amp;lt;/tt&amp;gt; are ''required'' for various SciNet routines to work!   &lt;br /&gt;
&lt;br /&gt;
If the situation is so bad that you cannot even log in, please send email [mailto:support@scinet.utoronto.ca support].&lt;br /&gt;
&lt;br /&gt;
===Could I have my login shell changed to (t)csh?===&lt;br /&gt;
&lt;br /&gt;
The login shell used on our systems is bash. While the tcsh is available on the GPC and the TCS, we do not support it as the default login shell at present.  So &amp;quot;chsh&amp;quot; will not work, but you can always run tcsh interactively. Also, csh scripts will be executed correctly provided that they have the correct &amp;quot;shebang&amp;quot; &amp;lt;tt&amp;gt;#!/bin/tcsh&amp;lt;/tt&amp;gt; at the top.&lt;br /&gt;
&lt;br /&gt;
===How can I run Matlab / IDL / Gaussian / my favourite commercial software at SciNet?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Because SciNet serves such a disparate group of user communities, there is just no way we can buy licenses for everyone's commercial package.   The only commercial software we have purchased is that which in principle can benefit everyone -- fast compilers and math libraries (Intel's on GPC, and IBM's on TCS).&lt;br /&gt;
&lt;br /&gt;
If your research group requires a commercial package that you already have or are willing to buy licenses for, contact us at [mailto:support@scinet.utoronto.ca support@scinet] and we can work together to find out if it is feasible to implement the packages licensing arrangement on the SciNet clusters, and if so, what is the the best way to do it.&lt;br /&gt;
&lt;br /&gt;
Note that it is important that you contact us before installing commercially licensed software on SciNet machines, even if you have a way to do it in your own directory without requiring sysadmin intervention.   It puts us in a very awkward position if someone is found to be running unlicensed or invalidly licensed software on our systems, so we need to be aware of what is being installed where.&lt;br /&gt;
&lt;br /&gt;
===Do you have a recommended ssh program that will allow scinet access from Windows machines?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
The [[Ssh#SSH_for_Windows_Users | SSH for Windows users]] programs we recommend are:&lt;br /&gt;
&lt;br /&gt;
* [http://mobaxterm.mobatek.net/en/ MobaXterm] is a tabbed ssh client with some Cygwin tools, including ssh and X, all wrapped up into one executable.&lt;br /&gt;
* [http://www.chiark.greenend.org.uk/~sgtatham/putty/ PuTTY]  - this is a terminal for windows that connects via ssh.  It is a quick install and will get you up and running quickly.&amp;lt;br&amp;gt;To set up your passphrase protected ssh key with putty, see [http://the.earth.li/~sgtatham/putty/0.61/htmldoc/Chapter8.html#pubkey here].&lt;br /&gt;
* [http://www.cygwin.com/ CygWin] - this is a whole linux-like environment for windows, which also includes an X window server so that you can display remote windows on your desktop.  Make sure you include the openssh and X window system in the installation for full functionality.  This is recommended if you will be doing a lot of work on Linux machines, as it makes a very similar environment available on your computer.&amp;lt;br&amp;gt;To set up your ssh keys, following the Linux instruction on the [[Ssh keys]] page.&lt;br /&gt;
&amp;lt;br&amp;gt;To set up your ssh keys, following the Linux instruction on the [[Ssh keys]] page.&lt;br /&gt;
&lt;br /&gt;
===My ssh key does not work! WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
[[Ssh_keys#Testing_Your_Key | Testing Your Key]]&lt;br /&gt;
&lt;br /&gt;
* If this doesn't work, you should be able to login using your password, and investigate the problem. For example, if during a login session you get an message similar to the one below, just follow the instruction and delete the offending key on line 3 (you can use vi to jump to that line with ESC plus : plus 3). That only means that you may have logged in from your home computer to SciNet in the past, and that key is obsolete.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh USERNAME@login.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@**@@@@@@@@@@@@@@@@@@@@@@@@@@@@@&lt;br /&gt;
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @&lt;br /&gt;
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@**@@@@@@@@@@@@@@@@@@@@@@@@@@@@@&lt;br /&gt;
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!&lt;br /&gt;
Someone could be eavesdropping on you right now (man-in-the-middle&lt;br /&gt;
attack)!&lt;br /&gt;
It is also possible that the RSA host key has just been changed.&lt;br /&gt;
The fingerprint for the RSA key sent by the remote host is&lt;br /&gt;
53:f9:60:71:a8:0b:5d:74:83:52:**fe:ea:1a:9e:cc:d3.&lt;br /&gt;
Please contact your system administrator.&lt;br /&gt;
Add correct host key in /home/&amp;lt;user&amp;gt;/.ssh/known_hosts to get rid of&lt;br /&gt;
this message.&lt;br /&gt;
Offending key in /home/&amp;lt;user&amp;gt;/.ssh/known_hosts:3&lt;br /&gt;
RSA host key for login.scinet.utoronto.ca &lt;br /&gt;
&amp;lt;http://login.scinet.utoronto.ca &amp;lt;http://login.scinet.utoronto.ca&amp;gt;&amp;gt; has&lt;br /&gt;
changed and you have requested&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* If you get the message below you may need to logout of your gnome session and log back in since the ssh-agent needs to be&lt;br /&gt;
restarted with the new passphrase ssh key.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh USERNAME@login.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
Agent admitted failure to sign using the key.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Can't forward X:  &amp;quot;Warning: No xauth data; using fake authentication data&amp;quot;, or &amp;quot;X11 connection rejected because of wrong authentication.&amp;quot;===&lt;br /&gt;
&lt;br /&gt;
I used to be able to forward X11 windows from SciNet to my home machine, but now I'm getting these messages; what's wrong?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
This very likely means that ssh/xauth can't update your ${HOME}/.Xauthority file. &lt;br /&gt;
&lt;br /&gt;
The simplest pssible reason for this is that you've filled your 10GB /home quota and so can't write anything to your home directory.   Use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load extras&lt;br /&gt;
$ diskUsage&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
&lt;br /&gt;
to check to see how close you are to your disk usage on ${HOME}.&lt;br /&gt;
&lt;br /&gt;
Alternately, this could mean your .Xauthority file has become broken/corrupted/confused some how, in which case you can delete that file, and when you next log in you'll get a similar warning message involving creating .Xauthority, but things should work.&lt;br /&gt;
&lt;br /&gt;
===How come I can not login to TCS?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
A SciNet account doesn't automatically entitle you to TCS access. At a minimum, TCS jobs need to run on at least 32 cores (64 preferred because of Simultaneous Multi Threading - [[TCS_Quickstart#Node_configuration|SMT]] - on these nodes) and need the large memory (4GB/core) and bandwidth on the system. Essentially you need to be able to explain why the work can't be done on the GPC.&lt;br /&gt;
&lt;br /&gt;
===How can I reset the password for my Compute Canada account?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
You can reset your password for your Compute Canada account here:&lt;br /&gt;
&lt;br /&gt;
https://ccdb.computecanada.org/security/forgot&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===How can I change or reset the password for my SciNet account?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
To reset your password at SciNet please go to [https://portal.scinet.utoronto.ca/password_resets Password reset page].&lt;br /&gt;
&lt;br /&gt;
If you know your old password and want to change it, that can be done here:&lt;br /&gt;
&lt;br /&gt;
https://portal.scinet.utoronto.ca/change_password&lt;br /&gt;
&lt;br /&gt;
===Why am I getting the error &amp;quot;Permission denied (publickey,gssapi-with-mic,password)&amp;quot;?===&lt;br /&gt;
&lt;br /&gt;
This error can pop up in a variety of situations: when trying to log in, or when after a job has finished, when the error and output files fail to be copied (there are other possible reasons for this failure as well -- see [[FAQ#My_GPC_job_died.2C_telling_me_.60Copy_Stageout_Files_Failed.27|My GPC job died, telling me:Copy Stageout Files Failed]]).&lt;br /&gt;
In most cases, the &amp;quot;Permission denioed&amp;quot; error is caused by incorrect permission of the (hidden) .ssh directory. Ssh is used for logging in as well as for the copying of the standard error and output files after a job. &lt;br /&gt;
&lt;br /&gt;
For security reasons, &lt;br /&gt;
the directory .ssh should only be writable and readable to you, but yours &lt;br /&gt;
has read permission for everybody, and thus it fails.  You can change &lt;br /&gt;
this by&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   chmod 700 ~/.ssh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
And to be sure, also do&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   chmod 600 ~/.ssh/id_rsa ~/.ssh/id_rsa.pub ~/authorized_keys&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===ERROR:102: Tcl command execution failed? when loading modules ===&lt;br /&gt;
Modules sometimes require other modules to be loaded first.&lt;br /&gt;
Module will let you know if you didn’t.&lt;br /&gt;
For example:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module purge&lt;br /&gt;
$ module load python&lt;br /&gt;
python/2.6.2(11):ERROR:151: Module ’python/2.6.2’ depends on one of the module(s) ’gcc/4.4.0’&lt;br /&gt;
python/2.6.2(11):ERROR:102: Tcl command execution failed: prereq gcc/4.4.0&lt;br /&gt;
$ gpc-f103n084-$ module load gcc python&lt;br /&gt;
$&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Compiling your Code==&lt;br /&gt;
&lt;br /&gt;
===How can I get g77 to work?===&lt;br /&gt;
&lt;br /&gt;
The fortran 77 compilers on the GPC are ifort and gfortran. We have dropped support for g77.  This has been a conscious decision. g77 (and the associated library libg2c) were completely replaced six years ago (Apr 2005) by the gcc 4.x branch, and haven't undergone any updates at all, even bug fixes, for over five years.  &lt;br /&gt;
If we would install g77 and libg2c, we would have to deal with the inevitable confusion caused when users accidentally link against the old, broken, wrong versions of the gcc libraries instead of the correct current versions.   &lt;br /&gt;
&lt;br /&gt;
If your code for some reason specifically requires five-plus-year-old libraries,  availability, compatibility, and unfixed-known-bug problems are only going to get worse for you over time, and this might be as good an opportunity as any to address those issues. &lt;br /&gt;
&lt;br /&gt;
''A note on porting to gfortran or ifort:''&lt;br /&gt;
&lt;br /&gt;
While gfortran and ifort are rather compatible with g77, one &lt;br /&gt;
important difference is that by default, gfortran does not preserve &lt;br /&gt;
local variables between function calls, while g77 does.   Preserved &lt;br /&gt;
local variables are for instance often used in implementations of quasi-random number &lt;br /&gt;
generators.  Proper fortran requires to declare such variables as SAVE &lt;br /&gt;
but not all old code does this.&lt;br /&gt;
Luckily, you can change gfortran's default behavior with the flag &lt;br /&gt;
&amp;lt;tt&amp;gt;-fno-automatic&amp;lt;/tt&amp;gt;.   For ifort, the corresponding flag is &amp;lt;tt&amp;gt;-noautomatic&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
===Where is libg2c.so?===&lt;br /&gt;
&lt;br /&gt;
libg2c.so is part of the g77 compiler, for which we dropped support. See [[#How can I get g77 to work on the GPC?]] for our reasons.&lt;br /&gt;
&lt;br /&gt;
===Autoparallelization does not work!===&lt;br /&gt;
&lt;br /&gt;
I compiled my code with the &amp;lt;tt&amp;gt;-qsmp=omp,auto&amp;lt;/tt&amp;gt; option, and then I specified that it should be run with 64 threads - with &lt;br /&gt;
 export OMP_NUM_THREADS=64&lt;br /&gt;
&lt;br /&gt;
However, when I check the load using &amp;lt;tt&amp;gt;llq1 -n&amp;lt;/tt&amp;gt;, it shows a load on the node of 1.37.  Why?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Using the autoparallelization will only get you so far.  In fact, it usually does not do too much.  What is helpful is to run the compiler with the &amp;lt;tt&amp;gt;-qreport&amp;lt;/tt&amp;gt; option, and then read the output listing carefully to see where the compiler thought it could parallelize, where it could not, and the reasons for this.  Then you can go back to your code and carefully try to address each of the issues brought up by the compiler.&lt;br /&gt;
We ''emphasize'' that this is just a rough first guide, and that the compilers are still not magical!   For more sophisticated approaches to parallelizing your code, email us at [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;]  to set up an appointment with one&lt;br /&gt;
of our technical analysts.&lt;br /&gt;
&lt;br /&gt;
===How do I link against the Intel Math Kernel Library?===&lt;br /&gt;
&lt;br /&gt;
If you need to link in the Intel Math Kernel Library (MKL) libraries, you are well advised to use the Intel(R) Math Kernel Library Link Line Advisor: http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/ for help in devising the list of libraries to link with your code.&lt;br /&gt;
&lt;br /&gt;
'''''Note that this give the link line for the command line. When using this in Makefiles, replace $MKLPATH by ${MKLPATH}.'''''&lt;br /&gt;
&lt;br /&gt;
'''''Note too that, unless the integer arguments you will be passing to the MKL libraries are actually 64-bit integers, rather than the normal int or INTEGER types, you want to specify 32-bit integers (lp64) .'''''&lt;br /&gt;
&lt;br /&gt;
===Can the compilers on the login nodes be disabled to prevent accidentally using them?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
You can accomplish this by modifying your .bashrc to not load the compiler modules. See [[Important .bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
===&amp;quot;relocation truncated to fit: R_X86_64_PC32&amp;quot;: Huh?===&lt;br /&gt;
&lt;br /&gt;
What does this mean, and why can't I compile this code?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Welcome to the joys of the x86 architecture!  You're probably having trouble building arrays larger than 2GB, individually or together.   Generally, you have to try to use the medium or large x86 `memory model'.   For the intel compilers, this is specified with the compile options&lt;br /&gt;
&lt;br /&gt;
  -mcmodel=medium -shared-intel&lt;br /&gt;
&lt;br /&gt;
===&amp;quot;feupdateenv is not implemented and will always fail&amp;quot;===&lt;br /&gt;
&lt;br /&gt;
How do I get rid of this and what does it mean?&lt;br /&gt;
 &lt;br /&gt;
'''Answer:'''&lt;br /&gt;
First note that, as ominous as it sounds, this is really just a warning, and has to do with the intel math library. You can ignore it (unless you really are trying to manually change the exception handlers for floating point exceptions such as divide by zero), or take the safe road and get rid off it by linking with the intel math functions library:&amp;lt;pre&amp;gt;-limf&amp;lt;/pre&amp;gt;See also [[#How do I link against the Intel Math Kernel Library?]]&lt;br /&gt;
&lt;br /&gt;
===Cannot find rdmacm library when compiling on GPC===&lt;br /&gt;
&lt;br /&gt;
I get the following error building my code on GPC: &amp;quot;&amp;lt;tt&amp;gt;ld: cannot find -lrdmacm&amp;lt;/tt&amp;gt;&amp;quot;.  Where can I find this library?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
This library is part of the MPI libraries; if your compiler is having problems picking it up, it probably means you are mistakenly trying to compile on the login nodes (scinet01..scinet04).  The login nodes aren't part of the GPC; they are for logging into the data centre only.  From there you must go to the GPC or TCS development nodes to do any real work.&lt;br /&gt;
&lt;br /&gt;
=== Why do I get this error when I try to compile: &amp;quot;icpc: error #10001: could not find directory in which /usr/bin/g++41 resides&amp;quot; ?===&lt;br /&gt;
&lt;br /&gt;
You are trying to compile on the login nodes.   As described in the wiki ( https://support.scinet.utoronto.ca/wiki/index.php/GPC_Quickstart#Login ), or in the users guide you would have received with your account,   Scinet supports two main clusters, with very different architectures.  Compilation must be done on the development nodes of the appropriate cluster (in this case, gpc01-04).   Thus, log into gpc01, gpc02, gpc03, or gpc04, and compile from there.&lt;br /&gt;
&lt;br /&gt;
==Testing your Code==&lt;br /&gt;
&lt;br /&gt;
=== Can I run a something for a short time on the development nodes? ===&lt;br /&gt;
&lt;br /&gt;
I am in the process of playing around with the mpi calls in my code to get it to work. I do a lot of tests and each of them takes a couple of seconds only.  Can I do this on the development nodes?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Yes, as long as it's very brief (a few minutes).   People use the development nodes&lt;br /&gt;
for their work, and you don't want to bog it down for people, and testing a real&lt;br /&gt;
code can chew up a lot more resources than compiling, etc.    The procedures differ&lt;br /&gt;
depending on what machine you're using.&lt;br /&gt;
&lt;br /&gt;
==== TCS ====&lt;br /&gt;
&lt;br /&gt;
On the TCS you can run small MPI jobs on the tcs02 node, which is meant for &lt;br /&gt;
development use.  But even for this test run on one node, you'll need a host file --&lt;br /&gt;
a list of hosts (in this case, all tcs-f11n06, which is the `real' name of tcs02)&lt;br /&gt;
that the job will run on.  Create a file called `hostfile' containing the following:&lt;br /&gt;
&lt;br /&gt;
 tcs-f11n06&lt;br /&gt;
 tcs-f11n06&lt;br /&gt;
 tcs-f11n06&lt;br /&gt;
 tcs-f11n06&lt;br /&gt;
&lt;br /&gt;
for a 4-task run.  When you invoke &amp;quot;poe&amp;quot; or &amp;quot;mpirun&amp;quot;, there are runtime&lt;br /&gt;
arguments that you specify pointing to this file.  You can also specify it&lt;br /&gt;
in an environment variable MP_HOSTFILE, so, if your file is in your /scratch directory, say &lt;br /&gt;
${SCRATCH}/hostfile, then you would do&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
 export MP_HOSTFILE=${SCRATCH}/hostfile&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
in your shell.  You will also need to create a &amp;lt;tt&amp;gt;.rhosts&amp;lt;/tt&amp;gt; file in your &lt;br /&gt;
home director, again listing &amp;lt;tt&amp;gt;tcs-f11n06&amp;lt;/tt&amp;gt; so that &amp;lt;tt&amp;gt;poe&amp;lt;/tt&amp;gt;&lt;br /&gt;
can start jobs.   After that you can simply run your program.  You can use&lt;br /&gt;
mpiexec:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
 mpiexec -n 4 my_test_program&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
adding &amp;lt;tt&amp;gt; -hostfile /path/to/my/hostfile&amp;lt;/tt&amp;gt; if you did not set the environment&lt;br /&gt;
variable above.  Alternatively, you can run it with the poe command (do a &amp;quot;man poe&amp;quot; for details), or even by&lt;br /&gt;
just directly running it.  In this case the number of MPI processes will by default&lt;br /&gt;
be the number of entries in your hostfile.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== GPC ====&lt;br /&gt;
&lt;br /&gt;
On the GPC one can run short test jobs on the GPC [[GPC_Quickstart#Compile.2FDevel_Nodes | development nodes ]]&amp;lt;tt&amp;gt;gpc01&amp;lt;/tt&amp;gt;..&amp;lt;tt&amp;gt;gpc04&amp;lt;/tt&amp;gt;;&lt;br /&gt;
if they are single-node jobs (which they should be) they don't need a hostfile.  Even better, though, is to request an [[ Moab#Interactive | interactive ]] job and run the tests either in regular batch queue or using a short high availability [[ Moab#debug | debug ]] queue that is reserved for this purpose.&lt;br /&gt;
&lt;br /&gt;
=== How do I run a longer (but still shorter than an hour) test job quickly ? ===&lt;br /&gt;
&lt;br /&gt;
'''Answer'''&lt;br /&gt;
&lt;br /&gt;
On the GPC there is a high turnover short queue called [[ Moab#debug | debug ]] that is designed for&lt;br /&gt;
this purpose.  You can use it by adding &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#PBS -q debug&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
to your submission script.&lt;br /&gt;
&lt;br /&gt;
==Running your jobs==&lt;br /&gt;
&lt;br /&gt;
===My job can't write to /home===&lt;br /&gt;
&lt;br /&gt;
My code works fine when I test on the development nodes, but when I submit a job, or even run interactively in the development queue on GPC, it fails.  What's wrong?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
As [[Data_Management#Home_Disk_Space | discussed]] [https://support.scinet.utoronto.ca/wiki/images/5/54/SciNet_Tutorial.pdf elsewhere], &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; is mounted read-only on the compute nodes; you can only write to &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; from the login nodes and devel nodes.  (The [[GPC_Quickstart#128Glargemem | largemem nodes]] on GPC, in this respect, are more like devel nodes than compute nodes).   In general, to run jobs you can read from &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; but you'll have to write to &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt; (or, if you were allocated space through the RAC process, on &amp;lt;tt&amp;gt;/project&amp;lt;/tt&amp;gt;).  More information on SciNet filesytems can be found on our [[Data_Management | Data Management]] page.&lt;br /&gt;
&lt;br /&gt;
===Error Submitting My Job: qsub: Bad UID for job execution MSG=ruserok failed ===&lt;br /&gt;
&lt;br /&gt;
I write up a submission script as in the examples, but when I attempt to submit the job, I get the above error.  What's wrong?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
This error will occur if you try to submit a job from the login nodes.   The login nodes are the gateway to all of SciNet's systems (GPC, TCS, P7, ARC), which have different hardware and queueing systems.  To submit a job, you must log into a development node for the particular cluster you are submitting to and submit from there.&lt;br /&gt;
&lt;br /&gt;
===OpenMP on the TCS===&lt;br /&gt;
&lt;br /&gt;
How do I run an OpenMP job on the TCS?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please look at the [[TCS_Quickstart#Submission_Script_for_an_OpenMP_Job | TCS Quickstart ]] page.&lt;br /&gt;
&lt;br /&gt;
===Can I can use hybrid codes consisting of MPI and openMP on the GPC?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Yes. Please look at the [[GPC_Quickstart#Hybrid_MPI.2FOpenMP_jobs | GPC Quickstart ]] page.&lt;br /&gt;
&lt;br /&gt;
===How do I run serial jobs on GPC?===&lt;br /&gt;
&lt;br /&gt;
'''Answer''':&lt;br /&gt;
&lt;br /&gt;
So it should be said first that SciNet is a parallel computing resource, &lt;br /&gt;
and our priority will always be parallel jobs.   Having said that, if &lt;br /&gt;
you can make efficient use of the resources using serial jobs and get &lt;br /&gt;
good science done, that's good too, and we're happy to help you.&lt;br /&gt;
&lt;br /&gt;
The GPC nodes each have 8 processing cores, and making efficient use of these &lt;br /&gt;
nodes means using all eight cores.  As a result, we'd like to have the &lt;br /&gt;
users take up whole nodes (eg, run multiples of 8 jobs) at a time.  &lt;br /&gt;
&lt;br /&gt;
It depends on the nature of your job what the best strategy is. Several approaches are presented on the [[User_Serial|serial run wiki page]].&lt;br /&gt;
&lt;br /&gt;
===Why can't I request only a single cpu for my job on GPC?===&lt;br /&gt;
&lt;br /&gt;
'''Answer''':&lt;br /&gt;
&lt;br /&gt;
On GPC, computers are allocated by the node - that is, in chunks of 8 processors.   If you want to run a job that requires only one processor, you need to bundle the jobs into groups of 8, so as to not be wasting the other 7 for 48 hours. See [[User_Serial|serial run wiki page]].&lt;br /&gt;
&lt;br /&gt;
===How do I run serial jobs on TCS?===&lt;br /&gt;
&lt;br /&gt;
'''Answer''': You don't.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===But in the queue I found a user who is running jobs on GPC, each of which is using only one processor, so why can't I?===&lt;br /&gt;
&lt;br /&gt;
'''Answer''':&lt;br /&gt;
&lt;br /&gt;
The pradat* and atlaspt* jobs, amongst others, are jobs of the ATLAS high energy physics project. That they are reported as single cpu jobs is an artifact of the moab scheduler. They are in fact being automatically bundled into 8-job bundles but have to run individually to be compatible with their international grid-based systems.&lt;br /&gt;
&lt;br /&gt;
===How do I use the ramdisk on GPC?===&lt;br /&gt;
&lt;br /&gt;
To use the ramdisk, create and read to / write from files in /dev/shm/.. just as one would to (eg) ${SCRATCH}. Only the amount of RAM needed to store the files will be taken up by the temporary file system; thus if you have 8 serial jobs each requiring 1 GB of RAM, and 1GB is taken up by various OS services, you would still have approximately 7GB available to use as ramdisk on a 16GB node. However, if you were to write 8 GB of data to the RAM disk, this would exceed available memory and your job would likely crash.&lt;br /&gt;
&lt;br /&gt;
It is very important to delete your files from ram disk at the end of your job. If you do not do this, the next user to use that node will have less RAM available than they might expect, and this might kill their jobs.&lt;br /&gt;
&lt;br /&gt;
''More details on how to setup your script to use the ramdisk can be found on the [[User_Ramdisk|Ramdisk wiki page]].''&lt;br /&gt;
&lt;br /&gt;
===How can I automatically resubmit a job?===&lt;br /&gt;
&lt;br /&gt;
Commonly you may have a job that you know will take longer to run than what is &lt;br /&gt;
permissible in the queue.  As long as your program contains [[Checkpoints|checkpoint]] or &lt;br /&gt;
restart capability, you can have one job automatically submit the next. In&lt;br /&gt;
the following example it is assumed that the program finishes before &lt;br /&gt;
the 48 hour limit and then resubmits itself by logging into one&lt;br /&gt;
of the development nodes.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque example submission script for auto resubmission&lt;br /&gt;
# SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=48:00:00&lt;br /&gt;
#PBS -N my_job&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# YOUR CODE HERE&lt;br /&gt;
./run_my_code&lt;br /&gt;
&lt;br /&gt;
# RESUBMIT 10 TIMES HERE&lt;br /&gt;
num=$NUM&lt;br /&gt;
if [ $num -lt 10 ]; then&lt;br /&gt;
      num=$(($num+1))&lt;br /&gt;
      ssh gpc01 &amp;quot;cd $PBS_O_WORKDIR; qsub ./script_name.sh -v NUM=$num&amp;quot;;&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub script_name.sh -v&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can alternatively use [[ Moab#Job_Dependencies | Job dependencies ]] through the queuing system which will not start one job until another job has completed.&lt;br /&gt;
&lt;br /&gt;
If your job can't be made to automatically stop before the 48 hour queue window, but it does write out checkpoints, you can use the timeout command to stop the program while you still have time to resubmit; for instance&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
    timeout 2850m ./run_my_code argument1 argument2&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
will run the program for 47.5 hours (2850 minutes), and then send it SIGTERM to exit the program.&lt;br /&gt;
&lt;br /&gt;
===How can I pass in arguments to my submission script?===&lt;br /&gt;
&lt;br /&gt;
If you wish to make your scripts more generic you can use qsub's ability &lt;br /&gt;
to pass in environment variables to pass in arguments to your script.&lt;br /&gt;
The following example shows a case where an input and an output &lt;br /&gt;
file are passed in on the qsub line. Multiple variables can be &lt;br /&gt;
passed in using the qsub &amp;quot;-v&amp;quot; option and comma delimited. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque example of passing in arguments&lt;br /&gt;
# SciNet GPC&lt;br /&gt;
# &lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=48:00:00&lt;br /&gt;
#PBS -N my_job&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# YOUR CODE HERE&lt;br /&gt;
./run_my_code -f $INFILE -o $OUTFILE&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub script_name.sh -v INFILE=input.txt,OUTFILE=outfile.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== How can I run a job longer than 48 hours? ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
The SciNet queues have a queue limit of 48 hours.   This is pretty typical for systems of its size in Canada and elsewhere, and larger systems commonly have shorter limits.   The limits are there to ensure that every user gets a fair share of the system (so that no one user ties up lots of nodes for a long time), and for safety (so that if one memory board in one node fails in the middle of a very long job, you haven't lost a months' worth of work).&lt;br /&gt;
&lt;br /&gt;
Since many of us have simulations that require more than that much time, most widely-used scientific applications have &amp;quot;checkpoint-restart&amp;quot; functionality, where every so often the complete state of the calculation is stored as a checkpoint file, and one can restart a simulation from one of these.   In fact, these restart files tend to be quite useful for a number of purposes.&lt;br /&gt;
&lt;br /&gt;
If your job will take longer, you will have to submit your job in multiple parts, restarting from a checkpoint each time.  In this way, one can run a simulation much longer than the queue limit.  In fact, one can even write job scripts which automatically re-submit themselves until a run is completed, using [[FAQ#How_can_I_automatically_resubmit_a_job.3F | automatic resubmission. ]]&lt;br /&gt;
&lt;br /&gt;
=== Why did showstart say it would take 3 hours for my job to start before, and now it says my job will start in 10 hours? ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please look at the [[FAQ#How_do_priorities_work.2Fwhy_did_that_job_jump_ahead_of_mine_in_the_queue.3F | How do priorities work/why did that job jump ahead of mine in the queue? ]] page.&lt;br /&gt;
&lt;br /&gt;
===How do priorities work/why did that job jump ahead of mine in the queue?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
The [[Moab | queueing system]] used on SciNet machines is a [http://en.wikipedia.org/wiki/Priority_queue Priority Queue].  Jobs enter the queue at the back of the queue, and slowly make their way to the front as those ahead of them are run; but a job that enters the queue with a higher priority can `cut in line'.&lt;br /&gt;
&lt;br /&gt;
The main factor which determines priority is whether or not the user (or their PI) has an [http://wiki.scinethpc.ca/wiki/index.php/Application_Process RAC allocation].  These are competitively allocated grants of computer time; there is a call for proposals towards the end of every calendar year.    Users with an allocation have high priorities in an attempt to make sure that they can use the amount of computer time the committees granted them.   Their priority decreases as they approach their allotted usage over the current window of time; by the time that they have exhausted that allotted usage, their priority is the same as users with no allocation (unallocated, or `default' users).    Unallocated users have a fixed, low, priority.&lt;br /&gt;
&lt;br /&gt;
This priority system is called `fairshare'; the scheduler attempts to make sure everyone has their fair share of the machines, where the share that's fair has been determined by the allocation committee.    The fairshare window is a rolling window of two weeks; that is, any time you have a job in the queue, the fairshare calculation of its priority is given by how much of your allocation of the machine has been used in the last 14 days.&lt;br /&gt;
&lt;br /&gt;
A particular allocation might have some fraction of GPC - say 4% of the machine (if the PI had been allocated 10 million CPU hours on GPC). The allocations have labels; (called `Resource Allocation Proposal Identifiers', or RAPIs) they look something like&lt;br /&gt;
&lt;br /&gt;
  abc-123-ab&lt;br /&gt;
&lt;br /&gt;
where abc-123 is the PIs CCRI, and the suffix specifies which of the allocations granted to the PI is to be used.  These can be specified on a job-by-job basis.  On GPC, one adds the line&lt;br /&gt;
 #PBS -A RAPI&lt;br /&gt;
to your script; on TCS, one uses&lt;br /&gt;
 # @ account_no = RAPI&lt;br /&gt;
If the allocation to charge isn't specified, a default is used; each user has such a default, which can be changed at the same portal where one changes one's password, &lt;br /&gt;
&lt;br /&gt;
 https://portal.scinet.utoronto.ca/&lt;br /&gt;
&lt;br /&gt;
A jobs priority is determined primarily by the fairshare priority of the allocation it is being charged to; the previous 14 days worth of use under that allocation is calculated and compared to the allocated fraction (here, 5%) of the machine over that window (here, 14 days).   The fairshare priority is a decreasing function of the allocation left; if there is no allocation left (eg, jobs running under that allocation have already used 379,038 CPU hours in the past 14 days), the priority is the same as that of a user with no granted allocation.   (This last part has been the topic of some debate; as the machine gets more utilized, it will probably be the case that we allow RAC users who have greatly overused their quota to have their priorities to drop below that of unallocated users, to give the unallocated users some chance to run on our increasingly crowded system; this would have no undue effect on our allocated users as they still would be able to use the amount of resources they had been allocated by the committees.)   Note that all jobs charging the same allocation get the same fairshare priority.&lt;br /&gt;
&lt;br /&gt;
There are other factors that go into calculating priority, but fairshare is the most significant.   Other factors include&lt;br /&gt;
* amount of time waiting in queue (measured in units of the requested runtime). A waiting queue job gains priority as it sits in the queue to avoid job starvation. &lt;br /&gt;
* User adjustment of priorities ( See below ).&lt;br /&gt;
&lt;br /&gt;
The major effect of these subdominant terms is to shuffle the order of jobs running under the same allocation.&lt;br /&gt;
&lt;br /&gt;
===How do we manage job priorities within our research group?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Obviously, managing shared resources within a large group - whether it &lt;br /&gt;
is conference funding or CPU time - takes some doing.   &lt;br /&gt;
&lt;br /&gt;
It's important to note that the fairshare periods are intentionally kept &lt;br /&gt;
quite short - just two weeks long. So, for example, let us say that in your resource &lt;br /&gt;
allocation you have about 10% of the machine.   Then for someone to use &lt;br /&gt;
up the whole two week amount of time in 2 days, they'd have to use 70% &lt;br /&gt;
of the machine in those two days - which is unlikely to happen by &lt;br /&gt;
accident.  If that does happen,  &lt;br /&gt;
those using the same allocation as the person who used 70% of the &lt;br /&gt;
machine over the two days will suffer by having much lower priority for &lt;br /&gt;
their jobs, but only for the next 12 days - and even then, if there are &lt;br /&gt;
idle cpus they'll still be able to compute.&lt;br /&gt;
&lt;br /&gt;
There will be online tools for seeing how the allocation is being used, &lt;br /&gt;
and those people who are in charge in your group will be able to use &lt;br /&gt;
that information to manage the users, telling them to dial it down or &lt;br /&gt;
up.   We know that managing a large research group is hard, and we want &lt;br /&gt;
to make sure we provide you the information you need to do your job &lt;br /&gt;
effectively.&lt;br /&gt;
&lt;br /&gt;
One way for users within a group to manage their priorities within the group&lt;br /&gt;
is with [[Moab#Adjusting_Job_Priority | user-adjusted priorities]]; this is&lt;br /&gt;
described in more detail on the [[Moab | Scheduling System]] page.&lt;br /&gt;
&lt;br /&gt;
=== How do I charge jobs to my RAC allocation? ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please see the [[Moab#Accounting|accounting section of Moab page]].&lt;br /&gt;
&lt;br /&gt;
=== How does one check the amount of used CPU-hours in a project, and how does one get statistics for each user in the project? ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
This information is available on the scinet portal,https://portal.scinet.utoronto.ca, See also [[SciNet Usage Reports]].&lt;br /&gt;
&lt;br /&gt;
==Monitoring jobs in the queue==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Why hasn't my job started?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Use the moab command &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
checkjob -v jobid&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and the last couple of lines should explain why a job hasn't started.  &lt;br /&gt;
&lt;br /&gt;
Please see [[Moab| Job Scheduling System (Moab) ]] for more detailed information&lt;br /&gt;
&lt;br /&gt;
===How do I figure out when my job will run?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please see [[Moab#Available_Resources| Job Scheduling System (Moab) ]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- ===My GPC job is Held, and checkjob says &amp;quot;Batch:PolicyViolation&amp;quot; ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
When this happens, you'll see your job stuck in a BatchHold state.  &lt;br /&gt;
This happens because the job you've submitted breaks one of the rules of the queues, and is being held until you modify it or kill it and re-submit a conforming job.  The most common problems are:&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===I submit my GPC job, and I get an email saying it was rejected===&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
This happens because the job you've submitted breaks one of the rules of the queues and is rejected. An email&lt;br /&gt;
is sent with the JOBID, JOBNAME, and the reason it was rejected.  The following is an example where a job&lt;br /&gt;
requests more than 48 hours and was rejected.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PBS Job Id: 3462493.gpc-sched&lt;br /&gt;
Job Name:   STDIN&lt;br /&gt;
job deleted&lt;br /&gt;
Job deleted at request of root@gpc-sched&lt;br /&gt;
MOAB_INFO:  job was rejected - job violates class configuration 'wclimit too high for class 'batch_ib' (345600 &amp;gt; 172800)'&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Jobs on the TCS or GPC may only run for 48 hours at a time; this restriction greatly increases responsiveness of the queue and queue throughput for all our users.  If your computation requires longer than that, as many do, you will have to [[ Checkpoints | checkpoint ]] your job and restart it after each 48-hour queue window.   You can manually re-submit jobs, or if you can have your job cleanly exit before the 48 hour window, there are ways to [[ FAQ#How_can_I_automatically_resubmit_a_job.3F | automatically resubmit jobs ]].&lt;br /&gt;
&lt;br /&gt;
Other rejections return a more cryptic error saying &amp;quot;job violates class configuration&amp;quot; such as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PBS Job Id: 3462409.gpc-sched&lt;br /&gt;
Job Name:   STDIN&lt;br /&gt;
job deleted&lt;br /&gt;
Job deleted at request of root@gpc-sched&lt;br /&gt;
MOAB_INFO:  job was rejected - job violates class configuration 'user required by class 'batch''&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The most common problems that result in this error are:&lt;br /&gt;
&lt;br /&gt;
* '''Incorrect number of processors per node''': Jobs on the GPC are scheduled per-node not per-core and since each node has 8 processor cores (ppn=8) the smallest job allowed is one node with 8 cores (nodes=1:ppn=8).  For serial jobs users must bundle or batch them together in groups of 8. See [[ FAQ#How_do_I_run_serial_jobs_on_GPC.3F | How do I run serial jobs on GPC? ]]&lt;br /&gt;
* '''No number of nodes specified''': Jobs submitted to the main queue must request a specific number of nodes, either in the submission script (with a line like &amp;lt;tt&amp;gt;#PBS -l nodes=2:ppn=8&amp;lt;/tt&amp;gt;) or on the command line (eg, &amp;lt;tt&amp;gt;qsub -l nodes=2:ppn=8,walltime=5:00:00 script.pbs&amp;lt;/tt&amp;gt;).  Note that for the debug queue, you can get away without specifying a number of nodes and a default of one will be assigned; for both technical and policy reasons, we do not enforce such a default for the main (&amp;quot;batch&amp;quot;) queue.&lt;br /&gt;
* '''There is a 15 minute walltime minimum''' on all queues except debug and if you set your walltime less than this, it will be rejected.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Running checkjob on my job gives me messages about JobFail and rejected===&lt;br /&gt;
&lt;br /&gt;
Running checkjob on my job gives me messages that suggest my job has failed, as below: what did I do wrong?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
AName: test&lt;br /&gt;
State: Idle &lt;br /&gt;
Creds:  user:xxxxxx  group:xxxxxxxx  account:xxxxxxxx  class:batch_ib  qos:ibqos&lt;br /&gt;
WallTime:   00:00:00 of 8:00:00&lt;br /&gt;
BecameEligible: Wed Jul 23 10:39:27&lt;br /&gt;
SubmitTime: Wed Jul 23 10:38:22&lt;br /&gt;
  (Time Queued  Total: 00:01:47  Eligible: 00:01:05)&lt;br /&gt;
&lt;br /&gt;
Total Requested Tasks: 8&lt;br /&gt;
&lt;br /&gt;
Req[0]  TaskCount: 8  Partition: ALL  &lt;br /&gt;
Opsys: centos6computeA  Arch: ---  Features: ---&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Notification Events: JobFail&lt;br /&gt;
&lt;br /&gt;
IWD:            /scratch/x/xxxxxxxx/xxxxxxx/xxxxxxx&lt;br /&gt;
Partition List: torque,DDR&lt;br /&gt;
Flags:          RESTARTABLE&lt;br /&gt;
Attr:           checkpoint&lt;br /&gt;
StartPriority:  76&lt;br /&gt;
rejected for Opsys        - (null)&lt;br /&gt;
rejected for State        - (null)&lt;br /&gt;
rejected for Reserved     - (null)&lt;br /&gt;
NOTE:  job req cannot run in partition torque (available procs do not meet requirements : 0 of 8 procs found)&lt;br /&gt;
idle procs: 793  feasible procs:   0&lt;br /&gt;
&lt;br /&gt;
Node Rejection Summary: [Opsys: 117][State: 2895][Reserved: 19]&lt;br /&gt;
&lt;br /&gt;
NOTE:  job violates constraints for partition SANDY (partition SANDY not in job partition mask)&lt;br /&gt;
&lt;br /&gt;
NOTE:  job violates constraints for partition GRAVITY (partition GRAVITY not in job partition mask)&lt;br /&gt;
&lt;br /&gt;
rejected for State        - (null)&lt;br /&gt;
NOTE:  &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
The output from check job is a little cryptic in places, and if you are wondering why your job hasn't started yet, you might think that &amp;quot;rejection&amp;quot; and &amp;quot;JobFail&amp;quot; suggest that there's something wrong.  But the above message is actually normal; you can use the &amp;lt;tt&amp;gt;showstart&amp;lt;/tt&amp;gt; command on your job to get a (preliminary, subject to change) estimate as to when the job will start, and you'll find that it is in fact scheduled to start up in the near future.&lt;br /&gt;
&lt;br /&gt;
In the above message:&lt;br /&gt;
&lt;br /&gt;
* `Notification Events: JobFail` just means that, if notifications are enabled, you'll get a message if the job fails;&lt;br /&gt;
* `job req cannot run in partition torque` just means that the job cannot run just yet (that's why it's queued);&lt;br /&gt;
* `job req cannot run in dynamic partition DDR now (insufficient procs available: 0 &amp;lt; 8)` says why: there aren't processors available; and&lt;br /&gt;
* `job violates constraints for partition SANDY/GRAVITY` just means that the job isn't eligable to run in those paritcular (small) sections of the cluster.&lt;br /&gt;
&lt;br /&gt;
that is, the above output is the normal and expected (if somewhat cryptic) explanation as to why the job is waiting - nothing to worry about.&lt;br /&gt;
&lt;br /&gt;
===How can I monitor my running jobs on TCS?===&lt;br /&gt;
&lt;br /&gt;
How can I monitor the load of TCS jobs?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
You can get more information with the command &lt;br /&gt;
 /xcat/tools/tcs-scripts/LL/jobState.sh&lt;br /&gt;
which I alias as:&lt;br /&gt;
 alias llq1='/xcat/tools/tcs-scripts/LL/jobState.sh'&lt;br /&gt;
If you run &amp;quot;llq1 -n&amp;quot; you will see a listing of jobs together with a lot of information, including the load.&lt;br /&gt;
&lt;br /&gt;
==Errors in running jobs==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
===On GPC, `Job cannot be executed'===&lt;br /&gt;
&lt;br /&gt;
I get error messages like this trying to run on GPC:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PBS Job Id: 30414.gpc-sched&lt;br /&gt;
Job Name:   namd&lt;br /&gt;
Exec host:  gpc-f120n011/7+gpc-f120n011/6+gpc-f120n011/5+gpc-f120n011/4+gpc-f120n011/3+gpc-f120n011/2+gpc-f120n011/1+gpc-f120n011/0&lt;br /&gt;
Aborted by PBS Server &lt;br /&gt;
Job cannot be executed&lt;br /&gt;
See Administrator for help&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
PBS Job Id: 30414.gpc-sched&lt;br /&gt;
Job Name:   namd&lt;br /&gt;
Exec host:  gpc-f120n011/7+gpc-f120n011/6+gpc-f120n011/5+gpc-f120n011/4+gpc-f120n011/3+gpc-f120n011/2+gpc-f120n011/1+gpc-f120n011/0&lt;br /&gt;
An error has occurred processing your job, see below.&lt;br /&gt;
request to copy stageout files failed on node 'gpc-f120n011/7+gpc-f120n011/6+gpc-f120n011/5+gpc-f120n011/4+gpc-f120n011/3+gpc-f120n011/2+gpc-f120n011/1+gpc-f120n011/0' for job 30414.gpc-sched&lt;br /&gt;
&lt;br /&gt;
Unable to copy file 30414.gpc-sched.OU to USER@gpc-f101n084.scinet.local:/scratch/G/GROUP/USER/projects/sim-performance-test/runtime/l/namd/8/namd.o30414&lt;br /&gt;
*** error from copy&lt;br /&gt;
30414.gpc-sched.OU: No such file or directory&lt;br /&gt;
*** end error output&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Try doing the following:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mkdir ${SCRATCH}/.pbs_spool&lt;br /&gt;
ln -s ${SCRATCH}/.pbs_spool ~/.pbs_spool&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This is how all new accounts are setup on SciNet.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; on GPC for compute jobs is mounted as a read-only file system.   &lt;br /&gt;
PBS by default tries to spool its output  files to &amp;lt;tt&amp;gt;${HOME}/.pbs_spool&amp;lt;/tt&amp;gt;&lt;br /&gt;
which fails as it tries to write to a read-only file  &lt;br /&gt;
system.    New accounts at SciNet  get around this by having ${HOME}/.pbs_spool  &lt;br /&gt;
point to somewhere appropriate on &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt;, but if you've deleted that link&lt;br /&gt;
or directory, or had an old account, you will see errors like the above.&lt;br /&gt;
&lt;br /&gt;
'''On Feb 24, the input/output mechanism has been reconfigured to use a local ramdisk as the temporary location, which means that .pbs_spool is no longer needed and this error should not occur anymore.'''&lt;br /&gt;
&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== I couldn't find the  .o output file in the .pbs_spool directory as I used to ===&lt;br /&gt;
&lt;br /&gt;
On Feb 24 2011, the temporary location of standard input and output files was moved from the shared file system ${SCRATCH}/.pbs_spool to the&lt;br /&gt;
node-local directory /var/spool/torque/spool (which resides in ram). The final location after a job has finished is unchanged,&lt;br /&gt;
but to check the output/error of running jobs, users will now have to ssh into the (first) node assigned to the job and look in&lt;br /&gt;
/var/spool/torque/spool.&lt;br /&gt;
&lt;br /&gt;
This alleviates access contention to the temporary directory, especially for those users that are running a lot of jobs, and  reduces the burden on the file system in general.&lt;br /&gt;
&lt;br /&gt;
Note that it is good practice to redirect output to a file rather than to count on the scheduler to do this for you.&lt;br /&gt;
&lt;br /&gt;
=== My GPC job died, telling me `Copy Stageout Files Failed' ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
When a job runs on GPC, the script's standard output and error are redirected to &lt;br /&gt;
&amp;lt;tt&amp;gt;$PBS_JOBID.gpc-sched.OU&amp;lt;/tt&amp;gt; and &amp;lt;tt&amp;gt;$PBS_JOBID.gpc-sched.ER&amp;lt;/tt&amp;gt; in&lt;br /&gt;
/var/spool/torque/spool on the (first) node on which your job is running.  At the end of the job, those .OU and .ER files are copied to where the batch script tells them to be copied, by default &amp;lt;tt&amp;gt;$PBS_JOBNAME.o$PBS_JOBID&amp;lt;/tt&amp;gt; and&amp;lt;tt&amp;gt;$PBS_JOBNAME.e$PBS_JOBID&amp;lt;/tt&amp;gt;.   (You can set those filenames to be something clearer with the -e and -o options in your PBS script.)&lt;br /&gt;
&lt;br /&gt;
When you get errors like this:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
An error has occurred processing your job, see below.&lt;br /&gt;
request to copy stageout files failed on node&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
it means that the copying back process has failed in some way.  There could be a few reasons for this. The first thing to '''make sure that your .bashrc does not produce any output''', as the output-stageout is performed by bash and further output can cause this to fail.&lt;br /&gt;
But it also could have just been a random filesystem error, or it  could be that your job failed spectacularly enough to shortcircuit the normal job-termination process and those files just never got copied.&lt;br /&gt;
&lt;br /&gt;
Write to [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;] if your input/output files got lost, as we will probably be able to retrieve them for you (please supply at least the jobid, and any other information that may be relevant). &lt;br /&gt;
&lt;br /&gt;
Mind you that it is good practice to redirect output to a file rather than depending on the job scheduler to do this for you.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
&lt;br /&gt;
===Another transport will be used instead===&lt;br /&gt;
&lt;br /&gt;
I get error messages like the following when running on the GPC at the start of the run, although the job seems to proceed OK.   Is this a problem?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
[[45588,1],0]: A high-performance Open MPI point-to-point messaging module&lt;br /&gt;
was unable to find any relevant network interfaces:&lt;br /&gt;
&lt;br /&gt;
Module: OpenFabrics (openib)&lt;br /&gt;
  Host: gpc-f101n005&lt;br /&gt;
&lt;br /&gt;
Another transport will be used instead, although this may result in&lt;br /&gt;
lower performance.&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Everything's fine.   The two MPI libraries scinet provides work for both the InifiniBand and the Gigabit Ethernet interconnects, and will always try to use the fastest interconnect available.   In this case, you ran on normal gigabit GPC nodes with no infiniband; but the MPI libraries have no way of knowing this, and try the infiniband first anyway.  This is just a harmless `failover' message; it tried to use the infiniband, which doesn't exist on this node, then fell back on using Gigabit ethernet (`another transport').&lt;br /&gt;
&lt;br /&gt;
With OpenMPI, this can be avoided by not looking for infiniband; eg, by using the option&lt;br /&gt;
&lt;br /&gt;
--mca btl ^openib&lt;br /&gt;
&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===IB Memory Errors, eg &amp;lt;tt&amp;gt; reg_mr Cannot allocate memory &amp;lt;/tt&amp;gt;===&lt;br /&gt;
&lt;br /&gt;
Infiniband requires more memory than ethernet; it can use RDMA (remote direct memory access) transport for which it sets aside registered memory to transfer data.&lt;br /&gt;
&lt;br /&gt;
In our current network configuration, it requires a _lot_ more memory, particularly as you go to larger process counts; unfortunately, that means you can't get around the &amp;quot;I need more memory&amp;quot; problem the usual way, by running on more nodes.   Machines with different memory or &lt;br /&gt;
network configurations may exhibit this problem at higher or lower MPI &lt;br /&gt;
task counts.&lt;br /&gt;
&lt;br /&gt;
Right now, the best workaround is to reduce the number and size of OpenIB queues, using XRC: with the OpenMPI, add the following options to your mpirun command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-mca btl_openib_receive_queues X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32 -mca btl_openib_max_send_size 12288&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
With Intel MPI, you should be able to do&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load intelmpi/4.0.3.008&lt;br /&gt;
mpirun -genv I_MPI_FABRICS=shm:ofa  -genv I_MPI_OFA_USE_XRC=1 -genv I_MPI_OFA_DYNAMIC_QPS=1 -genv I_MPI_DEBUG=5 -np XX ./mycode&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
to the same end.  &lt;br /&gt;
&lt;br /&gt;
For more information see [[GPC MPI Versions]].&lt;br /&gt;
&lt;br /&gt;
===My compute job fails, saying &amp;lt;tt&amp;gt;libpng12.so.0: cannot open shared object file&amp;lt;/tt&amp;gt; or &amp;lt;tt&amp;gt;libjpeg.so.62: cannot open shared object file&amp;lt;/tt&amp;gt;===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
To maximize the amount of memory available for compute jobs, the compute nodes have a less complete system image than the development nodes.   In particular, since interactive graphics libraries like matplotlib and gnuplot are usually used interactively, the libraries for their use are included in the devel nodes' image but not the compute nodes.&lt;br /&gt;
&lt;br /&gt;
Many of these extra libraries are, however, available in the &amp;quot;extras&amp;quot; module.   So adding a &amp;quot;module load extras&amp;quot; to your job submission  script - or, for overkill, to your .bashrc - should enable these scripts to run on the compute nodes.&lt;br /&gt;
&lt;br /&gt;
==Data on SciNet disks==&lt;br /&gt;
&lt;br /&gt;
===How do I find out my disk usage?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
The standard unix/linux utilities for finding the amount of disk space used by a directory are very slow, and notoriously inefficient on the GPFS filesystems that we run on the SciNet systems.  There are utilities that very quickly report your disk usage:&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;'''diskUsage'''&amp;lt;/tt&amp;gt; command, available with the 'extras' module on the login nodes, datamovers and the GPC devel nodes, provides information in a number of ways on the home, scratch, and project file systems. For instance, how much disk space is being used by yourself and your group (with the -a option), or how much your usage has changed over a certain period (&amp;quot;delta information&amp;quot;) or you may generate plots of your usage over time.&lt;br /&gt;
This information is only updated hourly!&lt;br /&gt;
&lt;br /&gt;
More information about these filesystems is available at the [[Data_Management | Data_Management]].&lt;br /&gt;
&lt;br /&gt;
===How do I transfer data to/from SciNet?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
All incoming connections to SciNet go through relatively low-speed connections to the &amp;lt;tt&amp;gt;login.scinet&amp;lt;/tt&amp;gt; gateways, so using scp to copy files the same way you ssh in is not an effective way to move lots of data.  Better tools are described in our page on [[Data_Management#Data_Transfer | Data Transfer]].&lt;br /&gt;
&lt;br /&gt;
===My group works with data files of size 1-2 GB.  Is this too large to  transfer by scp to login.scinet.utoronto.ca ?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Generally, occasion transfers of data less than 10GB is perfectly acceptible to so through the login nodes. See [[Data_Management#Data_Transfer | Data Transfer]].&lt;br /&gt;
&lt;br /&gt;
===How can I check if I have files in /scratch that are scheduled for automatic deletion?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please see [[Storage_Quickstart#Scratch_Disk_Purging_Policy | Storage At SciNet]]&lt;br /&gt;
&lt;br /&gt;
===How to allow my supervisor to manage files for me using ACL-based commands?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please see [[Data_Management#File.2FOwnership_Management_.28ACL.29 | File/Ownership Management]]&lt;br /&gt;
&lt;br /&gt;
===Can we buy extra storage space on SciNet?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
Yes, please see [[Data_Management#Buying_storage_space_on_GPFS_or_HPSS | Buying storage space on GPFS or HPSS ]] for more details.&lt;br /&gt;
&lt;br /&gt;
===Can I transfer files between BGQ and HPSS?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
Yes, please see [https://support.scinet.utoronto.ca/wiki/index.php/BGQ#Bridge_to_HPSS Bridge to HPSS ]  for more details.&lt;br /&gt;
&lt;br /&gt;
==Keep 'em Coming!==&lt;br /&gt;
&lt;br /&gt;
===Next question, please===&lt;br /&gt;
&lt;br /&gt;
Send your question to [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;];  we'll answer it asap!&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=FAQ&amp;diff=7165</id>
		<title>FAQ</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=FAQ&amp;diff=7165"/>
		<updated>2014-08-20T15:45:38Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: /* I submit my GPC job, and I get an email saying it was rejected */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__TOC__&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==The Basics==&lt;br /&gt;
===Whom do I contact for support?===&lt;br /&gt;
&lt;br /&gt;
Whom do I contact if I have problems or questions about how to use the SciNet systems?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
E-mail [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;]  &lt;br /&gt;
&lt;br /&gt;
In your email, please include the following information:&lt;br /&gt;
&lt;br /&gt;
* your username on SciNet&lt;br /&gt;
* the cluster that your question pertains to (GPC or TCS; SciNet is not a cluster!),&lt;br /&gt;
* any relevant error messages&lt;br /&gt;
* the commands you typed before the errors occured&lt;br /&gt;
* the path to your code (if applicable)&lt;br /&gt;
* the location of the job scripts (if applicable)&lt;br /&gt;
* the directory from which it was submitted (if applicable)&lt;br /&gt;
* a description of what it is supposed to do (if applicable)&lt;br /&gt;
* if your problem is about connecting to SciNet, the type of computer you are connecting from.&lt;br /&gt;
&lt;br /&gt;
Note that your password should never, never, never be to sent to us, even if your question is about your account.&lt;br /&gt;
&lt;br /&gt;
Try to avoid sending email only to specific individuals at SciNet. Your chances of a quick reply increase significantly if you email our team!&lt;br /&gt;
&lt;br /&gt;
===What does ''code scaling'' mean?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please see [[Introduction_To_Performance#Parallel_Speedup|A Performance Primer]]&lt;br /&gt;
&lt;br /&gt;
===What do you mean by ''throughput''?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please see [[Introduction_To_Performance#Throughput|A Performance Primer]].&lt;br /&gt;
&lt;br /&gt;
Here is a simple example:&lt;br /&gt;
&lt;br /&gt;
Suppose you need to do 10 computations.  Say each of these runs for&lt;br /&gt;
1 day on 8 cores, but they take &amp;quot;only&amp;quot; 18 hours on 16 cores.  What is the&lt;br /&gt;
fastest way to get all 10 computations done - as 8-core jobs or as&lt;br /&gt;
16-core jobs?  Let us assume you have 2 nodes at your disposal.&lt;br /&gt;
The answer, after some simple arithmetic, is that running your 10&lt;br /&gt;
jobs as 8-core jobs will take 5 days, whereas if you ran them&lt;br /&gt;
as 16-core jobs it would take 7.5 days.  Take your own conclusions...&lt;br /&gt;
&lt;br /&gt;
===I changed my .bashrc/.bash_profile and now nothing works===&lt;br /&gt;
&lt;br /&gt;
The default startup scripts provided by SciNet, and guidelines for them, can be found [[Important_.bashrc_guidelines|here]].  Certain things - like sourcing &amp;lt;tt&amp;gt;/etc/profile&amp;lt;/tt&amp;gt;&lt;br /&gt;
and &amp;lt;tt&amp;gt;/etc/bashrc&amp;lt;/tt&amp;gt; are ''required'' for various SciNet routines to work!   &lt;br /&gt;
&lt;br /&gt;
If the situation is so bad that you cannot even log in, please send email [mailto:support@scinet.utoronto.ca support].&lt;br /&gt;
&lt;br /&gt;
===Could I have my login shell changed to (t)csh?===&lt;br /&gt;
&lt;br /&gt;
The login shell used on our systems is bash. While the tcsh is available on the GPC and the TCS, we do not support it as the default login shell at present.  So &amp;quot;chsh&amp;quot; will not work, but you can always run tcsh interactively. Also, csh scripts will be executed correctly provided that they have the correct &amp;quot;shebang&amp;quot; &amp;lt;tt&amp;gt;#!/bin/tcsh&amp;lt;/tt&amp;gt; at the top.&lt;br /&gt;
&lt;br /&gt;
===How can I run Matlab / IDL / Gaussian / my favourite commercial software at SciNet?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Because SciNet serves such a disparate group of user communities, there is just no way we can buy licenses for everyone's commercial package.   The only commercial software we have purchased is that which in principle can benefit everyone -- fast compilers and math libraries (Intel's on GPC, and IBM's on TCS).&lt;br /&gt;
&lt;br /&gt;
If your research group requires a commercial package that you already have or are willing to buy licenses for, contact us at [mailto:support@scinet.utoronto.ca support@scinet] and we can work together to find out if it is feasible to implement the packages licensing arrangement on the SciNet clusters, and if so, what is the the best way to do it.&lt;br /&gt;
&lt;br /&gt;
Note that it is important that you contact us before installing commercially licensed software on SciNet machines, even if you have a way to do it in your own directory without requiring sysadmin intervention.   It puts us in a very awkward position if someone is found to be running unlicensed or invalidly licensed software on our systems, so we need to be aware of what is being installed where.&lt;br /&gt;
&lt;br /&gt;
===Do you have a recommended ssh program that will allow scinet access from Windows machines?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
The [[Ssh#SSH_for_Windows_Users | SSH for Windows users]] programs we recommend are:&lt;br /&gt;
&lt;br /&gt;
* [http://mobaxterm.mobatek.net/en/ MobaXterm] is a tabbed ssh client with some Cygwin tools, including ssh and X, all wrapped up into one executable.&lt;br /&gt;
* [http://www.chiark.greenend.org.uk/~sgtatham/putty/ PuTTY]  - this is a terminal for windows that connects via ssh.  It is a quick install and will get you up and running quickly.&amp;lt;br&amp;gt;To set up your passphrase protected ssh key with putty, see [http://the.earth.li/~sgtatham/putty/0.61/htmldoc/Chapter8.html#pubkey here].&lt;br /&gt;
* [http://www.cygwin.com/ CygWin] - this is a whole linux-like environment for windows, which also includes an X window server so that you can display remote windows on your desktop.  Make sure you include the openssh and X window system in the installation for full functionality.  This is recommended if you will be doing a lot of work on Linux machines, as it makes a very similar environment available on your computer.&amp;lt;br&amp;gt;To set up your ssh keys, following the Linux instruction on the [[Ssh keys]] page.&lt;br /&gt;
&amp;lt;br&amp;gt;To set up your ssh keys, following the Linux instruction on the [[Ssh keys]] page.&lt;br /&gt;
&lt;br /&gt;
===My ssh key does not work! WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
[[Ssh_keys#Testing_Your_Key | Testing Your Key]]&lt;br /&gt;
&lt;br /&gt;
* If this doesn't work, you should be able to login using your password, and investigate the problem. For example, if during a login session you get an message similar to the one below, just follow the instruction and delete the offending key on line 3 (you can use vi to jump to that line with ESC plus : plus 3). That only means that you may have logged in from your home computer to SciNet in the past, and that key is obsolete.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh USERNAME@login.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@**@@@@@@@@@@@@@@@@@@@@@@@@@@@@@&lt;br /&gt;
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @&lt;br /&gt;
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@**@@@@@@@@@@@@@@@@@@@@@@@@@@@@@&lt;br /&gt;
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!&lt;br /&gt;
Someone could be eavesdropping on you right now (man-in-the-middle&lt;br /&gt;
attack)!&lt;br /&gt;
It is also possible that the RSA host key has just been changed.&lt;br /&gt;
The fingerprint for the RSA key sent by the remote host is&lt;br /&gt;
53:f9:60:71:a8:0b:5d:74:83:52:**fe:ea:1a:9e:cc:d3.&lt;br /&gt;
Please contact your system administrator.&lt;br /&gt;
Add correct host key in /home/&amp;lt;user&amp;gt;/.ssh/known_hosts to get rid of&lt;br /&gt;
this message.&lt;br /&gt;
Offending key in /home/&amp;lt;user&amp;gt;/.ssh/known_hosts:3&lt;br /&gt;
RSA host key for login.scinet.utoronto.ca &lt;br /&gt;
&amp;lt;http://login.scinet.utoronto.ca &amp;lt;http://login.scinet.utoronto.ca&amp;gt;&amp;gt; has&lt;br /&gt;
changed and you have requested&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* If you get the message below you may need to logout of your gnome session and log back in since the ssh-agent needs to be&lt;br /&gt;
restarted with the new passphrase ssh key.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh USERNAME@login.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
Agent admitted failure to sign using the key.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Can't forward X:  &amp;quot;Warning: No xauth data; using fake authentication data&amp;quot;, or &amp;quot;X11 connection rejected because of wrong authentication.&amp;quot;===&lt;br /&gt;
&lt;br /&gt;
I used to be able to forward X11 windows from SciNet to my home machine, but now I'm getting these messages; what's wrong?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
This very likely means that ssh/xauth can't update your ${HOME}/.Xauthority file. &lt;br /&gt;
&lt;br /&gt;
The simplest pssible reason for this is that you've filled your 10GB /home quota and so can't write anything to your home directory.   Use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load extras&lt;br /&gt;
$ diskUsage&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
&lt;br /&gt;
to check to see how close you are to your disk usage on ${HOME}.&lt;br /&gt;
&lt;br /&gt;
Alternately, this could mean your .Xauthority file has become broken/corrupted/confused some how, in which case you can delete that file, and when you next log in you'll get a similar warning message involving creating .Xauthority, but things should work.&lt;br /&gt;
&lt;br /&gt;
===How come I can not login to TCS?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
A SciNet account doesn't automatically entitle you to TCS access. At a minimum, TCS jobs need to run on at least 32 cores (64 preferred because of Simultaneous Multi Threading - [[TCS_Quickstart#Node_configuration|SMT]] - on these nodes) and need the large memory (4GB/core) and bandwidth on the system. Essentially you need to be able to explain why the work can't be done on the GPC.&lt;br /&gt;
&lt;br /&gt;
===How can I reset the password for my Compute Canada account?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
You can reset your password for your Compute Canada account here:&lt;br /&gt;
&lt;br /&gt;
https://ccdb.computecanada.org/security/forgot&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===How can I change or reset the password for my SciNet account?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
To reset your password at SciNet please go to [https://portal.scinet.utoronto.ca/password_resets Password reset page].&lt;br /&gt;
&lt;br /&gt;
If you know your old password and want to change it, that can be done here:&lt;br /&gt;
&lt;br /&gt;
https://portal.scinet.utoronto.ca/change_password&lt;br /&gt;
&lt;br /&gt;
===Why am I getting the error &amp;quot;Permission denied (publickey,gssapi-with-mic,password)&amp;quot;?===&lt;br /&gt;
&lt;br /&gt;
This error can pop up in a variety of situations: when trying to log in, or when after a job has finished, when the error and output files fail to be copied (there are other possible reasons for this failure as well -- see [[FAQ#My_GPC_job_died.2C_telling_me_.60Copy_Stageout_Files_Failed.27|My GPC job died, telling me:Copy Stageout Files Failed]]).&lt;br /&gt;
In most cases, the &amp;quot;Permission denioed&amp;quot; error is caused by incorrect permission of the (hidden) .ssh directory. Ssh is used for logging in as well as for the copying of the standard error and output files after a job. &lt;br /&gt;
&lt;br /&gt;
For security reasons, &lt;br /&gt;
the directory .ssh should only be writable and readable to you, but yours &lt;br /&gt;
has read permission for everybody, and thus it fails.  You can change &lt;br /&gt;
this by&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   chmod 700 ~/.ssh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
And to be sure, also do&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   chmod 600 ~/.ssh/id_rsa ~/.ssh/id_rsa.pub ~/authorized_keys&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===ERROR:102: Tcl command execution failed? when loading modules ===&lt;br /&gt;
Modules sometimes require other modules to be loaded first.&lt;br /&gt;
Module will let you know if you didn’t.&lt;br /&gt;
For example:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module purge&lt;br /&gt;
$ module load python&lt;br /&gt;
python/2.6.2(11):ERROR:151: Module ’python/2.6.2’ depends on one of the module(s) ’gcc/4.4.0’&lt;br /&gt;
python/2.6.2(11):ERROR:102: Tcl command execution failed: prereq gcc/4.4.0&lt;br /&gt;
$ gpc-f103n084-$ module load gcc python&lt;br /&gt;
$&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Compiling your Code==&lt;br /&gt;
&lt;br /&gt;
===How can I get g77 to work?===&lt;br /&gt;
&lt;br /&gt;
The fortran 77 compilers on the GPC are ifort and gfortran. We have dropped support for g77.  This has been a conscious decision. g77 (and the associated library libg2c) were completely replaced six years ago (Apr 2005) by the gcc 4.x branch, and haven't undergone any updates at all, even bug fixes, for over five years.  &lt;br /&gt;
If we would install g77 and libg2c, we would have to deal with the inevitable confusion caused when users accidentally link against the old, broken, wrong versions of the gcc libraries instead of the correct current versions.   &lt;br /&gt;
&lt;br /&gt;
If your code for some reason specifically requires five-plus-year-old libraries,  availability, compatibility, and unfixed-known-bug problems are only going to get worse for you over time, and this might be as good an opportunity as any to address those issues. &lt;br /&gt;
&lt;br /&gt;
''A note on porting to gfortran or ifort:''&lt;br /&gt;
&lt;br /&gt;
While gfortran and ifort are rather compatible with g77, one &lt;br /&gt;
important difference is that by default, gfortran does not preserve &lt;br /&gt;
local variables between function calls, while g77 does.   Preserved &lt;br /&gt;
local variables are for instance often used in implementations of quasi-random number &lt;br /&gt;
generators.  Proper fortran requires to declare such variables as SAVE &lt;br /&gt;
but not all old code does this.&lt;br /&gt;
Luckily, you can change gfortran's default behavior with the flag &lt;br /&gt;
&amp;lt;tt&amp;gt;-fno-automatic&amp;lt;/tt&amp;gt;.   For ifort, the corresponding flag is &amp;lt;tt&amp;gt;-noautomatic&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
===Where is libg2c.so?===&lt;br /&gt;
&lt;br /&gt;
libg2c.so is part of the g77 compiler, for which we dropped support. See [[#How can I get g77 to work on the GPC?]] for our reasons.&lt;br /&gt;
&lt;br /&gt;
===Autoparallelization does not work!===&lt;br /&gt;
&lt;br /&gt;
I compiled my code with the &amp;lt;tt&amp;gt;-qsmp=omp,auto&amp;lt;/tt&amp;gt; option, and then I specified that it should be run with 64 threads - with &lt;br /&gt;
 export OMP_NUM_THREADS=64&lt;br /&gt;
&lt;br /&gt;
However, when I check the load using &amp;lt;tt&amp;gt;llq1 -n&amp;lt;/tt&amp;gt;, it shows a load on the node of 1.37.  Why?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Using the autoparallelization will only get you so far.  In fact, it usually does not do too much.  What is helpful is to run the compiler with the &amp;lt;tt&amp;gt;-qreport&amp;lt;/tt&amp;gt; option, and then read the output listing carefully to see where the compiler thought it could parallelize, where it could not, and the reasons for this.  Then you can go back to your code and carefully try to address each of the issues brought up by the compiler.&lt;br /&gt;
We ''emphasize'' that this is just a rough first guide, and that the compilers are still not magical!   For more sophisticated approaches to parallelizing your code, email us at [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;]  to set up an appointment with one&lt;br /&gt;
of our technical analysts.&lt;br /&gt;
&lt;br /&gt;
===How do I link against the Intel Math Kernel Library?===&lt;br /&gt;
&lt;br /&gt;
If you need to link in the Intel Math Kernel Library (MKL) libraries, you are well advised to use the Intel(R) Math Kernel Library Link Line Advisor: http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/ for help in devising the list of libraries to link with your code.&lt;br /&gt;
&lt;br /&gt;
'''''Note that this give the link line for the command line. When using this in Makefiles, replace $MKLPATH by ${MKLPATH}.'''''&lt;br /&gt;
&lt;br /&gt;
'''''Note too that, unless the integer arguments you will be passing to the MKL libraries are actually 64-bit integers, rather than the normal int or INTEGER types, you want to specify 32-bit integers (lp64) .'''''&lt;br /&gt;
&lt;br /&gt;
===Can the compilers on the login nodes be disabled to prevent accidentally using them?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
You can accomplish this by modifying your .bashrc to not load the compiler modules. See [[Important .bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
===&amp;quot;relocation truncated to fit: R_X86_64_PC32&amp;quot;: Huh?===&lt;br /&gt;
&lt;br /&gt;
What does this mean, and why can't I compile this code?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Welcome to the joys of the x86 architecture!  You're probably having trouble building arrays larger than 2GB, individually or together.   Generally, you have to try to use the medium or large x86 `memory model'.   For the intel compilers, this is specified with the compile options&lt;br /&gt;
&lt;br /&gt;
  -mcmodel=medium -shared-intel&lt;br /&gt;
&lt;br /&gt;
===&amp;quot;feupdateenv is not implemented and will always fail&amp;quot;===&lt;br /&gt;
&lt;br /&gt;
How do I get rid of this and what does it mean?&lt;br /&gt;
 &lt;br /&gt;
'''Answer:'''&lt;br /&gt;
First note that, as ominous as it sounds, this is really just a warning, and has to do with the intel math library. You can ignore it (unless you really are trying to manually change the exception handlers for floating point exceptions such as divide by zero), or take the safe road and get rid off it by linking with the intel math functions library:&amp;lt;pre&amp;gt;-limf&amp;lt;/pre&amp;gt;See also [[#How do I link against the Intel Math Kernel Library?]]&lt;br /&gt;
&lt;br /&gt;
===Cannot find rdmacm library when compiling on GPC===&lt;br /&gt;
&lt;br /&gt;
I get the following error building my code on GPC: &amp;quot;&amp;lt;tt&amp;gt;ld: cannot find -lrdmacm&amp;lt;/tt&amp;gt;&amp;quot;.  Where can I find this library?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
This library is part of the MPI libraries; if your compiler is having problems picking it up, it probably means you are mistakenly trying to compile on the login nodes (scinet01..scinet04).  The login nodes aren't part of the GPC; they are for logging into the data centre only.  From there you must go to the GPC or TCS development nodes to do any real work.&lt;br /&gt;
&lt;br /&gt;
=== Why do I get this error when I try to compile: &amp;quot;icpc: error #10001: could not find directory in which /usr/bin/g++41 resides&amp;quot; ?===&lt;br /&gt;
&lt;br /&gt;
You are trying to compile on the login nodes.   As described in the wiki ( https://support.scinet.utoronto.ca/wiki/index.php/GPC_Quickstart#Login ), or in the users guide you would have received with your account,   Scinet supports two main clusters, with very different architectures.  Compilation must be done on the development nodes of the appropriate cluster (in this case, gpc01-04).   Thus, log into gpc01, gpc02, gpc03, or gpc04, and compile from there.&lt;br /&gt;
&lt;br /&gt;
==Testing your Code==&lt;br /&gt;
&lt;br /&gt;
=== Can I run a something for a short time on the development nodes? ===&lt;br /&gt;
&lt;br /&gt;
I am in the process of playing around with the mpi calls in my code to get it to work. I do a lot of tests and each of them takes a couple of seconds only.  Can I do this on the development nodes?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Yes, as long as it's very brief (a few minutes).   People use the development nodes&lt;br /&gt;
for their work, and you don't want to bog it down for people, and testing a real&lt;br /&gt;
code can chew up a lot more resources than compiling, etc.    The procedures differ&lt;br /&gt;
depending on what machine you're using.&lt;br /&gt;
&lt;br /&gt;
==== TCS ====&lt;br /&gt;
&lt;br /&gt;
On the TCS you can run small MPI jobs on the tcs02 node, which is meant for &lt;br /&gt;
development use.  But even for this test run on one node, you'll need a host file --&lt;br /&gt;
a list of hosts (in this case, all tcs-f11n06, which is the `real' name of tcs02)&lt;br /&gt;
that the job will run on.  Create a file called `hostfile' containing the following:&lt;br /&gt;
&lt;br /&gt;
 tcs-f11n06&lt;br /&gt;
 tcs-f11n06&lt;br /&gt;
 tcs-f11n06&lt;br /&gt;
 tcs-f11n06&lt;br /&gt;
&lt;br /&gt;
for a 4-task run.  When you invoke &amp;quot;poe&amp;quot; or &amp;quot;mpirun&amp;quot;, there are runtime&lt;br /&gt;
arguments that you specify pointing to this file.  You can also specify it&lt;br /&gt;
in an environment variable MP_HOSTFILE, so, if your file is in your /scratch directory, say &lt;br /&gt;
${SCRATCH}/hostfile, then you would do&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
 export MP_HOSTFILE=${SCRATCH}/hostfile&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
in your shell.  You will also need to create a &amp;lt;tt&amp;gt;.rhosts&amp;lt;/tt&amp;gt; file in your &lt;br /&gt;
home director, again listing &amp;lt;tt&amp;gt;tcs-f11n06&amp;lt;/tt&amp;gt; so that &amp;lt;tt&amp;gt;poe&amp;lt;/tt&amp;gt;&lt;br /&gt;
can start jobs.   After that you can simply run your program.  You can use&lt;br /&gt;
mpiexec:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
 mpiexec -n 4 my_test_program&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
adding &amp;lt;tt&amp;gt; -hostfile /path/to/my/hostfile&amp;lt;/tt&amp;gt; if you did not set the environment&lt;br /&gt;
variable above.  Alternatively, you can run it with the poe command (do a &amp;quot;man poe&amp;quot; for details), or even by&lt;br /&gt;
just directly running it.  In this case the number of MPI processes will by default&lt;br /&gt;
be the number of entries in your hostfile.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== GPC ====&lt;br /&gt;
&lt;br /&gt;
On the GPC one can run short test jobs on the GPC [[GPC_Quickstart#Compile.2FDevel_Nodes | development nodes ]]&amp;lt;tt&amp;gt;gpc01&amp;lt;/tt&amp;gt;..&amp;lt;tt&amp;gt;gpc04&amp;lt;/tt&amp;gt;;&lt;br /&gt;
if they are single-node jobs (which they should be) they don't need a hostfile.  Even better, though, is to request an [[ Moab#Interactive | interactive ]] job and run the tests either in regular batch queue or using a short high availability [[ Moab#debug | debug ]] queue that is reserved for this purpose.&lt;br /&gt;
&lt;br /&gt;
=== How do I run a longer (but still shorter than an hour) test job quickly ? ===&lt;br /&gt;
&lt;br /&gt;
'''Answer'''&lt;br /&gt;
&lt;br /&gt;
On the GPC there is a high turnover short queue called [[ Moab#debug | debug ]] that is designed for&lt;br /&gt;
this purpose.  You can use it by adding &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#PBS -q debug&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
to your submission script.&lt;br /&gt;
&lt;br /&gt;
==Running your jobs==&lt;br /&gt;
&lt;br /&gt;
===My job can't write to /home===&lt;br /&gt;
&lt;br /&gt;
My code works fine when I test on the development nodes, but when I submit a job, or even run interactively in the development queue on GPC, it fails.  What's wrong?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
As [[Data_Management#Home_Disk_Space | discussed]] [https://support.scinet.utoronto.ca/wiki/images/5/54/SciNet_Tutorial.pdf elsewhere], &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; is mounted read-only on the compute nodes; you can only write to &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; from the login nodes and devel nodes.  (The [[GPC_Quickstart#128Glargemem | largemem nodes]] on GPC, in this respect, are more like devel nodes than compute nodes).   In general, to run jobs you can read from &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; but you'll have to write to &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt; (or, if you were allocated space through the RAC process, on &amp;lt;tt&amp;gt;/project&amp;lt;/tt&amp;gt;).  More information on SciNet filesytems can be found on our [[Data_Management | Data Management]] page.&lt;br /&gt;
&lt;br /&gt;
===Error Submitting My Job: qsub: Bad UID for job execution MSG=ruserok failed ===&lt;br /&gt;
&lt;br /&gt;
I write up a submission script as in the examples, but when I attempt to submit the job, I get the above error.  What's wrong?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
This error will occur if you try to submit a job from the login nodes.   The login nodes are the gateway to all of SciNet's systems (GPC, TCS, P7, ARC), which have different hardware and queueing systems.  To submit a job, you must log into a development node for the particular cluster you are submitting to and submit from there.&lt;br /&gt;
&lt;br /&gt;
===OpenMP on the TCS===&lt;br /&gt;
&lt;br /&gt;
How do I run an OpenMP job on the TCS?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please look at the [[TCS_Quickstart#Submission_Script_for_an_OpenMP_Job | TCS Quickstart ]] page.&lt;br /&gt;
&lt;br /&gt;
===Can I can use hybrid codes consisting of MPI and openMP on the GPC?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Yes. Please look at the [[GPC_Quickstart#Hybrid_MPI.2FOpenMP_jobs | GPC Quickstart ]] page.&lt;br /&gt;
&lt;br /&gt;
===How do I run serial jobs on GPC?===&lt;br /&gt;
&lt;br /&gt;
'''Answer''':&lt;br /&gt;
&lt;br /&gt;
So it should be said first that SciNet is a parallel computing resource, &lt;br /&gt;
and our priority will always be parallel jobs.   Having said that, if &lt;br /&gt;
you can make efficient use of the resources using serial jobs and get &lt;br /&gt;
good science done, that's good too, and we're happy to help you.&lt;br /&gt;
&lt;br /&gt;
The GPC nodes each have 8 processing cores, and making efficient use of these &lt;br /&gt;
nodes means using all eight cores.  As a result, we'd like to have the &lt;br /&gt;
users take up whole nodes (eg, run multiples of 8 jobs) at a time.  &lt;br /&gt;
&lt;br /&gt;
It depends on the nature of your job what the best strategy is. Several approaches are presented on the [[User_Serial|serial run wiki page]].&lt;br /&gt;
&lt;br /&gt;
===Why can't I request only a single cpu for my job on GPC?===&lt;br /&gt;
&lt;br /&gt;
'''Answer''':&lt;br /&gt;
&lt;br /&gt;
On GPC, computers are allocated by the node - that is, in chunks of 8 processors.   If you want to run a job that requires only one processor, you need to bundle the jobs into groups of 8, so as to not be wasting the other 7 for 48 hours. See [[User_Serial|serial run wiki page]].&lt;br /&gt;
&lt;br /&gt;
===How do I run serial jobs on TCS?===&lt;br /&gt;
&lt;br /&gt;
'''Answer''': You don't.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===But in the queue I found a user who is running jobs on GPC, each of which is using only one processor, so why can't I?===&lt;br /&gt;
&lt;br /&gt;
'''Answer''':&lt;br /&gt;
&lt;br /&gt;
The pradat* and atlaspt* jobs, amongst others, are jobs of the ATLAS high energy physics project. That they are reported as single cpu jobs is an artifact of the moab scheduler. They are in fact being automatically bundled into 8-job bundles but have to run individually to be compatible with their international grid-based systems.&lt;br /&gt;
&lt;br /&gt;
===How do I use the ramdisk on GPC?===&lt;br /&gt;
&lt;br /&gt;
To use the ramdisk, create and read to / write from files in /dev/shm/.. just as one would to (eg) ${SCRATCH}. Only the amount of RAM needed to store the files will be taken up by the temporary file system; thus if you have 8 serial jobs each requiring 1 GB of RAM, and 1GB is taken up by various OS services, you would still have approximately 7GB available to use as ramdisk on a 16GB node. However, if you were to write 8 GB of data to the RAM disk, this would exceed available memory and your job would likely crash.&lt;br /&gt;
&lt;br /&gt;
It is very important to delete your files from ram disk at the end of your job. If you do not do this, the next user to use that node will have less RAM available than they might expect, and this might kill their jobs.&lt;br /&gt;
&lt;br /&gt;
''More details on how to setup your script to use the ramdisk can be found on the [[User_Ramdisk|Ramdisk wiki page]].''&lt;br /&gt;
&lt;br /&gt;
===How can I automatically resubmit a job?===&lt;br /&gt;
&lt;br /&gt;
Commonly you may have a job that you know will take longer to run than what is &lt;br /&gt;
permissible in the queue.  As long as your program contains [[Checkpoints|checkpoint]] or &lt;br /&gt;
restart capability, you can have one job automatically submit the next. In&lt;br /&gt;
the following example it is assumed that the program finishes before &lt;br /&gt;
the 48 hour limit and then resubmits itself by logging into one&lt;br /&gt;
of the development nodes.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque example submission script for auto resubmission&lt;br /&gt;
# SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=48:00:00&lt;br /&gt;
#PBS -N my_job&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# YOUR CODE HERE&lt;br /&gt;
./run_my_code&lt;br /&gt;
&lt;br /&gt;
# RESUBMIT 10 TIMES HERE&lt;br /&gt;
num=$NUM&lt;br /&gt;
if [ $num -lt 10 ]; then&lt;br /&gt;
      num=$(($num+1))&lt;br /&gt;
      ssh gpc01 &amp;quot;cd $PBS_O_WORKDIR; qsub ./script_name.sh -v NUM=$num&amp;quot;;&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub script_name.sh -v&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can alternatively use [[ Moab#Job_Dependencies | Job dependencies ]] through the queuing system which will not start one job until another job has completed.&lt;br /&gt;
&lt;br /&gt;
If your job can't be made to automatically stop before the 48 hour queue window, but it does write out checkpoints, you can use the timeout command to stop the program while you still have time to resubmit; for instance&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
    timeout 2850m ./run_my_code argument1 argument2&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
will run the program for 47.5 hours (2850 minutes), and then send it SIGTERM to exit the program.&lt;br /&gt;
&lt;br /&gt;
===How can I pass in arguments to my submission script?===&lt;br /&gt;
&lt;br /&gt;
If you wish to make your scripts more generic you can use qsub's ability &lt;br /&gt;
to pass in environment variables to pass in arguments to your script.&lt;br /&gt;
The following example shows a case where an input and an output &lt;br /&gt;
file are passed in on the qsub line. Multiple variables can be &lt;br /&gt;
passed in using the qsub &amp;quot;-v&amp;quot; option and comma delimited. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque example of passing in arguments&lt;br /&gt;
# SciNet GPC&lt;br /&gt;
# &lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=48:00:00&lt;br /&gt;
#PBS -N my_job&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# YOUR CODE HERE&lt;br /&gt;
./run_my_code -f $INFILE -o $OUTFILE&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub script_name.sh -v INFILE=input.txt,OUTFILE=outfile.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== How can I run a job longer than 48 hours? ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
The SciNet queues have a queue limit of 48 hours.   This is pretty typical for systems of its size in Canada and elsewhere, and larger systems commonly have shorter limits.   The limits are there to ensure that every user gets a fair share of the system (so that no one user ties up lots of nodes for a long time), and for safety (so that if one memory board in one node fails in the middle of a very long job, you haven't lost a months' worth of work).&lt;br /&gt;
&lt;br /&gt;
Since many of us have simulations that require more than that much time, most widely-used scientific applications have &amp;quot;checkpoint-restart&amp;quot; functionality, where every so often the complete state of the calculation is stored as a checkpoint file, and one can restart a simulation from one of these.   In fact, these restart files tend to be quite useful for a number of purposes.&lt;br /&gt;
&lt;br /&gt;
If your job will take longer, you will have to submit your job in multiple parts, restarting from a checkpoint each time.  In this way, one can run a simulation much longer than the queue limit.  In fact, one can even write job scripts which automatically re-submit themselves until a run is completed, using [[FAQ#How_can_I_automatically_resubmit_a_job.3F | automatic resubmission. ]]&lt;br /&gt;
&lt;br /&gt;
=== Why did showstart say it would take 3 hours for my job to start before, and now it says my job will start in 10 hours? ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please look at the [[FAQ#How_do_priorities_work.2Fwhy_did_that_job_jump_ahead_of_mine_in_the_queue.3F | How do priorities work/why did that job jump ahead of mine in the queue? ]] page.&lt;br /&gt;
&lt;br /&gt;
===How do priorities work/why did that job jump ahead of mine in the queue?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
The [[Moab | queueing system]] used on SciNet machines is a [http://en.wikipedia.org/wiki/Priority_queue Priority Queue].  Jobs enter the queue at the back of the queue, and slowly make their way to the front as those ahead of them are run; but a job that enters the queue with a higher priority can `cut in line'.&lt;br /&gt;
&lt;br /&gt;
The main factor which determines priority is whether or not the user (or their PI) has an [http://wiki.scinethpc.ca/wiki/index.php/Application_Process RAC allocation].  These are competitively allocated grants of computer time; there is a call for proposals towards the end of every calendar year.    Users with an allocation have high priorities in an attempt to make sure that they can use the amount of computer time the committees granted them.   Their priority decreases as they approach their allotted usage over the current window of time; by the time that they have exhausted that allotted usage, their priority is the same as users with no allocation (unallocated, or `default' users).    Unallocated users have a fixed, low, priority.&lt;br /&gt;
&lt;br /&gt;
This priority system is called `fairshare'; the scheduler attempts to make sure everyone has their fair share of the machines, where the share that's fair has been determined by the allocation committee.    The fairshare window is a rolling window of two weeks; that is, any time you have a job in the queue, the fairshare calculation of its priority is given by how much of your allocation of the machine has been used in the last 14 days.&lt;br /&gt;
&lt;br /&gt;
A particular allocation might have some fraction of GPC - say 4% of the machine (if the PI had been allocated 10 million CPU hours on GPC). The allocations have labels; (called `Resource Allocation Proposal Identifiers', or RAPIs) they look something like&lt;br /&gt;
&lt;br /&gt;
  abc-123-ab&lt;br /&gt;
&lt;br /&gt;
where abc-123 is the PIs CCRI, and the suffix specifies which of the allocations granted to the PI is to be used.  These can be specified on a job-by-job basis.  On GPC, one adds the line&lt;br /&gt;
 #PBS -A RAPI&lt;br /&gt;
to your script; on TCS, one uses&lt;br /&gt;
 # @ account_no = RAPI&lt;br /&gt;
If the allocation to charge isn't specified, a default is used; each user has such a default, which can be changed at the same portal where one changes one's password, &lt;br /&gt;
&lt;br /&gt;
 https://portal.scinet.utoronto.ca/&lt;br /&gt;
&lt;br /&gt;
A jobs priority is determined primarily by the fairshare priority of the allocation it is being charged to; the previous 14 days worth of use under that allocation is calculated and compared to the allocated fraction (here, 5%) of the machine over that window (here, 14 days).   The fairshare priority is a decreasing function of the allocation left; if there is no allocation left (eg, jobs running under that allocation have already used 379,038 CPU hours in the past 14 days), the priority is the same as that of a user with no granted allocation.   (This last part has been the topic of some debate; as the machine gets more utilized, it will probably be the case that we allow RAC users who have greatly overused their quota to have their priorities to drop below that of unallocated users, to give the unallocated users some chance to run on our increasingly crowded system; this would have no undue effect on our allocated users as they still would be able to use the amount of resources they had been allocated by the committees.)   Note that all jobs charging the same allocation get the same fairshare priority.&lt;br /&gt;
&lt;br /&gt;
There are other factors that go into calculating priority, but fairshare is the most significant.   Other factors include&lt;br /&gt;
* amount of time waiting in queue (measured in units of the requested runtime). A waiting queue job gains priority as it sits in the queue to avoid job starvation. &lt;br /&gt;
* User adjustment of priorities ( See below ).&lt;br /&gt;
&lt;br /&gt;
The major effect of these subdominant terms is to shuffle the order of jobs running under the same allocation.&lt;br /&gt;
&lt;br /&gt;
===How do we manage job priorities within our research group?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Obviously, managing shared resources within a large group - whether it &lt;br /&gt;
is conference funding or CPU time - takes some doing.   &lt;br /&gt;
&lt;br /&gt;
It's important to note that the fairshare periods are intentionally kept &lt;br /&gt;
quite short - just two weeks long. So, for example, let us say that in your resource &lt;br /&gt;
allocation you have about 10% of the machine.   Then for someone to use &lt;br /&gt;
up the whole two week amount of time in 2 days, they'd have to use 70% &lt;br /&gt;
of the machine in those two days - which is unlikely to happen by &lt;br /&gt;
accident.  If that does happen,  &lt;br /&gt;
those using the same allocation as the person who used 70% of the &lt;br /&gt;
machine over the two days will suffer by having much lower priority for &lt;br /&gt;
their jobs, but only for the next 12 days - and even then, if there are &lt;br /&gt;
idle cpus they'll still be able to compute.&lt;br /&gt;
&lt;br /&gt;
There will be online tools for seeing how the allocation is being used, &lt;br /&gt;
and those people who are in charge in your group will be able to use &lt;br /&gt;
that information to manage the users, telling them to dial it down or &lt;br /&gt;
up.   We know that managing a large research group is hard, and we want &lt;br /&gt;
to make sure we provide you the information you need to do your job &lt;br /&gt;
effectively.&lt;br /&gt;
&lt;br /&gt;
One way for users within a group to manage their priorities within the group&lt;br /&gt;
is with [[Moab#Adjusting_Job_Priority | user-adjusted priorities]]; this is&lt;br /&gt;
described in more detail on the [[Moab | Scheduling System]] page.&lt;br /&gt;
&lt;br /&gt;
=== How do I charge jobs to my RAC allocation? ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please see the [[Moab#Accounting|accounting section of Moab page]].&lt;br /&gt;
&lt;br /&gt;
=== How does one check the amount of used CPU-hours in a project, and how does one get statistics for each user in the project? ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
This information is available on the scinet portal,https://portal.scinet.utoronto.ca, See also [[SciNet Usage Reports]].&lt;br /&gt;
&lt;br /&gt;
=== How does the Infiniband Upgrade affect my 2012 RAC allocation ?===&lt;br /&gt;
&lt;br /&gt;
The  RAC allocations for the current (2012) year that were based on ethernet and infiniband will carry over, however the allocation will be on the full GPC, not just the subsection.  So if you were allocated 500 hours on Infiniband your fairshare allocation will still be 500 hours, just 500 out or 30,000, instead of 500 out of 7,000.  If you received two allocations, one on gigE and one on IB, they will simply be combined. This should benefit all users as the desegregation of the GPC provides a greater pool of nodes increasing the probability of your job to run.&lt;br /&gt;
&lt;br /&gt;
==Monitoring jobs in the queue==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Why hasn't my job started?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Use the moab command &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
checkjob -v jobid&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and the last couple of lines should explain why a job hasn't started.  &lt;br /&gt;
&lt;br /&gt;
Please see [[Moab| Job Scheduling System (Moab) ]] for more detailed information&lt;br /&gt;
&lt;br /&gt;
===How do I figure out when my job will run?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please see [[Moab#Available_Resources| Job Scheduling System (Moab) ]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- ===My GPC job is Held, and checkjob says &amp;quot;Batch:PolicyViolation&amp;quot; ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
When this happens, you'll see your job stuck in a BatchHold state.  &lt;br /&gt;
This happens because the job you've submitted breaks one of the rules of the queues, and is being held until you modify it or kill it and re-submit a conforming job.  The most common problems are:&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===I submit my GPC job, and I get an email saying it was rejected===&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
This happens because the job you've submitted breaks one of the rules of the queues and is rejected. An email&lt;br /&gt;
is sent with the JOBID, JOBNAME, and the reason it was rejected.  The following is an example where a job&lt;br /&gt;
requests more than 48 hours and was rejected.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PBS Job Id: 3462493.gpc-sched&lt;br /&gt;
Job Name:   STDIN&lt;br /&gt;
job deleted&lt;br /&gt;
Job deleted at request of root@gpc-sched&lt;br /&gt;
MOAB_INFO:  job was rejected - job violates class configuration 'wclimit too high for class 'batch_ib' (345600 &amp;gt; 172800)'&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Jobs on the TCS or GPC may only run for 48 hours at a time; this restriction greatly increases responsiveness of the queue and queue throughput for all our users.  If your computation requires longer than that, as many do, you will have to [[ Checkpoints | checkpoint ]] your job and restart it after each 48-hour queue window.   You can manually re-submit jobs, or if you can have your job cleanly exit before the 48 hour window, there are ways to [[ FAQ#How_can_I_automatically_resubmit_a_job.3F | automatically resubmit jobs ]].&lt;br /&gt;
&lt;br /&gt;
Other rejections return a more cryptic error saying &amp;quot;job violates class configuration&amp;quot; such as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PBS Job Id: 3462409.gpc-sched&lt;br /&gt;
Job Name:   STDIN&lt;br /&gt;
job deleted&lt;br /&gt;
Job deleted at request of root@gpc-sched&lt;br /&gt;
MOAB_INFO:  job was rejected - job violates class configuration 'user required by class 'batch''&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The most common problems that result in this error are:&lt;br /&gt;
&lt;br /&gt;
* '''Incorrect number of processors per node''': Jobs on the GPC are scheduled per-node not per-core and since each node has 8 processor cores (ppn=8) the smallest job allowed is one node with 8 cores (nodes=1:ppn=8).  For serial jobs users must bundle or batch them together in groups of 8. See [[ FAQ#How_do_I_run_serial_jobs_on_GPC.3F | How do I run serial jobs on GPC? ]]&lt;br /&gt;
* '''No number of nodes specified''': Jobs submitted to the main queue must request a specific number of nodes, either in the submission script (with a line like &amp;lt;tt&amp;gt;#PBS -l nodes=2:ppn=8&amp;lt;/tt&amp;gt;) or on the command line (eg, &amp;lt;tt&amp;gt;qsub -l nodes=2:ppn=8,walltime=5:00:00 script.pbs&amp;lt;/tt&amp;gt;).  Note that for the debug queue, you can get away without specifying a number of nodes and a default of one will be assigned; for both technical and policy reasons, we do not enforce such a default for the main (&amp;quot;batch&amp;quot;) queue.&lt;br /&gt;
* '''There is a 15 minute walltime minimum''' on all queues except debug and if you set your walltime less than this, it will be rejected.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Running checkjob on my job gives me messages about JobFail and rejected===&lt;br /&gt;
&lt;br /&gt;
Running checkjob on my job gives me messages that suggest my job has failed, as below: what did I do wrong?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
AName: test&lt;br /&gt;
State: Idle &lt;br /&gt;
Creds:  user:xxxxxx  group:xxxxxxxx  account:xxxxxxxx  class:batch_ib  qos:ibqos&lt;br /&gt;
WallTime:   00:00:00 of 8:00:00&lt;br /&gt;
BecameEligible: Wed Jul 23 10:39:27&lt;br /&gt;
SubmitTime: Wed Jul 23 10:38:22&lt;br /&gt;
  (Time Queued  Total: 00:01:47  Eligible: 00:01:05)&lt;br /&gt;
&lt;br /&gt;
Total Requested Tasks: 8&lt;br /&gt;
&lt;br /&gt;
Req[0]  TaskCount: 8  Partition: ALL  &lt;br /&gt;
Opsys: centos6computeA  Arch: ---  Features: ---&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Notification Events: JobFail&lt;br /&gt;
&lt;br /&gt;
IWD:            /scratch/x/xxxxxxxx/xxxxxxx/xxxxxxx&lt;br /&gt;
Partition List: torque,DDR&lt;br /&gt;
Flags:          RESTARTABLE&lt;br /&gt;
Attr:           checkpoint&lt;br /&gt;
StartPriority:  76&lt;br /&gt;
rejected for Opsys        - (null)&lt;br /&gt;
rejected for State        - (null)&lt;br /&gt;
rejected for Reserved     - (null)&lt;br /&gt;
NOTE:  job req cannot run in partition torque (available procs do not meet requirements : 0 of 8 procs found)&lt;br /&gt;
idle procs: 793  feasible procs:   0&lt;br /&gt;
&lt;br /&gt;
Node Rejection Summary: [Opsys: 117][State: 2895][Reserved: 19]&lt;br /&gt;
&lt;br /&gt;
NOTE:  job violates constraints for partition SANDY (partition SANDY not in job partition mask)&lt;br /&gt;
&lt;br /&gt;
NOTE:  job violates constraints for partition GRAVITY (partition GRAVITY not in job partition mask)&lt;br /&gt;
&lt;br /&gt;
rejected for State        - (null)&lt;br /&gt;
NOTE:  &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
The output from check job is a little cryptic in places, and if you are wondering why your job hasn't started yet, you might think that &amp;quot;rejection&amp;quot; and &amp;quot;JobFail&amp;quot; suggest that there's something wrong.  But the above message is actually normal; you can use the &amp;lt;tt&amp;gt;showstart&amp;lt;/tt&amp;gt; command on your job to get a (preliminary, subject to change) estimate as to when the job will start, and you'll find that it is in fact scheduled to start up in the near future.&lt;br /&gt;
&lt;br /&gt;
In the above message:&lt;br /&gt;
&lt;br /&gt;
* `Notification Events: JobFail` just means that, if notifications are enabled, you'll get a message if the job fails;&lt;br /&gt;
* `job req cannot run in partition torque` just means that the job cannot run just yet (that's why it's queued);&lt;br /&gt;
* `job req cannot run in dynamic partition DDR now (insufficient procs available: 0 &amp;lt; 8)` says why: there aren't processors available; and&lt;br /&gt;
* `job violates constraints for partition SANDY/GRAVITY` just means that the job isn't eligable to run in those paritcular (small) sections of the cluster.&lt;br /&gt;
&lt;br /&gt;
that is, the above output is the normal and expected (if somewhat cryptic) explanation as to why the job is waiting - nothing to worry about.&lt;br /&gt;
&lt;br /&gt;
===How can I monitor my running jobs on TCS?===&lt;br /&gt;
&lt;br /&gt;
How can I monitor the load of TCS jobs?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
You can get more information with the command &lt;br /&gt;
 /xcat/tools/tcs-scripts/LL/jobState.sh&lt;br /&gt;
which I alias as:&lt;br /&gt;
 alias llq1='/xcat/tools/tcs-scripts/LL/jobState.sh'&lt;br /&gt;
If you run &amp;quot;llq1 -n&amp;quot; you will see a listing of jobs together with a lot of information, including the load.&lt;br /&gt;
&lt;br /&gt;
==Errors in running jobs==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
===On GPC, `Job cannot be executed'===&lt;br /&gt;
&lt;br /&gt;
I get error messages like this trying to run on GPC:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PBS Job Id: 30414.gpc-sched&lt;br /&gt;
Job Name:   namd&lt;br /&gt;
Exec host:  gpc-f120n011/7+gpc-f120n011/6+gpc-f120n011/5+gpc-f120n011/4+gpc-f120n011/3+gpc-f120n011/2+gpc-f120n011/1+gpc-f120n011/0&lt;br /&gt;
Aborted by PBS Server &lt;br /&gt;
Job cannot be executed&lt;br /&gt;
See Administrator for help&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
PBS Job Id: 30414.gpc-sched&lt;br /&gt;
Job Name:   namd&lt;br /&gt;
Exec host:  gpc-f120n011/7+gpc-f120n011/6+gpc-f120n011/5+gpc-f120n011/4+gpc-f120n011/3+gpc-f120n011/2+gpc-f120n011/1+gpc-f120n011/0&lt;br /&gt;
An error has occurred processing your job, see below.&lt;br /&gt;
request to copy stageout files failed on node 'gpc-f120n011/7+gpc-f120n011/6+gpc-f120n011/5+gpc-f120n011/4+gpc-f120n011/3+gpc-f120n011/2+gpc-f120n011/1+gpc-f120n011/0' for job 30414.gpc-sched&lt;br /&gt;
&lt;br /&gt;
Unable to copy file 30414.gpc-sched.OU to USER@gpc-f101n084.scinet.local:/scratch/G/GROUP/USER/projects/sim-performance-test/runtime/l/namd/8/namd.o30414&lt;br /&gt;
*** error from copy&lt;br /&gt;
30414.gpc-sched.OU: No such file or directory&lt;br /&gt;
*** end error output&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Try doing the following:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mkdir ${SCRATCH}/.pbs_spool&lt;br /&gt;
ln -s ${SCRATCH}/.pbs_spool ~/.pbs_spool&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This is how all new accounts are setup on SciNet.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; on GPC for compute jobs is mounted as a read-only file system.   &lt;br /&gt;
PBS by default tries to spool its output  files to &amp;lt;tt&amp;gt;${HOME}/.pbs_spool&amp;lt;/tt&amp;gt;&lt;br /&gt;
which fails as it tries to write to a read-only file  &lt;br /&gt;
system.    New accounts at SciNet  get around this by having ${HOME}/.pbs_spool  &lt;br /&gt;
point to somewhere appropriate on &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt;, but if you've deleted that link&lt;br /&gt;
or directory, or had an old account, you will see errors like the above.&lt;br /&gt;
&lt;br /&gt;
'''On Feb 24, the input/output mechanism has been reconfigured to use a local ramdisk as the temporary location, which means that .pbs_spool is no longer needed and this error should not occur anymore.'''&lt;br /&gt;
&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== I couldn't find the  .o output file in the .pbs_spool directory as I used to ===&lt;br /&gt;
&lt;br /&gt;
On Feb 24 2011, the temporary location of standard input and output files was moved from the shared file system ${SCRATCH}/.pbs_spool to the&lt;br /&gt;
node-local directory /var/spool/torque/spool (which resides in ram). The final location after a job has finished is unchanged,&lt;br /&gt;
but to check the output/error of running jobs, users will now have to ssh into the (first) node assigned to the job and look in&lt;br /&gt;
/var/spool/torque/spool.&lt;br /&gt;
&lt;br /&gt;
This alleviates access contention to the temporary directory, especially for those users that are running a lot of jobs, and  reduces the burden on the file system in general.&lt;br /&gt;
&lt;br /&gt;
Note that it is good practice to redirect output to a file rather than to count on the scheduler to do this for you.&lt;br /&gt;
&lt;br /&gt;
=== My GPC job died, telling me `Copy Stageout Files Failed' ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
When a job runs on GPC, the script's standard output and error are redirected to &lt;br /&gt;
&amp;lt;tt&amp;gt;$PBS_JOBID.gpc-sched.OU&amp;lt;/tt&amp;gt; and &amp;lt;tt&amp;gt;$PBS_JOBID.gpc-sched.ER&amp;lt;/tt&amp;gt; in&lt;br /&gt;
/var/spool/torque/spool on the (first) node on which your job is running.  At the end of the job, those .OU and .ER files are copied to where the batch script tells them to be copied, by default &amp;lt;tt&amp;gt;$PBS_JOBNAME.o$PBS_JOBID&amp;lt;/tt&amp;gt; and&amp;lt;tt&amp;gt;$PBS_JOBNAME.e$PBS_JOBID&amp;lt;/tt&amp;gt;.   (You can set those filenames to be something clearer with the -e and -o options in your PBS script.)&lt;br /&gt;
&lt;br /&gt;
When you get errors like this:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
An error has occurred processing your job, see below.&lt;br /&gt;
request to copy stageout files failed on node&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
it means that the copying back process has failed in some way.  There could be a few reasons for this. The first thing to '''make sure that your .bashrc does not produce any output''', as the output-stageout is performed by bash and further output can cause this to fail.&lt;br /&gt;
But it also could have just been a random filesystem error, or it  could be that your job failed spectacularly enough to shortcircuit the normal job-termination process and those files just never got copied.&lt;br /&gt;
&lt;br /&gt;
Write to [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;] if your input/output files got lost, as we will probably be able to retrieve them for you (please supply at least the jobid, and any other information that may be relevant). &lt;br /&gt;
&lt;br /&gt;
Mind you that it is good practice to redirect output to a file rather than depending on the job scheduler to do this for you.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
&lt;br /&gt;
===Another transport will be used instead===&lt;br /&gt;
&lt;br /&gt;
I get error messages like the following when running on the GPC at the start of the run, although the job seems to proceed OK.   Is this a problem?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
[[45588,1],0]: A high-performance Open MPI point-to-point messaging module&lt;br /&gt;
was unable to find any relevant network interfaces:&lt;br /&gt;
&lt;br /&gt;
Module: OpenFabrics (openib)&lt;br /&gt;
  Host: gpc-f101n005&lt;br /&gt;
&lt;br /&gt;
Another transport will be used instead, although this may result in&lt;br /&gt;
lower performance.&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Everything's fine.   The two MPI libraries scinet provides work for both the InifiniBand and the Gigabit Ethernet interconnects, and will always try to use the fastest interconnect available.   In this case, you ran on normal gigabit GPC nodes with no infiniband; but the MPI libraries have no way of knowing this, and try the infiniband first anyway.  This is just a harmless `failover' message; it tried to use the infiniband, which doesn't exist on this node, then fell back on using Gigabit ethernet (`another transport').&lt;br /&gt;
&lt;br /&gt;
With OpenMPI, this can be avoided by not looking for infiniband; eg, by using the option&lt;br /&gt;
&lt;br /&gt;
--mca btl ^openib&lt;br /&gt;
&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===IB Memory Errors, eg &amp;lt;tt&amp;gt; reg_mr Cannot allocate memory &amp;lt;/tt&amp;gt;===&lt;br /&gt;
&lt;br /&gt;
Infiniband requires more memory than ethernet; it can use RDMA (remote direct memory access) transport for which it sets aside registered memory to transfer data.&lt;br /&gt;
&lt;br /&gt;
In our current network configuration, it requires a _lot_ more memory, particularly as you go to larger process counts; unfortunately, that means you can't get around the &amp;quot;I need more memory&amp;quot; problem the usual way, by running on more nodes.   Machines with different memory or &lt;br /&gt;
network configurations may exhibit this problem at higher or lower MPI &lt;br /&gt;
task counts.&lt;br /&gt;
&lt;br /&gt;
Right now, the best workaround is to reduce the number and size of OpenIB queues, using XRC: with the OpenMPI, add the following options to your mpirun command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-mca btl_openib_receive_queues X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32 -mca btl_openib_max_send_size 12288&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
With Intel MPI, you should be able to do&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load intelmpi/4.0.3.008&lt;br /&gt;
mpirun -genv I_MPI_FABRICS=shm:ofa  -genv I_MPI_OFA_USE_XRC=1 -genv I_MPI_OFA_DYNAMIC_QPS=1 -genv I_MPI_DEBUG=5 -np XX ./mycode&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
to the same end.  &lt;br /&gt;
&lt;br /&gt;
For more information see [[GPC MPI Versions]].&lt;br /&gt;
&lt;br /&gt;
===My compute job fails, saying &amp;lt;tt&amp;gt;libpng12.so.0: cannot open shared object file&amp;lt;/tt&amp;gt; or &amp;lt;tt&amp;gt;libjpeg.so.62: cannot open shared object file&amp;lt;/tt&amp;gt;===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
To maximize the amount of memory available for compute jobs, the compute nodes have a less complete system image than the development nodes.   In particular, since interactive graphics libraries like matplotlib and gnuplot are usually used interactively, the libraries for their use are included in the devel nodes' image but not the compute nodes.&lt;br /&gt;
&lt;br /&gt;
Many of these extra libraries are, however, available in the &amp;quot;extras&amp;quot; module.   So adding a &amp;quot;module load extras&amp;quot; to your job submission  script - or, for overkill, to your .bashrc - should enable these scripts to run on the compute nodes.&lt;br /&gt;
&lt;br /&gt;
==Data on SciNet disks==&lt;br /&gt;
&lt;br /&gt;
===How do I find out my disk usage?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
The standard unix/linux utilities for finding the amount of disk space used by a directory are very slow, and notoriously inefficient on the GPFS filesystems that we run on the SciNet systems.  There are utilities that very quickly report your disk usage:&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;'''diskUsage'''&amp;lt;/tt&amp;gt; command, available with the 'extras' module on the login nodes, datamovers and the GPC devel nodes, provides information in a number of ways on the home, scratch, and project file systems. For instance, how much disk space is being used by yourself and your group (with the -a option), or how much your usage has changed over a certain period (&amp;quot;delta information&amp;quot;) or you may generate plots of your usage over time.&lt;br /&gt;
This information is only updated hourly!&lt;br /&gt;
&lt;br /&gt;
More information about these filesystems is available at the [[Data_Management | Data_Management]].&lt;br /&gt;
&lt;br /&gt;
===How do I transfer data to/from SciNet?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
All incoming connections to SciNet go through relatively low-speed connections to the &amp;lt;tt&amp;gt;login.scinet&amp;lt;/tt&amp;gt; gateways, so using scp to copy files the same way you ssh in is not an effective way to move lots of data.  Better tools are described in our page on [[Data_Management#Data_Transfer | Data Transfer]].&lt;br /&gt;
&lt;br /&gt;
===My group works with data files of size 1-2 GB.  Is this too large to  transfer by scp to login.scinet.utoronto.ca ?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Generally, occasion transfers of data less than 10GB is perfectly acceptible to so through the login nodes. See [[Data_Management#Data_Transfer | Data Transfer]].&lt;br /&gt;
&lt;br /&gt;
===How can I check if I have files in /scratch that are scheduled for automatic deletion?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please see [[Storage_Quickstart#Scratch_Disk_Purging_Policy | Storage At SciNet]]&lt;br /&gt;
&lt;br /&gt;
===How to allow my supervisor to manage files for me using ACL-based commands?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please see [[Data_Management#File.2FOwnership_Management_.28ACL.29 | File/Ownership Management]]&lt;br /&gt;
&lt;br /&gt;
===Can we buy extra storage space on SciNet?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
Yes, please see [[Data_Management#Buying_storage_space_on_GPFS_or_HPSS | Buying storage space on GPFS or HPSS ]] for more details.&lt;br /&gt;
&lt;br /&gt;
===Can I transfer files between BGQ and HPSS?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
Yes, please see [https://support.scinet.utoronto.ca/wiki/index.php/BGQ#Bridge_to_HPSS Bridge to HPSS ]  for more details.&lt;br /&gt;
&lt;br /&gt;
==Keep 'em Coming!==&lt;br /&gt;
&lt;br /&gt;
===Next question, please===&lt;br /&gt;
&lt;br /&gt;
Send your question to [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;];  we'll answer it asap!&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Introduction_To_Performance&amp;diff=7156</id>
		<title>Introduction To Performance</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Introduction_To_Performance&amp;diff=7156"/>
		<updated>2014-08-13T15:54:36Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: /* Strong Scaling Tests */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==The Concepts of Parallel Performance==&lt;br /&gt;
&lt;br /&gt;
Parallel computing used to be a very specialized domain; but now even making the best use of your laptop, which almost certainly has multiple independant computing cores, requires understanding the basic concepts of performance in a parallel environment.&lt;br /&gt;
&lt;br /&gt;
Most fundamentally, parallel programming allows three possible ways of getting more and better science done:&lt;br /&gt;
;Running your computation many times&lt;br /&gt;
:If you have a program that works in serial, having many processors available to you allows you to run many copies of the same program at once, improving your [[#Throughput|throughput]].   This (can be) a sort of trivial use of parallel computing and doesn't require very specialized hardware, but it can be extremely useful for, for instance, running parameter studies or sensitivity studies.   Best of all, this is essentially guaranteed to run efficiently if your serial code runs efficiently!  Because this doesn't require fancy hardware, it is a waste of resources to use the [[TCS_Quickstart|Tightly Coupled System]] for these sorts of tasks and instead they must be run on the [[GPC_Quickstart|General Purpose Cluster]].&lt;br /&gt;
;Running your computation faster&lt;br /&gt;
:This is what most people think of as parallel computing.  It can take a lot of work to make an existing code run efficiently on many processors, or to design a new code to make use of these resources, but when it works, one can achieve a substantial [[#Parallel_Speedup|speedup]] of individual jobs.  This might mean the difference between a computation running in a feasible length of time for a research project or taking years to complete --- so while it may be a lot of work, it may be your only option.    To determine whether your code runs well on many processors, you need to measure [[#Parallel_Speedup|speedup]] and [[#Efficiency|efficiency]]; to see how many processors one should use for a given problem you must run [[#Strong_Scaling_Tests|strong scaling tests]].&lt;br /&gt;
;Running your computation on larger problems&lt;br /&gt;
:One achieves speedup by using more processors on the same problem.  But by running your job in parallel you may have access to more resources other than just processors --- for instance, more memory, or more disks.   In this case, you may be able to run problems that simply wouldn't be possible on a single processor or a single computer; one can achieve significant '''''sizeup'''''.  To find how large a problem one can efficiently run, one measures [[#Efficiency|efficiency]] and runs [[#Weak_Scaling_Tests|weak scaling tests]].&lt;br /&gt;
&lt;br /&gt;
Of course, these aren't exclusive; one can take advantage of any combination of the above.   It may be that your problem runs efficiently on 8 cores but no more; however, you may be able to get use of more processors by running many jobs to explore parameter space, and already on 8 cores you may be able to consider larger problems than you can with just one!&lt;br /&gt;
&lt;br /&gt;
===Throughput===&lt;br /&gt;
&lt;br /&gt;
Throughput is the most fundamental measure of performance, and the one that ultimately most matters to most computational scientists -- if you have N computations that you need to have done for your research project, how quickly can you get them done?   Everything else we'll consider here is just a&lt;br /&gt;
way of increasing throughput T:&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
T = \frac{\mathrm{Number}\,\mathrm{of}\,\mathrm{computations}}{\mathrm{Unit}\,\mathrm{time}} .&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you have many indepdendent computations to perform (such as a parameter study or a sensitivity study) you can increase throughput almost arbitrarily by running them alongside each other at the same time, limited only by the number of processors available (or the wait time in the queue, or the disk space available, or some other external resource constraint).   This approach obviously doesn't work if you only have one computation to perform, or if later&lt;br /&gt;
computations require the output from previous ones.   In these cases, or when  the individual jobs take infeasibly long, or cannot be performed on only one processor,  one must resort to ''also'' using parallel programming techniques to parallelize the individual jobs.&lt;br /&gt;
&lt;br /&gt;
===Compute Time===&lt;br /&gt;
&lt;br /&gt;
Fundamental to everything else that follows is measuring the amount of time a computation takes on some problem size/amount of work &amp;lt;math&amp;gt;N&amp;lt;/math&amp;gt; and some number of processors &amp;lt;math&amp;gt;P&amp;lt;/math&amp;gt;.   We'll denote this by &amp;lt;math&amp;gt;t(N,P)&amp;lt;/math&amp;gt;.   The easiest way to measure this time is with the &amp;lt;tt&amp;gt;time&amp;lt;/tt&amp;gt; command that comes on most flavours of Unix in &amp;lt;tt&amp;gt;/bin/time&amp;lt;/tt&amp;gt; or &amp;lt;tt&amp;gt;/usr/bin/time&amp;lt;/tt&amp;gt;:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/bin/time myprogram&lt;br /&gt;
...normal program output...&lt;br /&gt;
658.44user 0.85system 10:59.41elapsed &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The format of the times output at the end may vary from system to system, but the basic information returned will be the same.  The ''real'' or ''elapsed'' time listed is the actual [[Wallclock time]] that elapsed during the run, ''user'' or ''cpu'' is the [[CPU time]] that was actually spent doing your computation, and the ''system'' time is the system time that was spent doing system-related things during the run, such as waiting for file input/output.   Our goal will be to reduce the real wallclock time that the simulation takes as much as possible while still making efficient use of the resources available.&lt;br /&gt;
&lt;br /&gt;
===Parallel Speedup===&lt;br /&gt;
&lt;br /&gt;
The speedup of an individual job with some amount of work &amp;lt;math&amp;gt;N&amp;lt;/math&amp;gt; as you go from some running it serially to running on &amp;lt;math&amp;gt;P&amp;lt;/math&amp;gt; is simply:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;S(N,P) = \frac{t(N,P=1)}{t(N,P)} .&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
That is, the time it takes to run the computation on P vs on one processor.   The way this is usually done is to run the parallel code on &amp;lt;math&amp;gt;P&amp;lt;/math&amp;gt; and&lt;br /&gt;
on &amp;lt;math&amp;gt;1&amp;lt;/math&amp;gt; processor and take the ratio of the two times; but this is a form of cheating, as the parallel version of the code will generally have&lt;br /&gt;
overheads (even in the one-processor case) compared to the best available serial-only version of the code.   The best thing to do in considering the efficiency of the parallelization to compare the parallel code to the best available serial code that does the same job.&lt;br /&gt;
&lt;br /&gt;
If you are considering the speedup of a problem that doesn't fit onto one processor, of course, the concept of speedup can be generalized; one needn't start at &amp;lt;math&amp;gt;P=1&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
It should go without saying that, while developing your parallel code and during performance tuning, you get the same results with multiple processors as with some `known good' serial test case; it is even easier to introduce bugs in parallel code than it is in serial code!&lt;br /&gt;
&lt;br /&gt;
===Efficiency===&lt;br /&gt;
&lt;br /&gt;
Once you have a parallel code and some timing results one can look at how efficiently you are making use of the resources as you use more and more processors.&lt;br /&gt;
The parallel efficiency of a computation of some fixed work size running on &amp;lt;math&amp;gt;P&amp;lt;/math&amp;gt; processors as compared to the &amp;lt;math&amp;gt;P=1&amp;lt;/math&amp;gt; case is &lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;E = \frac{S(N,P)}{P}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
That is, if you get a speedup of &amp;lt;math&amp;gt;8 \times&amp;lt;/math&amp;gt; in going from one to eight processors, you are at 1.00 or 100% efficiency; anything less and you are at lower efficiency.  It isn't uncommon to achieve greater than 100% parallel efficiencies for small numbers of processors for some types of problems; as you go to more processors, you also have more processor cache, and thus more of the problems data can fit into fast cache.  This is called ''super-linear speedup'' and sadly seldom extends out to very many processors.  &lt;br /&gt;
&lt;br /&gt;
===Strong Scaling Tests===&lt;br /&gt;
&lt;br /&gt;
[[Image:scaling-example.png|thumb|right|320px|An example of a strong scaling test]]&lt;br /&gt;
&lt;br /&gt;
The figure to the right and data below shows an example of a result of a small strong scaling test --- running a fixed-size problem on a varying number of processors to see how the timing of the computation scales with the number of processors.   The code was an OpenMP code run on a node of the GPC.  The quantitative results follow below; the times were measured and then speedups and efficiencies were calculated as above.   &lt;br /&gt;
&lt;br /&gt;
{| &lt;br /&gt;
! P&lt;br /&gt;
! t(N,P)&lt;br /&gt;
! S(N,P)&lt;br /&gt;
! E(N,P)&lt;br /&gt;
|-&lt;br /&gt;
| 1 || 3:50 ||  -  ||  -   &lt;br /&gt;
|-&lt;br /&gt;
| 2 || 2:02 || 1.87x || 94 % &lt;br /&gt;
|-&lt;br /&gt;
| 4 || 1:05 || 3.52x || 88 %&lt;br /&gt;
|-&lt;br /&gt;
| 6 || 47.8 || 4.81x || 80 %&lt;br /&gt;
|-&lt;br /&gt;
| 8 || 43.6 || 5.28x || 66%&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The plot shows the compute time &amp;lt;math&amp;gt;t(N,P)&amp;lt;/math&amp;gt; as a function of P; if the code maintained 100% parallel efficiency, we would expect the scaling to be&lt;br /&gt;
as 1/P, so we plot it on a log-log scale.  Also shown is the ideal scaling case -- what the times would be if, using the &amp;lt;math&amp;gt;P=1&amp;lt;/math&amp;gt; timing as a normalization, we did get 100% efficiency.   We can see that past 4 cores the measured case starts to significantly deviate from the ideal, and it looks like things would only get worse past 8 cores.&lt;br /&gt;
&lt;br /&gt;
It's important to note here that scaling tests should be done on realistic problem sizes and for realistic lengths of time.   Generally, for either serial or parallel programs there will be some overhead both at initialization time and during the course of the computation; if the problem size is too small, the overhead during the course of the run might be a significant fraction of the real work, and the program will behave needlessly poorly.  Similarly, if the number of timesteps or iterations is too small, the initizalization overhead will similarly play a spuriously large role in the performance.&lt;br /&gt;
&lt;br /&gt;
The above behaviour is typical for a small computation; it won't scale to too many cores, and the efficiency becomes monotonically worse as one increases the number of cores in use.   The rate at which this happens will depend on the problem size and the type of computation.   How is one to tell where to stop;&lt;br /&gt;
how good an efficiency is good enough?    Certainly there are rules of thumb --- one shudders to see efficiencies below 50% --- but one can arrive at more meaningful and quantitative results by considering throughput.   Let's imagine we had 64 cores at our disposal, and we wanted to run 96 jobs as quickly as possible.   Our total time to completion of the 96 jobs would vary with the number of cores we ran per job as follows:&lt;br /&gt;
&lt;br /&gt;
{| &lt;br /&gt;
! P&lt;br /&gt;
! Time for one job&lt;br /&gt;
! Time for all 96 jobs&lt;br /&gt;
|-&lt;br /&gt;
| 1 || 3:50 || 7:40  (2 batches, 64 jobs then 32)&lt;br /&gt;
|- &lt;br /&gt;
| 2 || 2:02 || 7:08 (3 batches, 32,32,32)&lt;br /&gt;
|-&lt;br /&gt;
| 4 || 1:05 || 6:30 (6 batches, 6x16)&lt;br /&gt;
|-&lt;br /&gt;
| 6 || 47.8 || 7:58 (10 batches, 9x10, 6)&lt;br /&gt;
|-&lt;br /&gt;
| 8 || 43.6 || 8:43 (12 batches)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
If we use more than 4 processes per job in this case, it will actually take us longer to do all our runs!  For jobs that scale better with the number of processes (this could be a different program, or the same program with different problem size), we will find this turnover point to be at higher &amp;lt;math&amp;gt;P&amp;lt;/math&amp;gt;; for jobs that scale worse, lower &amp;lt;math&amp;gt;P&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
===Weak Scaling Tests===&lt;br /&gt;
&lt;br /&gt;
[[Image:weak-scaling-example.png|thumb|right|320px|An example of a weak scaling test]]&lt;br /&gt;
&lt;br /&gt;
The strong scaling test described above considers the performance of a parallel code with a fixed work size as the number of processors varies; this tells us how the parallel overhead behaves as you go to more and more processors.   A weak scaling test fixes the amount of work '''per processor''' and compares the execution time over number of processors.   Since each processor has the same amount to do, in the ideal case the execution time should remain constant.   While the strong scaling test tells you how the parallel overhead scales with &amp;lt;math&amp;gt;P&amp;lt;/math&amp;gt;, the weak scaling test tells you something weaker -- whether the parallel overhead varies faster or slower than the amount of work.   &lt;br /&gt;
&lt;br /&gt;
Nonetheless, the weak scaling test can be the relevant one for determining how large a problem size one can efficiently compute with a given parallel code and system.    An example of results for a weak scaling test on the GPC and TCS up to 256 processors (8 nodes of the TCS, 32 of the GPC) is shown to the right.   In this case we are maintaining extremely good efficiency up to at least 128 processors with constant work per process on both architectures.  It is possible to see different behaviour when first filling up a node (eg, for less than 8 processes for the GPC, or 64 for TCS) than when one starts crossing  nodes; one should understand this but it doesn't necessarily indicate problems.&lt;br /&gt;
&lt;br /&gt;
==Performance Tuning==&lt;br /&gt;
&lt;br /&gt;
'''You cannot improve what you cannot measure.'''   Performance tuning is an iterative process between running an '''instrumented''' version of your code, getting data on performance throughout the code, and attempting to make chances to the code that will make it run more efficiently.&lt;br /&gt;
&lt;br /&gt;
There are three main ways of instrumenting a code to find its performance.  The first is '''manually adding timers''' around important parts of the code to find out how much time is spent in each part.   This is worth thinking about doing when putting together a new code, as it means that you'll have a very robust way of finding out how well the different parts of the code perform on different platforms and with different compiler options, etc..  The results are, however, necessarily very coarse-grained; they are very useful for comparing performance under different situations, but give very little information about whether or not there are performance problems or what they might be.&lt;br /&gt;
&lt;br /&gt;
The second technique is '''sampling''', sometimes called `program counter sampling' or `statistical sampling'.   In this case, the program is run in an environment where it is interrupted briefly at some set frequency (typically something like 100 times per second) and the location of the program counter is jotted down before the program is resumed.  At the end of the program, these locations are translated into locations in the source code, and one has a statistical profile of where the program has spent its time.  &lt;br /&gt;
&lt;br /&gt;
Statistical sampling has several advantages.  It has a very low overhead --- the sampling procedure for instance takes much less time than a function call to a timer routine --- so that the program runs much as it would without the measurement process.  If the samples are taken often enough, the result is a very accurate picture of where your program is spending its time, allowing you to very quickly identify `hotspots' in the code and focus your attention on the most costly areas of the program.   This combination of relevant information and low-overhead makes statistical sampling the first resort for serious performance measurement.&lt;br /&gt;
&lt;br /&gt;
Sampling, however, has drawbacks.  While it lets you know where the program is spending its time, it doesn't tell you why, or how it got there in the first place.   For instance, in a parallel program you may be spending too much time in barriers of one sort or another (perhaps at &amp;lt;tt&amp;gt;MPI_WAITALL&amp;lt;/tt&amp;gt; calls in MPI, or implicit barriers at the end of &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; sections in OpenMP) but unless you know where in the code that routine was called from, you can't address the problem.   In this case you need some sort of '''trace''' through the program which keeps track of which routine called what.   This is generally a much heavier-weight process, which can substantially increase the runtime of the code, running the risk of 'the Heisenburg effect' - measurement changing the system under observation.  On the other hand, sometimes you just need that level of information, so tracing packages or libraries must be used.&lt;br /&gt;
&lt;br /&gt;
A related method is the use of '''hardware counters''' --- counters within the CPU itself which keep track of performance-related information, such as the number of cache misses or branch mis-predictions within your code.   Using this information, either regularly throughout the code or once for the entire code run can give very specific information about performance problems.   Right now these counters are available on the TCS system but not on the GPC system, as the mainstream Linux kernel does not provide access to these counters.&lt;br /&gt;
&lt;br /&gt;
===Simple Timer Wrappers: C, FORTRAN===&lt;br /&gt;
&lt;br /&gt;
Below are some simple examples of timers in C and FORTAN which can be called before and after blocks of code to give wallclock times (in seconds) to give coarse-grained timings for sections of your code.   Other approaches are possible.&lt;br /&gt;
&lt;br /&gt;
Simple timers in C:&lt;br /&gt;
&amp;lt;source lang=c&amp;gt;&lt;br /&gt;
void tick(struct timeval *t) {&lt;br /&gt;
    gettimeofday(t, NULL);&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
/* returns time in seconds from now to time described by t */&lt;br /&gt;
double tock(struct timeval *t) {&lt;br /&gt;
    struct timeval now;&lt;br /&gt;
    gettimeofday(&amp;amp;now, NULL);&lt;br /&gt;
    return (double)(now.tv_sec - t-&amp;gt;tv_sec) + ((double)(now.tv_usec - t-&amp;gt;tv_usec)/1000000.);&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and how to use them:&lt;br /&gt;
&amp;lt;source lang=c&amp;gt;&lt;br /&gt;
#include &amp;lt;sys/time.h&amp;gt;&lt;br /&gt;
struct timeval init, calc, io;&lt;br /&gt;
double inittime, calctime, iotime;&lt;br /&gt;
&lt;br /&gt;
    /*... */&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
tick(&amp;amp;init);&lt;br /&gt;
/* do initialization */&lt;br /&gt;
inittime = tock(&amp;amp;init);&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
tick(&amp;amp;calc);&lt;br /&gt;
/* do big computation */&lt;br /&gt;
calctime = tock(&amp;amp;calc);&lt;br /&gt;
&lt;br /&gt;
tick(&amp;amp;io);&lt;br /&gt;
/* do IO */&lt;br /&gt;
iotime = tock(&amp;amp;io);&lt;br /&gt;
&lt;br /&gt;
printf(&amp;quot;Timing summary:\n\tInit: %8.5f sec\n\tCalc: %8.5f sec\n\tI/O : %8.5f sec\n&amp;quot;,&lt;br /&gt;
        inittime, calctime, iotime);&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Simple timers in FORTRAN:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=fortran&amp;gt;&lt;br /&gt;
subroutine tick(t)&lt;br /&gt;
    integer, intent(OUT) :: t&lt;br /&gt;
&lt;br /&gt;
    call system_clock(t)&lt;br /&gt;
end subroutine tick&lt;br /&gt;
&lt;br /&gt;
! returns time in seconds from now to time described by t &lt;br /&gt;
real function tock(t)&lt;br /&gt;
    integer, intent(in) :: t&lt;br /&gt;
    integer :: now, clock_rate&lt;br /&gt;
&lt;br /&gt;
    call system_clock(now,clock_rate)&lt;br /&gt;
&lt;br /&gt;
    tock = real(now - t)/real(clock_rate)&lt;br /&gt;
end function tock&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And using them:&lt;br /&gt;
&amp;lt;source lang=fortran&amp;gt;&lt;br /&gt;
 call tick(calc)&lt;br /&gt;
!  do big calculation&lt;br /&gt;
 calctime = tock(calc)&lt;br /&gt;
&lt;br /&gt;
 print *,'Timing summary'&lt;br /&gt;
 print *,'Calc: ', calctime&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Command-line Performance Tools==&lt;br /&gt;
&lt;br /&gt;
Many of the tools below can be used to examine both serial and parallel performance problems with a code.  We'd like to encourage you to tune serial performance first.  Worrying about parallel performance before the code performs well with a single task doesn't make much sense!  Profiling your code when running with one task  allows you to spot serial `hot spots' for optimization, as well as giving you more detailed understanding of where your program spends its time.   Further, any performance you make in the serial code will automatically speed up your parallel code.&lt;br /&gt;
&lt;br /&gt;
We've already talked about coarse-grained measurements such as timers within the code and using tools such as &amp;lt;tt&amp;gt;/bin/time&amp;lt;/tt&amp;gt;.  These are very useful for comparing overall performance between different platforms/parameters, but we won't need to discuss them further here.&lt;br /&gt;
&lt;br /&gt;
===gprof (profiling: everywhere)===&lt;br /&gt;
&lt;br /&gt;
A statistical sampling workhorse is &amp;lt;tt&amp;gt;gprof&amp;lt;/tt&amp;gt;, the GNU version of an old common Unix utility called prof.  To use this, the code must be re-compiled with both source-code symbols intact (&amp;lt;tt&amp;gt;-g&amp;lt;/tt&amp;gt;) and with profiling information available (for most compilers, this is &amp;lt;tt&amp;gt;-pg&amp;lt;/tt&amp;gt;; for the IBM compilers (xlf, xlc, xlC) it is &amp;lt;tt&amp;gt;-p&amp;lt;/tt&amp;gt;).  It is worth knowing because of its ubiquity, and because it contains much of the functionality of newer tools so that the same concepts occur in other concepts.&lt;br /&gt;
&lt;br /&gt;
So let's consider the following trivial program &amp;lt;tt&amp;gt;pi.c&amp;lt;/tt&amp;gt;:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;c&amp;quot;&amp;gt;&lt;br /&gt;
#include &amp;lt;stdio.h&amp;gt;&lt;br /&gt;
#include &amp;lt;stdlib.h&amp;gt;&lt;br /&gt;
#include &amp;lt;time.h&amp;gt;&lt;br /&gt;
&lt;br /&gt;
double calc_pi(long n) {&lt;br /&gt;
    long in = 0;&lt;br /&gt;
    long out = 0;&lt;br /&gt;
    long i;&lt;br /&gt;
    double x,y;&lt;br /&gt;
&lt;br /&gt;
    for (i=0; i&amp;lt;n; i++) {&lt;br /&gt;
        x = drand48();&lt;br /&gt;
        y = drand48();&lt;br /&gt;
        if (x*x+y*y &amp;lt; 1) {&lt;br /&gt;
            in++;&lt;br /&gt;
        } else {&lt;br /&gt;
            out++;&lt;br /&gt;
        }&lt;br /&gt;
    }&lt;br /&gt;
&lt;br /&gt;
    return 4.*(double)in/(double)(in+out);&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
int main(int argc, char **argv) {&lt;br /&gt;
    long n, defaultn=100000;&lt;br /&gt;
    double pi;&lt;br /&gt;
    time_t t;&lt;br /&gt;
&lt;br /&gt;
    /* seed random number generator */&lt;br /&gt;
    srand48(time(&amp;amp;t));&lt;br /&gt;
&lt;br /&gt;
    /* get number of tries */&lt;br /&gt;
    if (argc &amp;lt; 2 || (n=atoi(argv[1]))&amp;lt;1) {&lt;br /&gt;
        n = defaultn;&lt;br /&gt;
        printf(&amp;quot;Using default n = %ld\n&amp;quot;, n);&lt;br /&gt;
    }&lt;br /&gt;
&lt;br /&gt;
    pi = calc_pi(n);&lt;br /&gt;
    printf(&amp;quot;Pi = %lf\n&amp;quot;, pi);&lt;br /&gt;
&lt;br /&gt;
    return 0;&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/source&amp;gt;   &lt;br /&gt;
&lt;br /&gt;
We can compile this with profiling on and run it:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ gcc -g -pg -o pi pi.c&lt;br /&gt;
$ ./pi 100000000&lt;br /&gt;
Pi = 3.141804&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
(Note that this isn't a very good way of calculating pi!).  On exit, this program creates a file called &amp;lt;tt&amp;gt;gmon.out&amp;lt;/tt&amp;gt;; this contains the profiling information about the run of the code.  We can take a look at this by using &amp;lt;tt&amp;gt;gprof&amp;lt;/tt&amp;gt;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ gprof pi gmon.out &lt;br /&gt;
Flat profile:&lt;br /&gt;
&lt;br /&gt;
Each sample counts as 0.01 seconds.&lt;br /&gt;
  %   cumulative   self              self     total           &lt;br /&gt;
 time   seconds   seconds    calls  ms/call  ms/call  name    &lt;br /&gt;
100.88      1.00     1.00        1   998.76   998.76  calc_pi&lt;br /&gt;
&lt;br /&gt;
index % time    self  children    called     name&lt;br /&gt;
                1.00    0.00       1/1           main [2]&lt;br /&gt;
[1]    100.0    1.00    0.00       1         calc_pi [1]&lt;br /&gt;
-----------------------------------------------&lt;br /&gt;
                                                 &amp;lt;spontaneous&amp;gt;&lt;br /&gt;
[2]    100.0    0.00    1.00                 main [2]&lt;br /&gt;
                1.00    0.00       1/1           calc_pi [1]&lt;br /&gt;
-----------------------------------------------&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first part tells us that essentially all of the time spent running was in the &amp;lt;tt&amp;gt;calc_pi()&amp;lt;/tt&amp;gt; routine (of course), and the second part attepts to be a call graph, showing that &amp;lt;tt&amp;gt;main&amp;lt;/tt&amp;gt; called &amp;lt;tt&amp;gt;calc_pi()&amp;lt;/tt&amp;gt; once.    An important concept in the timing is the `self' and `children' times for each routine, sometimes called the inclusive and exclusive time.   Because most routines call many other routines, its often useful to distinguish between the total amount of time spent between starting and ending the routine (the `inclusive' time) and that same time excluding the time spent in child routines (`exclusive' time).  &lt;br /&gt;
&lt;br /&gt;
The above results are fairly trivial and not very useful for this simple program, but in more complicated routines it can be very valuable to narrow down hotspots to particular regions of code.&lt;br /&gt;
&lt;br /&gt;
[[Image:Xprofiler.png|thumb|300px|The AIX tool &amp;lt;tt&amp;gt;Xprof&amp;lt;/tt&amp;gt; gives a visual representation of the &amp;lt;tt&amp;gt;gprof&amp;lt;/tt&amp;gt; output.]]&lt;br /&gt;
&lt;br /&gt;
In fact, gprof also allows you to view the time spent in the code by lines of code.   As you chop the program up finer, the statistical sampling gets less accurate; thus to look at the results by line of code you must be sure that your sample run was long enough to get meaningful data.  But the results can be extremely useful:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ gprof --line pi gmon.out&lt;br /&gt;
Flat profile:&lt;br /&gt;
&lt;br /&gt;
Each sample counts as 0.01 seconds.&lt;br /&gt;
  %   cumulative   self              self     total           &lt;br /&gt;
 time   seconds   seconds    calls  Ts/call  Ts/call  name    &lt;br /&gt;
 70.31      0.70     0.70                             calc_pi (pi.c:14 @ 40078b)&lt;br /&gt;
 14.27      0.84     0.14                             calc_pi (pi.c:17 @ 4007bc)&lt;br /&gt;
  5.10      0.89     0.05                             calc_pi (pi.c:11 @ 4007c1)&lt;br /&gt;
  4.08      0.93     0.04                             calc_pi (pi.c:15 @ 4007b5)&lt;br /&gt;
  3.06      0.96     0.03                             calc_pi (pi.c:13 @ 400781)&lt;br /&gt;
  2.55      0.98     0.03                             calc_pi (pi.c:12 @ 400777)&lt;br /&gt;
  1.53      1.00     0.02                             calc_pi (pi.c:11 @ 40076d)&lt;br /&gt;
  0.00      1.00     0.00        1     0.00     0.00  calc_pi (pi.c:5 @ 40074c)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where now we can see that the single line containing the radius calculation (&amp;lt;tt&amp;gt;if (x*x+y*y &amp;lt; 1)&amp;lt;/tt&amp;gt;) is 70% of the work for the entire program.  This tells you where you should spend your time to optimize the code.   Other tools exist for this sort of line-by-line analysis; &amp;lt;tt&amp;gt;gcov&amp;lt;/tt&amp;gt; in the gcc compiler suite counts the number of times a given source line is executed - the idea was for coverage analysis for test suites, but it certainly can be used for profiling as well; however, usually the amount of time spent at a line is more important than the number of executions.&lt;br /&gt;
&lt;br /&gt;
For parallel programs, &amp;lt;tt&amp;gt;gprof&amp;lt;/tt&amp;gt; will generally output a seperate &amp;lt;tt&amp;gt;gmon.out&amp;lt;/tt&amp;gt; file for each process; for threaded applications, output for all threads will be summed into the same &amp;lt;tt&amp;gt;gmon.out&amp;lt;/tt&amp;gt;.   It may be useful to sum up all the results and view them with gprof or to look at them individually.&lt;br /&gt;
&lt;br /&gt;
There are other tools for looking at the same data.   For instance, on the TCS system, the command &amp;lt;tt&amp;gt;Xprof&amp;lt;/tt&amp;gt; &lt;br /&gt;
(run the same way as &amp;lt;tt&amp;gt;gprof&amp;lt;/tt&amp;gt;; &amp;lt;tt&amp;gt;Xprof program_name gmon.out&amp;lt;/tt&amp;gt;) lets you look at the call tree as a graphical tree.  Each routine is shown by a block with a size proportional to the time spent in each routine; the width is the inclusive time, and the height is the exclusive time.&lt;br /&gt;
&lt;br /&gt;
===hpmcount (performance counters: TCS)===&lt;br /&gt;
&lt;br /&gt;
On the TCS, &amp;lt;tt&amp;gt;hpmcount&amp;lt;/tt&amp;gt; allows the querying of the performance counter values over the course of a run.  Since here we are simply asking the CPU to report values it obtains during the run of a program, the code does not need to be instrumented; simply typing&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hpmcount hpmcount_args program_name program_args&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
will run the program and output the results from the hardware performance counters at the end.  So for instance, with our trivial pi program above,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tcs-f11n05-$ hpmcount ./pi&lt;br /&gt;
Using default n = 100000&lt;br /&gt;
Pi = 3.144240&lt;br /&gt;
 Execution time (wall clock time): 0.020325 seconds&lt;br /&gt;
&lt;br /&gt;
 ########  Resource Usage Statistics  ########  &lt;br /&gt;
&lt;br /&gt;
 Total amount of time in user mode            : 0.012754 seconds&lt;br /&gt;
 Total amount of time in system mode          : 0.001486 seconds&lt;br /&gt;
 Maximum resident set size                    : 440 Kbytes&lt;br /&gt;
 Average shared memory use in text segment    : 0 Kbytes*sec&lt;br /&gt;
 Average unshared memory use in data segment  : 0 Kbytes*sec&lt;br /&gt;
 Number of page faults without I/O activity   : 53&lt;br /&gt;
 Number of page faults with I/O activity      : 1&lt;br /&gt;
 Number of times process was swapped out      : 0&lt;br /&gt;
 Number of times file system performed INPUT  : 0&lt;br /&gt;
 Number of times file system performed OUTPUT : 0&lt;br /&gt;
 Number of IPC messages sent                  : 0&lt;br /&gt;
 Number of IPC messages received              : 0&lt;br /&gt;
 Number of signals delivered                  : 0&lt;br /&gt;
 Number of voluntary context switches         : 6&lt;br /&gt;
 Number of involuntary context switches       : 0&lt;br /&gt;
&lt;br /&gt;
 #######  End of Resource Statistics  ########&lt;br /&gt;
&lt;br /&gt;
 Set: 1&lt;br /&gt;
 Counting duration: 0.014947083 seconds&lt;br /&gt;
  PM_FPU_1FLOP (FPU executed one flop instruction )          :          400093&lt;br /&gt;
  PM_FPU_FMA (FPU executed multiply-add instruction)         :          500030&lt;br /&gt;
  PM_FPU_FSQRT_FDIV (FPU executed FSQRT or FDIV instruction) :               1&lt;br /&gt;
  PM_CYC (Processor cycles)                                  :        58485795&lt;br /&gt;
  PM_RUN_INST_CMPL (Run instructions completed)              :        24238152&lt;br /&gt;
  PM_RUN_CYC (Run cycles)                                    :        70307511&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
  Utilization rate                                 :          61.172 %&lt;br /&gt;
  Flop                                             :           1.400 Mflop&lt;br /&gt;
  Flop rate (flops / WCT)                          :          68.888 Mflop/s&lt;br /&gt;
  Flops / user time                                :         112.614 Mflop/s&lt;br /&gt;
  FMA percentage                                   :         111.103 %&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
There are a variety of sets of performance counters that can be reported; the default set isn't especially helpful for HPC-type computations; sets of performance counters can be specified on the commandline in the format  &amp;lt;tt&amp;gt;-d -s item,item,item&amp;lt;/tt&amp;gt;.  Sets 5 and 12 are very useful for showing memory performance (showing L1 and L2 cache misses) and set 6 is especially useful for shared memory profiling, giving statistics about how often off-processor memory had to be accessed.&lt;br /&gt;
&lt;br /&gt;
Showing the counters for the entire program will often tell you if there's a problem or not, but won't tell you where it is.  For more detailed information, one can [http://www.ncsa.uiuc.edu/UserInfo/Resources/Software/Tools/HPMToolkit/HPM_2_5_2.AIX.html  use the hpm library] to manually instrument different regions of your code, and get similar outputs to above for several different, smaller, regions of code.&lt;br /&gt;
&lt;br /&gt;
On the linux side, &amp;lt;tt&amp;gt;oprofile&amp;lt;/tt&amp;gt; allows the reporting of similar information, but to use it one must have root access to the linux machine.&lt;br /&gt;
&lt;br /&gt;
===cachegrind (Memory use analysis: GPC)===&lt;br /&gt;
&lt;br /&gt;
[[Image:Kcachegrind.png|thumb|kcachegrind, part of the KDE development package, can give graphical overviews of the output from cachegrind]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;tt&amp;gt;[http://valgrind.org/ valgrind]&amp;lt;/tt&amp;gt; is a memory tool that is usually thought of in terms of finding memory-access bugs in large programs.  Rather than instrumenting a code or measuring counters, valgrind takes a fairly extreme approach -- it emulates your program running on a computer, essentially running a simulation of your program running on the same kind of computer valgrind is running on.   This has enormous overhead (runtimes can be up to 20x as long as normal) but the result is exquisitely detailed information about what your program is doing.&lt;br /&gt;
&lt;br /&gt;
Memory access is often a bottleneck for HPC codes, and cachegrind is a tool for valgrind which simulates the use of cache in your program, giving you line-by-line information on which parts of the code have cache performance issues.  Your code does not need to be recompiled, although compiling with &amp;lt;tt&amp;gt;-g&amp;lt;/tt&amp;gt; is necessary for the output to be useful.   Cachegrind is run as shown:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
valgrind --tool=cachegrind myprogram myprogram_arguments&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Overall results for the whole program are given at the end of the programs normal output, and more detailed information is saved in a file that begins with &amp;lt;tt&amp;gt;cachegrind.out&amp;lt;/tt&amp;gt;.   These output files are XML files - readable in principle by humans, but it is much easier to see what is going on with visual tools like kcachegrind (shown to the right) or, eventually, valkyrie (which also can be used for &amp;lt;tt&amp;gt;memcheck&amp;lt;/tt&amp;gt; output.)&lt;br /&gt;
&lt;br /&gt;
===IPM (MPI Tracing and hardware counters: GPC, TCS)===&lt;br /&gt;
&lt;br /&gt;
[[Image:IPM.png|thumb|IPM generates a series of webpages and graphs summarizing performance of your code which can then be viewed in a web browser]]&lt;br /&gt;
&lt;br /&gt;
IPM is the [Integrated Performance Monitors http://ipm-hpc.sourceforge.net/] which monitor a variety of MPI and hardware performance information.   There are a number of modules for IPM (Integrated Performance Monitors) &lt;br /&gt;
on the TCS and GPC depending on which machine and compilers/MPI you use.&lt;br /&gt;
&lt;br /&gt;
These can be linked in to your MPI executable at link- or run-time, and generate &lt;br /&gt;
detailed output at the end of your run which can be parsed and produce&lt;br /&gt;
a nice set of HTML + graphics.&lt;br /&gt;
&lt;br /&gt;
Running IPM varies slightly on the GPC and the TCS.   In the GPC, because the default for the MPI libraries is to be&lt;br /&gt;
compiled in as dynamic, shared libraries, it is easiest just to compile your program as normal and link in these&lt;br /&gt;
libraries only when you are about to run your program.   In your submission script, run your program as so:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load ipm/[appropriate-version]&lt;br /&gt;
export LD_PRELOAD=${SCINET_IPM_LIB}/libipm.so&lt;br /&gt;
mpirun [...]&lt;br /&gt;
export LD_PRELOAD=&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
On the other hand, on TCS, it is easist to link in the IPM libraries at link time, with&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
...-L${SCINET_IPM_LIB} -lipm&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Either way, once the once the program is finished, in the directory it is run in &lt;br /&gt;
there will be a large XML file with an ugly name like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;[username].[longnumber]&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
to turn this into useful data, do:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load ipm/[appropriate-version]&lt;br /&gt;
$ ipm_parse -html [username].[longnumber]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
That will produce a directory named something like &lt;br /&gt;
&amp;lt;pre&amp;gt;[executable].[username].[number].[node]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
which you can copy back to your &lt;br /&gt;
computer and view the index.html file within for lots of detailed info about your run.&lt;br /&gt;
&lt;br /&gt;
Note that on the GPC, there are IPM modules for &amp;quot;posix&amp;quot; and &amp;quot;mpiio&amp;quot;.  Which of these you use only matters if you want to try to do I/O profiling &lt;br /&gt;
as well as MPI profiling.   If you do want to do IO tracing, you will have to be sure to choose the right module variant (mpiio if you do parallel I/O, either manually with MPI-I/O or through a parallel HDF5 or NetCDF library, and posix otherwise) and statically link the library into your executable at compile time:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
...-L${SCINET_IPM_LIB} -lipm&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Graphical Performance Tools==&lt;br /&gt;
&lt;br /&gt;
While graphical performance tools typically measure similar things to their command-line relations, a graphical display opens up the possibility of aggregating much more information and having increased flexibility in displaying it in a variety of ways; this can be very helpful, especially in the initial stages of finding performance problems.&lt;br /&gt;
&lt;br /&gt;
===OpenSpeedShop (profiling, MPI tracing: GPC)===&lt;br /&gt;
&lt;br /&gt;
[[Image:Speedshop2.png|thumb|OpenSpeedShop, like gprof will tell you where the hotspots are in the code by function]]&lt;br /&gt;
[[Image:Speedshop1.png|thumb|...or by line of code]]&lt;br /&gt;
&lt;br /&gt;
[http://www.openspeedshop.org OpenSpeedShop] is a tool that is installed on GPC; it is currently compiled for support only with gcc and openmpi.   To use it, &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load gcc python intel openmpi openspeedshop &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
It provides the functionality of gprof, with the addition of hardware counter measurements (not currently supported on GPC machines) and options to do both lightweight and more detailed, more heavy-weight profiling.  OpenSpeedShop also contains enhanced support for dealing with parallel runs, and for tracing of MPI or I/O calls to find performance problems to those areas.   The parallel support is considerably more than what &amp;lt;tt&amp;gt;gprof&amp;lt;/tt&amp;gt; offers; bundling the data from thousands of tasks into one set of results is a significant algorithmic challenge in itself.  &lt;br /&gt;
&lt;br /&gt;
Another important addition, shared by many of the other graphical tools, is the idea of bundling results into different `experiments' --- bundles of an executable, measurement type, and resulting data --- which makes the iterative process of performance tuning much easier.  OpenSpeedShop, as with some other tools, has the ability to directly compare the results of different experiments, so one can more easily see if a particular change made things better or worse, and if so where.&lt;br /&gt;
&lt;br /&gt;
OpenSpeedshop does not require re-compilation of the executable (although as with all these tools, for the correlation with the source code to be useful, the code should be compiled with debugging symbols, the option for which is almost universally &amp;lt;tt&amp;gt;-g&amp;lt;/tt&amp;gt; to the compiler and linker.   The code is then either instrumented, or run in an instrumented environment.   Shown to the right are two of the views available for exmaining the timing results of one OpenMP code.&lt;br /&gt;
&lt;br /&gt;
OpenSpeedShop can be launched from the commandline and then used entirely through the gui: there are a variety of `wizards' which guide you through choosing how to instrument and run your experiment:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ openss&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
this should be done from a directory containing the source code and the executable.   This is an excellent way to get started with this tool.  Once one is more familiar with the tool, one can run a variety of experiments on the command line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
openss –f program_name pcsamp&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the above runs the &amp;lt;tt&amp;gt;pcsamp&amp;lt;/tt&amp;gt; (program counter sampling, as in gprof) measurement on the executable &amp;lt;tt&amp;gt;program_name&amp;lt;/tt&amp;gt;.  Then one can launch the gui to view the results.   There are options for instrumenting the executable in a variety of ways, and taking different measurements; the [http://www.openspeedshop.org OpenSpeedShop] web page contains links to documentation and tutorials.&lt;br /&gt;
&lt;br /&gt;
===PeekPerf (profiling, TCS)===&lt;br /&gt;
&lt;br /&gt;
[[Image:PeekPerf.png|thumb|An example of using PeekPerf]]&lt;br /&gt;
&lt;br /&gt;
[http://domino.research.ibm.com/comm/research_projects.nsf/pages/hpct.index.html Peekperf] is IBM's single graphical  `dashboard' providing access to many  performance measurement tools for exmaining Hardware Counter data, threads, message passing, IO, and memory access, several of which are available seperately as command-line tools.  Like OpenSpeedShop, it does not require re-compilation of the executable; an instrumented version of the code is generated at run time and this instrumented version is executed with whatever options you care to pass to it.   It does not have the same support for comparing experiments as OpenSpeedShop does, however it allows running several different types of measurements at once, seeing how they correlate in a given run; this is something OpenSpeedShop doesn't have.&lt;br /&gt;
&lt;br /&gt;
One starts peekperf at the commandline&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ peekperf&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and tell peekperf which executableyou wish to run measurements on.   You then highlight which sorts of measurements you wish to make (which sorts are available depend on the type of program - threaded, OpenMPed, etc), select `generate an instrumented executable', and then `run the instrumented executable', giving it the name of either the instrumented executable or a script to run the same; the program will then begin displaying the resulting data as soon as the program has completed.  &lt;br /&gt;
&lt;br /&gt;
Understanding the interface and resulting data takes some practice, and the documentation is quite sparse; however the flexibility in the range of measurements to take makes this an excellent source of performance information for programs running on the TCS system.&lt;br /&gt;
&lt;br /&gt;
===Scalasca (profiling, tracing: TCS, GPC)===&lt;br /&gt;
[[Image:Scalasca.png|thumb|An example of using Scalasca]]&lt;br /&gt;
&lt;br /&gt;
[http://www.scalasca.org  Scalasca] is a sophisticated tool which takes the aggregation of data shown in the above graphical tools one step further and analyzes the results to pinpoint and display common performance problems; it scales extremely well and the graphical display makes it very easy for the user to find out where the performance issues are.   To use scalasca, on either the TCS or the GPC load the scalasca module:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load scalasca&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Scalasca requires the code to be recompiled, and it has wrapper scripts to select and choose the right options to use.   If your code for instance is normally compiled with&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ifort -c myprog.f &lt;br /&gt;
ifort -o myprog myprog.o -lm&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
then one can instead use&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
scalasca -instrument ifort -c myprog.f&lt;br /&gt;
scalasca -instrument ifort -o myprog myprog.o -lm&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that with the intel compilers, if you're using OpenMP, you'll have to add the -pomp flag, eg &amp;lt;tt&amp;gt;scalasca -instrument -pomp&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Scalasca then parses the rest of the command line and adds the necessary flags.   (If you are curious, &amp;lt;tt&amp;gt;scalasca -instrument -v&amp;lt;/tt&amp;gt; will show you what the resulting command line actually is.)   There is also a shortcut, &amp;lt;tt&amp;gt;skin&amp;lt;/tt&amp;gt;, which is equivalent to &amp;lt;tt&amp;gt;scalasca -instrument&amp;lt;/tt&amp;gt;.  &lt;br /&gt;
&lt;br /&gt;
When the new executable is generated, it's run in a similar way; if you normally run your program as&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./myprog&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mpirun -np 5 ./myprog&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You'd instead do&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
scalasca -analyze ./myprog&lt;br /&gt;
scalasca -analyze mpirun -np 5 ./myprog&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The program will run as usual with only additional lines about output to files.   (Again, there is a shortcut available; &amp;lt;tt&amp;gt;scan&amp;lt;/tt&amp;gt; is equivalent to &amp;lt;tt&amp;gt;scalasca -analyze&amp;lt;/tt&amp;gt;.)  To then look at the results with a graphical user interface, one uses &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
scalasca -examine [epik directory name]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where the directory name is that created by the analyze program.   This tries to pop up an xwindow; from an xterm on a linux or Mac machine, you will have to login via&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh -Y login.scinet.utoronto.ca&lt;br /&gt;
scinet01$ ssh -Y [whichever devel node you're using]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and you will have to run this from one of the devel nodes, rather than one of the compute nodes.&lt;br /&gt;
&lt;br /&gt;
A screenshot of the results is shown to the right for an OpenMP program, where wait times at implicit barriers at the end of parallel sections is selected as an important metric to show on the left; the middle panel shows the call tree indicating the context in which the delays occurred, and the panel on the right gives the breakdown for each thread.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- ===VTune/Thread Profiler (GPC)=== don't have license up for this yet --!&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Common Serial Performance Problems==&lt;br /&gt;
&lt;br /&gt;
===Poor use of cache===&lt;br /&gt;
A classic  problem for scientific codes is memory bandwidth; the capacity to do on-chip floating-point or integer operations has grown much faster than the ability to get numbers onto the chip in the first place.   One way around that is to use various levels of memory cache; when one number is needed from memory, a whole line of data is brought in from (slow) external memory to fast on-chip cache.  This makes the memory access modestly slower, but tends to greatly speed up performance since if you are going to do something to data in one part of memory you're typically going to be also doing to neighboring values.&lt;br /&gt;
&lt;br /&gt;
If you take advantage of data locality --- accessing memory in some kind of order rather than jumping around in memory --- cache can greatly increase the performance of your code.  On the other hand, if you '''do''' jump around in memory a lot, cache will actually hurt your performance.    &lt;br /&gt;
&lt;br /&gt;
The classic way this comes up is in accessing multidimensional arrays.  The example below is simplified; most cases aren't this extreme (or obvious!) but the idea is the same.  Let's consider the following FORTRAN code, which simply iterates a few time through a modestly sized multidimensional array:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;f&amp;quot;&amp;gt;  &lt;br /&gt;
      program memaccess&lt;br /&gt;
&lt;br /&gt;
      integer, parameter :: li=32,lj=32,lk=32,ll=32,lm=32&lt;br /&gt;
      real, dimension(li,lj,lk,ll,lm) :: a&lt;br /&gt;
      integer :: i,j,k,l,m&lt;br /&gt;
      integer :: iter&lt;br /&gt;
      &lt;br /&gt;
      a = 0.&lt;br /&gt;
&lt;br /&gt;
      do iter=1,10&lt;br /&gt;
      do m=1,lm &lt;br /&gt;
         do l=1,ll&lt;br /&gt;
             do k=1,lk&lt;br /&gt;
                 do j=1,lj&lt;br /&gt;
                     do i=1,li&lt;br /&gt;
                        a(i,j,k,l,m) = a(i,j,k,l,m)+ i+j+k+l+m&lt;br /&gt;
                     enddo&lt;br /&gt;
                 enddo&lt;br /&gt;
             enddo&lt;br /&gt;
         enddo&lt;br /&gt;
      enddo&lt;br /&gt;
      enddo&lt;br /&gt;
&lt;br /&gt;
      end program&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
The above program, which we'll suggestively call &amp;lt;tt&amp;gt;memaccess-good.f&amp;lt;/tt&amp;gt;, accesses array elements in the order that FORTRAN places in them in the computer's memory; FORTRAN lays out this array in memory as &amp;lt;tt&amp;gt;[a(1,1,1,1,1), a(2,1,1,1,1),... a(31,1,1,1,1), a(1,2,1,1,1)...]&amp;lt;/tt&amp;gt; and so on.  So by ordering our loops that way we are marching through memory in order, making maximum use of cache.    The resulting code can be timed:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ gfortran -O3 -o memaccess-good memaccess-good.f &lt;br /&gt;
$ time ./memaccess-good&lt;br /&gt;
&lt;br /&gt;
real    0m2.478s&lt;br /&gt;
user    0m2.337s&lt;br /&gt;
sys     0m0.094s&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If we reverse the order of the loops, so they go &amp;lt;tt&amp;gt; do i=1,li.. do j=1,lj,..... do m=1,lm&amp;lt;/tt&amp;gt;, however, we get&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ gfortran -O3 -o memaccess-bad memaccess-bad.f&lt;br /&gt;
$ time ./memaccess-bad&lt;br /&gt;
&lt;br /&gt;
real    0m19.622s&lt;br /&gt;
user    0m19.101s&lt;br /&gt;
sys     0m0.098s&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
A factor of 8 times worse!   Thus tools such as cachegrind can be extremely important for finding significant performance problems in memory-heavy codes.  &lt;br /&gt;
&lt;br /&gt;
C-based languages arrange their arrays the opposite ways in memory, so that the equivalent array in C would go as &amp;lt;tt&amp;gt;[a[0][0][0][0][0], a[0][0][0][0][1], ... a[0][0][0][0][31], a[0][0][0][1][0], ... ]&amp;lt;/tt&amp;gt;; thus `bad' array access in FORTRAN looks like `good' array access in C, and vice versa.&lt;br /&gt;
&lt;br /&gt;
==Common OpenMP Performance Problems==&lt;br /&gt;
&lt;br /&gt;
==Common MPI Performance Problems==&lt;br /&gt;
===Overuse of MPI_BARRIER===&lt;br /&gt;
===Many Small Messages===&lt;br /&gt;
Typically, a the time it takes for a message of size ''n'' to get from one node to another can be expressed in terms of a [[latency]] ''l'' and a [[bandwidth]] ''b'',&lt;br /&gt;
&amp;lt;math&amp;gt;t_c = l + \frac{n}{b} .&amp;lt;/math&amp;gt;&lt;br /&gt;
For small messages, the latency can dominate the cost of sending (and processing!) the message.  By&lt;br /&gt;
bundling many small messages into one, you can amortize that cost over many messages,  reducing&lt;br /&gt;
the time spent communicating.&lt;br /&gt;
===Not overlapping computation and communications===&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=User_Serial&amp;diff=7135</id>
		<title>User Serial</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=User_Serial&amp;diff=7135"/>
		<updated>2014-07-30T20:10:35Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: /* Use a whole node... */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;===General considerations===&lt;br /&gt;
&lt;br /&gt;
====Use whole nodes...====&lt;br /&gt;
&lt;br /&gt;
When you submit a job on a SciNet system, it is run on one (or more than one) entire node - meaning that your job is occupying at least 8 processors during the duration of its run.  The SciNet systems are usually, with many researchers waiting in the queue for computational resources, so we require that you make full use of the nodes that your job is allocated, so other resarchers don't have to wait unnecessarily, and so that your jobs get as much work done for you while they run as possible.&lt;br /&gt;
&lt;br /&gt;
Often, the best way to make full use of the node is to run one large parallel computation; but sometimes it is beneficial to run several serial codes at the same time.  On this page, we discuss ways to run suites of serial computations at once, as efficiently as possible, using the full resources of the node.&lt;br /&gt;
&lt;br /&gt;
====...but not more.====&lt;br /&gt;
&lt;br /&gt;
When running multiple jobs on the same node, it is essential to have a good idea of how much memory the jobs will require. The GPC compute nodes have about 14GB in total available &lt;br /&gt;
to user jobs running on the 8 cores (a bit less, say 13GB, on the devel ndoes &amp;lt;tt&amp;gt;gpc01..04&amp;lt;/tt&amp;gt;, and [[GPC_Quickstart#Memory_Configuration|somewhat more for some compute nodes]])&lt;br /&gt;
So the jobs also have to be  bunched in ways that will fit into 14GB.  If they use more than this, they will crash the node, inconveniencing you and other researchers waiting for that node.&lt;br /&gt;
&lt;br /&gt;
If that's not possible -- each individual job requires significantly in excess of ~1.75GB -- then it's possible to just run fewer jobs so that they do fit; but then, again there is an under-utilization problem.   In that case, the jobs are likely candidates for parallelization, and you can contact us at [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;] and arrange a meeting with one of the technical analysts to help you do just that.&lt;br /&gt;
&lt;br /&gt;
If the memory requirements allow it, you could actually run more than 8 jobs at the same time, up to 16, exploiting the [[GPC_Quickstart#HyperThreading | HyperThreading]] feature of the Intel Nehalem cores.  It may seem counterintuitive, but running 16 jobs on 8 cores for certain types of tasks has increased some users overall throughput by 10 to 30 percent.&lt;br /&gt;
&lt;br /&gt;
====Is your job really serial?====&lt;br /&gt;
&lt;br /&gt;
While your program may not be explicitly parallel, it may use some of SciNet's threaded libraries for numerical computations, which can make use of multiple processors.  In particular, SciNet's [[Python]] and [[R_Statistical_Package | R]] modules are compiled with aggressive optimization and using threaded numerical libraries which by default will make use of multiple cores for computations such as large matrix operations.  This can greatly speed up individual runs, but by less (usually much less) than a factor of 8.  If you do have many such computations to do, your [[Introduction_To_Performance#Throughput | throughput]] will be better - you will get more calculations done per unit time -if you turn off the threading and run multiple such computations at once.  Threading is turned off with the shell script line &amp;lt;tt&amp;gt;export OMP_NUM_THREADS=1&amp;lt;/tt&amp;gt;; that line will be included in the scripts below.  &lt;br /&gt;
&lt;br /&gt;
If your calculations do implicitly use threading, you may want to experiment to see what gives you the best performance - you may find that running 4 (or even 8) jobs with 2 threads each (&amp;lt;tt&amp;gt;OMP_NUM_THREADS=2&amp;lt;/tt&amp;gt;), or 2 jobs with 4 threads, gives better performance than 8 jobs with 1 thread (and almost certainly better than 1 job with 8 threads).  We'd encourage to you to perform exactly such a [[Introduction_To_Performance#Strong_Scaling_Tests | scaling test]]; for a small up-front investment in time you may significantly speed up all the computations you need to do.&lt;br /&gt;
&lt;br /&gt;
===Serial jobs of similar duration===&lt;br /&gt;
&lt;br /&gt;
The most straightforward way to run multiple serial jobs is to bunch the jobs in groups of 8 or more that will take roughly the same amount of time, and create a job that looks a &lt;br /&gt;
bit like this&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple serial jobs on&lt;br /&gt;
# SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N serialx8&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# Turn off implicit threading in Python, R&lt;br /&gt;
export OMP_NUM_THREADS=1&lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND; ampersand off 8 jobs and wait&lt;br /&gt;
(cd jobdir1; ./dojob1) &amp;amp;&lt;br /&gt;
(cd jobdir2; ./dojob2) &amp;amp;&lt;br /&gt;
(cd jobdir3; ./dojob3) &amp;amp;&lt;br /&gt;
(cd jobdir4; ./dojob4) &amp;amp;&lt;br /&gt;
(cd jobdir5; ./dojob5) &amp;amp;&lt;br /&gt;
(cd jobdir6; ./dojob6) &amp;amp;&lt;br /&gt;
(cd jobdir7; ./dojob7) &amp;amp;&lt;br /&gt;
(cd jobdir8; ./dojob8) &amp;amp;&lt;br /&gt;
wait&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are four important things to take note of here.  First, the &amp;lt;tt&amp;gt;'''wait'''&amp;lt;/tt&amp;gt;&lt;br /&gt;
command at the end is crucial; without it the job will terminate &lt;br /&gt;
immediately, killing the 8 programs you just started.&lt;br /&gt;
&lt;br /&gt;
Second is that it is important to group the programs by how long they &lt;br /&gt;
will take.   If (say) &amp;lt;tt&amp;gt;dojob8&amp;lt;/tt&amp;gt; takes 2 hours and the rest only take 1, &lt;br /&gt;
then for one hour 7 of the 8 cores on the GPC node are wasted; they are &lt;br /&gt;
sitting idle but are unavailable for other users, and the utilization of &lt;br /&gt;
this node over the whole run is only 56%.   This is the sort of thing &lt;br /&gt;
we'll notice, and users who don't make efficient use of the machine will &lt;br /&gt;
have their ability to use scinet resources reduced.  If you have many serial jobs of varying length, &lt;br /&gt;
use the submission script to balance the computational load, as explained [[ #Serial jobs of varying duration | below]].&lt;br /&gt;
&lt;br /&gt;
Third, we reiterate that if memory requirements allow it, you should try to run more than 8 jobs at once, with a maximum of 16 jobs.&lt;br /&gt;
&lt;br /&gt;
===GNU Parallel===&lt;br /&gt;
&lt;br /&gt;
GNU parallel is a really nice tool written by Ole Tange to run multiple serial jobs in&lt;br /&gt;
parallel. It allows you to keep the processors on each 8core node busy, if you provide enough jobs to do.&lt;br /&gt;
&lt;br /&gt;
GNU parallel is accessible on the GPC in the module&lt;br /&gt;
&amp;lt;tt&amp;gt;gnu-parallel&amp;lt;/tt&amp;gt;:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
module load gnu-parallel/20140622&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Note that there are several versions of gnu-parallel installed on the GPC; we recommend using the newer version. &lt;br /&gt;
&lt;br /&gt;
The citation for GNU Parallel is: O. Tange (2011): GNU Parallel - The Command-Line Power Tool, '';login: The USENIX Magazine,'' February 2011:42-47.&lt;br /&gt;
&lt;br /&gt;
It is easiest to demonstrate the usage of GNU parallel by&lt;br /&gt;
examples. Suppose you have 16 jobs to do, that these jobs duration varies quite a bit, but that the average job duration is around 10 hours. You could use the following script:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple serial jobs on SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=24:00:00&lt;br /&gt;
#PBS -N gnu-parallel-example&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# Turn off implicit threading in Python, R&lt;br /&gt;
export OMP_NUM_THREADS=1&lt;br /&gt;
&lt;br /&gt;
module load gnu-parallel/20140622  &lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND&lt;br /&gt;
parallel -j 8 &amp;lt;&amp;lt;EOF&lt;br /&gt;
  cd jobdir1; ./dojob1; echo &amp;quot;job 1 finished&amp;quot;&lt;br /&gt;
  cd jobdir2; ./dojob2; echo &amp;quot;job 2 finished&amp;quot;&lt;br /&gt;
  cd jobdir3; ./dojob3; echo &amp;quot;job 3 finished&amp;quot;&lt;br /&gt;
  cd jobdir4; ./dojob4; echo &amp;quot;job 4 finished&amp;quot;&lt;br /&gt;
  cd jobdir5; ./dojob5; echo &amp;quot;job 5 finished&amp;quot;&lt;br /&gt;
  cd jobdir6; ./dojob6; echo &amp;quot;job 6 finished&amp;quot;&lt;br /&gt;
  cd jobdir7; ./dojob7; echo &amp;quot;job 7 finished&amp;quot;&lt;br /&gt;
  cd jobdir8; ./dojob8; echo &amp;quot;job 8 finished&amp;quot;&lt;br /&gt;
  cd jobdir9; ./dojob9; echo &amp;quot;job 9 finished&amp;quot;&lt;br /&gt;
  cd jobdir10; ./dojob10; echo &amp;quot;job 10 finished&amp;quot;&lt;br /&gt;
  cd jobdir11; ./dojob11; echo &amp;quot;job 11 finished&amp;quot;&lt;br /&gt;
  cd jobdir12; ./dojob12; echo &amp;quot;job 12 finished&amp;quot;&lt;br /&gt;
  cd jobdir13; ./dojob13; echo &amp;quot;job 13 finished&amp;quot;&lt;br /&gt;
  cd jobdir14; ./dojob14; echo &amp;quot;job 14 finished&amp;quot;&lt;br /&gt;
  cd jobdir15; ./dojob15; echo &amp;quot;job 15 finished&amp;quot;&lt;br /&gt;
  cd jobdir16; ./dojob16; echo &amp;quot;job 16 finished&amp;quot;&lt;br /&gt;
EOF&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;-j8&amp;lt;/tt&amp;gt; parameter sets the number of jobs to run at the same time, but 16 jobs are lined up. Initially, 8 jobs are given to the 8 processors on the node. When one of the processors is done with its assigned job, it will get a next job instead of sitting idle until the other processors are done. While you would expect that on average this script should take 20 hours (each processor on average has to complete two jobs of 10hours), there's a good chance that one of the processors gets two jobs that take more than 10 hours, so the job script requests 24 hours. How much more time you should ask for in practice depends on the spread in run times of the separate jobs.&lt;br /&gt;
&lt;br /&gt;
===Serial jobs of varying duration===&lt;br /&gt;
&lt;br /&gt;
If you have a lot (50+) of relatively short serial runs to do, '''of which the walltime varies''', and if you know that eight jobs fit in memory without memory issues, then writing all the command explicitly in the jobscript can get tedious. If you follw a convention in that the jobs are all started by auxiliary scripts called jobs&amp;lt;something&amp;gt;, the following strategy in your submission script would maximize the cpu utilization. &lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple, dynamically-run &lt;br /&gt;
# serial jobs on SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N serialdynamic&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# Turn off implicit threading in Python, R&lt;br /&gt;
export OMP_NUM_THREADS=1&lt;br /&gt;
&lt;br /&gt;
module load gnu-parallel/20140622  &lt;br /&gt;
&lt;br /&gt;
# COMMANDS ARE ASSUMED TO BE SCRIPTS CALLED job*.sh&lt;br /&gt;
echo job*.sh | tr ' ' '\n' | parallel -j 8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Notes:&lt;br /&gt;
* As before, GNU Parallel keeps 8 jobs running at a time, and if one finishes, starts the next. This is an easy way to do ''load balancing''.&lt;br /&gt;
* You can in fact run more or less than 8 processes per node by modifying &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt;'s &amp;lt;tt&amp;gt;-j8&amp;lt;/tt&amp;gt; argument.&lt;br /&gt;
* Doing many serial jobs often entails doing many disk reads and writes, which can be detrimental to the performance. In that case, running from the ramdisk may be an option.  &lt;br /&gt;
* When using a ramdisk, make sure you copy your results from the ramdisk back to the scratch after the runs, or when the job is killed because time has run out.&lt;br /&gt;
* More details on how to setup your script to use the ramdisk can be found on the [[User_Ramdisk|Ramdisk wiki page]].&lt;br /&gt;
* This script optimizes resource utility, but can only use 1 node (8 cores) at a time. The next section addresses how to use more nodes.&lt;br /&gt;
&lt;br /&gt;
===Version for more than 8 cores at once (still serial)===&lt;br /&gt;
&lt;br /&gt;
If you have hundreds of serial jobs that you want to run concurrently and the nodes are available, then the approach above, while useful, would require tens of scripts to be submitted separately. It is possible for you to request more than one node and to use the following routine to distribute your processes amongst the cores. In this case, it is important to use the newer version of GNU parallel installed on the GPC.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple, dynamically-run &lt;br /&gt;
# serial jobs on SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=25:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N serialdynamicMulti&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# Turn off implicit threading in Python, R&lt;br /&gt;
export OMP_NUM_THREADS=1&lt;br /&gt;
&lt;br /&gt;
module load gnu-parallel/20140622&lt;br /&gt;
&lt;br /&gt;
# START PARALLEL JOBS USING NODE LIST IN $PBS_NODEFILE&lt;br /&gt;
seq 800 | parallel -j8 --sshloginfile $PBS_NODEFILE --workdir $PWD ./myrun {}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Explanation:&lt;br /&gt;
* &amp;lt;tt&amp;gt;seq 800&amp;lt;/tt&amp;gt; outputs the numbers 1 through 800 on separate lines. This output is piped to (ie becomes the input of) the &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; command.&lt;br /&gt;
* The use of the &amp;quot;seq 800&amp;quot; is that each line that you give to &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; defines a new job. So here, there are 800 jobs.&lt;br /&gt;
* Each job runs a command, but because the numbers generated by seq are not commands, a real command is constructed, in this case, by the argument &amp;lt;tt&amp;gt;./myrun {}&amp;lt;/tt&amp;gt;. Here &amp;lt;tt&amp;gt;myrun&amp;lt;/tt&amp;gt; is supposed to be the name of the application to run. The two curly brackets &amp;lt;tt&amp;gt;{}&amp;lt;/tt&amp;gt; get replaced by the line from the input, that is, by one of the numbers.&lt;br /&gt;
* So parallel will run the 800 commands:&amp;lt;br/&amp;gt;./myrun 1&amp;lt;br/&amp;gt;./myrun 2&amp;lt;br/&amp;gt;...&amp;lt;br/&amp;gt;./myrun 800&lt;br /&gt;
* The parameter &amp;lt;tt&amp;gt;--sshloginfile $PBS_NODEFILE&amp;lt;/tt&amp;gt; tells &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; to look for the file named $PBS_NODEFILE which contains the host names of the nodes assigned to the current job (as stated above, it is automatically generated).&lt;br /&gt;
* The parameter &amp;lt;tt&amp;gt;-j8&amp;lt;/tt&amp;gt; tells &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; to run 8 of these at a time on each of the hosts.&lt;br /&gt;
* The &amp;lt;tt&amp;gt;--workdir $PWD&amp;lt;/tt&amp;gt; sets the working directory on the other nodes to the working directory on the first node. Without this, the run tries to start from the wrong place and will most likely fail (unless using the latest gnu parallel module, 20130422, which by default puts you in $PWD on the remote node).&lt;br /&gt;
* Loaded modules should get automatically loaded on the remote nodes too for the latest gnu parallel module, but not for earlier ones.&lt;br /&gt;
* If you need an environment variable to be transfered from the job script to the remotely running subjobs, use &amp;lt;tt&amp;gt;--env ENVIRONMENTVARIABLE&amp;lt;/tt&amp;gt;.  SciNet's gnu-parallel modules automatically transfer &amp;lt;tt&amp;gt;OMP_NUM_THREADS&amp;lt;/tt&amp;gt;, and typical environment variables set by most modules.&lt;br /&gt;
&lt;br /&gt;
Notes:&lt;br /&gt;
* Of course, this is just an example of what you could do with gnu parallel. How you set up your specific run depends on how each of the runs would be started. One could for instance also prepare a file of commands to run and make that the input to parallel as well.&lt;br /&gt;
* Note that submitting several bunches to single nodes, as in the section above, is a more failsafe way of proceeding, since a node failure would only affect one of these bunches, rather than all runs. &lt;br /&gt;
* GNU Parallel can be passed a file with the list of nodes to which to ssh, using &amp;lt;tt&amp;gt;--sshloginfile&amp;lt;/tt&amp;gt; (thanks to Ole Tange for pointing this out). This list is automatically generated by the scheduler and its name is made available in the environment variable $PBS_NODEFILE.&lt;br /&gt;
* Alternatively, GNU Parallel can take a comma separated list of nodes given to its -S argument, but this would need to be constructed from the file $PBS_NODEFILE which contains all nodes assigned to the job, with each node duplicated 8x for the number of cores on each node.&lt;br /&gt;
* GNU Parallel can reads lines of input and convert those to arguments in the execution command. The execution command is the last argument given to &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt;, with &amp;lt;tt&amp;gt;{}&amp;lt;/tt&amp;gt; replaces by the lines on input.&lt;br /&gt;
* &amp;lt;span style=&amp;quot;color:red;&amp;quot;&amp;gt;The --workdir argument is essential: it sets the working directory on the other nodes, which would default to your home directory if omitted. Since /home is read-only on the compute nodes, you would not like not get any output at all!&amp;lt;/span&amp;gt;&amp;lt;br&amp;gt;This is no longer true for the latest GNU Parallel modules (20130422), which puts you in the current directory on the remote nodes.&lt;br /&gt;
* We reiterate that if memory requirements allow it, you should try to run more than 8 jobs at once, with a maximum of 16 jobs. You can run more or fewer than 8 processes per node by modifying the -j8 parameter to the parallel command.&lt;br /&gt;
&lt;br /&gt;
===More on GNU parallel=== &lt;br /&gt;
* [[Media:Tech-talk-gnu-parallel.pdf|Slides of the SciNet TechTalk on Gnu Parallel (14 Nov 2012)]]&lt;br /&gt;
* The documentation for GNU parallel can be found at http://www.gnu.org/software/parallel/&lt;br /&gt;
* Its man page can be found here http://www.gnu.org/software/parallel/man.html&lt;br /&gt;
* The man page is also available on the GPC when the gnu-parallel module is loaded, with the command &amp;lt;code&amp;gt;$ man parallel&amp;lt;/code&amp;gt;. The man page contains options, such as how to make sure the output is not all scrambled, and examples.&lt;br /&gt;
&lt;br /&gt;
===GNU Parallel Reference===&lt;br /&gt;
* O. Tange (2011): GNU Parallel - The Command-Line Power Tool, '';login: The USENIX Magazine,'' February 2011:42-47.&lt;br /&gt;
&lt;br /&gt;
===Older scripts===&lt;br /&gt;
&lt;br /&gt;
Older scripts, which mimicked some of GNU parallel functionality, can be found on the [[Deprecated scripts]] page.&lt;br /&gt;
&lt;br /&gt;
--[[User:Rzon|Rzon]] 02:22, 14 Nov 2010 (UTC)&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=User_Serial&amp;diff=7134</id>
		<title>User Serial</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=User_Serial&amp;diff=7134"/>
		<updated>2014-07-30T20:10:10Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: /* ...but not more. */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;===General considerations===&lt;br /&gt;
&lt;br /&gt;
====Use a whole node...====&lt;br /&gt;
&lt;br /&gt;
When you submit a job on a SciNet system, it is run on one (or more than one) entire node - meaning that your job is occupying at least 8 processors during the duration of its run.  The SciNet systems are usually, with many researchers waiting in the queue for computational resources, so we require that you make full use of the nodes that your job is allocated, so other resarchers don't have to wait unnecessarily, and so that your jobs get as much work done for you while they run as possible.&lt;br /&gt;
&lt;br /&gt;
Often, the best way to make full use of the node is to run one large parallel computation; but sometimes it is beneficial to run several serial codes at the same time.  On this page, we discuss ways to run suites of serial computations at once, as efficiently as possible, using the full resources of the node.&lt;br /&gt;
&lt;br /&gt;
====...but not more.====&lt;br /&gt;
&lt;br /&gt;
When running multiple jobs on the same node, it is essential to have a good idea of how much memory the jobs will require. The GPC compute nodes have about 14GB in total available &lt;br /&gt;
to user jobs running on the 8 cores (a bit less, say 13GB, on the devel ndoes &amp;lt;tt&amp;gt;gpc01..04&amp;lt;/tt&amp;gt;, and [[GPC_Quickstart#Memory_Configuration|somewhat more for some compute nodes]])&lt;br /&gt;
So the jobs also have to be  bunched in ways that will fit into 14GB.  If they use more than this, they will crash the node, inconveniencing you and other researchers waiting for that node.&lt;br /&gt;
&lt;br /&gt;
If that's not possible -- each individual job requires significantly in excess of ~1.75GB -- then it's possible to just run fewer jobs so that they do fit; but then, again there is an under-utilization problem.   In that case, the jobs are likely candidates for parallelization, and you can contact us at [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;] and arrange a meeting with one of the technical analysts to help you do just that.&lt;br /&gt;
&lt;br /&gt;
If the memory requirements allow it, you could actually run more than 8 jobs at the same time, up to 16, exploiting the [[GPC_Quickstart#HyperThreading | HyperThreading]] feature of the Intel Nehalem cores.  It may seem counterintuitive, but running 16 jobs on 8 cores for certain types of tasks has increased some users overall throughput by 10 to 30 percent.&lt;br /&gt;
&lt;br /&gt;
====Is your job really serial?====&lt;br /&gt;
&lt;br /&gt;
While your program may not be explicitly parallel, it may use some of SciNet's threaded libraries for numerical computations, which can make use of multiple processors.  In particular, SciNet's [[Python]] and [[R_Statistical_Package | R]] modules are compiled with aggressive optimization and using threaded numerical libraries which by default will make use of multiple cores for computations such as large matrix operations.  This can greatly speed up individual runs, but by less (usually much less) than a factor of 8.  If you do have many such computations to do, your [[Introduction_To_Performance#Throughput | throughput]] will be better - you will get more calculations done per unit time -if you turn off the threading and run multiple such computations at once.  Threading is turned off with the shell script line &amp;lt;tt&amp;gt;export OMP_NUM_THREADS=1&amp;lt;/tt&amp;gt;; that line will be included in the scripts below.  &lt;br /&gt;
&lt;br /&gt;
If your calculations do implicitly use threading, you may want to experiment to see what gives you the best performance - you may find that running 4 (or even 8) jobs with 2 threads each (&amp;lt;tt&amp;gt;OMP_NUM_THREADS=2&amp;lt;/tt&amp;gt;), or 2 jobs with 4 threads, gives better performance than 8 jobs with 1 thread (and almost certainly better than 1 job with 8 threads).  We'd encourage to you to perform exactly such a [[Introduction_To_Performance#Strong_Scaling_Tests | scaling test]]; for a small up-front investment in time you may significantly speed up all the computations you need to do.&lt;br /&gt;
&lt;br /&gt;
===Serial jobs of similar duration===&lt;br /&gt;
&lt;br /&gt;
The most straightforward way to run multiple serial jobs is to bunch the jobs in groups of 8 or more that will take roughly the same amount of time, and create a job that looks a &lt;br /&gt;
bit like this&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple serial jobs on&lt;br /&gt;
# SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N serialx8&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# Turn off implicit threading in Python, R&lt;br /&gt;
export OMP_NUM_THREADS=1&lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND; ampersand off 8 jobs and wait&lt;br /&gt;
(cd jobdir1; ./dojob1) &amp;amp;&lt;br /&gt;
(cd jobdir2; ./dojob2) &amp;amp;&lt;br /&gt;
(cd jobdir3; ./dojob3) &amp;amp;&lt;br /&gt;
(cd jobdir4; ./dojob4) &amp;amp;&lt;br /&gt;
(cd jobdir5; ./dojob5) &amp;amp;&lt;br /&gt;
(cd jobdir6; ./dojob6) &amp;amp;&lt;br /&gt;
(cd jobdir7; ./dojob7) &amp;amp;&lt;br /&gt;
(cd jobdir8; ./dojob8) &amp;amp;&lt;br /&gt;
wait&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are four important things to take note of here.  First, the &amp;lt;tt&amp;gt;'''wait'''&amp;lt;/tt&amp;gt;&lt;br /&gt;
command at the end is crucial; without it the job will terminate &lt;br /&gt;
immediately, killing the 8 programs you just started.&lt;br /&gt;
&lt;br /&gt;
Second is that it is important to group the programs by how long they &lt;br /&gt;
will take.   If (say) &amp;lt;tt&amp;gt;dojob8&amp;lt;/tt&amp;gt; takes 2 hours and the rest only take 1, &lt;br /&gt;
then for one hour 7 of the 8 cores on the GPC node are wasted; they are &lt;br /&gt;
sitting idle but are unavailable for other users, and the utilization of &lt;br /&gt;
this node over the whole run is only 56%.   This is the sort of thing &lt;br /&gt;
we'll notice, and users who don't make efficient use of the machine will &lt;br /&gt;
have their ability to use scinet resources reduced.  If you have many serial jobs of varying length, &lt;br /&gt;
use the submission script to balance the computational load, as explained [[ #Serial jobs of varying duration | below]].&lt;br /&gt;
&lt;br /&gt;
Third, we reiterate that if memory requirements allow it, you should try to run more than 8 jobs at once, with a maximum of 16 jobs.&lt;br /&gt;
&lt;br /&gt;
===GNU Parallel===&lt;br /&gt;
&lt;br /&gt;
GNU parallel is a really nice tool written by Ole Tange to run multiple serial jobs in&lt;br /&gt;
parallel. It allows you to keep the processors on each 8core node busy, if you provide enough jobs to do.&lt;br /&gt;
&lt;br /&gt;
GNU parallel is accessible on the GPC in the module&lt;br /&gt;
&amp;lt;tt&amp;gt;gnu-parallel&amp;lt;/tt&amp;gt;:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
module load gnu-parallel/20140622&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Note that there are several versions of gnu-parallel installed on the GPC; we recommend using the newer version. &lt;br /&gt;
&lt;br /&gt;
The citation for GNU Parallel is: O. Tange (2011): GNU Parallel - The Command-Line Power Tool, '';login: The USENIX Magazine,'' February 2011:42-47.&lt;br /&gt;
&lt;br /&gt;
It is easiest to demonstrate the usage of GNU parallel by&lt;br /&gt;
examples. Suppose you have 16 jobs to do, that these jobs duration varies quite a bit, but that the average job duration is around 10 hours. You could use the following script:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple serial jobs on SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=24:00:00&lt;br /&gt;
#PBS -N gnu-parallel-example&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# Turn off implicit threading in Python, R&lt;br /&gt;
export OMP_NUM_THREADS=1&lt;br /&gt;
&lt;br /&gt;
module load gnu-parallel/20140622  &lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND&lt;br /&gt;
parallel -j 8 &amp;lt;&amp;lt;EOF&lt;br /&gt;
  cd jobdir1; ./dojob1; echo &amp;quot;job 1 finished&amp;quot;&lt;br /&gt;
  cd jobdir2; ./dojob2; echo &amp;quot;job 2 finished&amp;quot;&lt;br /&gt;
  cd jobdir3; ./dojob3; echo &amp;quot;job 3 finished&amp;quot;&lt;br /&gt;
  cd jobdir4; ./dojob4; echo &amp;quot;job 4 finished&amp;quot;&lt;br /&gt;
  cd jobdir5; ./dojob5; echo &amp;quot;job 5 finished&amp;quot;&lt;br /&gt;
  cd jobdir6; ./dojob6; echo &amp;quot;job 6 finished&amp;quot;&lt;br /&gt;
  cd jobdir7; ./dojob7; echo &amp;quot;job 7 finished&amp;quot;&lt;br /&gt;
  cd jobdir8; ./dojob8; echo &amp;quot;job 8 finished&amp;quot;&lt;br /&gt;
  cd jobdir9; ./dojob9; echo &amp;quot;job 9 finished&amp;quot;&lt;br /&gt;
  cd jobdir10; ./dojob10; echo &amp;quot;job 10 finished&amp;quot;&lt;br /&gt;
  cd jobdir11; ./dojob11; echo &amp;quot;job 11 finished&amp;quot;&lt;br /&gt;
  cd jobdir12; ./dojob12; echo &amp;quot;job 12 finished&amp;quot;&lt;br /&gt;
  cd jobdir13; ./dojob13; echo &amp;quot;job 13 finished&amp;quot;&lt;br /&gt;
  cd jobdir14; ./dojob14; echo &amp;quot;job 14 finished&amp;quot;&lt;br /&gt;
  cd jobdir15; ./dojob15; echo &amp;quot;job 15 finished&amp;quot;&lt;br /&gt;
  cd jobdir16; ./dojob16; echo &amp;quot;job 16 finished&amp;quot;&lt;br /&gt;
EOF&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;-j8&amp;lt;/tt&amp;gt; parameter sets the number of jobs to run at the same time, but 16 jobs are lined up. Initially, 8 jobs are given to the 8 processors on the node. When one of the processors is done with its assigned job, it will get a next job instead of sitting idle until the other processors are done. While you would expect that on average this script should take 20 hours (each processor on average has to complete two jobs of 10hours), there's a good chance that one of the processors gets two jobs that take more than 10 hours, so the job script requests 24 hours. How much more time you should ask for in practice depends on the spread in run times of the separate jobs.&lt;br /&gt;
&lt;br /&gt;
===Serial jobs of varying duration===&lt;br /&gt;
&lt;br /&gt;
If you have a lot (50+) of relatively short serial runs to do, '''of which the walltime varies''', and if you know that eight jobs fit in memory without memory issues, then writing all the command explicitly in the jobscript can get tedious. If you follw a convention in that the jobs are all started by auxiliary scripts called jobs&amp;lt;something&amp;gt;, the following strategy in your submission script would maximize the cpu utilization. &lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple, dynamically-run &lt;br /&gt;
# serial jobs on SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N serialdynamic&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# Turn off implicit threading in Python, R&lt;br /&gt;
export OMP_NUM_THREADS=1&lt;br /&gt;
&lt;br /&gt;
module load gnu-parallel/20140622  &lt;br /&gt;
&lt;br /&gt;
# COMMANDS ARE ASSUMED TO BE SCRIPTS CALLED job*.sh&lt;br /&gt;
echo job*.sh | tr ' ' '\n' | parallel -j 8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Notes:&lt;br /&gt;
* As before, GNU Parallel keeps 8 jobs running at a time, and if one finishes, starts the next. This is an easy way to do ''load balancing''.&lt;br /&gt;
* You can in fact run more or less than 8 processes per node by modifying &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt;'s &amp;lt;tt&amp;gt;-j8&amp;lt;/tt&amp;gt; argument.&lt;br /&gt;
* Doing many serial jobs often entails doing many disk reads and writes, which can be detrimental to the performance. In that case, running from the ramdisk may be an option.  &lt;br /&gt;
* When using a ramdisk, make sure you copy your results from the ramdisk back to the scratch after the runs, or when the job is killed because time has run out.&lt;br /&gt;
* More details on how to setup your script to use the ramdisk can be found on the [[User_Ramdisk|Ramdisk wiki page]].&lt;br /&gt;
* This script optimizes resource utility, but can only use 1 node (8 cores) at a time. The next section addresses how to use more nodes.&lt;br /&gt;
&lt;br /&gt;
===Version for more than 8 cores at once (still serial)===&lt;br /&gt;
&lt;br /&gt;
If you have hundreds of serial jobs that you want to run concurrently and the nodes are available, then the approach above, while useful, would require tens of scripts to be submitted separately. It is possible for you to request more than one node and to use the following routine to distribute your processes amongst the cores. In this case, it is important to use the newer version of GNU parallel installed on the GPC.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple, dynamically-run &lt;br /&gt;
# serial jobs on SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=25:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N serialdynamicMulti&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# Turn off implicit threading in Python, R&lt;br /&gt;
export OMP_NUM_THREADS=1&lt;br /&gt;
&lt;br /&gt;
module load gnu-parallel/20140622&lt;br /&gt;
&lt;br /&gt;
# START PARALLEL JOBS USING NODE LIST IN $PBS_NODEFILE&lt;br /&gt;
seq 800 | parallel -j8 --sshloginfile $PBS_NODEFILE --workdir $PWD ./myrun {}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Explanation:&lt;br /&gt;
* &amp;lt;tt&amp;gt;seq 800&amp;lt;/tt&amp;gt; outputs the numbers 1 through 800 on separate lines. This output is piped to (ie becomes the input of) the &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; command.&lt;br /&gt;
* The use of the &amp;quot;seq 800&amp;quot; is that each line that you give to &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; defines a new job. So here, there are 800 jobs.&lt;br /&gt;
* Each job runs a command, but because the numbers generated by seq are not commands, a real command is constructed, in this case, by the argument &amp;lt;tt&amp;gt;./myrun {}&amp;lt;/tt&amp;gt;. Here &amp;lt;tt&amp;gt;myrun&amp;lt;/tt&amp;gt; is supposed to be the name of the application to run. The two curly brackets &amp;lt;tt&amp;gt;{}&amp;lt;/tt&amp;gt; get replaced by the line from the input, that is, by one of the numbers.&lt;br /&gt;
* So parallel will run the 800 commands:&amp;lt;br/&amp;gt;./myrun 1&amp;lt;br/&amp;gt;./myrun 2&amp;lt;br/&amp;gt;...&amp;lt;br/&amp;gt;./myrun 800&lt;br /&gt;
* The parameter &amp;lt;tt&amp;gt;--sshloginfile $PBS_NODEFILE&amp;lt;/tt&amp;gt; tells &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; to look for the file named $PBS_NODEFILE which contains the host names of the nodes assigned to the current job (as stated above, it is automatically generated).&lt;br /&gt;
* The parameter &amp;lt;tt&amp;gt;-j8&amp;lt;/tt&amp;gt; tells &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; to run 8 of these at a time on each of the hosts.&lt;br /&gt;
* The &amp;lt;tt&amp;gt;--workdir $PWD&amp;lt;/tt&amp;gt; sets the working directory on the other nodes to the working directory on the first node. Without this, the run tries to start from the wrong place and will most likely fail (unless using the latest gnu parallel module, 20130422, which by default puts you in $PWD on the remote node).&lt;br /&gt;
* Loaded modules should get automatically loaded on the remote nodes too for the latest gnu parallel module, but not for earlier ones.&lt;br /&gt;
* If you need an environment variable to be transfered from the job script to the remotely running subjobs, use &amp;lt;tt&amp;gt;--env ENVIRONMENTVARIABLE&amp;lt;/tt&amp;gt;.  SciNet's gnu-parallel modules automatically transfer &amp;lt;tt&amp;gt;OMP_NUM_THREADS&amp;lt;/tt&amp;gt;, and typical environment variables set by most modules.&lt;br /&gt;
&lt;br /&gt;
Notes:&lt;br /&gt;
* Of course, this is just an example of what you could do with gnu parallel. How you set up your specific run depends on how each of the runs would be started. One could for instance also prepare a file of commands to run and make that the input to parallel as well.&lt;br /&gt;
* Note that submitting several bunches to single nodes, as in the section above, is a more failsafe way of proceeding, since a node failure would only affect one of these bunches, rather than all runs. &lt;br /&gt;
* GNU Parallel can be passed a file with the list of nodes to which to ssh, using &amp;lt;tt&amp;gt;--sshloginfile&amp;lt;/tt&amp;gt; (thanks to Ole Tange for pointing this out). This list is automatically generated by the scheduler and its name is made available in the environment variable $PBS_NODEFILE.&lt;br /&gt;
* Alternatively, GNU Parallel can take a comma separated list of nodes given to its -S argument, but this would need to be constructed from the file $PBS_NODEFILE which contains all nodes assigned to the job, with each node duplicated 8x for the number of cores on each node.&lt;br /&gt;
* GNU Parallel can reads lines of input and convert those to arguments in the execution command. The execution command is the last argument given to &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt;, with &amp;lt;tt&amp;gt;{}&amp;lt;/tt&amp;gt; replaces by the lines on input.&lt;br /&gt;
* &amp;lt;span style=&amp;quot;color:red;&amp;quot;&amp;gt;The --workdir argument is essential: it sets the working directory on the other nodes, which would default to your home directory if omitted. Since /home is read-only on the compute nodes, you would not like not get any output at all!&amp;lt;/span&amp;gt;&amp;lt;br&amp;gt;This is no longer true for the latest GNU Parallel modules (20130422), which puts you in the current directory on the remote nodes.&lt;br /&gt;
* We reiterate that if memory requirements allow it, you should try to run more than 8 jobs at once, with a maximum of 16 jobs. You can run more or fewer than 8 processes per node by modifying the -j8 parameter to the parallel command.&lt;br /&gt;
&lt;br /&gt;
===More on GNU parallel=== &lt;br /&gt;
* [[Media:Tech-talk-gnu-parallel.pdf|Slides of the SciNet TechTalk on Gnu Parallel (14 Nov 2012)]]&lt;br /&gt;
* The documentation for GNU parallel can be found at http://www.gnu.org/software/parallel/&lt;br /&gt;
* Its man page can be found here http://www.gnu.org/software/parallel/man.html&lt;br /&gt;
* The man page is also available on the GPC when the gnu-parallel module is loaded, with the command &amp;lt;code&amp;gt;$ man parallel&amp;lt;/code&amp;gt;. The man page contains options, such as how to make sure the output is not all scrambled, and examples.&lt;br /&gt;
&lt;br /&gt;
===GNU Parallel Reference===&lt;br /&gt;
* O. Tange (2011): GNU Parallel - The Command-Line Power Tool, '';login: The USENIX Magazine,'' February 2011:42-47.&lt;br /&gt;
&lt;br /&gt;
===Older scripts===&lt;br /&gt;
&lt;br /&gt;
Older scripts, which mimicked some of GNU parallel functionality, can be found on the [[Deprecated scripts]] page.&lt;br /&gt;
&lt;br /&gt;
--[[User:Rzon|Rzon]] 02:22, 14 Nov 2010 (UTC)&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=User_Serial&amp;diff=7132</id>
		<title>User Serial</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=User_Serial&amp;diff=7132"/>
		<updated>2014-07-30T19:25:50Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: /* Version for more than 8 cores at once (still serial) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;===General considerations===&lt;br /&gt;
&lt;br /&gt;
====Use a whole node...====&lt;br /&gt;
&lt;br /&gt;
When you submit a job on a SciNet system, it is run on one (or more than one) entire node - meaning that your job is occupying at least 8 processors during the duration of its run.  The SciNet systems are usually, with many researchers waiting in the queue for computational resources, so we require that you make full use of the nodes that your job is allocated, so other resarchers don't have to wait unnecessarily, and so that your jobs get as much work done for you while they run as possible.&lt;br /&gt;
&lt;br /&gt;
Often, the best way to make full use of the node is to run one large parallel computation; but sometimes it is beneficial to run several serial codes at the same time.  On this page, we discuss ways to run suites of serial computations at once, as efficiently as possible, using the full resources of the node.&lt;br /&gt;
&lt;br /&gt;
====...but not more.====&lt;br /&gt;
&lt;br /&gt;
When running multiple jobs on the same node, it is essential to have a good idea of how much memory the jobs will require. The GPC compute nodes have about 14GB in total available &lt;br /&gt;
to user jobs running on the 8 cores (a bit less, say 13GB, on the devel ndoes &amp;lt;tt&amp;gt;gpc01..04&amp;lt;/tt&amp;gt;, and [[GPC_Quickstart#Memory_Configuration|somewhat more for some compute nodes]])&lt;br /&gt;
So the jobs also have to be  bunched in ways that will fit into 14GB.  If they use more than this, they will crash the node, inconvenicning you and other researchers waiting for that node.&lt;br /&gt;
&lt;br /&gt;
If that's not possible -- each individual job requires significantly in excess of ~1.75GB -- then it's possible to just run fewer jobs so that they do fit; but then, again there is an under-utilization problem.   In that case, the jobs are likely candidates for parallelization, and you can contact us at [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;] and arrange a meeting with one of the technical analysts to help you do just that.&lt;br /&gt;
&lt;br /&gt;
If the memory requirements allow it, you could actually run more than 8 jobs at the same time, up to 16, exploiting the [[GPC_Quickstart#HyperThreading | HyperThreading]] feature of the Intel Nehalem cores.  It may seem counterintuitive, but running 16 jobs on 8 cores for certain types of tasks has increased some users overall throughput by 10 to 30 percent.&lt;br /&gt;
&lt;br /&gt;
====Is your job really serial?====&lt;br /&gt;
&lt;br /&gt;
While your program may not be explicitly parallel, it may use some of SciNet's threaded libraries for numerical computations, which can make use of multiple processors.  In particular, SciNet's [[Python]] and [[R_Statistical_Package | R]] modules are compiled with aggressive optimization and using threaded numerical libraries which by default will make use of multiple cores for computations such as large matrix operations.  This can greatly speed up individual runs, but by less (usually much less) than a factor of 8.  If you do have many such computations to do, your [[Introduction_To_Performance#Throughput | throughput]] will be better - you will get more calculations done per unit time -if you turn off the threading and run multiple such computations at once.  Threading is turned off with the shell script line &amp;lt;tt&amp;gt;export OMP_NUM_THREADS=1&amp;lt;/tt&amp;gt;; that line will be included in the scripts below.  &lt;br /&gt;
&lt;br /&gt;
If your calculations do implicitly use threading, you may want to experiment to see what gives you the best performance - you may find that running 4 (or even 8) jobs with 2 threads each (&amp;lt;tt&amp;gt;OMP_NUM_THREADS=2&amp;lt;/tt&amp;gt;), or 2 jobs with 4 threads, gives better performance than 8 jobs with 1 thread (and almost certainly better than 1 job with 8 threads).  We'd encourage to you to perform exactly such a [[Introduction_To_Performance#Strong_Scaling_Tests | scaling test]]; for a small up-front investment in time you may significantly speed up all the computations you need to do.&lt;br /&gt;
&lt;br /&gt;
===Serial jobs of similar duration===&lt;br /&gt;
&lt;br /&gt;
The most straightforward way to run multiple serial jobs is to bunch the jobs in groups of 8 or more that will take roughly the same amount of time, and create a job that looks a &lt;br /&gt;
bit like this&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple serial jobs on&lt;br /&gt;
# SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N serialx8&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# Turn off implicit threading in Python, R&lt;br /&gt;
export OMP_NUM_THREADS=1&lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND; ampersand off 8 jobs and wait&lt;br /&gt;
(cd jobdir1; ./dojob1) &amp;amp;&lt;br /&gt;
(cd jobdir2; ./dojob2) &amp;amp;&lt;br /&gt;
(cd jobdir3; ./dojob3) &amp;amp;&lt;br /&gt;
(cd jobdir4; ./dojob4) &amp;amp;&lt;br /&gt;
(cd jobdir5; ./dojob5) &amp;amp;&lt;br /&gt;
(cd jobdir6; ./dojob6) &amp;amp;&lt;br /&gt;
(cd jobdir7; ./dojob7) &amp;amp;&lt;br /&gt;
(cd jobdir8; ./dojob8) &amp;amp;&lt;br /&gt;
wait&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are four important things to take note of here.  First, the &amp;lt;tt&amp;gt;'''wait'''&amp;lt;/tt&amp;gt;&lt;br /&gt;
command at the end is crucial; without it the job will terminate &lt;br /&gt;
immediately, killing the 8 programs you just started.&lt;br /&gt;
&lt;br /&gt;
Second is that it is important to group the programs by how long they &lt;br /&gt;
will take.   If (say) &amp;lt;tt&amp;gt;dojob8&amp;lt;/tt&amp;gt; takes 2 hours and the rest only take 1, &lt;br /&gt;
then for one hour 7 of the 8 cores on the GPC node are wasted; they are &lt;br /&gt;
sitting idle but are unavailable for other users, and the utilization of &lt;br /&gt;
this node over the whole run is only 56%.   This is the sort of thing &lt;br /&gt;
we'll notice, and users who don't make efficient use of the machine will &lt;br /&gt;
have their ability to use scinet resources reduced.  If you have many serial jobs of varying length, &lt;br /&gt;
use the submission script to balance the computational load, as explained [[ #Serial jobs of varying duration | below]].&lt;br /&gt;
&lt;br /&gt;
Third, we reiterate that if memory requirements allow it, you should try to run more than 8 jobs at once, with a maximum of 16 jobs.&lt;br /&gt;
&lt;br /&gt;
===GNU Parallel===&lt;br /&gt;
&lt;br /&gt;
GNU parallel is a really nice tool written by Ole Tange to run multiple serial jobs in&lt;br /&gt;
parallel. It allows you to keep the processors on each 8core node busy, if you provide enough jobs to do.&lt;br /&gt;
&lt;br /&gt;
GNU parallel is accessible on the GPC in the module&lt;br /&gt;
&amp;lt;tt&amp;gt;gnu-parallel&amp;lt;/tt&amp;gt;:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
module load gnu-parallel/20140622&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Note that there are several versions of gnu-parallel installed on the GPC; we recommend using the newer version. &lt;br /&gt;
&lt;br /&gt;
The citation for GNU Parallel is: O. Tange (2011): GNU Parallel - The Command-Line Power Tool, '';login: The USENIX Magazine,'' February 2011:42-47.&lt;br /&gt;
&lt;br /&gt;
It is easiest to demonstrate the usage of GNU parallel by&lt;br /&gt;
examples. Suppose you have 16 jobs to do, that these jobs duration varies quite a bit, but that the average job duration is around 10 hours. You could use the following script:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple serial jobs on SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=24:00:00&lt;br /&gt;
#PBS -N gnu-parallel-example&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# Turn off implicit threading in Python, R&lt;br /&gt;
export OMP_NUM_THREADS=1&lt;br /&gt;
&lt;br /&gt;
module load gnu-parallel/20140622  &lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND&lt;br /&gt;
parallel -j 8 &amp;lt;&amp;lt;EOF&lt;br /&gt;
  cd jobdir1; ./dojob1; echo &amp;quot;job 1 finished&amp;quot;&lt;br /&gt;
  cd jobdir2; ./dojob2; echo &amp;quot;job 2 finished&amp;quot;&lt;br /&gt;
  cd jobdir3; ./dojob3; echo &amp;quot;job 3 finished&amp;quot;&lt;br /&gt;
  cd jobdir4; ./dojob4; echo &amp;quot;job 4 finished&amp;quot;&lt;br /&gt;
  cd jobdir5; ./dojob5; echo &amp;quot;job 5 finished&amp;quot;&lt;br /&gt;
  cd jobdir6; ./dojob6; echo &amp;quot;job 6 finished&amp;quot;&lt;br /&gt;
  cd jobdir7; ./dojob7; echo &amp;quot;job 7 finished&amp;quot;&lt;br /&gt;
  cd jobdir8; ./dojob8; echo &amp;quot;job 8 finished&amp;quot;&lt;br /&gt;
  cd jobdir9; ./dojob9; echo &amp;quot;job 9 finished&amp;quot;&lt;br /&gt;
  cd jobdir10; ./dojob10; echo &amp;quot;job 10 finished&amp;quot;&lt;br /&gt;
  cd jobdir11; ./dojob11; echo &amp;quot;job 11 finished&amp;quot;&lt;br /&gt;
  cd jobdir12; ./dojob12; echo &amp;quot;job 12 finished&amp;quot;&lt;br /&gt;
  cd jobdir13; ./dojob13; echo &amp;quot;job 13 finished&amp;quot;&lt;br /&gt;
  cd jobdir14; ./dojob14; echo &amp;quot;job 14 finished&amp;quot;&lt;br /&gt;
  cd jobdir15; ./dojob15; echo &amp;quot;job 15 finished&amp;quot;&lt;br /&gt;
  cd jobdir16; ./dojob16; echo &amp;quot;job 16 finished&amp;quot;&lt;br /&gt;
EOF&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;-j8&amp;lt;/tt&amp;gt; parameter sets the number of jobs to run at the same time, but 16 jobs are lined up. Initially, 8 jobs are given to the 8 processors on the node. When one of the processors is done with its assigned job, it will get a next job instead of sitting idle until the other processors are done. While you would expect that on average this script should take 20 hours (each processor on average has to complete two jobs of 10hours), there's a good chance that one of the processors gets two jobs that take more than 10 hours, so the job script requests 24 hours. How much more time you should ask for in practice depends on the spread in run times of the separate jobs.&lt;br /&gt;
&lt;br /&gt;
===Serial jobs of varying duration===&lt;br /&gt;
&lt;br /&gt;
If you have a lot (50+) of relatively short serial runs to do, '''of which the walltime varies''', and if you know that eight jobs fit in memory without memory issues, then writing all the command explicitly in the jobscript can get tedious. If you follw a convention in that the jobs are all started by auxiliary scripts called jobs&amp;lt;something&amp;gt;, the following strategy in your submission script would maximize the cpu utilization. &lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple, dynamically-run &lt;br /&gt;
# serial jobs on SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N serialdynamic&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# Turn off implicit threading in Python, R&lt;br /&gt;
export OMP_NUM_THREADS=1&lt;br /&gt;
&lt;br /&gt;
module load gnu-parallel/20140622  &lt;br /&gt;
&lt;br /&gt;
# COMMANDS ARE ASSUMED TO BE SCRIPTS CALLED job*.sh&lt;br /&gt;
echo job*.sh | tr ' ' '\n' | parallel -j 8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Notes:&lt;br /&gt;
* As before, GNU Parallel keeps 8 jobs running at a time, and if one finishes, starts the next. This is an easy way to do ''load balancing''.&lt;br /&gt;
* You can in fact run more or less than 8 processes per node by modifying &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt;'s &amp;lt;tt&amp;gt;-j8&amp;lt;/tt&amp;gt; argument.&lt;br /&gt;
* Doing many serial jobs often entails doing many disk reads and writes, which can be detrimental to the performance. In that case, running from the ramdisk may be an option.  &lt;br /&gt;
* When using a ramdisk, make sure you copy your results from the ramdisk back to the scratch after the runs, or when the job is killed because time has run out.&lt;br /&gt;
* More details on how to setup your script to use the ramdisk can be found on the [[User_Ramdisk|Ramdisk wiki page]].&lt;br /&gt;
* This script optimizes resource utility, but can only use 1 node (8 cores) at a time. The next section addresses how to use more nodes.&lt;br /&gt;
&lt;br /&gt;
===Version for more than 8 cores at once (still serial)===&lt;br /&gt;
&lt;br /&gt;
If you have hundreds of serial jobs that you want to run concurrently and the nodes are available, then the approach above, while useful, would require tens of scripts to be submitted separately. It is possible for you to request more than one node and to use the following routine to distribute your processes amongst the cores. In this case, it is important to use the newer version of GNU parallel installed on the GPC.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple, dynamically-run &lt;br /&gt;
# serial jobs on SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=25:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N serialdynamicMulti&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# Turn off implicit threading in Python, R&lt;br /&gt;
export OMP_NUM_THREADS=1&lt;br /&gt;
&lt;br /&gt;
module load gnu-parallel/20140622&lt;br /&gt;
&lt;br /&gt;
# START PARALLEL JOBS USING NODE LIST IN $PBS_NODEFILE&lt;br /&gt;
seq 800 | parallel -j8 --sshloginfile $PBS_NODEFILE --workdir $PWD ./myrun {}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Explanation:&lt;br /&gt;
* &amp;lt;tt&amp;gt;seq 800&amp;lt;/tt&amp;gt; outputs the numbers 1 through 800 on separate lines. This output is piped to (ie becomes the input of) the &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; command.&lt;br /&gt;
* The use of the &amp;quot;seq 800&amp;quot; is that each line that you give to &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; defines a new job. So here, there are 800 jobs.&lt;br /&gt;
* Each job runs a command, but because the numbers generated by seq are not commands, a real command is constructed, in this case, by the argument &amp;lt;tt&amp;gt;./myrun {}&amp;lt;/tt&amp;gt;. Here &amp;lt;tt&amp;gt;myrun&amp;lt;/tt&amp;gt; is supposed to be the name of the application to run. The two curly brackets &amp;lt;tt&amp;gt;{}&amp;lt;/tt&amp;gt; get replaced by the line from the input, that is, by one of the numbers.&lt;br /&gt;
* So parallel will run the 800 commands:&amp;lt;br/&amp;gt;./myrun 1&amp;lt;br/&amp;gt;./myrun 2&amp;lt;br/&amp;gt;...&amp;lt;br/&amp;gt;./myrun 800&lt;br /&gt;
* The parameter &amp;lt;tt&amp;gt;--sshloginfile $PBS_NODEFILE&amp;lt;/tt&amp;gt; tells &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; to look for the file named $PBS_NODEFILE which contains the host names of the nodes assigned to the current job (as stated above, it is automatically generated).&lt;br /&gt;
* The parameter &amp;lt;tt&amp;gt;-j8&amp;lt;/tt&amp;gt; tells &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; to run 8 of these at a time on each of the hosts.&lt;br /&gt;
* The &amp;lt;tt&amp;gt;--workdir $PWD&amp;lt;/tt&amp;gt; sets the working directory on the other nodes to the working directory on the first node. Without this, the run tries to start from the wrong place and will most likely fail (unless using the latest gnu parallel module, 20130422, which by default puts you in $PWD on the remote node).&lt;br /&gt;
* Loaded modules should get automatically loaded on the remote nodes too for the latest gnu parallel module, but not for earlier ones.&lt;br /&gt;
* If you need an environment variable to be transfered from the job script to the remotely running subjobs, use &amp;lt;tt&amp;gt;--env ENVIRONMENTVARIABLE&amp;lt;/tt&amp;gt;.  SciNet's gnu-parallel modules automatically transfer &amp;lt;tt&amp;gt;OMP_NUM_THREADS&amp;lt;/tt&amp;gt;, and typical environment variables set by most modules.&lt;br /&gt;
&lt;br /&gt;
Notes:&lt;br /&gt;
* Of course, this is just an example of what you could do with gnu parallel. How you set up your specific run depends on how each of the runs would be started. One could for instance also prepare a file of commands to run and make that the input to parallel as well.&lt;br /&gt;
* Note that submitting several bunches to single nodes, as in the section above, is a more failsafe way of proceeding, since a node failure would only affect one of these bunches, rather than all runs. &lt;br /&gt;
* GNU Parallel can be passed a file with the list of nodes to which to ssh, using &amp;lt;tt&amp;gt;--sshloginfile&amp;lt;/tt&amp;gt; (thanks to Ole Tange for pointing this out). This list is automatically generated by the scheduler and its name is made available in the environment variable $PBS_NODEFILE.&lt;br /&gt;
* Alternatively, GNU Parallel can take a comma separated list of nodes given to its -S argument, but this would need to be constructed from the file $PBS_NODEFILE which contains all nodes assigned to the job, with each node duplicated 8x for the number of cores on each node.&lt;br /&gt;
* GNU Parallel can reads lines of input and convert those to arguments in the execution command. The execution command is the last argument given to &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt;, with &amp;lt;tt&amp;gt;{}&amp;lt;/tt&amp;gt; replaces by the lines on input.&lt;br /&gt;
* &amp;lt;span style=&amp;quot;color:red;&amp;quot;&amp;gt;The --workdir argument is essential: it sets the working directory on the other nodes, which would default to your home directory if omitted. Since /home is read-only on the compute nodes, you would not like not get any output at all!&amp;lt;/span&amp;gt;&amp;lt;br&amp;gt;This is no longer true for the latest GNU Parallel modules (20130422), which puts you in the current directory on the remote nodes.&lt;br /&gt;
* We reiterate that if memory requirements allow it, you should try to run more than 8 jobs at once, with a maximum of 16 jobs. You can run more or fewer than 8 processes per node by modifying the -j8 parameter to the parallel command.&lt;br /&gt;
&lt;br /&gt;
===More on GNU parallel=== &lt;br /&gt;
* [[Media:Tech-talk-gnu-parallel.pdf|Slides of the SciNet TechTalk on Gnu Parallel (14 Nov 2012)]]&lt;br /&gt;
* The documentation for GNU parallel can be found at http://www.gnu.org/software/parallel/&lt;br /&gt;
* Its man page can be found here http://www.gnu.org/software/parallel/man.html&lt;br /&gt;
* The man page is also available on the GPC when the gnu-parallel module is loaded, with the command &amp;lt;code&amp;gt;$ man parallel&amp;lt;/code&amp;gt;. The man page contains options, such as how to make sure the output is not all scrambled, and examples.&lt;br /&gt;
&lt;br /&gt;
===GNU Parallel Reference===&lt;br /&gt;
* O. Tange (2011): GNU Parallel - The Command-Line Power Tool, '';login: The USENIX Magazine,'' February 2011:42-47.&lt;br /&gt;
&lt;br /&gt;
===Older scripts===&lt;br /&gt;
&lt;br /&gt;
Older scripts, which mimicked some of GNU parallel functionality, can be found on the [[Deprecated scripts]] page.&lt;br /&gt;
&lt;br /&gt;
--[[User:Rzon|Rzon]] 02:22, 14 Nov 2010 (UTC)&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=User_Serial&amp;diff=7131</id>
		<title>User Serial</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=User_Serial&amp;diff=7131"/>
		<updated>2014-07-30T19:23:25Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: /* Version for more than 8 cores at once (still serial) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;===General considerations===&lt;br /&gt;
&lt;br /&gt;
====Use a whole node...====&lt;br /&gt;
&lt;br /&gt;
When you submit a job on a SciNet system, it is run on one (or more than one) entire node - meaning that your job is occupying at least 8 processors during the duration of its run.  The SciNet systems are usually, with many researchers waiting in the queue for computational resources, so we require that you make full use of the nodes that your job is allocated, so other resarchers don't have to wait unnecessarily, and so that your jobs get as much work done for you while they run as possible.&lt;br /&gt;
&lt;br /&gt;
Often, the best way to make full use of the node is to run one large parallel computation; but sometimes it is beneficial to run several serial codes at the same time.  On this page, we discuss ways to run suites of serial computations at once, as efficiently as possible, using the full resources of the node.&lt;br /&gt;
&lt;br /&gt;
====...but not more.====&lt;br /&gt;
&lt;br /&gt;
When running multiple jobs on the same node, it is essential to have a good idea of how much memory the jobs will require. The GPC compute nodes have about 14GB in total available &lt;br /&gt;
to user jobs running on the 8 cores (a bit less, say 13GB, on the devel ndoes &amp;lt;tt&amp;gt;gpc01..04&amp;lt;/tt&amp;gt;, and [[GPC_Quickstart#Memory_Configuration|somewhat more for some compute nodes]])&lt;br /&gt;
So the jobs also have to be  bunched in ways that will fit into 14GB.  If they use more than this, they will crash the node, inconvenicning you and other researchers waiting for that node.&lt;br /&gt;
&lt;br /&gt;
If that's not possible -- each individual job requires significantly in excess of ~1.75GB -- then it's possible to just run fewer jobs so that they do fit; but then, again there is an under-utilization problem.   In that case, the jobs are likely candidates for parallelization, and you can contact us at [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;] and arrange a meeting with one of the technical analysts to help you do just that.&lt;br /&gt;
&lt;br /&gt;
If the memory requirements allow it, you could actually run more than 8 jobs at the same time, up to 16, exploiting the [[GPC_Quickstart#HyperThreading | HyperThreading]] feature of the Intel Nehalem cores.  It may seem counterintuitive, but running 16 jobs on 8 cores for certain types of tasks has increased some users overall throughput by 10 to 30 percent.&lt;br /&gt;
&lt;br /&gt;
====Is your job really serial?====&lt;br /&gt;
&lt;br /&gt;
While your program may not be explicitly parallel, it may use some of SciNet's threaded libraries for numerical computations, which can make use of multiple processors.  In particular, SciNet's [[Python]] and [[R_Statistical_Package | R]] modules are compiled with aggressive optimization and using threaded numerical libraries which by default will make use of multiple cores for computations such as large matrix operations.  This can greatly speed up individual runs, but by less (usually much less) than a factor of 8.  If you do have many such computations to do, your [[Introduction_To_Performance#Throughput | throughput]] will be better - you will get more calculations done per unit time -if you turn off the threading and run multiple such computations at once.  Threading is turned off with the shell script line &amp;lt;tt&amp;gt;export OMP_NUM_THREADS=1&amp;lt;/tt&amp;gt;; that line will be included in the scripts below.  &lt;br /&gt;
&lt;br /&gt;
If your calculations do implicitly use threading, you may want to experiment to see what gives you the best performance - you may find that running 4 (or even 8) jobs with 2 threads each (&amp;lt;tt&amp;gt;OMP_NUM_THREADS=2&amp;lt;/tt&amp;gt;), or 2 jobs with 4 threads, gives better performance than 8 jobs with 1 thread (and almost certainly better than 1 job with 8 threads).  We'd encourage to you to perform exactly such a [[Introduction_To_Performance#Strong_Scaling_Tests | scaling test]]; for a small up-front investment in time you may significantly speed up all the computations you need to do.&lt;br /&gt;
&lt;br /&gt;
===Serial jobs of similar duration===&lt;br /&gt;
&lt;br /&gt;
The most straightforward way to run multiple serial jobs is to bunch the jobs in groups of 8 or more that will take roughly the same amount of time, and create a job that looks a &lt;br /&gt;
bit like this&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple serial jobs on&lt;br /&gt;
# SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N serialx8&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# Turn off implicit threading in Python, R&lt;br /&gt;
export OMP_NUM_THREADS=1&lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND; ampersand off 8 jobs and wait&lt;br /&gt;
(cd jobdir1; ./dojob1) &amp;amp;&lt;br /&gt;
(cd jobdir2; ./dojob2) &amp;amp;&lt;br /&gt;
(cd jobdir3; ./dojob3) &amp;amp;&lt;br /&gt;
(cd jobdir4; ./dojob4) &amp;amp;&lt;br /&gt;
(cd jobdir5; ./dojob5) &amp;amp;&lt;br /&gt;
(cd jobdir6; ./dojob6) &amp;amp;&lt;br /&gt;
(cd jobdir7; ./dojob7) &amp;amp;&lt;br /&gt;
(cd jobdir8; ./dojob8) &amp;amp;&lt;br /&gt;
wait&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are four important things to take note of here.  First, the &amp;lt;tt&amp;gt;'''wait'''&amp;lt;/tt&amp;gt;&lt;br /&gt;
command at the end is crucial; without it the job will terminate &lt;br /&gt;
immediately, killing the 8 programs you just started.&lt;br /&gt;
&lt;br /&gt;
Second is that it is important to group the programs by how long they &lt;br /&gt;
will take.   If (say) &amp;lt;tt&amp;gt;dojob8&amp;lt;/tt&amp;gt; takes 2 hours and the rest only take 1, &lt;br /&gt;
then for one hour 7 of the 8 cores on the GPC node are wasted; they are &lt;br /&gt;
sitting idle but are unavailable for other users, and the utilization of &lt;br /&gt;
this node over the whole run is only 56%.   This is the sort of thing &lt;br /&gt;
we'll notice, and users who don't make efficient use of the machine will &lt;br /&gt;
have their ability to use scinet resources reduced.  If you have many serial jobs of varying length, &lt;br /&gt;
use the submission script to balance the computational load, as explained [[ #Serial jobs of varying duration | below]].&lt;br /&gt;
&lt;br /&gt;
Third, we reiterate that if memory requirements allow it, you should try to run more than 8 jobs at once, with a maximum of 16 jobs.&lt;br /&gt;
&lt;br /&gt;
===GNU Parallel===&lt;br /&gt;
&lt;br /&gt;
GNU parallel is a really nice tool written by Ole Tange to run multiple serial jobs in&lt;br /&gt;
parallel. It allows you to keep the processors on each 8core node busy, if you provide enough jobs to do.&lt;br /&gt;
&lt;br /&gt;
GNU parallel is accessible on the GPC in the module&lt;br /&gt;
&amp;lt;tt&amp;gt;gnu-parallel&amp;lt;/tt&amp;gt;:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
module load gnu-parallel/20140622&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Note that there are several versions of gnu-parallel installed on the GPC; we recommend using the newer version. &lt;br /&gt;
&lt;br /&gt;
The citation for GNU Parallel is: O. Tange (2011): GNU Parallel - The Command-Line Power Tool, '';login: The USENIX Magazine,'' February 2011:42-47.&lt;br /&gt;
&lt;br /&gt;
It is easiest to demonstrate the usage of GNU parallel by&lt;br /&gt;
examples. Suppose you have 16 jobs to do, that these jobs duration varies quite a bit, but that the average job duration is around 10 hours. You could use the following script:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple serial jobs on SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=24:00:00&lt;br /&gt;
#PBS -N gnu-parallel-example&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# Turn off implicit threading in Python, R&lt;br /&gt;
export OMP_NUM_THREADS=1&lt;br /&gt;
&lt;br /&gt;
module load gnu-parallel/20140622  &lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND&lt;br /&gt;
parallel -j 8 &amp;lt;&amp;lt;EOF&lt;br /&gt;
  cd jobdir1; ./dojob1; echo &amp;quot;job 1 finished&amp;quot;&lt;br /&gt;
  cd jobdir2; ./dojob2; echo &amp;quot;job 2 finished&amp;quot;&lt;br /&gt;
  cd jobdir3; ./dojob3; echo &amp;quot;job 3 finished&amp;quot;&lt;br /&gt;
  cd jobdir4; ./dojob4; echo &amp;quot;job 4 finished&amp;quot;&lt;br /&gt;
  cd jobdir5; ./dojob5; echo &amp;quot;job 5 finished&amp;quot;&lt;br /&gt;
  cd jobdir6; ./dojob6; echo &amp;quot;job 6 finished&amp;quot;&lt;br /&gt;
  cd jobdir7; ./dojob7; echo &amp;quot;job 7 finished&amp;quot;&lt;br /&gt;
  cd jobdir8; ./dojob8; echo &amp;quot;job 8 finished&amp;quot;&lt;br /&gt;
  cd jobdir9; ./dojob9; echo &amp;quot;job 9 finished&amp;quot;&lt;br /&gt;
  cd jobdir10; ./dojob10; echo &amp;quot;job 10 finished&amp;quot;&lt;br /&gt;
  cd jobdir11; ./dojob11; echo &amp;quot;job 11 finished&amp;quot;&lt;br /&gt;
  cd jobdir12; ./dojob12; echo &amp;quot;job 12 finished&amp;quot;&lt;br /&gt;
  cd jobdir13; ./dojob13; echo &amp;quot;job 13 finished&amp;quot;&lt;br /&gt;
  cd jobdir14; ./dojob14; echo &amp;quot;job 14 finished&amp;quot;&lt;br /&gt;
  cd jobdir15; ./dojob15; echo &amp;quot;job 15 finished&amp;quot;&lt;br /&gt;
  cd jobdir16; ./dojob16; echo &amp;quot;job 16 finished&amp;quot;&lt;br /&gt;
EOF&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;-j8&amp;lt;/tt&amp;gt; parameter sets the number of jobs to run at the same time, but 16 jobs are lined up. Initially, 8 jobs are given to the 8 processors on the node. When one of the processors is done with its assigned job, it will get a next job instead of sitting idle until the other processors are done. While you would expect that on average this script should take 20 hours (each processor on average has to complete two jobs of 10hours), there's a good chance that one of the processors gets two jobs that take more than 10 hours, so the job script requests 24 hours. How much more time you should ask for in practice depends on the spread in run times of the separate jobs.&lt;br /&gt;
&lt;br /&gt;
===Serial jobs of varying duration===&lt;br /&gt;
&lt;br /&gt;
If you have a lot (50+) of relatively short serial runs to do, '''of which the walltime varies''', and if you know that eight jobs fit in memory without memory issues, then writing all the command explicitly in the jobscript can get tedious. If you follw a convention in that the jobs are all started by auxiliary scripts called jobs&amp;lt;something&amp;gt;, the following strategy in your submission script would maximize the cpu utilization. &lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple, dynamically-run &lt;br /&gt;
# serial jobs on SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N serialdynamic&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# Turn off implicit threading in Python, R&lt;br /&gt;
export OMP_NUM_THREADS=1&lt;br /&gt;
&lt;br /&gt;
module load gnu-parallel/20140622  &lt;br /&gt;
&lt;br /&gt;
# COMMANDS ARE ASSUMED TO BE SCRIPTS CALLED job*.sh&lt;br /&gt;
echo job*.sh | tr ' ' '\n' | parallel -j 8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Notes:&lt;br /&gt;
* As before, GNU Parallel keeps 8 jobs running at a time, and if one finishes, starts the next. This is an easy way to do ''load balancing''.&lt;br /&gt;
* You can in fact run more or less than 8 processes per node by modifying &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt;'s &amp;lt;tt&amp;gt;-j8&amp;lt;/tt&amp;gt; argument.&lt;br /&gt;
* Doing many serial jobs often entails doing many disk reads and writes, which can be detrimental to the performance. In that case, running from the ramdisk may be an option.  &lt;br /&gt;
* When using a ramdisk, make sure you copy your results from the ramdisk back to the scratch after the runs, or when the job is killed because time has run out.&lt;br /&gt;
* More details on how to setup your script to use the ramdisk can be found on the [[User_Ramdisk|Ramdisk wiki page]].&lt;br /&gt;
* This script optimizes resource utility, but can only use 1 node (8 cores) at a time. The next section addresses how to use more nodes.&lt;br /&gt;
&lt;br /&gt;
===Version for more than 8 cores at once (still serial)===&lt;br /&gt;
&lt;br /&gt;
If you have hundreds of serial jobs that you want to run concurrently and the nodes are available, then the approach above, while useful, would require tens of scripts to be submitted separately. It is possible for you to request more than one node and to use the following routine to distribute your processes amongst the cores. In this case, it is important to use the newer version of GNU parallel installed on the GPC.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple, dynamically-run &lt;br /&gt;
# serial jobs on SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=25:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N serialdynamicMulti&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# Turn off implicit threading in Python, R&lt;br /&gt;
export OMP_NUM_THREADS=1&lt;br /&gt;
&lt;br /&gt;
module load gnu-parallel/20140622&lt;br /&gt;
&lt;br /&gt;
# START PARALLEL JOBS USING NODE LIST IN $PBS_NODEFILE&lt;br /&gt;
seq 800 | parallel -j8 --sshloginfile $PBS_NODEFILE --workdir $PWD ./myrun {}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Explanation:&lt;br /&gt;
* &amp;lt;tt&amp;gt;seq 800&amp;lt;/tt&amp;gt; outputs the numbers 1 through 800 on separate lines. This output is piped to (ie becomes the input of) the &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; command.&lt;br /&gt;
* The use of the &amp;quot;seq 800&amp;quot; is that each line that you give to &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; defines a new job. So here, there are 800 jobs.&lt;br /&gt;
* Each job runs a command, but because the numbers generated by seq are not commands, a real command is constructed, in this case, by the argument &amp;lt;tt&amp;gt;./myrun {}&amp;lt;/tt&amp;gt;. Here &amp;lt;tt&amp;gt;myrun&amp;lt;/tt&amp;gt; is supposed to be the name of the application to run. The two curly brackets &amp;lt;tt&amp;gt;{}&amp;lt;/tt&amp;gt; get replaced by the line from the input, that is, by one of the numbers.&lt;br /&gt;
* So parallel will run the 800 commands:&amp;lt;br/&amp;gt;./myrun 1&amp;lt;br/&amp;gt;./myrun 2&amp;lt;br/&amp;gt;...&amp;lt;br/&amp;gt;./myrun 800&lt;br /&gt;
* The parameter &amp;lt;tt&amp;gt;--sshloginfile $PBS_NODEFILE&amp;lt;/tt&amp;gt; tells &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; to look for the file named $PBS_NODEFILE which contains the host names of the nodes assigned to the current job (as stated above, it is automatically generated).&lt;br /&gt;
* The parameter &amp;lt;tt&amp;gt;-j8&amp;lt;/tt&amp;gt; tells &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; to run 8 of these at a time on each of the hosts.&lt;br /&gt;
* The &amp;lt;tt&amp;gt;--workdir $PWD&amp;lt;/tt&amp;gt; sets the working directory on the other nodes to the working directory on the first node. Without this, the run tries to start from the wrong place and will most likely fail (unless using the latest gnu parallel module, 20130422, which by default puts you in $PWD on the remote node).&lt;br /&gt;
* Loaded modules should get automatically loaded on the remote nodes too for the latest gnu parallel module, but not for earlier ones.&lt;br /&gt;
* If you need an environment variable to be transfered from the job script to the remotely running subjobs, use &amp;lt;tt&amp;gt;--env ENVIRONMENTVARIABLE&amp;lt;/tt&amp;gt;.&lt;br /&gt;
Notes:&lt;br /&gt;
* Of course, this is just an example of what you could do with gnu parallel. How you set up your specific run depends on how each of the runs would be started. One could for instance also prepare a file of commands to run and make that the input to parallel as well.&lt;br /&gt;
* Note that submitting several bunches to single nodes, as in the section above, is a more failsafe way of proceeding, since a node failure would only affect one of these bunches, rather than all runs. &lt;br /&gt;
* GNU Parallel can be passed a file with the list of nodes to which to ssh, using &amp;lt;tt&amp;gt;--sshloginfile&amp;lt;/tt&amp;gt; (thanks to Ole Tange for pointing this out). This list is automatically generated by the scheduler and its name is made available in the environment variable $PBS_NODEFILE.&lt;br /&gt;
* Alternatively, GNU Parallel can take a comma separated list of nodes given to its -S argument, but this would need to be constructed from the file $PBS_NODEFILE which contains all nodes assigned to the job, with each node duplicated 8x for the number of cores on each node.&lt;br /&gt;
* GNU Parallel can reads lines of input and convert those to arguments in the execution command. The execution command is the last argument given to &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt;, with &amp;lt;tt&amp;gt;{}&amp;lt;/tt&amp;gt; replaces by the lines on input.&lt;br /&gt;
* &amp;lt;span style=&amp;quot;color:red;&amp;quot;&amp;gt;The --workdir argument is essential: it sets the working directory on the other nodes, which would default to your home directory if omitted. Since /home is read-only on the compute nodes, you would not like not get any output at all!&amp;lt;/span&amp;gt;&amp;lt;br&amp;gt;This is no longer true for the latest GNU Parallel modules (20130422), which puts you in the current directory on the remote nodes.&lt;br /&gt;
* We reiterate that if memory requirements allow it, you should try to run more than 8 jobs at once, with a maximum of 16 jobs. You can run more or fewer than 8 processes per node by modifying the -j8 parameter to the parallel command.&lt;br /&gt;
&lt;br /&gt;
===More on GNU parallel=== &lt;br /&gt;
* [[Media:Tech-talk-gnu-parallel.pdf|Slides of the SciNet TechTalk on Gnu Parallel (14 Nov 2012)]]&lt;br /&gt;
* The documentation for GNU parallel can be found at http://www.gnu.org/software/parallel/&lt;br /&gt;
* Its man page can be found here http://www.gnu.org/software/parallel/man.html&lt;br /&gt;
* The man page is also available on the GPC when the gnu-parallel module is loaded, with the command &amp;lt;code&amp;gt;$ man parallel&amp;lt;/code&amp;gt;. The man page contains options, such as how to make sure the output is not all scrambled, and examples.&lt;br /&gt;
&lt;br /&gt;
===GNU Parallel Reference===&lt;br /&gt;
* O. Tange (2011): GNU Parallel - The Command-Line Power Tool, '';login: The USENIX Magazine,'' February 2011:42-47.&lt;br /&gt;
&lt;br /&gt;
===Older scripts===&lt;br /&gt;
&lt;br /&gt;
Older scripts, which mimicked some of GNU parallel functionality, can be found on the [[Deprecated scripts]] page.&lt;br /&gt;
&lt;br /&gt;
--[[User:Rzon|Rzon]] 02:22, 14 Nov 2010 (UTC)&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=User_Serial&amp;diff=7130</id>
		<title>User Serial</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=User_Serial&amp;diff=7130"/>
		<updated>2014-07-30T19:21:48Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: /* Serial jobs of varying duration */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;===General considerations===&lt;br /&gt;
&lt;br /&gt;
====Use a whole node...====&lt;br /&gt;
&lt;br /&gt;
When you submit a job on a SciNet system, it is run on one (or more than one) entire node - meaning that your job is occupying at least 8 processors during the duration of its run.  The SciNet systems are usually, with many researchers waiting in the queue for computational resources, so we require that you make full use of the nodes that your job is allocated, so other resarchers don't have to wait unnecessarily, and so that your jobs get as much work done for you while they run as possible.&lt;br /&gt;
&lt;br /&gt;
Often, the best way to make full use of the node is to run one large parallel computation; but sometimes it is beneficial to run several serial codes at the same time.  On this page, we discuss ways to run suites of serial computations at once, as efficiently as possible, using the full resources of the node.&lt;br /&gt;
&lt;br /&gt;
====...but not more.====&lt;br /&gt;
&lt;br /&gt;
When running multiple jobs on the same node, it is essential to have a good idea of how much memory the jobs will require. The GPC compute nodes have about 14GB in total available &lt;br /&gt;
to user jobs running on the 8 cores (a bit less, say 13GB, on the devel ndoes &amp;lt;tt&amp;gt;gpc01..04&amp;lt;/tt&amp;gt;, and [[GPC_Quickstart#Memory_Configuration|somewhat more for some compute nodes]])&lt;br /&gt;
So the jobs also have to be  bunched in ways that will fit into 14GB.  If they use more than this, they will crash the node, inconvenicning you and other researchers waiting for that node.&lt;br /&gt;
&lt;br /&gt;
If that's not possible -- each individual job requires significantly in excess of ~1.75GB -- then it's possible to just run fewer jobs so that they do fit; but then, again there is an under-utilization problem.   In that case, the jobs are likely candidates for parallelization, and you can contact us at [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;] and arrange a meeting with one of the technical analysts to help you do just that.&lt;br /&gt;
&lt;br /&gt;
If the memory requirements allow it, you could actually run more than 8 jobs at the same time, up to 16, exploiting the [[GPC_Quickstart#HyperThreading | HyperThreading]] feature of the Intel Nehalem cores.  It may seem counterintuitive, but running 16 jobs on 8 cores for certain types of tasks has increased some users overall throughput by 10 to 30 percent.&lt;br /&gt;
&lt;br /&gt;
====Is your job really serial?====&lt;br /&gt;
&lt;br /&gt;
While your program may not be explicitly parallel, it may use some of SciNet's threaded libraries for numerical computations, which can make use of multiple processors.  In particular, SciNet's [[Python]] and [[R_Statistical_Package | R]] modules are compiled with aggressive optimization and using threaded numerical libraries which by default will make use of multiple cores for computations such as large matrix operations.  This can greatly speed up individual runs, but by less (usually much less) than a factor of 8.  If you do have many such computations to do, your [[Introduction_To_Performance#Throughput | throughput]] will be better - you will get more calculations done per unit time -if you turn off the threading and run multiple such computations at once.  Threading is turned off with the shell script line &amp;lt;tt&amp;gt;export OMP_NUM_THREADS=1&amp;lt;/tt&amp;gt;; that line will be included in the scripts below.  &lt;br /&gt;
&lt;br /&gt;
If your calculations do implicitly use threading, you may want to experiment to see what gives you the best performance - you may find that running 4 (or even 8) jobs with 2 threads each (&amp;lt;tt&amp;gt;OMP_NUM_THREADS=2&amp;lt;/tt&amp;gt;), or 2 jobs with 4 threads, gives better performance than 8 jobs with 1 thread (and almost certainly better than 1 job with 8 threads).  We'd encourage to you to perform exactly such a [[Introduction_To_Performance#Strong_Scaling_Tests | scaling test]]; for a small up-front investment in time you may significantly speed up all the computations you need to do.&lt;br /&gt;
&lt;br /&gt;
===Serial jobs of similar duration===&lt;br /&gt;
&lt;br /&gt;
The most straightforward way to run multiple serial jobs is to bunch the jobs in groups of 8 or more that will take roughly the same amount of time, and create a job that looks a &lt;br /&gt;
bit like this&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple serial jobs on&lt;br /&gt;
# SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N serialx8&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# Turn off implicit threading in Python, R&lt;br /&gt;
export OMP_NUM_THREADS=1&lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND; ampersand off 8 jobs and wait&lt;br /&gt;
(cd jobdir1; ./dojob1) &amp;amp;&lt;br /&gt;
(cd jobdir2; ./dojob2) &amp;amp;&lt;br /&gt;
(cd jobdir3; ./dojob3) &amp;amp;&lt;br /&gt;
(cd jobdir4; ./dojob4) &amp;amp;&lt;br /&gt;
(cd jobdir5; ./dojob5) &amp;amp;&lt;br /&gt;
(cd jobdir6; ./dojob6) &amp;amp;&lt;br /&gt;
(cd jobdir7; ./dojob7) &amp;amp;&lt;br /&gt;
(cd jobdir8; ./dojob8) &amp;amp;&lt;br /&gt;
wait&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are four important things to take note of here.  First, the &amp;lt;tt&amp;gt;'''wait'''&amp;lt;/tt&amp;gt;&lt;br /&gt;
command at the end is crucial; without it the job will terminate &lt;br /&gt;
immediately, killing the 8 programs you just started.&lt;br /&gt;
&lt;br /&gt;
Second is that it is important to group the programs by how long they &lt;br /&gt;
will take.   If (say) &amp;lt;tt&amp;gt;dojob8&amp;lt;/tt&amp;gt; takes 2 hours and the rest only take 1, &lt;br /&gt;
then for one hour 7 of the 8 cores on the GPC node are wasted; they are &lt;br /&gt;
sitting idle but are unavailable for other users, and the utilization of &lt;br /&gt;
this node over the whole run is only 56%.   This is the sort of thing &lt;br /&gt;
we'll notice, and users who don't make efficient use of the machine will &lt;br /&gt;
have their ability to use scinet resources reduced.  If you have many serial jobs of varying length, &lt;br /&gt;
use the submission script to balance the computational load, as explained [[ #Serial jobs of varying duration | below]].&lt;br /&gt;
&lt;br /&gt;
Third, we reiterate that if memory requirements allow it, you should try to run more than 8 jobs at once, with a maximum of 16 jobs.&lt;br /&gt;
&lt;br /&gt;
===GNU Parallel===&lt;br /&gt;
&lt;br /&gt;
GNU parallel is a really nice tool written by Ole Tange to run multiple serial jobs in&lt;br /&gt;
parallel. It allows you to keep the processors on each 8core node busy, if you provide enough jobs to do.&lt;br /&gt;
&lt;br /&gt;
GNU parallel is accessible on the GPC in the module&lt;br /&gt;
&amp;lt;tt&amp;gt;gnu-parallel&amp;lt;/tt&amp;gt;:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
module load gnu-parallel/20140622&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Note that there are several versions of gnu-parallel installed on the GPC; we recommend using the newer version. &lt;br /&gt;
&lt;br /&gt;
The citation for GNU Parallel is: O. Tange (2011): GNU Parallel - The Command-Line Power Tool, '';login: The USENIX Magazine,'' February 2011:42-47.&lt;br /&gt;
&lt;br /&gt;
It is easiest to demonstrate the usage of GNU parallel by&lt;br /&gt;
examples. Suppose you have 16 jobs to do, that these jobs duration varies quite a bit, but that the average job duration is around 10 hours. You could use the following script:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple serial jobs on SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=24:00:00&lt;br /&gt;
#PBS -N gnu-parallel-example&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# Turn off implicit threading in Python, R&lt;br /&gt;
export OMP_NUM_THREADS=1&lt;br /&gt;
&lt;br /&gt;
module load gnu-parallel/20140622  &lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND&lt;br /&gt;
parallel -j 8 &amp;lt;&amp;lt;EOF&lt;br /&gt;
  cd jobdir1; ./dojob1; echo &amp;quot;job 1 finished&amp;quot;&lt;br /&gt;
  cd jobdir2; ./dojob2; echo &amp;quot;job 2 finished&amp;quot;&lt;br /&gt;
  cd jobdir3; ./dojob3; echo &amp;quot;job 3 finished&amp;quot;&lt;br /&gt;
  cd jobdir4; ./dojob4; echo &amp;quot;job 4 finished&amp;quot;&lt;br /&gt;
  cd jobdir5; ./dojob5; echo &amp;quot;job 5 finished&amp;quot;&lt;br /&gt;
  cd jobdir6; ./dojob6; echo &amp;quot;job 6 finished&amp;quot;&lt;br /&gt;
  cd jobdir7; ./dojob7; echo &amp;quot;job 7 finished&amp;quot;&lt;br /&gt;
  cd jobdir8; ./dojob8; echo &amp;quot;job 8 finished&amp;quot;&lt;br /&gt;
  cd jobdir9; ./dojob9; echo &amp;quot;job 9 finished&amp;quot;&lt;br /&gt;
  cd jobdir10; ./dojob10; echo &amp;quot;job 10 finished&amp;quot;&lt;br /&gt;
  cd jobdir11; ./dojob11; echo &amp;quot;job 11 finished&amp;quot;&lt;br /&gt;
  cd jobdir12; ./dojob12; echo &amp;quot;job 12 finished&amp;quot;&lt;br /&gt;
  cd jobdir13; ./dojob13; echo &amp;quot;job 13 finished&amp;quot;&lt;br /&gt;
  cd jobdir14; ./dojob14; echo &amp;quot;job 14 finished&amp;quot;&lt;br /&gt;
  cd jobdir15; ./dojob15; echo &amp;quot;job 15 finished&amp;quot;&lt;br /&gt;
  cd jobdir16; ./dojob16; echo &amp;quot;job 16 finished&amp;quot;&lt;br /&gt;
EOF&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;-j8&amp;lt;/tt&amp;gt; parameter sets the number of jobs to run at the same time, but 16 jobs are lined up. Initially, 8 jobs are given to the 8 processors on the node. When one of the processors is done with its assigned job, it will get a next job instead of sitting idle until the other processors are done. While you would expect that on average this script should take 20 hours (each processor on average has to complete two jobs of 10hours), there's a good chance that one of the processors gets two jobs that take more than 10 hours, so the job script requests 24 hours. How much more time you should ask for in practice depends on the spread in run times of the separate jobs.&lt;br /&gt;
&lt;br /&gt;
===Serial jobs of varying duration===&lt;br /&gt;
&lt;br /&gt;
If you have a lot (50+) of relatively short serial runs to do, '''of which the walltime varies''', and if you know that eight jobs fit in memory without memory issues, then writing all the command explicitly in the jobscript can get tedious. If you follw a convention in that the jobs are all started by auxiliary scripts called jobs&amp;lt;something&amp;gt;, the following strategy in your submission script would maximize the cpu utilization. &lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple, dynamically-run &lt;br /&gt;
# serial jobs on SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N serialdynamic&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# Turn off implicit threading in Python, R&lt;br /&gt;
export OMP_NUM_THREADS=1&lt;br /&gt;
&lt;br /&gt;
module load gnu-parallel/20140622  &lt;br /&gt;
&lt;br /&gt;
# COMMANDS ARE ASSUMED TO BE SCRIPTS CALLED job*.sh&lt;br /&gt;
echo job*.sh | tr ' ' '\n' | parallel -j 8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Notes:&lt;br /&gt;
* As before, GNU Parallel keeps 8 jobs running at a time, and if one finishes, starts the next. This is an easy way to do ''load balancing''.&lt;br /&gt;
* You can in fact run more or less than 8 processes per node by modifying &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt;'s &amp;lt;tt&amp;gt;-j8&amp;lt;/tt&amp;gt; argument.&lt;br /&gt;
* Doing many serial jobs often entails doing many disk reads and writes, which can be detrimental to the performance. In that case, running from the ramdisk may be an option.  &lt;br /&gt;
* When using a ramdisk, make sure you copy your results from the ramdisk back to the scratch after the runs, or when the job is killed because time has run out.&lt;br /&gt;
* More details on how to setup your script to use the ramdisk can be found on the [[User_Ramdisk|Ramdisk wiki page]].&lt;br /&gt;
* This script optimizes resource utility, but can only use 1 node (8 cores) at a time. The next section addresses how to use more nodes.&lt;br /&gt;
&lt;br /&gt;
===Version for more than 8 cores at once (still serial)===&lt;br /&gt;
&lt;br /&gt;
If you have hundreds of serial jobs that you want to run concurrently and the nodes are available, then the approach above, while useful, would require tens of scripts to be submitted separately. It is possible for you to request more than one node and to use the following routine to distribute your processes amongst the cores. In this case, it is important to use the newer version of GNU parallel installed on the GPC.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple, dynamically-run &lt;br /&gt;
# serial jobs on SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=25:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N serialdynamicMulti&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
module load gnu-parallel/20130422&lt;br /&gt;
&lt;br /&gt;
# START PARALLEL JOBS USING NODE LIST IN $PBS_NODEFILE&lt;br /&gt;
seq 800 | parallel -j8 --sshloginfile $PBS_NODEFILE --workdir $PWD ./myrun {}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Explanation:&lt;br /&gt;
* &amp;lt;tt&amp;gt;seq 800&amp;lt;/tt&amp;gt; outputs the numbers 1 through 800 on separate lines. This output is piped to (ie becomes the input of) the &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; command.&lt;br /&gt;
* The use of the &amp;quot;seq 800&amp;quot; is that each line that you give to &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; defines a new job. So here, there are 800 jobs.&lt;br /&gt;
* Each job runs a command, but because the numbers generated by seq are not commands, a real command is constructed, in this case, by the argument &amp;lt;tt&amp;gt;./myrun {}&amp;lt;/tt&amp;gt;. Here &amp;lt;tt&amp;gt;myrun&amp;lt;/tt&amp;gt; is supposed to be the name of the application to run. The two curly brackets &amp;lt;tt&amp;gt;{}&amp;lt;/tt&amp;gt; get replaced by the line from the input, that is, by one of the numbers.&lt;br /&gt;
* So parallel will run the 800 commands:&amp;lt;br/&amp;gt;./myrun 1&amp;lt;br/&amp;gt;./myrun 2&amp;lt;br/&amp;gt;...&amp;lt;br/&amp;gt;./myrun 800&lt;br /&gt;
* The parameter &amp;lt;tt&amp;gt;--sshloginfile $PBS_NODEFILE&amp;lt;/tt&amp;gt; tells &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; to look for the file named $PBS_NODEFILE which contains the host names of the nodes assigned to the current job (as stated above, it is automatically generated).&lt;br /&gt;
* The parameter &amp;lt;tt&amp;gt;-j8&amp;lt;/tt&amp;gt; tells &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; to run 8 of these at a time on each of the hosts.&lt;br /&gt;
* The &amp;lt;tt&amp;gt;--workdir $PWD&amp;lt;/tt&amp;gt; sets the working directory on the other nodes to the working directory on the first node. Without this, the run tries to start from the wrong place and will most likely fail (unless using the latest gnu parallel module, 20130422, which by default puts you in $PWD on the remote node).&lt;br /&gt;
* Loaded modules should get automatically loaded on the remote nodes too for the latest gnu parallel module, but not for earlier ones.&lt;br /&gt;
* If you need an environment variable to be transfered from the job script to the remotely running subjobs, use &amp;lt;tt&amp;gt;--env ENVIRONMENTVARIABLE&amp;lt;/tt&amp;gt;.&lt;br /&gt;
Notes:&lt;br /&gt;
* Of course, this is just an example of what you could do with gnu parallel. How you set up your specific run depends on how each of the runs would be started. One could for instance also prepare a file of commands to run and make that the input to parallel as well.&lt;br /&gt;
* Note that submitting several bunches to single nodes, as in the section above, is a more failsafe way of proceeding, since a node failure would only affect one of these bunches, rather than all runs. &lt;br /&gt;
* GNU Parallel can be passed a file with the list of nodes to which to ssh, using &amp;lt;tt&amp;gt;--sshloginfile&amp;lt;/tt&amp;gt; (thanks to Ole Tange for pointing this out). This list is automatically generated by the scheduler and its name is made available in the environment variable $PBS_NODEFILE.&lt;br /&gt;
* Alternatively, GNU Parallel can take a comma separated list of nodes given to its -S argument, but this would need to be constructed from the file $PBS_NODEFILE which contains all nodes assigned to the job, with each node duplicated 8x for the number of cores on each node.&lt;br /&gt;
* GNU Parallel can reads lines of input and convert those to arguments in the execution command. The execution command is the last argument given to &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt;, with &amp;lt;tt&amp;gt;{}&amp;lt;/tt&amp;gt; replaces by the lines on input.&lt;br /&gt;
* &amp;lt;span style=&amp;quot;color:red;&amp;quot;&amp;gt;The --workdir argument is essential: it sets the working directory on the other nodes, which would default to your home directory if omitted. Since /home is read-only on the compute nodes, you would not like not get any output at all!&amp;lt;/span&amp;gt;&amp;lt;br&amp;gt;This is no longer true for the latest GNU Parallel modules (20130422), which puts you in the current directory on the remote nodes.&lt;br /&gt;
* We reiterate that if memory requirements allow it, you should try to run more than 8 jobs at once, with a maximum of 16 jobs. You can run more or fewer than 8 processes per node by modifying the -j8 parameter to the parallel command.&lt;br /&gt;
&lt;br /&gt;
===More on GNU parallel=== &lt;br /&gt;
* [[Media:Tech-talk-gnu-parallel.pdf|Slides of the SciNet TechTalk on Gnu Parallel (14 Nov 2012)]]&lt;br /&gt;
* The documentation for GNU parallel can be found at http://www.gnu.org/software/parallel/&lt;br /&gt;
* Its man page can be found here http://www.gnu.org/software/parallel/man.html&lt;br /&gt;
* The man page is also available on the GPC when the gnu-parallel module is loaded, with the command &amp;lt;code&amp;gt;$ man parallel&amp;lt;/code&amp;gt;. The man page contains options, such as how to make sure the output is not all scrambled, and examples.&lt;br /&gt;
&lt;br /&gt;
===GNU Parallel Reference===&lt;br /&gt;
* O. Tange (2011): GNU Parallel - The Command-Line Power Tool, '';login: The USENIX Magazine,'' February 2011:42-47.&lt;br /&gt;
&lt;br /&gt;
===Older scripts===&lt;br /&gt;
&lt;br /&gt;
Older scripts, which mimicked some of GNU parallel functionality, can be found on the [[Deprecated scripts]] page.&lt;br /&gt;
&lt;br /&gt;
--[[User:Rzon|Rzon]] 02:22, 14 Nov 2010 (UTC)&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=User_Serial&amp;diff=7129</id>
		<title>User Serial</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=User_Serial&amp;diff=7129"/>
		<updated>2014-07-30T19:20:47Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: /* GNU Parallel */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;===General considerations===&lt;br /&gt;
&lt;br /&gt;
====Use a whole node...====&lt;br /&gt;
&lt;br /&gt;
When you submit a job on a SciNet system, it is run on one (or more than one) entire node - meaning that your job is occupying at least 8 processors during the duration of its run.  The SciNet systems are usually, with many researchers waiting in the queue for computational resources, so we require that you make full use of the nodes that your job is allocated, so other resarchers don't have to wait unnecessarily, and so that your jobs get as much work done for you while they run as possible.&lt;br /&gt;
&lt;br /&gt;
Often, the best way to make full use of the node is to run one large parallel computation; but sometimes it is beneficial to run several serial codes at the same time.  On this page, we discuss ways to run suites of serial computations at once, as efficiently as possible, using the full resources of the node.&lt;br /&gt;
&lt;br /&gt;
====...but not more.====&lt;br /&gt;
&lt;br /&gt;
When running multiple jobs on the same node, it is essential to have a good idea of how much memory the jobs will require. The GPC compute nodes have about 14GB in total available &lt;br /&gt;
to user jobs running on the 8 cores (a bit less, say 13GB, on the devel ndoes &amp;lt;tt&amp;gt;gpc01..04&amp;lt;/tt&amp;gt;, and [[GPC_Quickstart#Memory_Configuration|somewhat more for some compute nodes]])&lt;br /&gt;
So the jobs also have to be  bunched in ways that will fit into 14GB.  If they use more than this, they will crash the node, inconvenicning you and other researchers waiting for that node.&lt;br /&gt;
&lt;br /&gt;
If that's not possible -- each individual job requires significantly in excess of ~1.75GB -- then it's possible to just run fewer jobs so that they do fit; but then, again there is an under-utilization problem.   In that case, the jobs are likely candidates for parallelization, and you can contact us at [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;] and arrange a meeting with one of the technical analysts to help you do just that.&lt;br /&gt;
&lt;br /&gt;
If the memory requirements allow it, you could actually run more than 8 jobs at the same time, up to 16, exploiting the [[GPC_Quickstart#HyperThreading | HyperThreading]] feature of the Intel Nehalem cores.  It may seem counterintuitive, but running 16 jobs on 8 cores for certain types of tasks has increased some users overall throughput by 10 to 30 percent.&lt;br /&gt;
&lt;br /&gt;
====Is your job really serial?====&lt;br /&gt;
&lt;br /&gt;
While your program may not be explicitly parallel, it may use some of SciNet's threaded libraries for numerical computations, which can make use of multiple processors.  In particular, SciNet's [[Python]] and [[R_Statistical_Package | R]] modules are compiled with aggressive optimization and using threaded numerical libraries which by default will make use of multiple cores for computations such as large matrix operations.  This can greatly speed up individual runs, but by less (usually much less) than a factor of 8.  If you do have many such computations to do, your [[Introduction_To_Performance#Throughput | throughput]] will be better - you will get more calculations done per unit time -if you turn off the threading and run multiple such computations at once.  Threading is turned off with the shell script line &amp;lt;tt&amp;gt;export OMP_NUM_THREADS=1&amp;lt;/tt&amp;gt;; that line will be included in the scripts below.  &lt;br /&gt;
&lt;br /&gt;
If your calculations do implicitly use threading, you may want to experiment to see what gives you the best performance - you may find that running 4 (or even 8) jobs with 2 threads each (&amp;lt;tt&amp;gt;OMP_NUM_THREADS=2&amp;lt;/tt&amp;gt;), or 2 jobs with 4 threads, gives better performance than 8 jobs with 1 thread (and almost certainly better than 1 job with 8 threads).  We'd encourage to you to perform exactly such a [[Introduction_To_Performance#Strong_Scaling_Tests | scaling test]]; for a small up-front investment in time you may significantly speed up all the computations you need to do.&lt;br /&gt;
&lt;br /&gt;
===Serial jobs of similar duration===&lt;br /&gt;
&lt;br /&gt;
The most straightforward way to run multiple serial jobs is to bunch the jobs in groups of 8 or more that will take roughly the same amount of time, and create a job that looks a &lt;br /&gt;
bit like this&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple serial jobs on&lt;br /&gt;
# SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N serialx8&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# Turn off implicit threading in Python, R&lt;br /&gt;
export OMP_NUM_THREADS=1&lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND; ampersand off 8 jobs and wait&lt;br /&gt;
(cd jobdir1; ./dojob1) &amp;amp;&lt;br /&gt;
(cd jobdir2; ./dojob2) &amp;amp;&lt;br /&gt;
(cd jobdir3; ./dojob3) &amp;amp;&lt;br /&gt;
(cd jobdir4; ./dojob4) &amp;amp;&lt;br /&gt;
(cd jobdir5; ./dojob5) &amp;amp;&lt;br /&gt;
(cd jobdir6; ./dojob6) &amp;amp;&lt;br /&gt;
(cd jobdir7; ./dojob7) &amp;amp;&lt;br /&gt;
(cd jobdir8; ./dojob8) &amp;amp;&lt;br /&gt;
wait&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are four important things to take note of here.  First, the &amp;lt;tt&amp;gt;'''wait'''&amp;lt;/tt&amp;gt;&lt;br /&gt;
command at the end is crucial; without it the job will terminate &lt;br /&gt;
immediately, killing the 8 programs you just started.&lt;br /&gt;
&lt;br /&gt;
Second is that it is important to group the programs by how long they &lt;br /&gt;
will take.   If (say) &amp;lt;tt&amp;gt;dojob8&amp;lt;/tt&amp;gt; takes 2 hours and the rest only take 1, &lt;br /&gt;
then for one hour 7 of the 8 cores on the GPC node are wasted; they are &lt;br /&gt;
sitting idle but are unavailable for other users, and the utilization of &lt;br /&gt;
this node over the whole run is only 56%.   This is the sort of thing &lt;br /&gt;
we'll notice, and users who don't make efficient use of the machine will &lt;br /&gt;
have their ability to use scinet resources reduced.  If you have many serial jobs of varying length, &lt;br /&gt;
use the submission script to balance the computational load, as explained [[ #Serial jobs of varying duration | below]].&lt;br /&gt;
&lt;br /&gt;
Third, we reiterate that if memory requirements allow it, you should try to run more than 8 jobs at once, with a maximum of 16 jobs.&lt;br /&gt;
&lt;br /&gt;
===GNU Parallel===&lt;br /&gt;
&lt;br /&gt;
GNU parallel is a really nice tool written by Ole Tange to run multiple serial jobs in&lt;br /&gt;
parallel. It allows you to keep the processors on each 8core node busy, if you provide enough jobs to do.&lt;br /&gt;
&lt;br /&gt;
GNU parallel is accessible on the GPC in the module&lt;br /&gt;
&amp;lt;tt&amp;gt;gnu-parallel&amp;lt;/tt&amp;gt;:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
module load gnu-parallel/20140622&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Note that there are several versions of gnu-parallel installed on the GPC; we recommend using the newer version. &lt;br /&gt;
&lt;br /&gt;
The citation for GNU Parallel is: O. Tange (2011): GNU Parallel - The Command-Line Power Tool, '';login: The USENIX Magazine,'' February 2011:42-47.&lt;br /&gt;
&lt;br /&gt;
It is easiest to demonstrate the usage of GNU parallel by&lt;br /&gt;
examples. Suppose you have 16 jobs to do, that these jobs duration varies quite a bit, but that the average job duration is around 10 hours. You could use the following script:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple serial jobs on SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=24:00:00&lt;br /&gt;
#PBS -N gnu-parallel-example&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# Turn off implicit threading in Python, R&lt;br /&gt;
export OMP_NUM_THREADS=1&lt;br /&gt;
&lt;br /&gt;
module load gnu-parallel/20140622  &lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND&lt;br /&gt;
parallel -j 8 &amp;lt;&amp;lt;EOF&lt;br /&gt;
  cd jobdir1; ./dojob1; echo &amp;quot;job 1 finished&amp;quot;&lt;br /&gt;
  cd jobdir2; ./dojob2; echo &amp;quot;job 2 finished&amp;quot;&lt;br /&gt;
  cd jobdir3; ./dojob3; echo &amp;quot;job 3 finished&amp;quot;&lt;br /&gt;
  cd jobdir4; ./dojob4; echo &amp;quot;job 4 finished&amp;quot;&lt;br /&gt;
  cd jobdir5; ./dojob5; echo &amp;quot;job 5 finished&amp;quot;&lt;br /&gt;
  cd jobdir6; ./dojob6; echo &amp;quot;job 6 finished&amp;quot;&lt;br /&gt;
  cd jobdir7; ./dojob7; echo &amp;quot;job 7 finished&amp;quot;&lt;br /&gt;
  cd jobdir8; ./dojob8; echo &amp;quot;job 8 finished&amp;quot;&lt;br /&gt;
  cd jobdir9; ./dojob9; echo &amp;quot;job 9 finished&amp;quot;&lt;br /&gt;
  cd jobdir10; ./dojob10; echo &amp;quot;job 10 finished&amp;quot;&lt;br /&gt;
  cd jobdir11; ./dojob11; echo &amp;quot;job 11 finished&amp;quot;&lt;br /&gt;
  cd jobdir12; ./dojob12; echo &amp;quot;job 12 finished&amp;quot;&lt;br /&gt;
  cd jobdir13; ./dojob13; echo &amp;quot;job 13 finished&amp;quot;&lt;br /&gt;
  cd jobdir14; ./dojob14; echo &amp;quot;job 14 finished&amp;quot;&lt;br /&gt;
  cd jobdir15; ./dojob15; echo &amp;quot;job 15 finished&amp;quot;&lt;br /&gt;
  cd jobdir16; ./dojob16; echo &amp;quot;job 16 finished&amp;quot;&lt;br /&gt;
EOF&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;-j8&amp;lt;/tt&amp;gt; parameter sets the number of jobs to run at the same time, but 16 jobs are lined up. Initially, 8 jobs are given to the 8 processors on the node. When one of the processors is done with its assigned job, it will get a next job instead of sitting idle until the other processors are done. While you would expect that on average this script should take 20 hours (each processor on average has to complete two jobs of 10hours), there's a good chance that one of the processors gets two jobs that take more than 10 hours, so the job script requests 24 hours. How much more time you should ask for in practice depends on the spread in run times of the separate jobs.&lt;br /&gt;
&lt;br /&gt;
===Serial jobs of varying duration===&lt;br /&gt;
&lt;br /&gt;
If you have a lot (50+) of relatively short serial runs to do, '''of which the walltime varies''', and if you know that eight jobs fit in memory without memory issues, then writing all the command explicitly in the jobscript can get tedious. If you follw a convention in that the jobs are all started by auxiliary scripts called jobs&amp;lt;something&amp;gt;, the following strategy in your submission script would maximize the cpu utilization. &lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple, dynamically-run &lt;br /&gt;
# serial jobs on SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N serialdynamic&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
module load gnu-parallel/20130422  &lt;br /&gt;
&lt;br /&gt;
# COMMANDS ARE ASSUMED TO BE SCRIPTS CALLED job*.sh&lt;br /&gt;
echo job*.sh | tr ' ' '\n' | parallel -j 8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Notes:&lt;br /&gt;
* As before, GNU Parallel keeps 8 jobs running at a time, and if one finishes, starts the next. This is an easy way to do ''load balancing''.&lt;br /&gt;
* You can in fact run more or less than 8 processes per node by modifying &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt;'s &amp;lt;tt&amp;gt;-j8&amp;lt;/tt&amp;gt; argument.&lt;br /&gt;
* Doing many serial jobs often entails doing many disk reads and writes, which can be detrimental to the performance. In that case, running from the ramdisk may be an option.  &lt;br /&gt;
* When using a ramdisk, make sure you copy your results from the ramdisk back to the scratch after the runs, or when the job is killed because time has run out.&lt;br /&gt;
* More details on how to setup your script to use the ramdisk can be found on the [[User_Ramdisk|Ramdisk wiki page]].&lt;br /&gt;
* This script optimizes resource utility, but can only use 1 node (8 cores) at a time. The next section addresses how to use more nodes.&lt;br /&gt;
&lt;br /&gt;
===Version for more than 8 cores at once (still serial)===&lt;br /&gt;
&lt;br /&gt;
If you have hundreds of serial jobs that you want to run concurrently and the nodes are available, then the approach above, while useful, would require tens of scripts to be submitted separately. It is possible for you to request more than one node and to use the following routine to distribute your processes amongst the cores. In this case, it is important to use the newer version of GNU parallel installed on the GPC.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple, dynamically-run &lt;br /&gt;
# serial jobs on SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=25:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N serialdynamicMulti&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
module load gnu-parallel/20130422&lt;br /&gt;
&lt;br /&gt;
# START PARALLEL JOBS USING NODE LIST IN $PBS_NODEFILE&lt;br /&gt;
seq 800 | parallel -j8 --sshloginfile $PBS_NODEFILE --workdir $PWD ./myrun {}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Explanation:&lt;br /&gt;
* &amp;lt;tt&amp;gt;seq 800&amp;lt;/tt&amp;gt; outputs the numbers 1 through 800 on separate lines. This output is piped to (ie becomes the input of) the &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; command.&lt;br /&gt;
* The use of the &amp;quot;seq 800&amp;quot; is that each line that you give to &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; defines a new job. So here, there are 800 jobs.&lt;br /&gt;
* Each job runs a command, but because the numbers generated by seq are not commands, a real command is constructed, in this case, by the argument &amp;lt;tt&amp;gt;./myrun {}&amp;lt;/tt&amp;gt;. Here &amp;lt;tt&amp;gt;myrun&amp;lt;/tt&amp;gt; is supposed to be the name of the application to run. The two curly brackets &amp;lt;tt&amp;gt;{}&amp;lt;/tt&amp;gt; get replaced by the line from the input, that is, by one of the numbers.&lt;br /&gt;
* So parallel will run the 800 commands:&amp;lt;br/&amp;gt;./myrun 1&amp;lt;br/&amp;gt;./myrun 2&amp;lt;br/&amp;gt;...&amp;lt;br/&amp;gt;./myrun 800&lt;br /&gt;
* The parameter &amp;lt;tt&amp;gt;--sshloginfile $PBS_NODEFILE&amp;lt;/tt&amp;gt; tells &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; to look for the file named $PBS_NODEFILE which contains the host names of the nodes assigned to the current job (as stated above, it is automatically generated).&lt;br /&gt;
* The parameter &amp;lt;tt&amp;gt;-j8&amp;lt;/tt&amp;gt; tells &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; to run 8 of these at a time on each of the hosts.&lt;br /&gt;
* The &amp;lt;tt&amp;gt;--workdir $PWD&amp;lt;/tt&amp;gt; sets the working directory on the other nodes to the working directory on the first node. Without this, the run tries to start from the wrong place and will most likely fail (unless using the latest gnu parallel module, 20130422, which by default puts you in $PWD on the remote node).&lt;br /&gt;
* Loaded modules should get automatically loaded on the remote nodes too for the latest gnu parallel module, but not for earlier ones.&lt;br /&gt;
* If you need an environment variable to be transfered from the job script to the remotely running subjobs, use &amp;lt;tt&amp;gt;--env ENVIRONMENTVARIABLE&amp;lt;/tt&amp;gt;.&lt;br /&gt;
Notes:&lt;br /&gt;
* Of course, this is just an example of what you could do with gnu parallel. How you set up your specific run depends on how each of the runs would be started. One could for instance also prepare a file of commands to run and make that the input to parallel as well.&lt;br /&gt;
* Note that submitting several bunches to single nodes, as in the section above, is a more failsafe way of proceeding, since a node failure would only affect one of these bunches, rather than all runs. &lt;br /&gt;
* GNU Parallel can be passed a file with the list of nodes to which to ssh, using &amp;lt;tt&amp;gt;--sshloginfile&amp;lt;/tt&amp;gt; (thanks to Ole Tange for pointing this out). This list is automatically generated by the scheduler and its name is made available in the environment variable $PBS_NODEFILE.&lt;br /&gt;
* Alternatively, GNU Parallel can take a comma separated list of nodes given to its -S argument, but this would need to be constructed from the file $PBS_NODEFILE which contains all nodes assigned to the job, with each node duplicated 8x for the number of cores on each node.&lt;br /&gt;
* GNU Parallel can reads lines of input and convert those to arguments in the execution command. The execution command is the last argument given to &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt;, with &amp;lt;tt&amp;gt;{}&amp;lt;/tt&amp;gt; replaces by the lines on input.&lt;br /&gt;
* &amp;lt;span style=&amp;quot;color:red;&amp;quot;&amp;gt;The --workdir argument is essential: it sets the working directory on the other nodes, which would default to your home directory if omitted. Since /home is read-only on the compute nodes, you would not like not get any output at all!&amp;lt;/span&amp;gt;&amp;lt;br&amp;gt;This is no longer true for the latest GNU Parallel modules (20130422), which puts you in the current directory on the remote nodes.&lt;br /&gt;
* We reiterate that if memory requirements allow it, you should try to run more than 8 jobs at once, with a maximum of 16 jobs. You can run more or fewer than 8 processes per node by modifying the -j8 parameter to the parallel command.&lt;br /&gt;
&lt;br /&gt;
===More on GNU parallel=== &lt;br /&gt;
* [[Media:Tech-talk-gnu-parallel.pdf|Slides of the SciNet TechTalk on Gnu Parallel (14 Nov 2012)]]&lt;br /&gt;
* The documentation for GNU parallel can be found at http://www.gnu.org/software/parallel/&lt;br /&gt;
* Its man page can be found here http://www.gnu.org/software/parallel/man.html&lt;br /&gt;
* The man page is also available on the GPC when the gnu-parallel module is loaded, with the command &amp;lt;code&amp;gt;$ man parallel&amp;lt;/code&amp;gt;. The man page contains options, such as how to make sure the output is not all scrambled, and examples.&lt;br /&gt;
&lt;br /&gt;
===GNU Parallel Reference===&lt;br /&gt;
* O. Tange (2011): GNU Parallel - The Command-Line Power Tool, '';login: The USENIX Magazine,'' February 2011:42-47.&lt;br /&gt;
&lt;br /&gt;
===Older scripts===&lt;br /&gt;
&lt;br /&gt;
Older scripts, which mimicked some of GNU parallel functionality, can be found on the [[Deprecated scripts]] page.&lt;br /&gt;
&lt;br /&gt;
--[[User:Rzon|Rzon]] 02:22, 14 Nov 2010 (UTC)&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=User_Serial&amp;diff=7128</id>
		<title>User Serial</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=User_Serial&amp;diff=7128"/>
		<updated>2014-07-30T19:19:12Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: /* Serial jobs of similar duration */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;===General considerations===&lt;br /&gt;
&lt;br /&gt;
====Use a whole node...====&lt;br /&gt;
&lt;br /&gt;
When you submit a job on a SciNet system, it is run on one (or more than one) entire node - meaning that your job is occupying at least 8 processors during the duration of its run.  The SciNet systems are usually, with many researchers waiting in the queue for computational resources, so we require that you make full use of the nodes that your job is allocated, so other resarchers don't have to wait unnecessarily, and so that your jobs get as much work done for you while they run as possible.&lt;br /&gt;
&lt;br /&gt;
Often, the best way to make full use of the node is to run one large parallel computation; but sometimes it is beneficial to run several serial codes at the same time.  On this page, we discuss ways to run suites of serial computations at once, as efficiently as possible, using the full resources of the node.&lt;br /&gt;
&lt;br /&gt;
====...but not more.====&lt;br /&gt;
&lt;br /&gt;
When running multiple jobs on the same node, it is essential to have a good idea of how much memory the jobs will require. The GPC compute nodes have about 14GB in total available &lt;br /&gt;
to user jobs running on the 8 cores (a bit less, say 13GB, on the devel ndoes &amp;lt;tt&amp;gt;gpc01..04&amp;lt;/tt&amp;gt;, and [[GPC_Quickstart#Memory_Configuration|somewhat more for some compute nodes]])&lt;br /&gt;
So the jobs also have to be  bunched in ways that will fit into 14GB.  If they use more than this, they will crash the node, inconvenicning you and other researchers waiting for that node.&lt;br /&gt;
&lt;br /&gt;
If that's not possible -- each individual job requires significantly in excess of ~1.75GB -- then it's possible to just run fewer jobs so that they do fit; but then, again there is an under-utilization problem.   In that case, the jobs are likely candidates for parallelization, and you can contact us at [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;] and arrange a meeting with one of the technical analysts to help you do just that.&lt;br /&gt;
&lt;br /&gt;
If the memory requirements allow it, you could actually run more than 8 jobs at the same time, up to 16, exploiting the [[GPC_Quickstart#HyperThreading | HyperThreading]] feature of the Intel Nehalem cores.  It may seem counterintuitive, but running 16 jobs on 8 cores for certain types of tasks has increased some users overall throughput by 10 to 30 percent.&lt;br /&gt;
&lt;br /&gt;
====Is your job really serial?====&lt;br /&gt;
&lt;br /&gt;
While your program may not be explicitly parallel, it may use some of SciNet's threaded libraries for numerical computations, which can make use of multiple processors.  In particular, SciNet's [[Python]] and [[R_Statistical_Package | R]] modules are compiled with aggressive optimization and using threaded numerical libraries which by default will make use of multiple cores for computations such as large matrix operations.  This can greatly speed up individual runs, but by less (usually much less) than a factor of 8.  If you do have many such computations to do, your [[Introduction_To_Performance#Throughput | throughput]] will be better - you will get more calculations done per unit time -if you turn off the threading and run multiple such computations at once.  Threading is turned off with the shell script line &amp;lt;tt&amp;gt;export OMP_NUM_THREADS=1&amp;lt;/tt&amp;gt;; that line will be included in the scripts below.  &lt;br /&gt;
&lt;br /&gt;
If your calculations do implicitly use threading, you may want to experiment to see what gives you the best performance - you may find that running 4 (or even 8) jobs with 2 threads each (&amp;lt;tt&amp;gt;OMP_NUM_THREADS=2&amp;lt;/tt&amp;gt;), or 2 jobs with 4 threads, gives better performance than 8 jobs with 1 thread (and almost certainly better than 1 job with 8 threads).  We'd encourage to you to perform exactly such a [[Introduction_To_Performance#Strong_Scaling_Tests | scaling test]]; for a small up-front investment in time you may significantly speed up all the computations you need to do.&lt;br /&gt;
&lt;br /&gt;
===Serial jobs of similar duration===&lt;br /&gt;
&lt;br /&gt;
The most straightforward way to run multiple serial jobs is to bunch the jobs in groups of 8 or more that will take roughly the same amount of time, and create a job that looks a &lt;br /&gt;
bit like this&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple serial jobs on&lt;br /&gt;
# SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N serialx8&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# Turn off implicit threading in Python, R&lt;br /&gt;
export OMP_NUM_THREADS=1&lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND; ampersand off 8 jobs and wait&lt;br /&gt;
(cd jobdir1; ./dojob1) &amp;amp;&lt;br /&gt;
(cd jobdir2; ./dojob2) &amp;amp;&lt;br /&gt;
(cd jobdir3; ./dojob3) &amp;amp;&lt;br /&gt;
(cd jobdir4; ./dojob4) &amp;amp;&lt;br /&gt;
(cd jobdir5; ./dojob5) &amp;amp;&lt;br /&gt;
(cd jobdir6; ./dojob6) &amp;amp;&lt;br /&gt;
(cd jobdir7; ./dojob7) &amp;amp;&lt;br /&gt;
(cd jobdir8; ./dojob8) &amp;amp;&lt;br /&gt;
wait&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are four important things to take note of here.  First, the &amp;lt;tt&amp;gt;'''wait'''&amp;lt;/tt&amp;gt;&lt;br /&gt;
command at the end is crucial; without it the job will terminate &lt;br /&gt;
immediately, killing the 8 programs you just started.&lt;br /&gt;
&lt;br /&gt;
Second is that it is important to group the programs by how long they &lt;br /&gt;
will take.   If (say) &amp;lt;tt&amp;gt;dojob8&amp;lt;/tt&amp;gt; takes 2 hours and the rest only take 1, &lt;br /&gt;
then for one hour 7 of the 8 cores on the GPC node are wasted; they are &lt;br /&gt;
sitting idle but are unavailable for other users, and the utilization of &lt;br /&gt;
this node over the whole run is only 56%.   This is the sort of thing &lt;br /&gt;
we'll notice, and users who don't make efficient use of the machine will &lt;br /&gt;
have their ability to use scinet resources reduced.  If you have many serial jobs of varying length, &lt;br /&gt;
use the submission script to balance the computational load, as explained [[ #Serial jobs of varying duration | below]].&lt;br /&gt;
&lt;br /&gt;
Third, we reiterate that if memory requirements allow it, you should try to run more than 8 jobs at once, with a maximum of 16 jobs.&lt;br /&gt;
&lt;br /&gt;
===GNU Parallel===&lt;br /&gt;
&lt;br /&gt;
GNU parallel is a really nice tool written by Ole Tange to run multiple serial jobs in&lt;br /&gt;
parallel. It allows you to keep the processors on each 8core node busy, if you provide enough jobs to do.&lt;br /&gt;
&lt;br /&gt;
GNU parallel is accessible on the GPC in the module&lt;br /&gt;
&amp;lt;tt&amp;gt;gnu-parallel&amp;lt;/tt&amp;gt;:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
module load gnu-parallel/20130422&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Note that there are currently (May 2013) four versions of gnu-parallel installed on the GPC, with the older version, gnu-parallel/2010, as the default, although we'd recommend using the newer version. &lt;br /&gt;
&lt;br /&gt;
Note that the citation for GNU Parallel is: O. Tange (2011): GNU Parallel - The Command-Line Power Tool, '';login: The USENIX Magazine,'' February 2011:42-47.&lt;br /&gt;
&lt;br /&gt;
It is easiest to demonstrate the usage of GNU parallel by&lt;br /&gt;
examples. Suppose you have 16 jobs to do, that these jobs duration varies quite a bit, but that the average job duration is around 10 hours. You could use the following script:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple serial jobs on SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=24:00:00&lt;br /&gt;
#PBS -N gnu-parallel-example&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
module load gnu-parallel/20130422  &lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND&lt;br /&gt;
parallel -j 8 &amp;lt;&amp;lt;EOF&lt;br /&gt;
  cd jobdir1; ./dojob1; echo &amp;quot;job 1 finished&amp;quot;&lt;br /&gt;
  cd jobdir2; ./dojob2; echo &amp;quot;job 2 finished&amp;quot;&lt;br /&gt;
  cd jobdir3; ./dojob3; echo &amp;quot;job 3 finished&amp;quot;&lt;br /&gt;
  cd jobdir4; ./dojob4; echo &amp;quot;job 4 finished&amp;quot;&lt;br /&gt;
  cd jobdir5; ./dojob5; echo &amp;quot;job 5 finished&amp;quot;&lt;br /&gt;
  cd jobdir6; ./dojob6; echo &amp;quot;job 6 finished&amp;quot;&lt;br /&gt;
  cd jobdir7; ./dojob7; echo &amp;quot;job 7 finished&amp;quot;&lt;br /&gt;
  cd jobdir8; ./dojob8; echo &amp;quot;job 8 finished&amp;quot;&lt;br /&gt;
  cd jobdir9; ./dojob9; echo &amp;quot;job 9 finished&amp;quot;&lt;br /&gt;
  cd jobdir10; ./dojob10; echo &amp;quot;job 10 finished&amp;quot;&lt;br /&gt;
  cd jobdir11; ./dojob11; echo &amp;quot;job 11 finished&amp;quot;&lt;br /&gt;
  cd jobdir12; ./dojob12; echo &amp;quot;job 12 finished&amp;quot;&lt;br /&gt;
  cd jobdir13; ./dojob13; echo &amp;quot;job 13 finished&amp;quot;&lt;br /&gt;
  cd jobdir14; ./dojob14; echo &amp;quot;job 14 finished&amp;quot;&lt;br /&gt;
  cd jobdir15; ./dojob15; echo &amp;quot;job 15 finished&amp;quot;&lt;br /&gt;
  cd jobdir16; ./dojob16; echo &amp;quot;job 16 finished&amp;quot;&lt;br /&gt;
EOF&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;-j8&amp;lt;/tt&amp;gt; parameter sets the number of jobs to run at the same time, but 16 jobs are lined up. Initially, 8 jobs are given to the 8 processors on the node. When one of the processors is done with its assigned job, it will get a next job instead of sitting idle until the other processors are done. While you would expect that on average this script should take 20 hours (each processor on average has to complete two jobs of 10hours), there's a good chance that one of the processors gets two jobs that take more than 10 hours, so the job script requests 24 hours. How much more time you should ask for in practice depends on the spread in run times of the separate jobs.&lt;br /&gt;
&lt;br /&gt;
===Serial jobs of varying duration===&lt;br /&gt;
&lt;br /&gt;
If you have a lot (50+) of relatively short serial runs to do, '''of which the walltime varies''', and if you know that eight jobs fit in memory without memory issues, then writing all the command explicitly in the jobscript can get tedious. If you follw a convention in that the jobs are all started by auxiliary scripts called jobs&amp;lt;something&amp;gt;, the following strategy in your submission script would maximize the cpu utilization. &lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple, dynamically-run &lt;br /&gt;
# serial jobs on SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N serialdynamic&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
module load gnu-parallel/20130422  &lt;br /&gt;
&lt;br /&gt;
# COMMANDS ARE ASSUMED TO BE SCRIPTS CALLED job*.sh&lt;br /&gt;
echo job*.sh | tr ' ' '\n' | parallel -j 8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Notes:&lt;br /&gt;
* As before, GNU Parallel keeps 8 jobs running at a time, and if one finishes, starts the next. This is an easy way to do ''load balancing''.&lt;br /&gt;
* You can in fact run more or less than 8 processes per node by modifying &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt;'s &amp;lt;tt&amp;gt;-j8&amp;lt;/tt&amp;gt; argument.&lt;br /&gt;
* Doing many serial jobs often entails doing many disk reads and writes, which can be detrimental to the performance. In that case, running from the ramdisk may be an option.  &lt;br /&gt;
* When using a ramdisk, make sure you copy your results from the ramdisk back to the scratch after the runs, or when the job is killed because time has run out.&lt;br /&gt;
* More details on how to setup your script to use the ramdisk can be found on the [[User_Ramdisk|Ramdisk wiki page]].&lt;br /&gt;
* This script optimizes resource utility, but can only use 1 node (8 cores) at a time. The next section addresses how to use more nodes.&lt;br /&gt;
&lt;br /&gt;
===Version for more than 8 cores at once (still serial)===&lt;br /&gt;
&lt;br /&gt;
If you have hundreds of serial jobs that you want to run concurrently and the nodes are available, then the approach above, while useful, would require tens of scripts to be submitted separately. It is possible for you to request more than one node and to use the following routine to distribute your processes amongst the cores. In this case, it is important to use the newer version of GNU parallel installed on the GPC.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple, dynamically-run &lt;br /&gt;
# serial jobs on SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=25:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N serialdynamicMulti&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
module load gnu-parallel/20130422&lt;br /&gt;
&lt;br /&gt;
# START PARALLEL JOBS USING NODE LIST IN $PBS_NODEFILE&lt;br /&gt;
seq 800 | parallel -j8 --sshloginfile $PBS_NODEFILE --workdir $PWD ./myrun {}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Explanation:&lt;br /&gt;
* &amp;lt;tt&amp;gt;seq 800&amp;lt;/tt&amp;gt; outputs the numbers 1 through 800 on separate lines. This output is piped to (ie becomes the input of) the &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; command.&lt;br /&gt;
* The use of the &amp;quot;seq 800&amp;quot; is that each line that you give to &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; defines a new job. So here, there are 800 jobs.&lt;br /&gt;
* Each job runs a command, but because the numbers generated by seq are not commands, a real command is constructed, in this case, by the argument &amp;lt;tt&amp;gt;./myrun {}&amp;lt;/tt&amp;gt;. Here &amp;lt;tt&amp;gt;myrun&amp;lt;/tt&amp;gt; is supposed to be the name of the application to run. The two curly brackets &amp;lt;tt&amp;gt;{}&amp;lt;/tt&amp;gt; get replaced by the line from the input, that is, by one of the numbers.&lt;br /&gt;
* So parallel will run the 800 commands:&amp;lt;br/&amp;gt;./myrun 1&amp;lt;br/&amp;gt;./myrun 2&amp;lt;br/&amp;gt;...&amp;lt;br/&amp;gt;./myrun 800&lt;br /&gt;
* The parameter &amp;lt;tt&amp;gt;--sshloginfile $PBS_NODEFILE&amp;lt;/tt&amp;gt; tells &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; to look for the file named $PBS_NODEFILE which contains the host names of the nodes assigned to the current job (as stated above, it is automatically generated).&lt;br /&gt;
* The parameter &amp;lt;tt&amp;gt;-j8&amp;lt;/tt&amp;gt; tells &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; to run 8 of these at a time on each of the hosts.&lt;br /&gt;
* The &amp;lt;tt&amp;gt;--workdir $PWD&amp;lt;/tt&amp;gt; sets the working directory on the other nodes to the working directory on the first node. Without this, the run tries to start from the wrong place and will most likely fail (unless using the latest gnu parallel module, 20130422, which by default puts you in $PWD on the remote node).&lt;br /&gt;
* Loaded modules should get automatically loaded on the remote nodes too for the latest gnu parallel module, but not for earlier ones.&lt;br /&gt;
* If you need an environment variable to be transfered from the job script to the remotely running subjobs, use &amp;lt;tt&amp;gt;--env ENVIRONMENTVARIABLE&amp;lt;/tt&amp;gt;.&lt;br /&gt;
Notes:&lt;br /&gt;
* Of course, this is just an example of what you could do with gnu parallel. How you set up your specific run depends on how each of the runs would be started. One could for instance also prepare a file of commands to run and make that the input to parallel as well.&lt;br /&gt;
* Note that submitting several bunches to single nodes, as in the section above, is a more failsafe way of proceeding, since a node failure would only affect one of these bunches, rather than all runs. &lt;br /&gt;
* GNU Parallel can be passed a file with the list of nodes to which to ssh, using &amp;lt;tt&amp;gt;--sshloginfile&amp;lt;/tt&amp;gt; (thanks to Ole Tange for pointing this out). This list is automatically generated by the scheduler and its name is made available in the environment variable $PBS_NODEFILE.&lt;br /&gt;
* Alternatively, GNU Parallel can take a comma separated list of nodes given to its -S argument, but this would need to be constructed from the file $PBS_NODEFILE which contains all nodes assigned to the job, with each node duplicated 8x for the number of cores on each node.&lt;br /&gt;
* GNU Parallel can reads lines of input and convert those to arguments in the execution command. The execution command is the last argument given to &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt;, with &amp;lt;tt&amp;gt;{}&amp;lt;/tt&amp;gt; replaces by the lines on input.&lt;br /&gt;
* &amp;lt;span style=&amp;quot;color:red;&amp;quot;&amp;gt;The --workdir argument is essential: it sets the working directory on the other nodes, which would default to your home directory if omitted. Since /home is read-only on the compute nodes, you would not like not get any output at all!&amp;lt;/span&amp;gt;&amp;lt;br&amp;gt;This is no longer true for the latest GNU Parallel modules (20130422), which puts you in the current directory on the remote nodes.&lt;br /&gt;
* We reiterate that if memory requirements allow it, you should try to run more than 8 jobs at once, with a maximum of 16 jobs. You can run more or fewer than 8 processes per node by modifying the -j8 parameter to the parallel command.&lt;br /&gt;
&lt;br /&gt;
===More on GNU parallel=== &lt;br /&gt;
* [[Media:Tech-talk-gnu-parallel.pdf|Slides of the SciNet TechTalk on Gnu Parallel (14 Nov 2012)]]&lt;br /&gt;
* The documentation for GNU parallel can be found at http://www.gnu.org/software/parallel/&lt;br /&gt;
* Its man page can be found here http://www.gnu.org/software/parallel/man.html&lt;br /&gt;
* The man page is also available on the GPC when the gnu-parallel module is loaded, with the command &amp;lt;code&amp;gt;$ man parallel&amp;lt;/code&amp;gt;. The man page contains options, such as how to make sure the output is not all scrambled, and examples.&lt;br /&gt;
&lt;br /&gt;
===GNU Parallel Reference===&lt;br /&gt;
* O. Tange (2011): GNU Parallel - The Command-Line Power Tool, '';login: The USENIX Magazine,'' February 2011:42-47.&lt;br /&gt;
&lt;br /&gt;
===Older scripts===&lt;br /&gt;
&lt;br /&gt;
Older scripts, which mimicked some of GNU parallel functionality, can be found on the [[Deprecated scripts]] page.&lt;br /&gt;
&lt;br /&gt;
--[[User:Rzon|Rzon]] 02:22, 14 Nov 2010 (UTC)&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=User_Serial&amp;diff=7127</id>
		<title>User Serial</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=User_Serial&amp;diff=7127"/>
		<updated>2014-07-30T19:18:37Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: /* General considerations */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;===General considerations===&lt;br /&gt;
&lt;br /&gt;
====Use a whole node...====&lt;br /&gt;
&lt;br /&gt;
When you submit a job on a SciNet system, it is run on one (or more than one) entire node - meaning that your job is occupying at least 8 processors during the duration of its run.  The SciNet systems are usually, with many researchers waiting in the queue for computational resources, so we require that you make full use of the nodes that your job is allocated, so other resarchers don't have to wait unnecessarily, and so that your jobs get as much work done for you while they run as possible.&lt;br /&gt;
&lt;br /&gt;
Often, the best way to make full use of the node is to run one large parallel computation; but sometimes it is beneficial to run several serial codes at the same time.  On this page, we discuss ways to run suites of serial computations at once, as efficiently as possible, using the full resources of the node.&lt;br /&gt;
&lt;br /&gt;
====...but not more.====&lt;br /&gt;
&lt;br /&gt;
When running multiple jobs on the same node, it is essential to have a good idea of how much memory the jobs will require. The GPC compute nodes have about 14GB in total available &lt;br /&gt;
to user jobs running on the 8 cores (a bit less, say 13GB, on the devel ndoes &amp;lt;tt&amp;gt;gpc01..04&amp;lt;/tt&amp;gt;, and [[GPC_Quickstart#Memory_Configuration|somewhat more for some compute nodes]])&lt;br /&gt;
So the jobs also have to be  bunched in ways that will fit into 14GB.  If they use more than this, they will crash the node, inconvenicning you and other researchers waiting for that node.&lt;br /&gt;
&lt;br /&gt;
If that's not possible -- each individual job requires significantly in excess of ~1.75GB -- then it's possible to just run fewer jobs so that they do fit; but then, again there is an under-utilization problem.   In that case, the jobs are likely candidates for parallelization, and you can contact us at [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;] and arrange a meeting with one of the technical analysts to help you do just that.&lt;br /&gt;
&lt;br /&gt;
If the memory requirements allow it, you could actually run more than 8 jobs at the same time, up to 16, exploiting the [[GPC_Quickstart#HyperThreading | HyperThreading]] feature of the Intel Nehalem cores.  It may seem counterintuitive, but running 16 jobs on 8 cores for certain types of tasks has increased some users overall throughput by 10 to 30 percent.&lt;br /&gt;
&lt;br /&gt;
====Is your job really serial?====&lt;br /&gt;
&lt;br /&gt;
While your program may not be explicitly parallel, it may use some of SciNet's threaded libraries for numerical computations, which can make use of multiple processors.  In particular, SciNet's [[Python]] and [[R_Statistical_Package | R]] modules are compiled with aggressive optimization and using threaded numerical libraries which by default will make use of multiple cores for computations such as large matrix operations.  This can greatly speed up individual runs, but by less (usually much less) than a factor of 8.  If you do have many such computations to do, your [[Introduction_To_Performance#Throughput | throughput]] will be better - you will get more calculations done per unit time -if you turn off the threading and run multiple such computations at once.  Threading is turned off with the shell script line &amp;lt;tt&amp;gt;export OMP_NUM_THREADS=1&amp;lt;/tt&amp;gt;; that line will be included in the scripts below.  &lt;br /&gt;
&lt;br /&gt;
If your calculations do implicitly use threading, you may want to experiment to see what gives you the best performance - you may find that running 4 (or even 8) jobs with 2 threads each (&amp;lt;tt&amp;gt;OMP_NUM_THREADS=2&amp;lt;/tt&amp;gt;), or 2 jobs with 4 threads, gives better performance than 8 jobs with 1 thread (and almost certainly better than 1 job with 8 threads).  We'd encourage to you to perform exactly such a [[Introduction_To_Performance#Strong_Scaling_Tests | scaling test]]; for a small up-front investment in time you may significantly speed up all the computations you need to do.&lt;br /&gt;
&lt;br /&gt;
===Serial jobs of similar duration===&lt;br /&gt;
&lt;br /&gt;
The most straightforward way to run multiple serial jobs is to bunch the jobs in groups of 8 or more that will take roughly the same amount of time, and create a job that looks a &lt;br /&gt;
bit like this&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple serial jobs on&lt;br /&gt;
# SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N serialx8&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND; ampersand off 8 jobs and wait&lt;br /&gt;
(cd jobdir1; ./dojob1) &amp;amp;&lt;br /&gt;
(cd jobdir2; ./dojob2) &amp;amp;&lt;br /&gt;
(cd jobdir3; ./dojob3) &amp;amp;&lt;br /&gt;
(cd jobdir4; ./dojob4) &amp;amp;&lt;br /&gt;
(cd jobdir5; ./dojob5) &amp;amp;&lt;br /&gt;
(cd jobdir6; ./dojob6) &amp;amp;&lt;br /&gt;
(cd jobdir7; ./dojob7) &amp;amp;&lt;br /&gt;
(cd jobdir8; ./dojob8) &amp;amp;&lt;br /&gt;
wait&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are four important things to take note of here.  First, the &amp;lt;tt&amp;gt;'''wait'''&amp;lt;/tt&amp;gt;&lt;br /&gt;
command at the end is crucial; without it the job will terminate &lt;br /&gt;
immediately, killing the 8 programs you just started.&lt;br /&gt;
&lt;br /&gt;
Second is that it is important to group the programs by how long they &lt;br /&gt;
will take.   If (say) &amp;lt;tt&amp;gt;dojob8&amp;lt;/tt&amp;gt; takes 2 hours and the rest only take 1, &lt;br /&gt;
then for one hour 7 of the 8 cores on the GPC node are wasted; they are &lt;br /&gt;
sitting idle but are unavailable for other users, and the utilization of &lt;br /&gt;
this node over the whole run is only 56%.   This is the sort of thing &lt;br /&gt;
we'll notice, and users who don't make efficient use of the machine will &lt;br /&gt;
have their ability to use scinet resources reduced.  If you have many serial jobs of varying length, &lt;br /&gt;
use the submission script to balance the computational load, as explained [[ #Serial jobs of varying duration | below]].&lt;br /&gt;
&lt;br /&gt;
Third, we reiterate that if memory requirements allow it, you should try to run more than 8 jobs at once, with a maximum of 16 jobs.&lt;br /&gt;
&lt;br /&gt;
===GNU Parallel===&lt;br /&gt;
&lt;br /&gt;
GNU parallel is a really nice tool written by Ole Tange to run multiple serial jobs in&lt;br /&gt;
parallel. It allows you to keep the processors on each 8core node busy, if you provide enough jobs to do.&lt;br /&gt;
&lt;br /&gt;
GNU parallel is accessible on the GPC in the module&lt;br /&gt;
&amp;lt;tt&amp;gt;gnu-parallel&amp;lt;/tt&amp;gt;:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
module load gnu-parallel/20130422&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Note that there are currently (May 2013) four versions of gnu-parallel installed on the GPC, with the older version, gnu-parallel/2010, as the default, although we'd recommend using the newer version. &lt;br /&gt;
&lt;br /&gt;
Note that the citation for GNU Parallel is: O. Tange (2011): GNU Parallel - The Command-Line Power Tool, '';login: The USENIX Magazine,'' February 2011:42-47.&lt;br /&gt;
&lt;br /&gt;
It is easiest to demonstrate the usage of GNU parallel by&lt;br /&gt;
examples. Suppose you have 16 jobs to do, that these jobs duration varies quite a bit, but that the average job duration is around 10 hours. You could use the following script:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple serial jobs on SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=24:00:00&lt;br /&gt;
#PBS -N gnu-parallel-example&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
module load gnu-parallel/20130422  &lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND&lt;br /&gt;
parallel -j 8 &amp;lt;&amp;lt;EOF&lt;br /&gt;
  cd jobdir1; ./dojob1; echo &amp;quot;job 1 finished&amp;quot;&lt;br /&gt;
  cd jobdir2; ./dojob2; echo &amp;quot;job 2 finished&amp;quot;&lt;br /&gt;
  cd jobdir3; ./dojob3; echo &amp;quot;job 3 finished&amp;quot;&lt;br /&gt;
  cd jobdir4; ./dojob4; echo &amp;quot;job 4 finished&amp;quot;&lt;br /&gt;
  cd jobdir5; ./dojob5; echo &amp;quot;job 5 finished&amp;quot;&lt;br /&gt;
  cd jobdir6; ./dojob6; echo &amp;quot;job 6 finished&amp;quot;&lt;br /&gt;
  cd jobdir7; ./dojob7; echo &amp;quot;job 7 finished&amp;quot;&lt;br /&gt;
  cd jobdir8; ./dojob8; echo &amp;quot;job 8 finished&amp;quot;&lt;br /&gt;
  cd jobdir9; ./dojob9; echo &amp;quot;job 9 finished&amp;quot;&lt;br /&gt;
  cd jobdir10; ./dojob10; echo &amp;quot;job 10 finished&amp;quot;&lt;br /&gt;
  cd jobdir11; ./dojob11; echo &amp;quot;job 11 finished&amp;quot;&lt;br /&gt;
  cd jobdir12; ./dojob12; echo &amp;quot;job 12 finished&amp;quot;&lt;br /&gt;
  cd jobdir13; ./dojob13; echo &amp;quot;job 13 finished&amp;quot;&lt;br /&gt;
  cd jobdir14; ./dojob14; echo &amp;quot;job 14 finished&amp;quot;&lt;br /&gt;
  cd jobdir15; ./dojob15; echo &amp;quot;job 15 finished&amp;quot;&lt;br /&gt;
  cd jobdir16; ./dojob16; echo &amp;quot;job 16 finished&amp;quot;&lt;br /&gt;
EOF&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;-j8&amp;lt;/tt&amp;gt; parameter sets the number of jobs to run at the same time, but 16 jobs are lined up. Initially, 8 jobs are given to the 8 processors on the node. When one of the processors is done with its assigned job, it will get a next job instead of sitting idle until the other processors are done. While you would expect that on average this script should take 20 hours (each processor on average has to complete two jobs of 10hours), there's a good chance that one of the processors gets two jobs that take more than 10 hours, so the job script requests 24 hours. How much more time you should ask for in practice depends on the spread in run times of the separate jobs.&lt;br /&gt;
&lt;br /&gt;
===Serial jobs of varying duration===&lt;br /&gt;
&lt;br /&gt;
If you have a lot (50+) of relatively short serial runs to do, '''of which the walltime varies''', and if you know that eight jobs fit in memory without memory issues, then writing all the command explicitly in the jobscript can get tedious. If you follw a convention in that the jobs are all started by auxiliary scripts called jobs&amp;lt;something&amp;gt;, the following strategy in your submission script would maximize the cpu utilization. &lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple, dynamically-run &lt;br /&gt;
# serial jobs on SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N serialdynamic&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
module load gnu-parallel/20130422  &lt;br /&gt;
&lt;br /&gt;
# COMMANDS ARE ASSUMED TO BE SCRIPTS CALLED job*.sh&lt;br /&gt;
echo job*.sh | tr ' ' '\n' | parallel -j 8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Notes:&lt;br /&gt;
* As before, GNU Parallel keeps 8 jobs running at a time, and if one finishes, starts the next. This is an easy way to do ''load balancing''.&lt;br /&gt;
* You can in fact run more or less than 8 processes per node by modifying &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt;'s &amp;lt;tt&amp;gt;-j8&amp;lt;/tt&amp;gt; argument.&lt;br /&gt;
* Doing many serial jobs often entails doing many disk reads and writes, which can be detrimental to the performance. In that case, running from the ramdisk may be an option.  &lt;br /&gt;
* When using a ramdisk, make sure you copy your results from the ramdisk back to the scratch after the runs, or when the job is killed because time has run out.&lt;br /&gt;
* More details on how to setup your script to use the ramdisk can be found on the [[User_Ramdisk|Ramdisk wiki page]].&lt;br /&gt;
* This script optimizes resource utility, but can only use 1 node (8 cores) at a time. The next section addresses how to use more nodes.&lt;br /&gt;
&lt;br /&gt;
===Version for more than 8 cores at once (still serial)===&lt;br /&gt;
&lt;br /&gt;
If you have hundreds of serial jobs that you want to run concurrently and the nodes are available, then the approach above, while useful, would require tens of scripts to be submitted separately. It is possible for you to request more than one node and to use the following routine to distribute your processes amongst the cores. In this case, it is important to use the newer version of GNU parallel installed on the GPC.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple, dynamically-run &lt;br /&gt;
# serial jobs on SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=25:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N serialdynamicMulti&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
module load gnu-parallel/20130422&lt;br /&gt;
&lt;br /&gt;
# START PARALLEL JOBS USING NODE LIST IN $PBS_NODEFILE&lt;br /&gt;
seq 800 | parallel -j8 --sshloginfile $PBS_NODEFILE --workdir $PWD ./myrun {}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Explanation:&lt;br /&gt;
* &amp;lt;tt&amp;gt;seq 800&amp;lt;/tt&amp;gt; outputs the numbers 1 through 800 on separate lines. This output is piped to (ie becomes the input of) the &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; command.&lt;br /&gt;
* The use of the &amp;quot;seq 800&amp;quot; is that each line that you give to &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; defines a new job. So here, there are 800 jobs.&lt;br /&gt;
* Each job runs a command, but because the numbers generated by seq are not commands, a real command is constructed, in this case, by the argument &amp;lt;tt&amp;gt;./myrun {}&amp;lt;/tt&amp;gt;. Here &amp;lt;tt&amp;gt;myrun&amp;lt;/tt&amp;gt; is supposed to be the name of the application to run. The two curly brackets &amp;lt;tt&amp;gt;{}&amp;lt;/tt&amp;gt; get replaced by the line from the input, that is, by one of the numbers.&lt;br /&gt;
* So parallel will run the 800 commands:&amp;lt;br/&amp;gt;./myrun 1&amp;lt;br/&amp;gt;./myrun 2&amp;lt;br/&amp;gt;...&amp;lt;br/&amp;gt;./myrun 800&lt;br /&gt;
* The parameter &amp;lt;tt&amp;gt;--sshloginfile $PBS_NODEFILE&amp;lt;/tt&amp;gt; tells &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; to look for the file named $PBS_NODEFILE which contains the host names of the nodes assigned to the current job (as stated above, it is automatically generated).&lt;br /&gt;
* The parameter &amp;lt;tt&amp;gt;-j8&amp;lt;/tt&amp;gt; tells &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; to run 8 of these at a time on each of the hosts.&lt;br /&gt;
* The &amp;lt;tt&amp;gt;--workdir $PWD&amp;lt;/tt&amp;gt; sets the working directory on the other nodes to the working directory on the first node. Without this, the run tries to start from the wrong place and will most likely fail (unless using the latest gnu parallel module, 20130422, which by default puts you in $PWD on the remote node).&lt;br /&gt;
* Loaded modules should get automatically loaded on the remote nodes too for the latest gnu parallel module, but not for earlier ones.&lt;br /&gt;
* If you need an environment variable to be transfered from the job script to the remotely running subjobs, use &amp;lt;tt&amp;gt;--env ENVIRONMENTVARIABLE&amp;lt;/tt&amp;gt;.&lt;br /&gt;
Notes:&lt;br /&gt;
* Of course, this is just an example of what you could do with gnu parallel. How you set up your specific run depends on how each of the runs would be started. One could for instance also prepare a file of commands to run and make that the input to parallel as well.&lt;br /&gt;
* Note that submitting several bunches to single nodes, as in the section above, is a more failsafe way of proceeding, since a node failure would only affect one of these bunches, rather than all runs. &lt;br /&gt;
* GNU Parallel can be passed a file with the list of nodes to which to ssh, using &amp;lt;tt&amp;gt;--sshloginfile&amp;lt;/tt&amp;gt; (thanks to Ole Tange for pointing this out). This list is automatically generated by the scheduler and its name is made available in the environment variable $PBS_NODEFILE.&lt;br /&gt;
* Alternatively, GNU Parallel can take a comma separated list of nodes given to its -S argument, but this would need to be constructed from the file $PBS_NODEFILE which contains all nodes assigned to the job, with each node duplicated 8x for the number of cores on each node.&lt;br /&gt;
* GNU Parallel can reads lines of input and convert those to arguments in the execution command. The execution command is the last argument given to &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt;, with &amp;lt;tt&amp;gt;{}&amp;lt;/tt&amp;gt; replaces by the lines on input.&lt;br /&gt;
* &amp;lt;span style=&amp;quot;color:red;&amp;quot;&amp;gt;The --workdir argument is essential: it sets the working directory on the other nodes, which would default to your home directory if omitted. Since /home is read-only on the compute nodes, you would not like not get any output at all!&amp;lt;/span&amp;gt;&amp;lt;br&amp;gt;This is no longer true for the latest GNU Parallel modules (20130422), which puts you in the current directory on the remote nodes.&lt;br /&gt;
* We reiterate that if memory requirements allow it, you should try to run more than 8 jobs at once, with a maximum of 16 jobs. You can run more or fewer than 8 processes per node by modifying the -j8 parameter to the parallel command.&lt;br /&gt;
&lt;br /&gt;
===More on GNU parallel=== &lt;br /&gt;
* [[Media:Tech-talk-gnu-parallel.pdf|Slides of the SciNet TechTalk on Gnu Parallel (14 Nov 2012)]]&lt;br /&gt;
* The documentation for GNU parallel can be found at http://www.gnu.org/software/parallel/&lt;br /&gt;
* Its man page can be found here http://www.gnu.org/software/parallel/man.html&lt;br /&gt;
* The man page is also available on the GPC when the gnu-parallel module is loaded, with the command &amp;lt;code&amp;gt;$ man parallel&amp;lt;/code&amp;gt;. The man page contains options, such as how to make sure the output is not all scrambled, and examples.&lt;br /&gt;
&lt;br /&gt;
===GNU Parallel Reference===&lt;br /&gt;
* O. Tange (2011): GNU Parallel - The Command-Line Power Tool, '';login: The USENIX Magazine,'' February 2011:42-47.&lt;br /&gt;
&lt;br /&gt;
===Older scripts===&lt;br /&gt;
&lt;br /&gt;
Older scripts, which mimicked some of GNU parallel functionality, can be found on the [[Deprecated scripts]] page.&lt;br /&gt;
&lt;br /&gt;
--[[User:Rzon|Rzon]] 02:22, 14 Nov 2010 (UTC)&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=User_Serial&amp;diff=7126</id>
		<title>User Serial</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=User_Serial&amp;diff=7126"/>
		<updated>2014-07-30T19:05:56Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: /* General considerations */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;===General considerations===&lt;br /&gt;
&lt;br /&gt;
====Use a whole node...====&lt;br /&gt;
&lt;br /&gt;
When you submit a job on a SciNet system, it is run on one (or more than one) entire node - meaning that your job is occupying at least 8 processors during the duration of its run.  The SciNet systems are usually, with many researchers waiting in the queue for computational resources, so we require that you make full use of the nodes that your job is allocated, so other resarchers don't have to wait unnecessarily, and so that your jobs get as much work done for you while they run as possible.&lt;br /&gt;
&lt;br /&gt;
Often, the best way to make full use of the node is to run one large parallel computation; but sometimes it is beneficial to run several serial codes at the same time.  On this page, we discuss ways to run suites of serial computations at once, as efficiently as possible, using the full resources of the node.&lt;br /&gt;
&lt;br /&gt;
====...but not more.====&lt;br /&gt;
&lt;br /&gt;
When running multiple jobs on the same node, it is essential to have a good idea of how much memory the jobs will require. The GPC compute nodes have about 14GB in total available &lt;br /&gt;
to user jobs running on the 8 cores (a bit less, say 13GB, on the devel ndoes &amp;lt;tt&amp;gt;gpc01..04&amp;lt;/tt&amp;gt;, and [[GPC_Quickstart#Memory_Configuration|somewhat more for some compute nodes]])&lt;br /&gt;
So the jobs also have to be  bunched in ways that will fit into 14GB.   If that's not possible -- &lt;br /&gt;
each individual job requires significantly in excess of ~1.75GB -- then &lt;br /&gt;
its possible in principle to just run fewer jobs so that they do fit; &lt;br /&gt;
but then, again there is an under-utilization problem.   In that case, &lt;br /&gt;
the jobs are likely candidates for parallelization, and you can contact &lt;br /&gt;
us at [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;] and arrange a meeting with one of the &lt;br /&gt;
technical analysts to help you do just that.&lt;br /&gt;
&lt;br /&gt;
If the memory requirements allow it, you could actually run more than 8 jobs at the same time, up to 16, exploiting the [[GPC_Quickstart#HyperThreading | HyperThreading]] feature of the Intel Nehalem cores.  It may seem counterintuitive, but running 16 jobs on 8 cores has increased some users overall throughput by 10 to 30 percent.&lt;br /&gt;
&lt;br /&gt;
====Is your job really serial?====&lt;br /&gt;
&lt;br /&gt;
===Serial jobs of similar duration===&lt;br /&gt;
&lt;br /&gt;
The most straightforward way to run multiple serial jobs is to bunch the jobs in groups of 8 or more that will take roughly the same amount of time, and create a job that looks a &lt;br /&gt;
bit like this&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple serial jobs on&lt;br /&gt;
# SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N serialx8&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND; ampersand off 8 jobs and wait&lt;br /&gt;
(cd jobdir1; ./dojob1) &amp;amp;&lt;br /&gt;
(cd jobdir2; ./dojob2) &amp;amp;&lt;br /&gt;
(cd jobdir3; ./dojob3) &amp;amp;&lt;br /&gt;
(cd jobdir4; ./dojob4) &amp;amp;&lt;br /&gt;
(cd jobdir5; ./dojob5) &amp;amp;&lt;br /&gt;
(cd jobdir6; ./dojob6) &amp;amp;&lt;br /&gt;
(cd jobdir7; ./dojob7) &amp;amp;&lt;br /&gt;
(cd jobdir8; ./dojob8) &amp;amp;&lt;br /&gt;
wait&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are four important things to take note of here.  First, the &amp;lt;tt&amp;gt;'''wait'''&amp;lt;/tt&amp;gt;&lt;br /&gt;
command at the end is crucial; without it the job will terminate &lt;br /&gt;
immediately, killing the 8 programs you just started.&lt;br /&gt;
&lt;br /&gt;
Second is that it is important to group the programs by how long they &lt;br /&gt;
will take.   If (say) &amp;lt;tt&amp;gt;dojob8&amp;lt;/tt&amp;gt; takes 2 hours and the rest only take 1, &lt;br /&gt;
then for one hour 7 of the 8 cores on the GPC node are wasted; they are &lt;br /&gt;
sitting idle but are unavailable for other users, and the utilization of &lt;br /&gt;
this node over the whole run is only 56%.   This is the sort of thing &lt;br /&gt;
we'll notice, and users who don't make efficient use of the machine will &lt;br /&gt;
have their ability to use scinet resources reduced.  If you have many serial jobs of varying length, &lt;br /&gt;
use the submission script to balance the computational load, as explained [[ #Serial jobs of varying duration | below]].&lt;br /&gt;
&lt;br /&gt;
Third, we reiterate that if memory requirements allow it, you should try to run more than 8 jobs at once, with a maximum of 16 jobs.&lt;br /&gt;
&lt;br /&gt;
===GNU Parallel===&lt;br /&gt;
&lt;br /&gt;
GNU parallel is a really nice tool written by Ole Tange to run multiple serial jobs in&lt;br /&gt;
parallel. It allows you to keep the processors on each 8core node busy, if you provide enough jobs to do.&lt;br /&gt;
&lt;br /&gt;
GNU parallel is accessible on the GPC in the module&lt;br /&gt;
&amp;lt;tt&amp;gt;gnu-parallel&amp;lt;/tt&amp;gt;:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
module load gnu-parallel/20130422&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Note that there are currently (May 2013) four versions of gnu-parallel installed on the GPC, with the older version, gnu-parallel/2010, as the default, although we'd recommend using the newer version. &lt;br /&gt;
&lt;br /&gt;
Note that the citation for GNU Parallel is: O. Tange (2011): GNU Parallel - The Command-Line Power Tool, '';login: The USENIX Magazine,'' February 2011:42-47.&lt;br /&gt;
&lt;br /&gt;
It is easiest to demonstrate the usage of GNU parallel by&lt;br /&gt;
examples. Suppose you have 16 jobs to do, that these jobs duration varies quite a bit, but that the average job duration is around 10 hours. You could use the following script:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple serial jobs on SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=24:00:00&lt;br /&gt;
#PBS -N gnu-parallel-example&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
module load gnu-parallel/20130422  &lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND&lt;br /&gt;
parallel -j 8 &amp;lt;&amp;lt;EOF&lt;br /&gt;
  cd jobdir1; ./dojob1; echo &amp;quot;job 1 finished&amp;quot;&lt;br /&gt;
  cd jobdir2; ./dojob2; echo &amp;quot;job 2 finished&amp;quot;&lt;br /&gt;
  cd jobdir3; ./dojob3; echo &amp;quot;job 3 finished&amp;quot;&lt;br /&gt;
  cd jobdir4; ./dojob4; echo &amp;quot;job 4 finished&amp;quot;&lt;br /&gt;
  cd jobdir5; ./dojob5; echo &amp;quot;job 5 finished&amp;quot;&lt;br /&gt;
  cd jobdir6; ./dojob6; echo &amp;quot;job 6 finished&amp;quot;&lt;br /&gt;
  cd jobdir7; ./dojob7; echo &amp;quot;job 7 finished&amp;quot;&lt;br /&gt;
  cd jobdir8; ./dojob8; echo &amp;quot;job 8 finished&amp;quot;&lt;br /&gt;
  cd jobdir9; ./dojob9; echo &amp;quot;job 9 finished&amp;quot;&lt;br /&gt;
  cd jobdir10; ./dojob10; echo &amp;quot;job 10 finished&amp;quot;&lt;br /&gt;
  cd jobdir11; ./dojob11; echo &amp;quot;job 11 finished&amp;quot;&lt;br /&gt;
  cd jobdir12; ./dojob12; echo &amp;quot;job 12 finished&amp;quot;&lt;br /&gt;
  cd jobdir13; ./dojob13; echo &amp;quot;job 13 finished&amp;quot;&lt;br /&gt;
  cd jobdir14; ./dojob14; echo &amp;quot;job 14 finished&amp;quot;&lt;br /&gt;
  cd jobdir15; ./dojob15; echo &amp;quot;job 15 finished&amp;quot;&lt;br /&gt;
  cd jobdir16; ./dojob16; echo &amp;quot;job 16 finished&amp;quot;&lt;br /&gt;
EOF&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;-j8&amp;lt;/tt&amp;gt; parameter sets the number of jobs to run at the same time, but 16 jobs are lined up. Initially, 8 jobs are given to the 8 processors on the node. When one of the processors is done with its assigned job, it will get a next job instead of sitting idle until the other processors are done. While you would expect that on average this script should take 20 hours (each processor on average has to complete two jobs of 10hours), there's a good chance that one of the processors gets two jobs that take more than 10 hours, so the job script requests 24 hours. How much more time you should ask for in practice depends on the spread in run times of the separate jobs.&lt;br /&gt;
&lt;br /&gt;
===Serial jobs of varying duration===&lt;br /&gt;
&lt;br /&gt;
If you have a lot (50+) of relatively short serial runs to do, '''of which the walltime varies''', and if you know that eight jobs fit in memory without memory issues, then writing all the command explicitly in the jobscript can get tedious. If you follw a convention in that the jobs are all started by auxiliary scripts called jobs&amp;lt;something&amp;gt;, the following strategy in your submission script would maximize the cpu utilization. &lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple, dynamically-run &lt;br /&gt;
# serial jobs on SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N serialdynamic&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
module load gnu-parallel/20130422  &lt;br /&gt;
&lt;br /&gt;
# COMMANDS ARE ASSUMED TO BE SCRIPTS CALLED job*.sh&lt;br /&gt;
echo job*.sh | tr ' ' '\n' | parallel -j 8&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Notes:&lt;br /&gt;
* As before, GNU Parallel keeps 8 jobs running at a time, and if one finishes, starts the next. This is an easy way to do ''load balancing''.&lt;br /&gt;
* You can in fact run more or less than 8 processes per node by modifying &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt;'s &amp;lt;tt&amp;gt;-j8&amp;lt;/tt&amp;gt; argument.&lt;br /&gt;
* Doing many serial jobs often entails doing many disk reads and writes, which can be detrimental to the performance. In that case, running from the ramdisk may be an option.  &lt;br /&gt;
* When using a ramdisk, make sure you copy your results from the ramdisk back to the scratch after the runs, or when the job is killed because time has run out.&lt;br /&gt;
* More details on how to setup your script to use the ramdisk can be found on the [[User_Ramdisk|Ramdisk wiki page]].&lt;br /&gt;
* This script optimizes resource utility, but can only use 1 node (8 cores) at a time. The next section addresses how to use more nodes.&lt;br /&gt;
&lt;br /&gt;
===Version for more than 8 cores at once (still serial)===&lt;br /&gt;
&lt;br /&gt;
If you have hundreds of serial jobs that you want to run concurrently and the nodes are available, then the approach above, while useful, would require tens of scripts to be submitted separately. It is possible for you to request more than one node and to use the following routine to distribute your processes amongst the cores. In this case, it is important to use the newer version of GNU parallel installed on the GPC.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for multiple, dynamically-run &lt;br /&gt;
# serial jobs on SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=25:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N serialdynamicMulti&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
module load gnu-parallel/20130422&lt;br /&gt;
&lt;br /&gt;
# START PARALLEL JOBS USING NODE LIST IN $PBS_NODEFILE&lt;br /&gt;
seq 800 | parallel -j8 --sshloginfile $PBS_NODEFILE --workdir $PWD ./myrun {}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Explanation:&lt;br /&gt;
* &amp;lt;tt&amp;gt;seq 800&amp;lt;/tt&amp;gt; outputs the numbers 1 through 800 on separate lines. This output is piped to (ie becomes the input of) the &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; command.&lt;br /&gt;
* The use of the &amp;quot;seq 800&amp;quot; is that each line that you give to &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; defines a new job. So here, there are 800 jobs.&lt;br /&gt;
* Each job runs a command, but because the numbers generated by seq are not commands, a real command is constructed, in this case, by the argument &amp;lt;tt&amp;gt;./myrun {}&amp;lt;/tt&amp;gt;. Here &amp;lt;tt&amp;gt;myrun&amp;lt;/tt&amp;gt; is supposed to be the name of the application to run. The two curly brackets &amp;lt;tt&amp;gt;{}&amp;lt;/tt&amp;gt; get replaced by the line from the input, that is, by one of the numbers.&lt;br /&gt;
* So parallel will run the 800 commands:&amp;lt;br/&amp;gt;./myrun 1&amp;lt;br/&amp;gt;./myrun 2&amp;lt;br/&amp;gt;...&amp;lt;br/&amp;gt;./myrun 800&lt;br /&gt;
* The parameter &amp;lt;tt&amp;gt;--sshloginfile $PBS_NODEFILE&amp;lt;/tt&amp;gt; tells &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; to look for the file named $PBS_NODEFILE which contains the host names of the nodes assigned to the current job (as stated above, it is automatically generated).&lt;br /&gt;
* The parameter &amp;lt;tt&amp;gt;-j8&amp;lt;/tt&amp;gt; tells &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt; to run 8 of these at a time on each of the hosts.&lt;br /&gt;
* The &amp;lt;tt&amp;gt;--workdir $PWD&amp;lt;/tt&amp;gt; sets the working directory on the other nodes to the working directory on the first node. Without this, the run tries to start from the wrong place and will most likely fail (unless using the latest gnu parallel module, 20130422, which by default puts you in $PWD on the remote node).&lt;br /&gt;
* Loaded modules should get automatically loaded on the remote nodes too for the latest gnu parallel module, but not for earlier ones.&lt;br /&gt;
* If you need an environment variable to be transfered from the job script to the remotely running subjobs, use &amp;lt;tt&amp;gt;--env ENVIRONMENTVARIABLE&amp;lt;/tt&amp;gt;.&lt;br /&gt;
Notes:&lt;br /&gt;
* Of course, this is just an example of what you could do with gnu parallel. How you set up your specific run depends on how each of the runs would be started. One could for instance also prepare a file of commands to run and make that the input to parallel as well.&lt;br /&gt;
* Note that submitting several bunches to single nodes, as in the section above, is a more failsafe way of proceeding, since a node failure would only affect one of these bunches, rather than all runs. &lt;br /&gt;
* GNU Parallel can be passed a file with the list of nodes to which to ssh, using &amp;lt;tt&amp;gt;--sshloginfile&amp;lt;/tt&amp;gt; (thanks to Ole Tange for pointing this out). This list is automatically generated by the scheduler and its name is made available in the environment variable $PBS_NODEFILE.&lt;br /&gt;
* Alternatively, GNU Parallel can take a comma separated list of nodes given to its -S argument, but this would need to be constructed from the file $PBS_NODEFILE which contains all nodes assigned to the job, with each node duplicated 8x for the number of cores on each node.&lt;br /&gt;
* GNU Parallel can reads lines of input and convert those to arguments in the execution command. The execution command is the last argument given to &amp;lt;tt&amp;gt;parallel&amp;lt;/tt&amp;gt;, with &amp;lt;tt&amp;gt;{}&amp;lt;/tt&amp;gt; replaces by the lines on input.&lt;br /&gt;
* &amp;lt;span style=&amp;quot;color:red;&amp;quot;&amp;gt;The --workdir argument is essential: it sets the working directory on the other nodes, which would default to your home directory if omitted. Since /home is read-only on the compute nodes, you would not like not get any output at all!&amp;lt;/span&amp;gt;&amp;lt;br&amp;gt;This is no longer true for the latest GNU Parallel modules (20130422), which puts you in the current directory on the remote nodes.&lt;br /&gt;
* We reiterate that if memory requirements allow it, you should try to run more than 8 jobs at once, with a maximum of 16 jobs. You can run more or fewer than 8 processes per node by modifying the -j8 parameter to the parallel command.&lt;br /&gt;
&lt;br /&gt;
===More on GNU parallel=== &lt;br /&gt;
* [[Media:Tech-talk-gnu-parallel.pdf|Slides of the SciNet TechTalk on Gnu Parallel (14 Nov 2012)]]&lt;br /&gt;
* The documentation for GNU parallel can be found at http://www.gnu.org/software/parallel/&lt;br /&gt;
* Its man page can be found here http://www.gnu.org/software/parallel/man.html&lt;br /&gt;
* The man page is also available on the GPC when the gnu-parallel module is loaded, with the command &amp;lt;code&amp;gt;$ man parallel&amp;lt;/code&amp;gt;. The man page contains options, such as how to make sure the output is not all scrambled, and examples.&lt;br /&gt;
&lt;br /&gt;
===GNU Parallel Reference===&lt;br /&gt;
* O. Tange (2011): GNU Parallel - The Command-Line Power Tool, '';login: The USENIX Magazine,'' February 2011:42-47.&lt;br /&gt;
&lt;br /&gt;
===Older scripts===&lt;br /&gt;
&lt;br /&gt;
Older scripts, which mimicked some of GNU parallel functionality, can be found on the [[Deprecated scripts]] page.&lt;br /&gt;
&lt;br /&gt;
--[[User:Rzon|Rzon]] 02:22, 14 Nov 2010 (UTC)&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Knowledge_Base:_Tutorials_and_Manuals&amp;diff=6956</id>
		<title>Knowledge Base: Tutorials and Manuals</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Knowledge_Base:_Tutorials_and_Manuals&amp;diff=6956"/>
		<updated>2014-04-07T21:23:09Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: /* Programming */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__TOC__&lt;br /&gt;
=Training material=&lt;br /&gt;
&lt;br /&gt;
For upcoming classes, see our [https://support.scinet.utoronto.ca/education/ Training and Education website]!&lt;br /&gt;
==SciNet Basics==&lt;br /&gt;
* [[Media:SciNet_Tutorial.pdf|SciNet User Tutorial]]&lt;br /&gt;
* Intro to SciNet: [http://support.scinet.utoronto.ca/CourseVideo/scinetintro/scinetintro.html Video]/[[Media:Introscinet.pdf|Slides]], SciNet, November 2012&lt;br /&gt;
* SciNet Resources: [http://support.scinet.utoronto.ca/CourseVideo/PPPcourse/Monday_Morning_SciNet_Resources/Monday_Morning_SciNet_Resources.mp4 Video]/ [[Media:Monday_Morning_SciNet_Resources.pdf|Slides]] &lt;br /&gt;
* [[Essentials]]&lt;br /&gt;
* [[FAQ|Frequenty asked questions]]&lt;br /&gt;
* [[Ssh]]&lt;br /&gt;
* [[GPC_Quickstart|GPC quickstart]]&lt;br /&gt;
* [[TCS_Quickstart|TCS quickstart]]&lt;br /&gt;
* [[GPU_Devel_Nodes|ARC/GPU quickstart]]&lt;br /&gt;
* [[Cell_Devel_Nodes|ARC/Cell quickstart]]&lt;br /&gt;
* [[Important .bashrc guidelines]]&lt;br /&gt;
* [[Media:LargeScaleBio.pdf‎|Workflow Optimization (w/focus on Large Scale BioInformatics)]]&lt;br /&gt;
* [[Software_and_Libraries | Software and libraries]]&lt;br /&gt;
* [[Installing your own modules]]&lt;br /&gt;
* [[Media:SNUGlocalsetup.pdf|User-space modules and packages (April 2011 SNUG TechTalk)]]&lt;br /&gt;
* [[Media:HPSS_rationale.pdf|HPSS - SciNet's storage capacity expansion]]&lt;br /&gt;
* BGQ Hardware Overview [https://support.scinet.utoronto.ca/~northrup/bgqhardware.pdf Slides ]/ [https://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqhardware/bgqhardware.mp4 Video Recording ]&lt;br /&gt;
* Intro to Using the BGQ [[Media:Bgqintro.pdf|Slides ]]/[https://support.scinet.utoronto.ca/CourseVideo/BGQ/bgqintro/bgqintro.mp4 Video Recording ]&lt;br /&gt;
&lt;br /&gt;
==Linux==&lt;br /&gt;
* Linux Command Line: A Primer (June 2012) [[Media:SS_IntroToShell.pdf|Slides,]] [[Media:SS_IntroToShell.tgz|Files]]&lt;br /&gt;
* Introduction to the Linux Shell, SciNet, Mar 2012: [[Media:IntroToShell.pdf|Slides]] and [[Media:Shell-data.tgz|Data files]]&lt;br /&gt;
&lt;br /&gt;
==Batch job management==&lt;br /&gt;
* [[Media:LargeScaleBio.pdf‎|Workflow Optimization (w/focus on Large Scale BioInformatics)]]&lt;br /&gt;
* [[Media:Tech-talk-gnu-parallel.pdf|GNU Parallel (Techtalk Nov 14, 2012)]]&lt;br /&gt;
* [[Media:TechTalkJobMonitoring.pdf|Job Monitoring on SciNet and Job Efficiency]]&lt;br /&gt;
&amp;lt;!-- * [[Media:Snugtrackjob.pdf|Job Monitoring on SciNet and Job Efficiency]] --&amp;gt;&lt;br /&gt;
* [[Wallclock time]]&lt;br /&gt;
* [[Checkpoints]]&lt;br /&gt;
* [[Using_Signals|Signals]]&lt;br /&gt;
* [[Moab]]&lt;br /&gt;
* [[User_Serial|Serial Jobs (including GNU Parallel)]]&lt;br /&gt;
* [[User_Ramdisk|Ramdisk]]&lt;br /&gt;
* [http://www.clusterresources.com/products/mwm/docs/index.shtml Moab workload manager]&lt;br /&gt;
* [http://www.clusterresources.com/products/mwm/docs/a.gcommandoverview.shtml Moab commands]&lt;br /&gt;
* [http://www.clusterresources.com/products/torque/docs/ Torque resource manager] &lt;br /&gt;
* [http://www.clusterresources.com/products/torque/docs/a.acommands.shtml Torque PBS commands]&lt;br /&gt;
* [http://support.scinet.utoronto.ca/Manuals/PE5.1-operationanduse.pdf Parallel environment]&lt;br /&gt;
* [http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp Cluster information center] (with error codes)&lt;br /&gt;
* [http://support.scinet.utoronto.ca/Manuals/LL-usingandadministering.pdf LoadLeveler: using &amp;amp; administering]&lt;br /&gt;
&lt;br /&gt;
==Programming==&lt;br /&gt;
===General===&lt;br /&gt;
* [[Media:SciDev-XLCompilers.pdf|Performance Tuning with the IBM XL Compilers]]: Slides fromt the SciNet Develop Seminar by Kit Barton, Sep 17, 2012.&lt;br /&gt;
* [[Media:Remotescinet.pdf‎|Remote Development]], slides from TechTalk Jun 13, 2012&lt;br /&gt;
* [[Scientific Software Development Course]], part I of the SciNet's Scientific Computing Course&lt;br /&gt;
* [http://software-carpentry.org Software Carpentry Resources]&lt;br /&gt;
* Version Control: [http://support.scinet.utoronto.ca/CourseVideo/PPPcourse/Thursday_Morning_BP_Revision_Control/Thursday_Morning_BP_Revision_Control.mp4 Video]/ [[Media:Snug_techtalk_revcontrol.pdf | Slides]]&lt;br /&gt;
* [[IBM_Nov_Workshop | IBM AIX Workshop, SciNet, Nov 2008 ]] &lt;br /&gt;
* [[IBM_Compiler_Workshop | IBM Compiler Workshop, SciNet, Feb 2009]]&lt;br /&gt;
* SNUG Techtalk Dec 2011 [[Media:Snug_techtalk_compiler.pdf | Intel Compiler Optimizations]]&lt;br /&gt;
&lt;br /&gt;
===Fortran===&lt;br /&gt;
* Modern Fortran Course (1 day), SciNet, 19 Apr 2011&lt;br /&gt;
** [[Media:ModernFortran.pdf | Slides]]&lt;br /&gt;
** [[Media:ModernFortran.tgz | Source Code]]&lt;br /&gt;
* [http://software.intel.com/sites/products/documentation/hpc/compilerp* [http://support.scinet.utoronto.ca/Manuals/xlf-compiler.pdf IBM Fortran compiler] [http://support.scinet.utoronto.ca/Manuals/xlf-langref.pdf language], [http://support.scinet.utoronto.ca/Manuals/xlf-proguide.pdf optimization]&lt;br /&gt;
&lt;br /&gt;
===C++===&lt;br /&gt;
* [[Media:Cpp11.pdf|Slides]] and [http://support.scinet.utoronto.ca/CourseVideo/Cpp11/cpp11.html recording] of the SciNet Developer Seminar on C++11, March 20, 2013&lt;br /&gt;
* Scientific C++ Course (1 day), SciNet, 15 March 2011 &lt;br /&gt;
** [[Media:Scientific-c%2B%2B.pdf|Slides]] (updated on Apr 26, 2012)&lt;br /&gt;
** [[Media:Scinetcppexamples.tgz|Example source code]]&lt;br /&gt;
** [[Videos_of_the_One-Day_Scientific_C%2B%2B_Class | Videos of the Scientific C++ class]] &lt;br /&gt;
* [http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/cpp/lin/compiler_c/index.htm Intel C &amp;amp; C++ compiler]&lt;br /&gt;
ro/en-us/fortran/lin/compiler_f/index.htm Intel Fortran compiler]&lt;br /&gt;
* [http://support.scinet.utoronto.ca/Manuals/xlC++-compiler.pdf IBM C++ compiler], [http://support.scinet.utoronto.ca/Manuals/xlC++-langref.pdf language], [http://support.scinet.utoronto.ca/Manuals/xlC++-proguide.pdf optimization]&lt;br /&gt;
&lt;br /&gt;
===C===&lt;br /&gt;
* C refresher: [http://support.scinet.utoronto.ca/CourseVideo/PPPcourse/Monday_Morning_C_Review/Monday_Morning_C_Review.mp4 Video]/ [[Media:Monday_Morning_C_Review.pdf| Slides]]&lt;br /&gt;
* [http://support.scinet.utoronto.ca/Manuals/xlc-compiler.pdf IBM C compiler], [http://support.scinet.utoronto.ca/Manuals/xlc-langref.pdf language], [http://support.scinet.utoronto.ca/Manuals/xlc-proguide.pdf optimization]&lt;br /&gt;
&lt;br /&gt;
===Hadoop===&lt;br /&gt;
* Introduction to Hadoop for HPCers, Part I - MapReduce: [[Media:Hadoop-PartI.pdf|Slides]], [[Media:HadoopPart1examples.tgz|Source Code]], [http://support.scinet.utoronto.ca/~ljdursi/SciNetHadoopVM.zip Virtual Machine]&lt;br /&gt;
&lt;br /&gt;
===Perl===&lt;br /&gt;
* [[Perl]]&lt;br /&gt;
===Python===&lt;br /&gt;
* [[Python]]&lt;br /&gt;
* [[IPython Notebook on GPC]] (January 2014 TechTalk)&lt;br /&gt;
* [[Research Computing with Python]] (Modular Course, Fall 2013)&lt;br /&gt;
* [http://support.scinet.utoronto.ca/Snug/scinet-f2py/scinet-f2py.html f2py: Fortran and Python] (June 2011 TechTalk by Pierre de Buyl)&lt;br /&gt;
&lt;br /&gt;
===R===&lt;br /&gt;
* [[R Statistical Package]]&lt;br /&gt;
===Lua===&lt;br /&gt;
* [[Media:PeterColberg_Lua_scinet.pdf | Scripting HALMD with Lua and Luabind]] (May 2011 TechTalk by Peter Colberg)&lt;br /&gt;
&lt;br /&gt;
==Parallel Programming==&lt;br /&gt;
* [[Ontario Summerschool on High Performance Computing Central]]&lt;br /&gt;
* [[High Performance Scientific Computing]], part 3 of SciNet's Scientific Computing Course (Winter 2012)&lt;br /&gt;
* Parallel Programming Course (5 days), SciNet, May 2011&lt;br /&gt;
** [[Parallel_Scientific_Computing_-_May_2011 | Videos, slides and code]]&lt;br /&gt;
* Parallel Computing for Computational Fluid Dynamics (CFD), SciNet, 23 March 2011&lt;br /&gt;
** [[Media:parCFD-mpi.pdf | Slides]]&lt;br /&gt;
** [[Media:parCFD.tgz | Source Code]]&lt;br /&gt;
* Intro to Practical Parallel Programming (1 day), SciNet, 22 Sept 2010: &lt;br /&gt;
**[[Media:PPP-Intro-Morning.pdf|Morning Slides, Intro and OpenMP ]]&lt;br /&gt;
**[[Media:PPP-Intro-Afternoon.pdf|Afternoon Slides, MPI]]&lt;br /&gt;
**[[Media:Intro-ppp.tgz|Example source code]]&lt;br /&gt;
* Parallel Scientific Computing Workshop (5 days), SciNet, Aug 2009: &lt;br /&gt;
**[[ Parallel_Scientific_Computing_-_Aug_09 | Slides ]]&lt;br /&gt;
**[http://www.cita.utoronto.ca/~ljdursi/PSP/ Video]&lt;br /&gt;
* [http://www.vscse.org/  Virtual School for CSE] Web courses (Jul/Aug 2010):&lt;br /&gt;
** Petascale programming environments and tools&lt;br /&gt;
** Big data for science&lt;br /&gt;
** Proven algorithmic techniques for many-core processors&lt;br /&gt;
* [https://computing.llnl.gov/tutorials/mpi/ LLNL MPI Tutorial]: This was the basis for the MPI workshop at SciNet. &lt;br /&gt;
* [http://software.intel.com/sites/products/documentation/hpc/mpi/linux/reference_manual.pdf Intel MPI library]&lt;br /&gt;
* [[GPC MPI Versions]]&lt;br /&gt;
* [[Co-array Fortran on the GPC]]&lt;br /&gt;
* [[IBM_Feb_Workshop | IBM MPI Workshop, SciNet, Feb 2009]]&lt;br /&gt;
* [http://support.scinet.utoronto.ca/Manuals/UPC/compiler.pdf IBM UPC compiler], [http://support.scinet.utoronto.ca/Manuals/UPC/langref.pdf language], [http://support.scinet.utoronto.ca/Manuals/UPC/upcopt.pdf optimization], [http://support.scinet.utoronto.ca/Manuals/UPC/standlib.pdf library], [http://support.scinet.utoronto.ca/Manuals/UPC/upcusersguide.pdf user's guide], [http://support.scinet.utoronto.ca/Manuals/UPC/proguide.pdf programmer's guide]&lt;br /&gt;
&lt;br /&gt;
==GPU Computing==&lt;br /&gt;
&lt;br /&gt;
* [[Media:Westgrid_CUDA.pdf | Intro to GPU Computing Using CUDA]] (WestGrid Spring 2014 Seminar Series)&lt;br /&gt;
* 1.5 hour intro to CUDA, March 2013: [[Media:CUDA-Graphics-Intro-2013.pdf | Slides]] and [[Media:CUDA-Graphics-Intro-2013.tgz | Source Code]]&lt;br /&gt;
* [[CUDA_Minicourse_Fall_2012 | CITA/SciNet CUDA Minicourse, Fall 2012]]&lt;br /&gt;
* [[SciNet GPU Workshop July 2010]]&lt;br /&gt;
* Intro to GPGPU Programming: [http://support.scinet.utoronto.ca/CourseVideo/PPPcourse/Friday_Morning_GPGPU/Friday_Morning_GPGPU.mp4 Video]/ [[Media:Gpgpu.pdf | Slides]]&amp;lt;br /&amp;gt;(from 5 day parallel programming course at SciNet, May 2011)&lt;br /&gt;
* 1-day intro to GPGPU using CUDA Course (Aug 2011): [[Media:Intro-gpu.tgz | Source Code]], [[Media:IntroGPGPU-Aug2011.pdf | Slides]].&lt;br /&gt;
* [http://developer.nvidia.com/object/cuda_training.html  NVidia archived courses for GPGPU Programming]&lt;br /&gt;
* [http://www.pgroup.com/doc/pgiug.pdf PGI Compiler User's Guide]&lt;br /&gt;
* [http://www.pgroup.com/doc/pgiref.pdf PGI Compiler Reference Manual]&lt;br /&gt;
* [http://www.pgroup.com/doc/pgifortref.pdf PGI Fortran reference]&lt;br /&gt;
* [http://www.pgroup.com/doc/pgicudaforug.pdf PGI CUDA Fortran Programming Guide and Reference]&lt;br /&gt;
* [http://www.pgroup.com/doc/openACC_gs.pdf PGI OpenACC Getting Started Guide]&lt;br /&gt;
&lt;br /&gt;
==Performance Tuning==&lt;br /&gt;
* [[Performance and Profiling Course, April 2013]]&lt;br /&gt;
* [[Introduction To Performance]]&lt;br /&gt;
* Performance tools for [[Performance_And_Debugging_Tools:_GPC | GPC ]] and [[Performance_And_Debugging_Tools:_TCS | TCS ]]&lt;br /&gt;
* Dec 2010 SNUG TechTalk: [[Media:ProfillingTechTalk-Dec2010.pdf | Profiling Tools on GPC]]&lt;br /&gt;
* [http://cnx.org/content/col11136/latest/  High Performance Computing Book]&amp;lt;br /&amp;gt;Online version of an older O'Reilly book which covers the basics of (mostly serial) programming for performance.  Covers the most important issues today very clearly.&lt;br /&gt;
* [http://www.ece.cmu.edu/~franzf/papers/gttse07.pdf  How to Write Fast Numerical Code ]&amp;lt;br /&amp;gt;Good introduction to thinking about performance.&lt;br /&gt;
* [http://support.scinet.utoronto.ca/Manuals/JUMP-AIX-POWER6-AppsPerformanceTuning-wp032008.pdf Performance tuning]&lt;br /&gt;
* [[Media:Mpi-tuning-parameters.pdf‎ | MPI Tuning Parameters]] - SNUG TechTalk, Feb 2012&lt;br /&gt;
&lt;br /&gt;
==Debugging==&lt;br /&gt;
* [[Media:SS_Debug.pdf|Debugging with GDB and DDT, half-day session at the Ontario HPC Summerschool 2012 Central&amp;lt;br&amp;gt;Slides]], [[Media:SS_Debug.tgz|Code]].&lt;br /&gt;
* [[Media:Snugdebug.pdf|TechTalk: Debuggers &amp;amp; Parallel Debugging on SciNet - gdb, ddd, padb], SciNet User Group Meeting, Nov 2010]]&amp;lt;br/ &amp;gt; [http://support.scinet.utoronto.ca/CourseVideo/PPPcourse/Thursday_Morning_Debugging/Thursday_Morning_Debugging.mp4 Video]&lt;br /&gt;
* [http://www.allinea.com/downloads/userguide.pdf Allinea DDT (Distributed Debugging Tool) User Guide]&lt;br /&gt;
&lt;br /&gt;
==Math libraries (BLAS, LAPACK, FFT)==&lt;br /&gt;
* [[Media:MKLTechTalkMarch2012.pdf|Intel Math Kernel Library (MKL): An overview]] (TechTalk, March, 2012)&lt;br /&gt;
* [[Numerical Tools for Physical Scientists]], part 2 of SciNet's Scientific computing course, covers, random nubers, blas, lapack, fft, ...&lt;br /&gt;
* [[Media:FP_Consistency.pdf|Intel Compiler Floating Point Consistency]]&lt;br /&gt;
* [http://software.intel.com/sites/products/documentation/hpc/mkl/lin/index.htm Math Kernel Library (MKL)] &lt;br /&gt;
* [http://software.intel.com/sites/products/documentation/hpc/mkl/vsl/vslnotes.pdf Math Kernel Library's Vector Statistical Library]&lt;br /&gt;
* [http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor Math Kernel Library link line advisor]&amp;lt;br/&amp;gt;($MKLPATH &amp;amp;rarr; ${MKLPATH} in makefiles)&lt;br /&gt;
* [http://publib.boulder.ibm.com/epubs/pdf/am501405.pdf ESSL high performance math library V4] ([http://publib.boulder.ibm.com/epubs/pdf/am601305.pdf V3])&lt;br /&gt;
* [http://publib.boulder.ibm.com/epubs/pdf/am601305.pdf Parallel ESSL high performance math library V3.3]&lt;br /&gt;
* [http://hal.inria.fr/inria-00576469 Linear Algebra Libraries] by Claire Mouton. 2009 INRIA Technical Report on existing linear algebra libraries for C++ (also here: [http://arxiv.org/abs/1103.3020])&lt;br /&gt;
&lt;br /&gt;
==I/O==&lt;br /&gt;
&lt;br /&gt;
* [[Media:NetCDF.pdf|Introduction to NetCDF4 binary files with Python, C++ and R (TechTalk March 2014)]]&lt;br /&gt;
* [[Media:SCIENCEDATA.pdf‎|Sep 2012 SNUG TechTalk: Science=Data]]&lt;br /&gt;
* [[Data_Management|Data management]]&lt;br /&gt;
* Intro to Parallel I/O, SciNet, Oct 6th, 2010: &lt;br /&gt;
**[[Media:Parallel_io_course.pdf‎|Morning &amp;amp; MPI-IO Slides]]&lt;br /&gt;
**[[Media:Netcdfhdf5.pdf|NetCDF/HDF5 Slides]]&lt;br /&gt;
**[[Media:ParIO.tgz|Source Code]].  &lt;br /&gt;
* Half-day HPCS2012 Parallel I/O tutorial, covering MPI-IO, HDF5, NetCDF, based on the above:  [[Media:ParIO-HPCS2012.pdf|slides (pdf)]] and [[Media:ParIO-HPCS2012.tgz|source code]].&lt;br /&gt;
* [[Media:Snugio.pdf|Sept 2010 SNUG TechTalk: Parallel File System and IO]] &amp;lt;br/ &amp;gt;[http://support.scinet.utoronto.ca/CourseVideo/PPPcourse/Friday_Morning_IO/Friday_Morning_IO.mp4 Video]&lt;br /&gt;
* [[File System and I/O dos and don'ts]]&lt;br /&gt;
* [[Media:40TB.pdf|So you have 40TB of Data]] -- an overview of things to consider with large data sets.&lt;br /&gt;
* [[Media:Adios-techtalk-may2012.pdf|May 2012 SNUG TechTalk: ADIOS for Parallel IO slides]] and [[Media:Adios-techtalk-may2012-src.tgz|source code]]&lt;br /&gt;
* [[hdf5_table|Writting / Reading a table in hdf5]]&lt;br /&gt;
* [[NetCDF_table|Writting / Reading a table in NetCDF]]&lt;br /&gt;
&lt;br /&gt;
==Infiniband Networking==&lt;br /&gt;
* [[Media:Snug_techtalk_Infiniband.pdf | TechTalk on SciNet's Infiniband Network &amp;amp; MPI options ]] &lt;br /&gt;
&lt;br /&gt;
==Visualization==&lt;br /&gt;
* [[Using Paraview]]&lt;br /&gt;
* [[Media:Ttvnc.pdf|TechTalk on VNC (slides)]]&lt;br /&gt;
* [[Software_and_Libraries#anchor_viz|Visualization Software on the GPC]]&lt;br /&gt;
* [http://scienceillustrated.ca Science Illustrated:] Two-day symposium on Visualizing Science, Feb 2011&lt;br /&gt;
* [http://www.kmdi.utoronto.ca/story/2011/03/si-science-illustrated-symposium-success Videos of the talks given at Science Illustrated] (recorded by [http://www.kmdi.utoronto.ca KMDI] at [http://www.utoronto.ca UoT]):&lt;br /&gt;
** [http://itube.ischool.utoronto.ca/Panopto/Pages/Viewer/Default.aspx?id=94ff5cd5-be6e-4fc6-9be6-dd2222342bcd Opening remarks] by Paul Young&lt;br /&gt;
** [http://itube.ischool.utoronto.ca/Panopto/Pages/Viewer/Default.aspx?id=4255c34e-15e7-4b24-ba99-78f5c8fa4381 Information Visualization and the Myth of Information Overload] by Christopher Collins&lt;br /&gt;
** [http://itube.ischool.utoronto.ca/Panopto/Pages/Viewer/Default.aspx?id=adcf02bf-16cb-46cc-8cdc-1a65e9071d6b Beyond Basic Visualization] by Ramses Van Zon&lt;br /&gt;
** [http://itube.ischool.utoronto.ca/Panopto/Pages/Viewer/Default.aspx?id=47baf346-3599-4fa3-9b10-6a58faa6b33c Network Visualization &amp;amp;amp; Analysis] by Igor Jurisica&lt;br /&gt;
** [http://itube.ischool.utoronto.ca/Panopto/Pages/Viewer/Default.aspx?id=7e754a2e-7be5-476e-bb54-37def37bc07e Simulation and Visualization of Blood Flow] by David Steinman&lt;br /&gt;
** [http://itube.ischool.utoronto.ca/Panopto/Pages/Viewer/Default.aspx?id=d7622587-2c31-49c1-99d7-f0f16c078801 Scientific Visualizations: Does the Science Matter?] by Thomas Lucas&lt;br /&gt;
** [http://itube.ischool.utoronto.ca/Panopto/Pages/Viewer/Default.aspx?id=9963c637-6840-454f-a57f-9a1be6456616 How can visualization impact public perception of science?] Panelists: Jay Ingram, Peter Calamai, Reni Barlow, Hooley McLaughlin&lt;br /&gt;
** [http://itube.ischool.utoronto.ca/Panopto/Pages/Viewer/Default.aspx?id=88e50cff-db9b-4c71-b10d-781fec60a2c0 How Info Graphics are Created for the Mainstream Media] by Peter Calamai&lt;br /&gt;
** [http://itube.ischool.utoronto.ca/Panopto/Pages/Viewer/Default.aspx?id=3897e2a3-1fda-42be-ab78-edbab090fd9e Design Boot Camp] by Graham Huber&lt;br /&gt;
** [http://itube.ischool.utoronto.ca/Panopto/Pages/Viewer/Default.aspx?id=4c27ed76-7292-407e-83a6-814e1461eccd Visualization Large Datasets] by Jonathan Dursi&lt;br /&gt;
** [http://itube.ischool.utoronto.ca/Panopto/Pages/Viewer/Default.aspx?id=7d49c845-3937-44e2-a300-9b8ffe57a857 Visualizing Colliding Black Holes] by Herald Pfeiffer&lt;br /&gt;
** [http://itube.ischool.utoronto.ca/Panopto/Pages/Viewer/Default.aspx?id=7d7e4803-39cd-4c8e-a443-bc2a7b1b3c28 Closing remarks] by Mubdi Rahman&lt;br /&gt;
&lt;br /&gt;
==Applications==&lt;br /&gt;
{{:Knowledge Base: Applications}}&lt;br /&gt;
* See also [[User Codes]]&lt;br /&gt;
&lt;br /&gt;
=Manuals=&lt;br /&gt;
&lt;br /&gt;
==Intel compilers and libraries (GPC)==&lt;br /&gt;
* [http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/composerxe/compiler/cpp-lin/index.htm C &amp;amp; C++ compiler]&lt;br /&gt;
* [http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/composerxe/compiler/fortran-lin/index.htm Fortran compiler]&lt;br /&gt;
* [[Media:FP_Consistency.pdf|Intel Compiler Floating Point Consistency]]&lt;br /&gt;
* [[Media:Compiler_qrg12.pdf‎|Intel Compiler Optimization Guide]]&lt;br /&gt;
* [http://software.intel.com/sites/products/documentation/hpc/mkl/lin/index.htm Math Kernel Library (MKL)] &lt;br /&gt;
* [http://software.intel.com/sites/products/documentation/hpc/mkl/vsl/vslnotes.pdf Math Kernel Library's Vector Statistical Library]&lt;br /&gt;
* [http://software.intel.com/sites/products/documentation/hpc/mpi/linux/reference_manual.pdf Intel MPI library]&lt;br /&gt;
* [http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor Math Kernel Library link line advisor]&amp;lt;br/&amp;gt;($MKLPATH &amp;amp;rarr; ${MKLPATH} in makefiles)&lt;br /&gt;
&lt;br /&gt;
==IBM compilers and libraries (TCS/P7)==&lt;br /&gt;
* [http://support.scinet.utoronto.ca/Manuals/xlc-compiler.pdf C compiler], [http://support.scinet.utoronto.ca/Manuals/xlc-langref.pdf language], [http://support.scinet.utoronto.ca/Manuals/xlc-proguide.pdf optimization]&lt;br /&gt;
* [http://support.scinet.utoronto.ca/Manuals/xlC++-compiler.pdf C++ compiler], [http://support.scinet.utoronto.ca/Manuals/xlC++-langref.pdf language], [http://support.scinet.utoronto.ca/Manuals/xlC++-proguide.pdf optimization]&lt;br /&gt;
* [http://support.scinet.utoronto.ca/Manuals/xlf-compiler.pdf Fortran compiler] [http://support.scinet.utoronto.ca/Manuals/xlf-langref.pdf language], [http://support.scinet.utoronto.ca/Manuals/xlf-proguide.pdf optimization]&lt;br /&gt;
* [http://support.scinet.utoronto.ca/Manuals/UPC/compiler.pdf UPC compiler], [http://support.scinet.utoronto.ca/Manuals/UPC/langref.pdf language], [http://support.scinet.utoronto.ca/Manuals/UPC/upcopt.pdf optimization], [http://support.scinet.utoronto.ca/Manuals/UPC/standlib.pdf library], [http://support.scinet.utoronto.ca/Manuals/UPC/upcusersguide.pdf user's guide], [http://support.scinet.utoronto.ca/Manuals/UPC/proguide.pdf programmer's guide]&lt;br /&gt;
* [http://publib.boulder.ibm.com/epubs/pdf/am501405.pdf ESSL high performance math library V4] ([http://publib.boulder.ibm.com/epubs/pdf/am601305.pdf V3])&lt;br /&gt;
* [[Media:essl51.pdf|ESSL high performance math library V5.1 for Linux on Power]]&lt;br /&gt;
* [http://support.scinet.utoronto.ca/Manuals/JUMP-AIX-POWER6-AppsPerformanceTuning-wp032008.pdf Performance tuning]&lt;br /&gt;
* [http://support.scinet.utoronto.ca/Manuals/PE5.1-operationanduse.pdf Parallel environment]&lt;br /&gt;
* [http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp Cluster information center] (with error codes)&lt;br /&gt;
* [http://support.scinet.utoronto.ca/Manuals/LL-usingandadministering.pdf LoadLeveler: using &amp;amp; administering]&lt;br /&gt;
&lt;br /&gt;
==PGI compilers (ARC)==&lt;br /&gt;
* [http://www.pgroup.com/doc/pgiug.pdf Compiler User's Guide]&lt;br /&gt;
* [http://www.pgroup.com/doc/pgiref.pdf Compiler Reference Manual]&lt;br /&gt;
* [http://www.pgroup.com/doc/pgifortref.pdf Fortran reference]&lt;br /&gt;
* [http://www.pgroup.com/doc/pgicudaforug.pdf CUDA Fortran Programming Guide and Reference]&lt;br /&gt;
* [http://www.pgroup.com/doc/openACC_gs.pdf OpenACC Getting Started Guide]&amp;lt;br&amp;gt;(Note: $PGI/linux86-64/12.5/doc contains a newer version.)&lt;br /&gt;
&lt;br /&gt;
==Scheduler (Adaptive Computing/Cluster Resources)==&lt;br /&gt;
* [http://www.clusterresources.com/products/mwm/docs/index.shtml Moab workload manager]&lt;br /&gt;
* [http://www.clusterresources.com/products/mwm/docs/a.gcommandoverview.shtml Moab commands]&lt;br /&gt;
* [http://www.clusterresources.com/products/torque/docs/ Torque resource manager] &lt;br /&gt;
* [http://www.clusterresources.com/products/torque/docs/a.acommands.shtml Torque PBS commands]&lt;br /&gt;
&lt;br /&gt;
==DDT Debugger (Allinea)==&lt;br /&gt;
* [http://www.allinea.com/downloads/userguide.pdf Distributed Debugging Tool User Guide]&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=File:HadoopPart1examples.tgz&amp;diff=6955</id>
		<title>File:HadoopPart1examples.tgz</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=File:HadoopPart1examples.tgz&amp;diff=6955"/>
		<updated>2014-04-07T21:19:03Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: Source code examples for Hadoop part 1&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Source code examples for Hadoop part 1&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=File:Hadoop-PartI.pdf&amp;diff=6954</id>
		<title>File:Hadoop-PartI.pdf</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=File:Hadoop-PartI.pdf&amp;diff=6954"/>
		<updated>2014-04-07T21:18:05Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: Slides for Introduction to Hadoop for HPCers, Part I&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Slides for Introduction to Hadoop for HPCers, Part I&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Ssh_keys&amp;diff=6278</id>
		<title>Ssh keys</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Ssh_keys&amp;diff=6278"/>
		<updated>2013-07-16T17:14:59Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: /* SSH tunnel */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Ssh | SSH]] has an alternative to passwords to authenticate your login; you can generate a key file on a trusted machine and tell a remote machine to trust logins from a machine that presents that key.   This can be both convenient and secure, and may be necessary for some tasks (such as connecting directly to compute nodes to use [[Using_Paraview | some visualization packages]]).  Here we describe how to setup keys for logging into SciNet.&lt;br /&gt;
&lt;br /&gt;
==SSH Keys and SciNet==&lt;br /&gt;
&lt;br /&gt;
[[Ssh | SSH]] is a secure protocol for logging into or copying data to/from remote machines.  In addition to using passwords to [http://en.wikipedia.org/wiki/Authentication authenticate] users, one can use cryptographically secure keys to guarantee that a login request is coming from a trusted account on a remote machine, and automatically allow such requests.   Done properly, this is as secure as requiring a password, but can be more convenient, and is necessary for some operations.&lt;br /&gt;
&lt;br /&gt;
On this page, we will assume you are using Linux, Mac OS X, or a similar environment such as [http://www.cygwin.com/ Cygwin] under Windows.  If not, the steps will be the same, but how they are done (for instance, generating keys) may differ; look up the documentation for your ssh package for details.&lt;br /&gt;
&lt;br /&gt;
==Using SSH keys==&lt;br /&gt;
===How SSH keys work===&lt;br /&gt;
&lt;br /&gt;
SSH relies on [http://en.wikipedia.org/wiki/Public-key_cryptography public key cryptography] for its encryption.  These cryptosystems have a private key, which must be kept secret, and a public key, which may be disseminated freely.   In these systems, anyone may use the public key to encode a message; but only the owner of the private key can decode the message.  This can also be used to verify identities; if someone is claiming to be Alice, the owner of some private key, Bob can send Alice a message encoded with Alice's well-known public key.  If the person claiming to be Alice can then tell Bob what the message really was, then that person at the very least has access to Alice's private key.&lt;br /&gt;
&lt;br /&gt;
To use keys for authentication, we:&lt;br /&gt;
* Generate a key pair (Private and Public)&lt;br /&gt;
* Copy the public keys to remote sites we wish to be able to login to, and mark it as an authorized key for that system&lt;br /&gt;
* Ensure permissions are set properly&lt;br /&gt;
* Test.&lt;br /&gt;
&lt;br /&gt;
===Generating an SSH key pair===&lt;br /&gt;
&lt;br /&gt;
{|border=&amp;quot;1&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
Note: This describes creating ssh key pairs on '''your''' machine, not on SciNet.  On SciNet, you already have key pairs generated, sitting in &amp;lt;tt&amp;gt;${HOME}/.ssh/&amp;lt;/tt&amp;gt;, and modifying them is likely to cause problems.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The first stage is to create an SSH key pair.   On most systems, this is done using the command&lt;br /&gt;
&lt;br /&gt;
 ssh-keygen&lt;br /&gt;
&lt;br /&gt;
This will prompt you for two pieces of information: where to save the key, and a passphrase for the key.  The passphrase is like a password, but rather than letting you in to some particular account, it allows you to use the key you've generated to log into other systems.  &lt;br /&gt;
&lt;br /&gt;
There are a series of possible options to &amp;lt;tt&amp;gt;ssh-keygen&amp;lt;/tt&amp;gt; which allow increasingly cryptographically secure keys (by increasing the number of bits used in the key), or by choosing different encryption systems.  The defaults are fine, and we won't discuss other options here.&lt;br /&gt;
&lt;br /&gt;
The default location to save the private key is in &amp;lt;tt&amp;gt;${HOME}/.ssh/id_rsa&amp;lt;/tt&amp;gt; (for an RSA key); unless you have some specific reason for placing it elsewhere, use this option.  The public key will be &amp;lt;tt&amp;gt;id_rsa.pub&amp;lt;/tt&amp;gt; in the same directory.&lt;br /&gt;
&lt;br /&gt;
Your passphrase can be any string, and of any length.   It is best not to make it the same as any of your passwords.&lt;br /&gt;
&lt;br /&gt;
A sample session of generating a key would go like this:&lt;br /&gt;
&lt;br /&gt;
 $ ssh-keygen&lt;br /&gt;
 Generating public/private rsa key pair.&lt;br /&gt;
 Enter file in which to save the key (${HOME}/.ssh/id_rsa): &lt;br /&gt;
 Enter passphrase (empty for no passphrase): &lt;br /&gt;
 Enter same passphrase again: &lt;br /&gt;
 Your identification has been saved in ${HOME}/.ssh/id_rsa.&lt;br /&gt;
 Your public key has been saved in ${HOME}/.ssh/id_rsa.pub.&lt;br /&gt;
 The key fingerprint is:&lt;br /&gt;
 79:8e:36:6a:78:7d:cf:80:94:90:92:0e:74:0b:f1:b7 USERNAME@YOURMACHINE&lt;br /&gt;
&lt;br /&gt;
====Don't Use Passphraseless Keys!====&lt;br /&gt;
&lt;br /&gt;
If you do not specify a passphrase, you will have a completely &amp;quot;exposed&amp;quot; private key.  '''This is a terrible idea.'''   If you then use this key for anything it means that anyone who sits down at your desk, or anyone who borrows or steals your laptop, can login to anywhere you use that key (good guesses could come from just looking at your history) without needing any password, and could do anything they wanted with your account or data.  Don't use passphraseless keys.&lt;br /&gt;
&lt;br /&gt;
We should note that we do, in fact, have one necessary and reasonable exception here -- the keys used within SciNet itself.  The SciNet key used for within-scinet operations (you already have one in your account in &amp;lt;tt&amp;gt;~/.ssh/id_rsa&amp;lt;/tt&amp;gt;) is passphraseless, for two good reasons.  One is that, once you are on one SciNet machine (like the login node), you already have read/write access to all your data; all the nodes mount the same file systems.  So there is little to be gained in protecting the SciNet nodes from each other.   The second is practical; ssh is used to login to compute nodes to start your compute jobs.  You obviously can't be asked to type in a passphrase every time one of your jobs starts; you may not be at your computer at that moment.  So passphraseless keys are ok ''within'' a controlled environment; but don't use them for remote access.&lt;br /&gt;
&lt;br /&gt;
===Copying the Public Key to SciNet (and elsewhere)===&lt;br /&gt;
&lt;br /&gt;
Now that you have this SSH &amp;quot;identity&amp;quot;, you use the public (''not'' the private) key for access to remote machines.  The public key must be put as one line in the file &amp;lt;tt&amp;gt;/home/USERNAME/.ssh/authorized_keys&amp;lt;/tt&amp;gt;.  Do not delete the lines already there, or you may end up with strange problems using SciNet machines.&lt;br /&gt;
&lt;br /&gt;
You can copy your new public key to the SciNet systems&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
scp /home/LOCAL_USERNAME/.ssh/id_rsa.pub SCINET_USERNAME@login.scinet.utoronto.ca:newkey&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Then login to SciNet and copy&amp;amp;paste the contents from &amp;lt;tt&amp;gt;~/newkey&amp;lt;/tt&amp;gt; into &amp;lt;tt&amp;gt;~/.ssh/authorized_keys&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat ~/newkey &amp;gt;&amp;gt; ~/.ssh/authorized_keys&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===&amp;lt;tt&amp;gt;.ssh&amp;lt;/tt&amp;gt; Permissions===&lt;br /&gt;
&lt;br /&gt;
Note that &amp;lt;tt&amp;gt;SSH&amp;lt;/tt&amp;gt; is very fussy about file permissions; your &amp;lt;tt&amp;gt;~/.ssh&amp;lt;/tt&amp;gt; directory must only be accessible by you, and your various key files must not be writable (or in some cases, readable) by anyone else.  Sometimes users accidentally reset file permissions while editing these files, and problems happen.   If you look at the &amp;lt;tt&amp;gt;~/.ssh&amp;lt;/tt&amp;gt; directory itself, it should not be readable at all by anyone else:&lt;br /&gt;
&lt;br /&gt;
 ls -ld ~/.ssh&lt;br /&gt;
 drwx------ 2 USERNAME GROUPNAME 7 Aug  9 15:43 /home/USERNAME/.ssh&lt;br /&gt;
&lt;br /&gt;
and &amp;lt;tt&amp;gt;authorized_keys&amp;lt;/tt&amp;gt; must not be writable:&lt;br /&gt;
&lt;br /&gt;
 $ ls -l ~/.ssh/authorized_keys &lt;br /&gt;
 -rw-r--r-- 1 USERNAME GROUPNAME 1213 May 29  2009 /home/USERNAME/.ssh/authorized_keys&lt;br /&gt;
&lt;br /&gt;
===Testing Your Key===&lt;br /&gt;
&lt;br /&gt;
Now you should be able to login to the remote system (say, SciNet):&lt;br /&gt;
&lt;br /&gt;
 $ ssh USERNAME@login.scinet.utoronto.ca&lt;br /&gt;
 Enter passphrase for key '/home/USERNAME/.ssh/id_rsa': &lt;br /&gt;
 Last login: Tue Aug 17 11:24:48 2010 from HOMEMACHINE&lt;br /&gt;
 &lt;br /&gt;
 ===================================================&lt;br /&gt;
 &lt;br /&gt;
 This SciNet login node is to be used only as a&lt;br /&gt;
 gateway to the GPC and TCS.&lt;br /&gt;
 &lt;br /&gt;
 [...]&lt;br /&gt;
 scinet04-$&lt;br /&gt;
&lt;br /&gt;
If this doesn't work, you should be able to login using your password, and investigate the problem. For example, if during a login session you get an message similar to the one below, just follow the instruction and delete the offending key on line 3 (you can use vi to jump to that line with ESC plus : plus 3). That only means that you may have logged in from your home computer to SciNet in the past, and that key is obsolete.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh USERNAME@login.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@**@@@@@@@@@@@@@@@@@@@@@@@@@@@@@&lt;br /&gt;
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @&lt;br /&gt;
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@**@@@@@@@@@@@@@@@@@@@@@@@@@@@@@&lt;br /&gt;
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!&lt;br /&gt;
Someone could be eavesdropping on you right now (man-in-the-middle&lt;br /&gt;
attack)!&lt;br /&gt;
It is also possible that the RSA host key has just been changed.&lt;br /&gt;
The fingerprint for the RSA key sent by the remote host is&lt;br /&gt;
53:f9:60:71:a8:0b:5d:74:83:52:**fe:ea:1a:9e:cc:d3.&lt;br /&gt;
Please contact your system administrator.&lt;br /&gt;
Add correct host key in /home/&amp;lt;user&amp;gt;/.ssh/known_hosts to get rid of&lt;br /&gt;
this message.&lt;br /&gt;
Offending key in /home/&amp;lt;user&amp;gt;/.ssh/known_hosts:3&lt;br /&gt;
RSA host key for login.scinet.utoronto.ca &lt;br /&gt;
&amp;lt;http://login.scinet.utoronto.ca &amp;lt;http://login.scinet.utoronto.ca&amp;gt;&amp;gt; has&lt;br /&gt;
changed and you have requested&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* If you get the message below you may need to logout of your gnome session and log back in since the ssh-agent needs to be&lt;br /&gt;
restarted with the new passphrase ssh key.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh USERNAME@login.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Agent admitted failure to sign using the key.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===(Optional) Using &amp;lt;tt&amp;gt;ssh-agent&amp;lt;/tt&amp;gt; to Remember Your Passphrase===&lt;br /&gt;
&lt;br /&gt;
But now you've just replaced having to type a password for login with having to type a passphrase for your key; what have you gained?  &lt;br /&gt;
&lt;br /&gt;
It turns out that there's an automated way to manage ssh &amp;quot;identities&amp;quot;, using the &amp;lt;tt&amp;gt;ssh-agent&amp;lt;/tt&amp;gt; command, which should automatically be running on newer Linux or Mac&amp;amp;nbsp;OS&amp;amp;nbsp;X machines.   You can add keys to this agent for the duration of your login using the &amp;lt;tt&amp;gt;ssh-add&amp;lt;/tt&amp;gt; command:&lt;br /&gt;
&lt;br /&gt;
 $ ssh-add&lt;br /&gt;
 Enter passphrase for /home/USERNAME/.ssh/id_rsa: &lt;br /&gt;
 Identity added: /home/USERNAME/.ssh/id_rsa (/home/USERNAME/.ssh/id_rsa)&lt;br /&gt;
&lt;br /&gt;
and then logins will not require the passphrase, as &amp;lt;tt&amp;gt;ssh-agent&amp;lt;/tt&amp;gt; will provide access to the key.&lt;br /&gt;
&lt;br /&gt;
When you log out of your home computer, the ssh agent will close, and next time you log in, you will have to &amp;lt;tt&amp;gt;ssh-add&amp;lt;/tt&amp;gt; your key.  You can also set a timeout of (say) an hour by using &amp;lt;tt&amp;gt;ssh-add -t 3600&amp;lt;/tt&amp;gt;.  This minimizes the number of times you have to type your passphrase, while still maintaining some degree of key security.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Multiple ssh private keys ===&lt;br /&gt;
In quite a few situations its preferred to have ssh keys dedicated to each service, specific role or domain.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ssh-keygen -t rsa -f ~/.ssh/id_rsa.SciNet   -C &amp;quot;Key for SciNet&amp;quot;&lt;br /&gt;
ssh-keygen -t rsa -f ~/.ssh/id_rsa.SHARCNET -C &amp;quot;Key for SHARCNET&amp;quot;&lt;br /&gt;
ssh-keygen -t rsa -f ~/.ssh/id_rsa.DCS      -C &amp;quot;Key for Dept. Of Computer Science&amp;quot;&lt;br /&gt;
ssh-keygen -t rsa -f ~/.ssh/id_rsa.CITA     -C &amp;quot;Key for CITA&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use different file names for each key. Lets assume that there are 2 keys, ~/.ssh/id_rsa.SciNet and ~/.ssh/id_rsa.SHARCNET. The simple way of making sure each of the keys works all the time is to now create config file for ssh:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
touch ~/.ssh/config&lt;br /&gt;
chmod 600 ~/.ssh/config&lt;br /&gt;
echo &amp;quot;IdentityFile ~/.ssh/id_rsa.SciNet&amp;quot;   &amp;gt;&amp;gt; ~/.ssh/config&lt;br /&gt;
echo &amp;quot;IdentityFile ~/.ssh/id_rsa.SHARCNET&amp;quot; &amp;gt;&amp;gt; ~/.ssh/config&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This would make sure that both keys are always used whenever ssh makes a connection. However, ssh config lets you get down to a much finer level of control on keys and other per-connection setups. The recommendation is to use a key selection based on the Hostname. For example, a ~/.ssh/config that looks like this :&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Host SciNet&lt;br /&gt;
  Hostname login.scinet.utoronto.ca&lt;br /&gt;
  IdentityFile ~/.ssh/id_dsa.SciNet&lt;br /&gt;
  User pinto&lt;br /&gt;
&lt;br /&gt;
Host SHARCNET&lt;br /&gt;
  Hostname sharcnet.ca&lt;br /&gt;
  IdentityFile ~/.ssh/id_rsa.SHARCNET&lt;br /&gt;
  User jchong&lt;br /&gt;
  Port 44787&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And just login with the shortcut:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh SciNet&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= SSH tunnel =&lt;br /&gt;
A more obscure use of ssh is to generate a communication tunnel. As an example, assume you want to access a website running on a remotehost using your localhost, but there is a firewall between the 2 systems blocking every port, except incoming ssh.&lt;br /&gt;
&lt;br /&gt;
The basic syntax of the ssh command for such a purpose is: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ssh -f -N -L localport:localhost:remoteport user@remotehost&lt;br /&gt;
# -f puts ssh in background&lt;br /&gt;
# -N makes it not execute a remote command&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If the remote website broadcasts on the default port 80, you could do the following:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ssh -L 8080:localhost:remotehost:80 tunneluser@remotehost&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
... and point your local browser to http://localhost:8080&lt;br /&gt;
&lt;br /&gt;
If you don't want to remember the above sequence of flags all the time, you can add an entry to your ~/.ssh/config:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Host tunnel&lt;br /&gt;
    HostName remotehost&lt;br /&gt;
    IdentityFile ~/.ssh/id_rsa.tunnel&lt;br /&gt;
    LocalForward 8080 127.0.0.1:80&lt;br /&gt;
    User tunneluser&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To open the tunnel just issue the command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh -f -N tunnel&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=FAQ&amp;diff=6277</id>
		<title>FAQ</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=FAQ&amp;diff=6277"/>
		<updated>2013-07-16T17:07:34Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: /* How do we manage job priorities within our research group? */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__TOC__&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==The Basics==&lt;br /&gt;
===Whom do I contact for support?===&lt;br /&gt;
&lt;br /&gt;
Whom do I contact if I have problems or questions about how to use the SciNet systems?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
E-mail [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;]  &lt;br /&gt;
&lt;br /&gt;
In your email, please include the following information:&lt;br /&gt;
&lt;br /&gt;
* your username on SciNet&lt;br /&gt;
* the cluster that your question pertains to (GPC or TCS; SciNet is not a cluster!),&lt;br /&gt;
* any relevant error messages&lt;br /&gt;
* the commands you typed before the errors occured&lt;br /&gt;
* the path to your code (if applicable)&lt;br /&gt;
* the location of the job scripts (if applicable)&lt;br /&gt;
* the directory from which it was submitted (if applicable)&lt;br /&gt;
* a description of what it is supposed to do (if applicable)&lt;br /&gt;
* if your problem is about connecting to SciNet, the type of computer you are connecting from.&lt;br /&gt;
&lt;br /&gt;
Note that your password should never, never, never be to sent to us, even if your question is about your account.&lt;br /&gt;
&lt;br /&gt;
Try to avoid sending email only to specific individuals at SciNet. Your chances of a quick reply increase significantly if you email our team!&lt;br /&gt;
&lt;br /&gt;
===What does ''code scaling'' mean?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please see [[Introduction_To_Performance#Parallel_Speedup|A Performance Primer]]&lt;br /&gt;
&lt;br /&gt;
===What do you mean by ''throughput''?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please see [[Introduction_To_Performance#Throughput|A Performance Primer]].&lt;br /&gt;
&lt;br /&gt;
Here is a simple example:&lt;br /&gt;
&lt;br /&gt;
Suppose you need to do 10 computations.  Say each of these runs for&lt;br /&gt;
1 day on 8 cores, but they take &amp;quot;only&amp;quot; 18 hours on 16 cores.  What is the&lt;br /&gt;
fastest way to get all 10 computations done - as 8-core jobs or as&lt;br /&gt;
16-core jobs?  Let us assume you have 2 nodes at your disposal.&lt;br /&gt;
The answer, after some simple arithmetic, is that running your 10&lt;br /&gt;
jobs as 8-core jobs will take 5 days, whereas if you ran them&lt;br /&gt;
as 16-core jobs it would take 7.5 days.  Take your own conclusions...&lt;br /&gt;
&lt;br /&gt;
===I changed my .bashrc/.bash_profile and now nothing works===&lt;br /&gt;
&lt;br /&gt;
The default startup scripts provided by SciNet, and guidelines for them, can be found [[Important_.bashrc_guidelines|here]].  Certain things - like sourcing &amp;lt;tt&amp;gt;/etc/profile&amp;lt;/tt&amp;gt;&lt;br /&gt;
and &amp;lt;tt&amp;gt;/etc/bashrc&amp;lt;/tt&amp;gt; are ''required'' for various SciNet routines to work!   &lt;br /&gt;
&lt;br /&gt;
If the situation is so bad that you cannot even log in, please send email [mailto:support@scinet.utoronto.ca support].&lt;br /&gt;
&lt;br /&gt;
===Could I have my login shell changed to (t)csh?===&lt;br /&gt;
&lt;br /&gt;
The login shell used on our systems is bash. While the tcsh is available on the GPC and the TCS, we do not support it as the default login shell at present.  So &amp;quot;chsh&amp;quot; will not work, but you can always run tcsh interactively. Also, csh scripts will be executed correctly provided that they have the correct &amp;quot;shebang&amp;quot; &amp;lt;tt&amp;gt;#!/bin/tcsh&amp;lt;/tt&amp;gt; at the top.&lt;br /&gt;
&lt;br /&gt;
===How can I run Matlab / IDL / Gaussian / my favourite commercial software at SciNet?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Because SciNet serves such a disparate group of user communities, there is just no way we can buy licenses for everyone's commercial package.   The only commercial software we have purchased is that which in principle can benefit everyone -- fast compilers and math libraries (Intel's on GPC, and IBM's on TCS).&lt;br /&gt;
&lt;br /&gt;
If your research group requires a commercial package that you already have or are willing to buy licenses for, contact us at [mailto:support@scinet.utoronto.ca support@scinet] and we can work together to find out if it is feasible to implement the packages licensing arrangement on the SciNet clusters, and if so, what is the the best way to do it.&lt;br /&gt;
&lt;br /&gt;
Note that it is important that you contact us before installing commercially licensed software on SciNet machines, even if you have a way to do it in your own directory without requiring sysadmin intervention.   It puts us in a very awkward position if someone is found to be running unlicensed or invalidly licensed software on our systems, so we need to be aware of what is being installed where.&lt;br /&gt;
&lt;br /&gt;
===Do you have a recommended ssh program that will allow scinet access from Windows machines?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
The [[Ssh#SSH_for_Windows_Users | SSH for Windows users]] programs we recommend are:&lt;br /&gt;
&lt;br /&gt;
* [http://mobaxterm.mobatek.net/en/ MobaXterm] is a tabbed ssh client with some Cygwin tools, including ssh and X, all wrapped up into one executable.&lt;br /&gt;
* [http://www.chiark.greenend.org.uk/~sgtatham/putty/ PuTTY]  - this is a terminal for windows that connects via ssh.  It is a quick install and will get you up and running quickly.&amp;lt;br&amp;gt;To set up your passphrase protected ssh key with putty, see [http://the.earth.li/~sgtatham/putty/0.61/htmldoc/Chapter8.html#pubkey here].&lt;br /&gt;
* [http://www.cygwin.com/ CygWin] - this is a whole linux-like environment for windows, which also includes an X window server so that you can display remote windows on your desktop.  Make sure you include the openssh and X window system in the installation for full functionality.  This is recommended if you will be doing a lot of work on Linux machines, as it makes a very similar environment available on your computer.&amp;lt;br&amp;gt;To set up your ssh keys, following the Linux instruction on the [[Ssh keys]] page.&lt;br /&gt;
&amp;lt;br&amp;gt;To set up your ssh keys, following the Linux instruction on the [[Ssh keys]] page.&lt;br /&gt;
&lt;br /&gt;
===My ssh key does not work! WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
[[Ssh_keys#Testing_Your_Key | Testing Your Key]]&lt;br /&gt;
&lt;br /&gt;
* If this doesn't work, you should be able to login using your password, and investigate the problem. For example, if during a login session you get an message similar to the one below, just follow the instruction and delete the offending key on line 3 (you can use vi to jump to that line with ESC plus : plus 3). That only means that you may have logged in from your home computer to SciNet in the past, and that key is obsolete.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh USERNAME@login.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@**@@@@@@@@@@@@@@@@@@@@@@@@@@@@@&lt;br /&gt;
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @&lt;br /&gt;
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@**@@@@@@@@@@@@@@@@@@@@@@@@@@@@@&lt;br /&gt;
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!&lt;br /&gt;
Someone could be eavesdropping on you right now (man-in-the-middle&lt;br /&gt;
attack)!&lt;br /&gt;
It is also possible that the RSA host key has just been changed.&lt;br /&gt;
The fingerprint for the RSA key sent by the remote host is&lt;br /&gt;
53:f9:60:71:a8:0b:5d:74:83:52:**fe:ea:1a:9e:cc:d3.&lt;br /&gt;
Please contact your system administrator.&lt;br /&gt;
Add correct host key in /home/&amp;lt;user&amp;gt;/.ssh/known_hosts to get rid of&lt;br /&gt;
this message.&lt;br /&gt;
Offending key in /home/&amp;lt;user&amp;gt;/.ssh/known_hosts:3&lt;br /&gt;
RSA host key for login.scinet.utoronto.ca &lt;br /&gt;
&amp;lt;http://login.scinet.utoronto.ca &amp;lt;http://login.scinet.utoronto.ca&amp;gt;&amp;gt; has&lt;br /&gt;
changed and you have requested&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* If you get the message below you may need to logout of your gnome session and log back in since the ssh-agent needs to be&lt;br /&gt;
restarted with the new passphrase ssh key.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh USERNAME@login.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
Agent admitted failure to sign using the key.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Can't forward X:  &amp;quot;Warning: No xauth data; using fake authentication data&amp;quot;, or &amp;quot;X11 connection rejected because of wrong authentication.&amp;quot;===&lt;br /&gt;
&lt;br /&gt;
I used to be able to forward X11 windows from SciNet to my home machine, but now I'm getting these messages; what's wrong?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
This very likely means that ssh/xauth can't update your ${HOME}/.Xauthority file. &lt;br /&gt;
&lt;br /&gt;
The simplest pssible reason for this is that you've filled your 10GB /home quota and so can't write anything to your home directory.   Use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load extras&lt;br /&gt;
$ diskUsage&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
&lt;br /&gt;
to check to see how close you are to your disk usage on ${HOME}.&lt;br /&gt;
&lt;br /&gt;
Alternately, this could mean your .Xauthority file has become broken/corrupted/confused some how, in which case you can delete that file, and when you next log in you'll get a similar warning message involving creating .Xauthority, but things should work.&lt;br /&gt;
&lt;br /&gt;
===How come I can not login to TCS?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
A SciNet account doesn't automatically entitle you to TCS access. At a minimum, TCS jobs need to run on at least 32 cores (64 preferred because of Simultaneous Multi Threading - [[TCS_Quickstart#Node_configuration|SMT]] - on these nodes) and need the large memory (4GB/core) and bandwidth on the system. Essentially you need to be able to explain why the work can't be done on the GPC.&lt;br /&gt;
&lt;br /&gt;
===How can I reset the password for my Compute Canada account?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
You can reset your password for your Compute Canada account here:&lt;br /&gt;
&lt;br /&gt;
https://ccdb.computecanada.org/security/forgot&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===How can I change or reset the password for my SciNet account?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
To reset your password at SciNet please e-mail [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;]&lt;br /&gt;
&lt;br /&gt;
If you know your old password and want to change it, that can be done here:&lt;br /&gt;
&lt;br /&gt;
https://portal.scinet.utoronto.ca/&lt;br /&gt;
&lt;br /&gt;
===Why am I getting the error &amp;quot;Permission denied (publickey,gssapi-with-mic,password)&amp;quot;?===&lt;br /&gt;
&lt;br /&gt;
This error can pop up in a variety of situations: when trying to log in, or when after a job has finished, when the error and output files fail to be copied (there are other possible reasons for this failure as well -- see [[FAQ#My_GPC_job_died.2C_telling_me_.60Copy_Stageout_Files_Failed.27|My GPC job died, telling me:Copy Stageout Files Failed]]).&lt;br /&gt;
In most cases, the &amp;quot;Permission denioed&amp;quot; error is caused by incorrect permission of the (hidden) .ssh directory. Ssh is used for logging in as well as for the copying of the standard error and output files after a job. &lt;br /&gt;
&lt;br /&gt;
For security reasons, &lt;br /&gt;
the directory .ssh should only be writable and readable to you, but yours &lt;br /&gt;
has read permission for everybody, and thus it fails.  You can change &lt;br /&gt;
this by&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   chmod 700 ~/.ssh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
And to be sure, also do&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   chmod 600 ~/.ssh/id_rsa ~/.ssh/id_rsa.pub ~/authorized_keys&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===ERROR:102: Tcl command execution failed? when loading modules ===&lt;br /&gt;
Modules sometimes require other modules to be loaded first.&lt;br /&gt;
Module will let you know if you didn’t.&lt;br /&gt;
For example:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module purge&lt;br /&gt;
$ module load python&lt;br /&gt;
python/2.6.2(11):ERROR:151: Module ’python/2.6.2’ depends on one of the module(s) ’gcc/4.4.0’&lt;br /&gt;
python/2.6.2(11):ERROR:102: Tcl command execution failed: prereq gcc/4.4.0&lt;br /&gt;
$ gpc-f103n084-$ module load gcc python&lt;br /&gt;
$&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Compiling your Code==&lt;br /&gt;
&lt;br /&gt;
===How can I get g77 to work?===&lt;br /&gt;
&lt;br /&gt;
The fortran 77 compilers on the GPC are ifort and gfortran. We have dropped support for g77.  This has been a conscious decision. g77 (and the associated library libg2c) were completely replaced six years ago (Apr 2005) by the gcc 4.x branch, and haven't undergone any updates at all, even bug fixes, for over five years.  &lt;br /&gt;
If we would install g77 and libg2c, we would have to deal with the inevitable confusion caused when users accidentally link against the old, broken, wrong versions of the gcc libraries instead of the correct current versions.   &lt;br /&gt;
&lt;br /&gt;
If your code for some reason specifically requires five-plus-year-old libraries,  availability, compatibility, and unfixed-known-bug problems are only going to get worse for you over time, and this might be as good an opportunity as any to address those issues. &lt;br /&gt;
&lt;br /&gt;
''A note on porting to gfortran or ifort:''&lt;br /&gt;
&lt;br /&gt;
While gfortran and ifort are rather compatible with g77, one &lt;br /&gt;
important difference is that by default, gfortran does not preserve &lt;br /&gt;
local variables between function calls, while g77 does.   Preserved &lt;br /&gt;
local variables are for instance often used in implementations of quasi-random number &lt;br /&gt;
generators.  Proper fortran requires to declare such variables as SAVE &lt;br /&gt;
but not all old code does this.&lt;br /&gt;
Luckily, you can change gfortran's default behavior with the flag &lt;br /&gt;
&amp;lt;tt&amp;gt;-fno-automatic&amp;lt;/tt&amp;gt;.   For ifort, the corresponding flag is &amp;lt;tt&amp;gt;-noautomatic&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
===Where is libg2c.so?===&lt;br /&gt;
&lt;br /&gt;
libg2c.so is part of the g77 compiler, for which we dropped support. See [[#How can I get g77 to work on the GPC?]] for our reasons.&lt;br /&gt;
&lt;br /&gt;
===Autoparallelization does not work!===&lt;br /&gt;
&lt;br /&gt;
I compiled my code with the &amp;lt;tt&amp;gt;-qsmp=omp,auto&amp;lt;/tt&amp;gt; option, and then I specified that it should be run with 64 threads - with &lt;br /&gt;
 export OMP_NUM_THREADS=64&lt;br /&gt;
&lt;br /&gt;
However, when I check the load using &amp;lt;tt&amp;gt;llq1 -n&amp;lt;/tt&amp;gt;, it shows a load on the node of 1.37.  Why?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Using the autoparallelization will only get you so far.  In fact, it usually does not do too much.  What is helpful is to run the compiler with the &amp;lt;tt&amp;gt;-qreport&amp;lt;/tt&amp;gt; option, and then read the output listing carefully to see where the compiler thought it could parallelize, where it could not, and the reasons for this.  Then you can go back to your code and carefully try to address each of the issues brought up by the compiler.&lt;br /&gt;
We ''emphasize'' that this is just a rough first guide, and that the compilers are still not magical!   For more sophisticated approaches to parallelizing your code, email us at [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;]  to set up an appointment with one&lt;br /&gt;
of our technical analysts.&lt;br /&gt;
&lt;br /&gt;
===How do I link against the Intel Math Kernel Library?===&lt;br /&gt;
&lt;br /&gt;
If you need to link in the Intel Math Kernel Library (MKL) libraries, you are well advised to use the Intel(R) Math Kernel Library Link Line Advisor: http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/ for help in devising the list of libraries to link with your code.&lt;br /&gt;
&lt;br /&gt;
'''''Note that this give the link line for the command line. When using this in Makefiles, replace $MKLPATH by ${MKLPATH}.'''''&lt;br /&gt;
&lt;br /&gt;
'''''Note too that, unless the integer arguments you will be passing to the MKL libraries are actually 64-bit integers, rather than the normal int or INTEGER types, you want to specify 32-bit integers (lp64) .'''''&lt;br /&gt;
&lt;br /&gt;
===Can the compilers on the login nodes be disabled to prevent accidentally using them?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
You can accomplish this by modifying your .bashrc to not load the compiler modules. See [[Important .bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
===&amp;quot;relocation truncated to fit: R_X86_64_PC32&amp;quot;: Huh?===&lt;br /&gt;
&lt;br /&gt;
What does this mean, and why can't I compile this code?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Welcome to the joys of the x86 architecture!  You're probably having trouble building arrays larger than 2GB, individually or together.   Generally, you have to try to use the medium or large x86 `memory model'.   For the intel compilers, this is specified with the compile options&lt;br /&gt;
&lt;br /&gt;
  -mcmodel=medium -shared-intel&lt;br /&gt;
&lt;br /&gt;
===&amp;quot;feupdateenv is not implemented and will always fail&amp;quot;===&lt;br /&gt;
&lt;br /&gt;
How do I get rid of this and what does it mean?&lt;br /&gt;
 &lt;br /&gt;
'''Answer:'''&lt;br /&gt;
First note that, as ominous as it sounds, this is really just a warning, and has to do with the intel math library. You can ignore it (unless you really are trying to manually change the exception handlers for floating point exceptions such as divide by zero), or take the safe road and get rid off it by linking with the intel math functions library:&amp;lt;pre&amp;gt;-limf&amp;lt;/pre&amp;gt;See also [[#How do I link against the Intel Math Kernel Library?]]&lt;br /&gt;
&lt;br /&gt;
===Cannot find rdmacm library when compiling on GPC===&lt;br /&gt;
&lt;br /&gt;
I get the following error building my code on GPC: &amp;quot;&amp;lt;tt&amp;gt;ld: cannot find -lrdmacm&amp;lt;/tt&amp;gt;&amp;quot;.  Where can I find this library?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
This library is part of the MPI libraries; if your compiler is having problems picking it up, it probably means you are mistakenly trying to compile on the login nodes (scinet01..scinet04).  The login nodes aren't part of the GPC; they are for logging into the data centre only.  From there you must go to the GPC or TCS development nodes to do any real work.&lt;br /&gt;
&lt;br /&gt;
=== Why do I get this error when I try to compile: &amp;quot;icpc: error #10001: could not find directory in which /usr/bin/g++41 resides&amp;quot; ?===&lt;br /&gt;
&lt;br /&gt;
You are trying to compile on the login nodes.   As described in the wiki ( https://support.scinet.utoronto.ca/wiki/index.php/GPC_Quickstart#Login ), or in the users guide you would have received with your account,   Scinet supports two main clusters, with very different architectures.  Compilation must be done on the development nodes of the appropriate cluster (in this case, gpc01-04).   Thus, log into gpc01, gpc02, gpc03, or gpc04, and compile from there.&lt;br /&gt;
&lt;br /&gt;
==Testing your Code==&lt;br /&gt;
&lt;br /&gt;
=== Can I run a something for a short time on the development nodes? ===&lt;br /&gt;
&lt;br /&gt;
I am in the process of playing around with the mpi calls in my code to get it to work. I do a lot of tests and each of them takes a couple of seconds only.  Can I do this on the development nodes?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Yes, as long as it's very brief (a few minutes).   People use the development nodes&lt;br /&gt;
for their work, and you don't want to bog it down for people, and testing a real&lt;br /&gt;
code can chew up a lot more resources than compiling, etc.    The procedures differ&lt;br /&gt;
depending on what machine you're using.&lt;br /&gt;
&lt;br /&gt;
==== TCS ====&lt;br /&gt;
&lt;br /&gt;
On the TCS you can run small MPI jobs on the tcs02 node, which is meant for &lt;br /&gt;
development use.  But even for this test run on one node, you'll need a host file --&lt;br /&gt;
a list of hosts (in this case, all tcs-f11n06, which is the `real' name of tcs02)&lt;br /&gt;
that the job will run on.  Create a file called `hostfile' containing the following:&lt;br /&gt;
&lt;br /&gt;
 tcs-f11n06&lt;br /&gt;
 tcs-f11n06&lt;br /&gt;
 tcs-f11n06&lt;br /&gt;
 tcs-f11n06&lt;br /&gt;
&lt;br /&gt;
for a 4-task run.  When you invoke &amp;quot;poe&amp;quot; or &amp;quot;mpirun&amp;quot;, there are runtime&lt;br /&gt;
arguments that you specify pointing to this file.  You can also specify it&lt;br /&gt;
in an environment variable MP_HOSTFILE, so, if your file is in your /scratch directory, say &lt;br /&gt;
${SCRATCH}/hostfile, then you would do&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
 export MP_HOSTFILE=${SCRATCH}/hostfile&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
in your shell.  You will also need to create a &amp;lt;tt&amp;gt;.rhosts&amp;lt;/tt&amp;gt; file in your &lt;br /&gt;
home director, again listing &amp;lt;tt&amp;gt;tcs-f11n06&amp;lt;/tt&amp;gt; so that &amp;lt;tt&amp;gt;poe&amp;lt;/tt&amp;gt;&lt;br /&gt;
can start jobs.   After that you can simply run your program.  You can use&lt;br /&gt;
mpiexec:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
 mpiexec -n 4 my_test_program&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
adding &amp;lt;tt&amp;gt; -hostfile /path/to/my/hostfile&amp;lt;/tt&amp;gt; if you did not set the environment&lt;br /&gt;
variable above.  Alternatively, you can run it with the poe command (do a &amp;quot;man poe&amp;quot; for details), or even by&lt;br /&gt;
just directly running it.  In this case the number of MPI processes will by default&lt;br /&gt;
be the number of entries in your hostfile.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== GPC ====&lt;br /&gt;
&lt;br /&gt;
On the GPC one can run short test jobs on the GPC [[GPC_Quickstart#Compile.2FDevel_Nodes | development nodes ]]&amp;lt;tt&amp;gt;gpc01&amp;lt;/tt&amp;gt;..&amp;lt;tt&amp;gt;gpc04&amp;lt;/tt&amp;gt;;&lt;br /&gt;
if they are single-node jobs (which they should be) they don't need a hostfile.  Even better, though, is to request an [[ Moab#Interactive | interactive ]] job and run the tests either in regular batch queue or using a short high availability [[ Moab#debug | debug ]] queue that is reserved for this purpose.&lt;br /&gt;
&lt;br /&gt;
=== How do I run a longer (but still shorter than an hour) test job quickly ? ===&lt;br /&gt;
&lt;br /&gt;
'''Answer'''&lt;br /&gt;
&lt;br /&gt;
On the GPC there is a high turnover short queue called [[ Moab#debug | debug ]] that is designed for&lt;br /&gt;
this purpose.  You can use it by adding &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#PBS -q debug&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
to your submission script.&lt;br /&gt;
&lt;br /&gt;
==Running your jobs==&lt;br /&gt;
&lt;br /&gt;
===My job can't write to /home===&lt;br /&gt;
&lt;br /&gt;
My code works fine when I test on the development nodes, but when I submit a job, or even run interactively in the development queue on GPC, it fails.  What's wrong?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
As [[Data_Management#Home_Disk_Space | discussed]] [https://support.scinet.utoronto.ca/wiki/images/5/54/SciNet_Tutorial.pdf elsewhere], &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; is mounted read-only on the compute nodes; you can only write to &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; from the login nodes and devel nodes.  (The [[GPC_Quickstart#128Glargemem | largemem nodes]] on GPC, in this respect, are more like devel nodes than compute nodes).   In general, to run jobs you can read from &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; but you'll have to write to &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt; (or, if you were allocated space through the LRAC/NRAC process, on &amp;lt;tt&amp;gt;/project&amp;lt;/tt&amp;gt;).  More information on SciNet filesytems can be found on our [[Data_Management | Data Management]] page.&lt;br /&gt;
&lt;br /&gt;
===Error Submitting My Job: qsub: Bad UID for job execution MSG=ruserok failed ===&lt;br /&gt;
&lt;br /&gt;
I write up a submission script as in the examples, but when I attempt to submit the job, I get the above error.  What's wrong?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
This error will occur if you try to submit a job from the login nodes.   The login nodes are the gateway to all of SciNet's systems (GPC, TCS, P7, ARC), which have different hardware and queueing systems.  To submit a job, you must log into a development node for the particular cluster you are submitting to and submit from there.&lt;br /&gt;
&lt;br /&gt;
===OpenMP on the TCS===&lt;br /&gt;
&lt;br /&gt;
How do I run an OpenMP job on the TCS?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please look at the [[TCS_Quickstart#Submission_Script_for_an_OpenMP_Job | TCS Quickstart ]] page.&lt;br /&gt;
&lt;br /&gt;
===Can I can use hybrid codes consisting of MPI and openMP on the GPC?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Yes. Please look at the [[GPC_Quickstart#Hybrid_MPI.2FOpenMP_jobs | GPC Quickstart ]] page.&lt;br /&gt;
&lt;br /&gt;
===How do I run serial jobs on GPC?===&lt;br /&gt;
&lt;br /&gt;
'''Answer''':&lt;br /&gt;
&lt;br /&gt;
So it should be said first that SciNet is a parallel computing resource, &lt;br /&gt;
and our priority will always be parallel jobs.   Having said that, if &lt;br /&gt;
you can make efficient use of the resources using serial jobs and get &lt;br /&gt;
good science done, that's good too, and we're happy to help you.&lt;br /&gt;
&lt;br /&gt;
The GPC nodes each have 8 processing cores, and making efficient use of these &lt;br /&gt;
nodes means using all eight cores.  As a result, we'd like to have the &lt;br /&gt;
users take up whole nodes (eg, run multiples of 8 jobs) at a time.  &lt;br /&gt;
&lt;br /&gt;
It depends on the nature of your job what the best strategy is. Several approaches are presented on the [[User_Serial|serial run wiki page]].&lt;br /&gt;
&lt;br /&gt;
===Why can't I request only a single cpu for my job on GPC?===&lt;br /&gt;
&lt;br /&gt;
'''Answer''':&lt;br /&gt;
&lt;br /&gt;
On GPC, computers are allocated by the node - that is, in chunks of 8 processors.   If you want to run a job that requires only one processor, you need to bundle the jobs into groups of 8, so as to not be wasting the other 7 for 48 hours. See [[User_Serial|serial run wiki page]].&lt;br /&gt;
&lt;br /&gt;
===How do I run serial jobs on TCS?===&lt;br /&gt;
&lt;br /&gt;
'''Answer''': You don't.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===But in the queue I found a user who is running jobs on GPC, each of which is using only one processor, so why can't I?===&lt;br /&gt;
&lt;br /&gt;
'''Answer''':&lt;br /&gt;
&lt;br /&gt;
The pradat* and atlaspt* jobs, amongst others, are jobs of the ATLAS high energy physics project. That they are reported as single cpu jobs is an artifact of the moab scheduler. They are in fact being automatically bundled into 8-job bundles but have to run individually to be compatible with their international grid-based systems.&lt;br /&gt;
&lt;br /&gt;
===How do I use the ramdisk on GPC?===&lt;br /&gt;
&lt;br /&gt;
To use the ramdisk, create and read to / write from files in /dev/shm/.. just as one would to (eg) ${SCRATCH}. Only the amount of RAM needed to store the files will be taken up by the temporary file system; thus if you have 8 serial jobs each requiring 1 GB of RAM, and 1GB is taken up by various OS services, you would still have approximately 7GB available to use as ramdisk on a 16GB node. However, if you were to write 8 GB of data to the RAM disk, this would exceed available memory and your job would likely crash.&lt;br /&gt;
&lt;br /&gt;
It is very important to delete your files from ram disk at the end of your job. If you do not do this, the next user to use that node will have less RAM available than they might expect, and this might kill their jobs.&lt;br /&gt;
&lt;br /&gt;
''More details on how to setup your script to use the ramdisk can be found on the [[User_Ramdisk|Ramdisk wiki page]].''&lt;br /&gt;
&lt;br /&gt;
===How can I automatically resubmit a job?===&lt;br /&gt;
&lt;br /&gt;
Commonly you may have a job that you know will take longer to run than what is &lt;br /&gt;
permissible in the queue.  As long as your program contains [[Checkpoints|checkpoint]] or &lt;br /&gt;
restart capability, you can have one job automatically submit the next. In&lt;br /&gt;
the following example it is assumed that the program finishes before &lt;br /&gt;
the 48 hour limit and then resubmits itself by logging into one&lt;br /&gt;
of the development nodes.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque example submission script for auto resubmission&lt;br /&gt;
# SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=48:00:00&lt;br /&gt;
#PBS -N my_job&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# YOUR CODE HERE&lt;br /&gt;
./run_my_code&lt;br /&gt;
&lt;br /&gt;
# RESUBMIT 10 TIMES HERE&lt;br /&gt;
num=$NUM&lt;br /&gt;
if [ $num -lt 10 ]; then&lt;br /&gt;
      num=$(($num+1))&lt;br /&gt;
      ssh gpc01 &amp;quot;cd $PBS_O_WORKDIR; qsub ./script_name.sh -v NUM=$num&amp;quot;;&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub script_name.sh -v&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can alternatively use [[ Moab#Job_Dependencies | Job dependencies ]] through the queuing system which will not start one job until another job has completed.&lt;br /&gt;
&lt;br /&gt;
If your job can't be made to automatically stop before the 48 hour queue window, but it does write out checkpoints, you can use the timeout command to stop the program while you still have time to resubmit; for instance&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
    timeout 2850m ./run_my_code argument1 argument2&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
will run the program for 47.5 hours (2850 minutes), and then send it SIGTERM to exit the program.&lt;br /&gt;
&lt;br /&gt;
===How can I pass in arguments to my submission script?===&lt;br /&gt;
&lt;br /&gt;
If you wish to make your scripts more generic you can use qsub's ability &lt;br /&gt;
to pass in environment variables to pass in arguments to your script.&lt;br /&gt;
The following example shows a case where an input and an output &lt;br /&gt;
file are passed in on the qsub line. Multiple variables can be &lt;br /&gt;
passed in using the qsub &amp;quot;-v&amp;quot; option and comma delimited. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque example of passing in arguments&lt;br /&gt;
# SciNet GPC&lt;br /&gt;
# &lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=48:00:00&lt;br /&gt;
#PBS -N my_job&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# YOUR CODE HERE&lt;br /&gt;
./run_my_code -f $INFILE -o $OUTFILE&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub script_name.sh -v INFILE=input.txt,OUTFILE=outfile.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== How can I run a job longer than 48 hours? ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
The SciNet queues have a queue limit of 48 hours.   This is pretty typical for systems of its size in Canada and elsewhere, and larger systems commonly have shorter limits.   The limits are there to ensure that every user gets a fair share of the system (so that no one user ties up lots of nodes for a long time), and for safety (so that if one memory board in one node fails in the middle of a very long job, you haven't lost a months' worth of work).&lt;br /&gt;
&lt;br /&gt;
Since many of us have simulations that require more than that much time, most widely-used scientific applications have &amp;quot;checkpoint-restart&amp;quot; functionality, where every so often the complete state of the calculation is stored as a checkpoint file, and one can restart a simulation from one of these.   In fact, these restart files tend to be quite useful for a number of purposes.&lt;br /&gt;
&lt;br /&gt;
If your job will take longer, you will have to submit your job in multiple parts, restarting from a checkpoint each time.  In this way, one can run a simulation much longer than the queue limit.  In fact, one can even write job scripts which automatically re-submit themselves until a run is completed, using [[FAQ#How_can_I_automatically_resubmit_a_job.3F | automatic resubmission. ]]&lt;br /&gt;
&lt;br /&gt;
=== Why did showstart say it would take 3 hours for my job to start before, and now it says my job will start in 10 hours? ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please look at the [[FAQ#How_do_priorities_work.2Fwhy_did_that_job_jump_ahead_of_mine_in_the_queue.3F | How do priorities work/why did that job jump ahead of mine in the queue? ]] page.&lt;br /&gt;
&lt;br /&gt;
===How do priorities work/why did that job jump ahead of mine in the queue?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
The [[Moab | queueing system]] used on SciNet machines is a [http://en.wikipedia.org/wiki/Priority_queue Priority Queue].  Jobs enter the queue at the back of the queue, and slowly make their way to the front as those ahead of them are run; but a job that enters the queue with a higher priority can `cut in line'.&lt;br /&gt;
&lt;br /&gt;
The main factor which determines priority is whether or not the user (or their PI) has an [http://wiki.scinethpc.ca/wiki/index.php/Application_Process LRAC or NRAC allocation].  These are competitively allocated grants of computer time; there is a call for proposals towards the end of every calendar year.    Users with an allocation have high priorities in an attempt to make sure that they can use the amount of computer time the committees granted them.   Their priority decreases as they approach their allotted usage over the current window of time; by the time that they have exhausted that allotted usage, their priority is the same as users with no allocation (unallocated, or `default' users).    Unallocated users have a fixed, low, priority.&lt;br /&gt;
&lt;br /&gt;
This priority system is called `fairshare'; the scheduler attempts to make sure everyone has their fair share of the machines, where the share that's fair has been determined by the allocation committee.    The fairshare window is a rolling window of two weeks; that is, any time you have a job in the queue, the fairshare calculation of its priority is given by how much of your allocation of the machine has been used in the last 14 days.&lt;br /&gt;
&lt;br /&gt;
A particular allocation might have some fraction of GPC - say 4% of the machine (if the PI had been allocated 10 million CPU hours on GPC). The allocations have labels; (called `Resource Allocation Proposal Identifiers', or RAPIs) they look something like&lt;br /&gt;
&lt;br /&gt;
  abc-123-ab&lt;br /&gt;
&lt;br /&gt;
where abc-123 is the PIs CCRI, and the suffix specifies which of the allocations granted to the PI is to be used.  These can be specified on a job-by-job basis.  On GPC, one adds the line&lt;br /&gt;
 #PBS -A RAPI&lt;br /&gt;
to your script; on TCS, one uses&lt;br /&gt;
 # @ account_no = RAPI&lt;br /&gt;
If the allocation to charge isn't specified, a default is used; each user has such a default, which can be changed at the same portal where one changes one's password, &lt;br /&gt;
&lt;br /&gt;
 https://portal.scinet.utoronto.ca/&lt;br /&gt;
&lt;br /&gt;
A jobs priority is determined primarily by the fairshare priority of the allocation it is being charged to; the previous 14 days worth of use under that allocation is calculated and compared to the allocated fraction (here, 5%) of the machine over that window (here, 14 days).   The fairshare priority is a decreasing function of the allocation left; if there is no allocation left (eg, jobs running under that allocation have already used 379,038 CPU hours in the past 14 days), the priority is the same as that of a user with no granted allocation.   (This last part has been the topic of some debate; as the machine gets more utilized, it will probably be the case that we allow RAC users who have greatly overused their quota to have their priorities to drop below that of unallocated users, to give the unallocated users some chance to run on our increasingly crowded system; this would have no undue effect on our allocated users as they still would be able to use the amount of resources they had been allocated by the committees.)   Note that all jobs charging the same allocation get the same fairshare priority.&lt;br /&gt;
&lt;br /&gt;
There are other factors that go into calculating priority, but fairshare is the most significant.   Other factors include&lt;br /&gt;
* amount of time waiting in queue (measured in units of the requested runtime).   A job that requests 1 hour in the queue and has been waiting 2 days will get a bump in its priority larger than a job that requests 2 days and has been waiting the same time.&lt;br /&gt;
* User adjustment of priorities ( See below ).&lt;br /&gt;
&lt;br /&gt;
The major effect of these subdominant terms is to shuffle the order of jobs running under the same allocation.&lt;br /&gt;
&lt;br /&gt;
===How do we manage job priorities within our research group?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Obviously, managing shared resources within a large group - whether it &lt;br /&gt;
is conference funding or CPU time - takes some doing.   &lt;br /&gt;
&lt;br /&gt;
It's important to note that the fairshare periods are intentionally kept &lt;br /&gt;
quite short - just two weeks long. So, for example, let us say that in your resource &lt;br /&gt;
allocation you have about 10% of the machine.   Then for someone to use &lt;br /&gt;
up the whole two week amount of time in 2 days, they'd have to use 70% &lt;br /&gt;
of the machine in those two days - which is unlikely to happen by &lt;br /&gt;
accident.  If that does happen,  &lt;br /&gt;
those using the same allocation as the person who used 70% of the &lt;br /&gt;
machine over the two days will suffer by having much lower priority for &lt;br /&gt;
their jobs, but only for the next 12 days - and even then, if there are &lt;br /&gt;
idle cpus they'll still be able to compute.&lt;br /&gt;
&lt;br /&gt;
There will be online tools for seeing how the allocation is being used, &lt;br /&gt;
and those people who are in charge in your group will be able to use &lt;br /&gt;
that information to manage the users, telling them to dial it down or &lt;br /&gt;
up.   We know that managing a large research group is hard, and we want &lt;br /&gt;
to make sure we provide you the information you need to do your job &lt;br /&gt;
effectively.&lt;br /&gt;
&lt;br /&gt;
One way for users within a group to manage their priorities within the group&lt;br /&gt;
is with [[Moab#Adjusting_Job_Priority | user-adjusted priorities]]; this is&lt;br /&gt;
described in more detail on the [[Moab | Scheduling System]] page.&lt;br /&gt;
&lt;br /&gt;
=== How do I charge jobs to my NRAC/LRAC allocation? ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please see the [[Moab#Accounting|accounting section of Moab page]].&lt;br /&gt;
&lt;br /&gt;
=== How does one check the amount of used CPU-hours in a project, and how does one get statistics for each user in the project? ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
This information is available on the scinet portal,https://portal.scinet.utoronto.ca, See also [[SciNet Usage Reports]].&lt;br /&gt;
&lt;br /&gt;
=== How does the Infiniband Upgrade affect my 2012 NRAC allocation ?===&lt;br /&gt;
&lt;br /&gt;
The  NRAC allocations for the current (2012) year that were based on ethernet and infiniband will carry over, however the allocation will be on the full GPC, not just the subsection.  So if you were allocated 500 hours on Infiniband your fairshare allocation will still be 500 hours, just 500 out or 30,000, instead of 500 out of 7,000.  If you received two allocations, one on gigE and one on IB, they will simply be combined. This should benefit all users as the desegregation of the GPC provides a greater pool of nodes increasing the probability of your job to run.&lt;br /&gt;
&lt;br /&gt;
==Monitoring jobs in the queue==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Why hasn't my job started?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Use the moab command &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
checkjob -v jobid&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and the last couple of lines should explain why a job hasn't started.  &lt;br /&gt;
&lt;br /&gt;
Please see [[Moab| Job Scheduling System (Moab) ]] for more detailed information&lt;br /&gt;
&lt;br /&gt;
===How do I figure out when my job will run?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please see [[Moab#Available_Resources| Job Scheduling System (Moab) ]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- ===My GPC job is Held, and checkjob says &amp;quot;Batch:PolicyViolation&amp;quot; ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
When this happens, you'll see your job stuck in a BatchHold state.  &lt;br /&gt;
This happens because the job you've submitted breaks one of the rules of the queues, and is being held until you modify it or kill it and re-submit a conforming job.  The most common problems are:&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===I submit my GPC job, and I get an email saying it was rejected===&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
This happens because the job you've submitted breaks one of the rules of the queues and is rejected. An email&lt;br /&gt;
is sent with the JOBID, JOBNAME, and the reason it was rejected.  The following is an example where a job&lt;br /&gt;
requests more than 48 hours and was rejected.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PBS Job Id: 3462493.gpc-sched&lt;br /&gt;
Job Name:   STDIN&lt;br /&gt;
job deleted&lt;br /&gt;
Job deleted at request of root@gpc-sched&lt;br /&gt;
MOAB_INFO:  job was rejected - job violates class configuration 'wclimit too high for class 'batch_ib' (345600 &amp;gt; 172800)'&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Jobs on the TCS or GPC may only run for 48 hours at a time; this restriction greatly increases responsiveness of the queue and queue throughput for all our users.  If your computation requires longer than that, as many do, you will have to [[ Checkpoints | checkpoint ]] your job and restart it after each 48-hour queue window.   You can manually re-submit jobs, or if you can have your job cleanly exit before the 48 hour window, there are ways to [[ FAQ#How_can_I_automatically_resubmit_a_job.3F | automatically resubmit jobs ]].&lt;br /&gt;
&lt;br /&gt;
Other rejections return a more cryptic error saying &amp;quot;job violates class configuration&amp;quot; such as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PBS Job Id: 3462409.gpc-sched&lt;br /&gt;
Job Name:   STDIN&lt;br /&gt;
job deleted&lt;br /&gt;
Job deleted at request of root@gpc-sched&lt;br /&gt;
MOAB_INFO:  job was rejected - job violates class configuration 'user required by class 'batch''&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The most common problems that result in this error are:&lt;br /&gt;
&lt;br /&gt;
* '''Incorrect number of processors per node''': Jobs on the GPC are scheduled per-node not per-core and since each node has 8 processor cores (ppn=8) the smallest job allowed is one node with 8 cores (nodes=1:ppn=8).  For serial jobs users must bundle or batch them together in groups of 8. See [[ FAQ#How_do_I_run_serial_jobs_on_GPC.3F | How do I run serial jobs on GPC? ]]&lt;br /&gt;
* '''No number of nodes specified''': Jobs submitted to the main queue must request a specific number of nodes, either in the submission script (with a line like &amp;lt;tt&amp;gt;#PBS -l nodes=2:ppn=8&amp;lt;/tt&amp;gt;) or on the command line (eg, &amp;lt;tt&amp;gt;qsub -l nodes=2:ppn=8,walltime=5:00:00 script.pbs&amp;lt;/tt&amp;gt;).  Note that for the debug queue, you can get away without specifying a number of nodes and a default of one will be assigned; for both technical and policy reasons, we do not enforce such a default for the main (&amp;quot;batch&amp;quot;) queue.&lt;br /&gt;
* '''There is a 15 minute walltime minimum''' on all queues except debug and if you set your walltime less than this, it will be rejected.&lt;br /&gt;
&lt;br /&gt;
===How can I monitor my running jobs on TCS?===&lt;br /&gt;
&lt;br /&gt;
How can I monitor the load of TCS jobs?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
You can get more information with the command &lt;br /&gt;
 /xcat/tools/tcs-scripts/LL/jobState.sh&lt;br /&gt;
which I alias as:&lt;br /&gt;
 alias llq1='/xcat/tools/tcs-scripts/LL/jobState.sh'&lt;br /&gt;
If you run &amp;quot;llq1 -n&amp;quot; you will see a listing of jobs together with a lot of information, including the load.&lt;br /&gt;
&lt;br /&gt;
==Errors in running jobs==&lt;br /&gt;
&lt;br /&gt;
===On GPC, `Job cannot be executed'===&lt;br /&gt;
&lt;br /&gt;
I get error messages like this trying to run on GPC:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PBS Job Id: 30414.gpc-sched&lt;br /&gt;
Job Name:   namd&lt;br /&gt;
Exec host:  gpc-f120n011/7+gpc-f120n011/6+gpc-f120n011/5+gpc-f120n011/4+gpc-f120n011/3+gpc-f120n011/2+gpc-f120n011/1+gpc-f120n011/0&lt;br /&gt;
Aborted by PBS Server &lt;br /&gt;
Job cannot be executed&lt;br /&gt;
See Administrator for help&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
PBS Job Id: 30414.gpc-sched&lt;br /&gt;
Job Name:   namd&lt;br /&gt;
Exec host:  gpc-f120n011/7+gpc-f120n011/6+gpc-f120n011/5+gpc-f120n011/4+gpc-f120n011/3+gpc-f120n011/2+gpc-f120n011/1+gpc-f120n011/0&lt;br /&gt;
An error has occurred processing your job, see below.&lt;br /&gt;
request to copy stageout files failed on node 'gpc-f120n011/7+gpc-f120n011/6+gpc-f120n011/5+gpc-f120n011/4+gpc-f120n011/3+gpc-f120n011/2+gpc-f120n011/1+gpc-f120n011/0' for job 30414.gpc-sched&lt;br /&gt;
&lt;br /&gt;
Unable to copy file 30414.gpc-sched.OU to USER@gpc-f101n084.scinet.local:/scratch/G/GROUP/USER/projects/sim-performance-test/runtime/l/namd/8/namd.o30414&lt;br /&gt;
*** error from copy&lt;br /&gt;
30414.gpc-sched.OU: No such file or directory&lt;br /&gt;
*** end error output&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Try doing the following:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mkdir ${SCRATCH}/.pbs_spool&lt;br /&gt;
ln -s ${SCRATCH}/.pbs_spool ~/.pbs_spool&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This is how all new accounts are setup on SciNet.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; on GPC for compute jobs is mounted as a read-only file system.   &lt;br /&gt;
PBS by default tries to spool its output  files to &amp;lt;tt&amp;gt;${HOME}/.pbs_spool&amp;lt;/tt&amp;gt;&lt;br /&gt;
which fails as it tries to write to a read-only file  &lt;br /&gt;
system.    New accounts at SciNet  get around this by having ${HOME}/.pbs_spool  &lt;br /&gt;
point to somewhere appropriate on &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt;, but if you've deleted that link&lt;br /&gt;
or directory, or had an old account, you will see errors like the above.&lt;br /&gt;
&lt;br /&gt;
'''On Feb 24, the input/output mechanism has been reconfigured to use a local ramdisk as the temporary location, which means that .pbs_spool is no longer needed and this error should not occur anymore.'''&lt;br /&gt;
&lt;br /&gt;
=== I couldn't find the  .o output file in the .pbs_spool directory as I used to ===&lt;br /&gt;
&lt;br /&gt;
On Feb 24 2011, the temporary location of standard input and output files was moved from the shared file system ${SCRATCH}/.pbs_spool to the&lt;br /&gt;
node-local directory /var/spool/torque/spool (which resides in ram). The final location after a job has finished is unchanged,&lt;br /&gt;
but to check the output/error of running jobs, users will now have to ssh into the (first) node assigned to the job and look in&lt;br /&gt;
/var/spool/torque/spool.&lt;br /&gt;
&lt;br /&gt;
This alleviates access contention to the temporary directory, especially for those users that are running a lot of jobs, and  reduces the burden on the file system in general.&lt;br /&gt;
&lt;br /&gt;
Note that it is good practice to redirect output to a file rather than to count on the scheduler to do this for you.&lt;br /&gt;
&lt;br /&gt;
=== My GPC job died, telling me `Copy Stageout Files Failed' ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
When a job runs on GPC, the script's standard output and error are redirected to &lt;br /&gt;
&amp;lt;tt&amp;gt;$PBS_JOBID.gpc-sched.OU&amp;lt;/tt&amp;gt; and &amp;lt;tt&amp;gt;$PBS_JOBID.gpc-sched.ER&amp;lt;/tt&amp;gt; in&lt;br /&gt;
/var/spool/torque/spool on the (first) node on which your job is running.  At the end of the job, those .OU and .ER files are copied to where the batch script tells them to be copied, by default &amp;lt;tt&amp;gt;$PBS_JOBNAME.o$PBS_JOBID&amp;lt;/tt&amp;gt; and&amp;lt;tt&amp;gt;$PBS_JOBNAME.e$PBS_JOBID&amp;lt;/tt&amp;gt;.   (You can set those filenames to be something clearer with the -e and -o options in your PBS script.)&lt;br /&gt;
&lt;br /&gt;
When you get errors like this:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
An error has occurred processing your job, see below.&lt;br /&gt;
request to copy stageout files failed on node&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
it means that the copying back process has failed in some way.  There could be a few reasons for this. The first thing to '''make sure that your .bashrc does not produce any output''', as the output-stageout is performed by bash and further output can cause this to fail.&lt;br /&gt;
But it also could have just been a random filesystem error, or it  could be that your job failed spectacularly enough to shortcircuit the normal job-termination process and those files just never got copied.&lt;br /&gt;
&lt;br /&gt;
Write to [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;] if your input/output files got lost, as we will probably be able to retrieve them for you (please supply at least the jobid, and any other information that may be relevant). &lt;br /&gt;
&lt;br /&gt;
Mind you that it is good practice to redirect output to a file rather than depending on the job scheduler to do this for you.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
&lt;br /&gt;
===Another transport will be used instead===&lt;br /&gt;
&lt;br /&gt;
I get error messages like the following when running on the GPC at the start of the run, although the job seems to proceed OK.   Is this a problem?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
[[45588,1],0]: A high-performance Open MPI point-to-point messaging module&lt;br /&gt;
was unable to find any relevant network interfaces:&lt;br /&gt;
&lt;br /&gt;
Module: OpenFabrics (openib)&lt;br /&gt;
  Host: gpc-f101n005&lt;br /&gt;
&lt;br /&gt;
Another transport will be used instead, although this may result in&lt;br /&gt;
lower performance.&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Everything's fine.   The two MPI libraries scinet provides work for both the InifiniBand and the Gigabit Ethernet interconnects, and will always try to use the fastest interconnect available.   In this case, you ran on normal gigabit GPC nodes with no infiniband; but the MPI libraries have no way of knowing this, and try the infiniband first anyway.  This is just a harmless `failover' message; it tried to use the infiniband, which doesn't exist on this node, then fell back on using Gigabit ethernet (`another transport').&lt;br /&gt;
&lt;br /&gt;
With OpenMPI, this can be avoided by not looking for infiniband; eg, by using the option&lt;br /&gt;
&lt;br /&gt;
--mca btl ^openib&lt;br /&gt;
&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===IB Memory Errors, eg &amp;lt;tt&amp;gt; reg_mr Cannot allocate memory &amp;lt;/tt&amp;gt;===&lt;br /&gt;
&lt;br /&gt;
Infiniband requires more memory than ethernet; it can use RDMA (remote direct memory access) transport for which it sets aside registered memory to transfer data.&lt;br /&gt;
&lt;br /&gt;
In our current network configuration, it requires a _lot_ more memory, particularly as you go to larger process counts; unfortunately, that means you can't get around the &amp;quot;I need more memory&amp;quot; problem the usual way, by running on more nodes.   Machines with different memory or &lt;br /&gt;
network configurations may exhibit this problem at higher or lower MPI &lt;br /&gt;
task counts.&lt;br /&gt;
&lt;br /&gt;
Right now, the best workaround is to reduce the number and size of OpenIB queues, using XRC: with the OpenMPI, add the following options to your mpirun command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-mca btl_openib_receive_queues X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32 -mca btl_openib_max_send_size 12288&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
With Intel MPI, you should be able to do&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load intelmpi/4.0.3.008&lt;br /&gt;
mpirun -genv I_MPI_FABRICS=shm:ofa  -genv I_MPI_OFA_USE_XRC=1 -genv I_MPI_OFA_DYNAMIC_QPS=1 -genv I_MPI_DEBUG=5 -np XX ./mycode&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
to the same end.  &lt;br /&gt;
&lt;br /&gt;
For more information see [[GPC MPI Versions]].&lt;br /&gt;
&lt;br /&gt;
===My compute job fails, saying &amp;lt;tt&amp;gt;libpng12.so.0: cannot open shared object file&amp;lt;/tt&amp;gt; or &amp;lt;tt&amp;gt;libjpeg.so.62: cannot open shared object file&amp;lt;/tt&amp;gt;===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
To maximize the amount of memory available for compute jobs, the compute nodes have a less complete system image than the development nodes.   In particular, since interactive graphics libraries like matplotlib and gnuplot are usually used interactively, the libraries for their use are included in the devel nodes' image but not the compute nodes.&lt;br /&gt;
&lt;br /&gt;
Many of these extra libraries are, however, available in the &amp;quot;extras&amp;quot; module.   So adding a &amp;quot;module load extras&amp;quot; to your job submission  script - or, for overkill, to your .bashrc - should enable these scripts to run on the compute nodes.&lt;br /&gt;
&lt;br /&gt;
==Data on SciNet disks==&lt;br /&gt;
&lt;br /&gt;
===How do I find out my disk usage?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
The standard unix/linux utilities for finding the amount of disk space used by a directory are very slow, and notoriously inefficient on the GPFS filesystems that we run on the SciNet systems.  There are utilities that very quickly report your disk usage:&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;'''/scinet/gpc/bin/diskUsage'''&amp;lt;/tt&amp;gt; command, available on the login nodes, datamovers and the GPC devel nodes, provides information in a number of ways on the home, scratch, and project file systems. For instance, how much disk space is being used by yourself and your group (with the -a option), or how much your usage has changed over a certain period (&amp;quot;delta information&amp;quot;) or you may generate plots of your usage over time.&lt;br /&gt;
This information is only updated hourly!&lt;br /&gt;
&lt;br /&gt;
More information about these filesystems is available at the [[Data_Management | Data_Management]].&lt;br /&gt;
&lt;br /&gt;
===How do I transfer data to/from SciNet?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
All incoming connections to SciNet go through relatively low-speed connections to the &amp;lt;tt&amp;gt;login.scinet&amp;lt;/tt&amp;gt; gateways, so using scp to copy files the same way you ssh in is not an effective way to move lots of data.  Better tools are described in our page on [[Data_Management#Data_Transfer | Data Transfer]].&lt;br /&gt;
&lt;br /&gt;
===My group works with data files of size 1-2 GB.  Is this too large to  transfer by scp to login.scinet.utoronto.ca ?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Generally, occasion transfers of data less than 10GB is perfectly acceptible to so through the login nodes. See [[Data_Management#Data_Transfer | Data Transfer]].&lt;br /&gt;
&lt;br /&gt;
===How can I check if I have files in /scratch that are scheduled for automatic deletion?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please see [[Storage_Quickstart#Scratch_Disk_Purging_Policy | Storage At SciNet]]&lt;br /&gt;
&lt;br /&gt;
===How to allow my supervisor to manage files for me using ACL-based commands?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please see [[Data_Management#File.2FOwnership_Management_.28ACL.29 | File/Ownership Management]]&lt;br /&gt;
&lt;br /&gt;
===Can we buy extra storage space on SciNet?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
Yes, please see [[Data_Management#Buying_storage_space_on_GPFS_or_HPSS | Buying storage space on GPFS or HPSS ]] for more details.&lt;br /&gt;
&lt;br /&gt;
==Keep 'em Coming!==&lt;br /&gt;
&lt;br /&gt;
===Next question, please===&lt;br /&gt;
&lt;br /&gt;
Send your question to [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;];  we'll answer it asap!&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=FAQ&amp;diff=6276</id>
		<title>FAQ</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=FAQ&amp;diff=6276"/>
		<updated>2013-07-16T17:04:42Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: /* When will the 2011 NRAC disk space allocation be ready? */ - time to delete this.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__TOC__&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==The Basics==&lt;br /&gt;
===Whom do I contact for support?===&lt;br /&gt;
&lt;br /&gt;
Whom do I contact if I have problems or questions about how to use the SciNet systems?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
E-mail [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;]  &lt;br /&gt;
&lt;br /&gt;
In your email, please include the following information:&lt;br /&gt;
&lt;br /&gt;
* your username on SciNet&lt;br /&gt;
* the cluster that your question pertains to (GPC or TCS; SciNet is not a cluster!),&lt;br /&gt;
* any relevant error messages&lt;br /&gt;
* the commands you typed before the errors occured&lt;br /&gt;
* the path to your code (if applicable)&lt;br /&gt;
* the location of the job scripts (if applicable)&lt;br /&gt;
* the directory from which it was submitted (if applicable)&lt;br /&gt;
* a description of what it is supposed to do (if applicable)&lt;br /&gt;
* if your problem is about connecting to SciNet, the type of computer you are connecting from.&lt;br /&gt;
&lt;br /&gt;
Note that your password should never, never, never be to sent to us, even if your question is about your account.&lt;br /&gt;
&lt;br /&gt;
Try to avoid sending email only to specific individuals at SciNet. Your chances of a quick reply increase significantly if you email our team!&lt;br /&gt;
&lt;br /&gt;
===What does ''code scaling'' mean?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please see [[Introduction_To_Performance#Parallel_Speedup|A Performance Primer]]&lt;br /&gt;
&lt;br /&gt;
===What do you mean by ''throughput''?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please see [[Introduction_To_Performance#Throughput|A Performance Primer]].&lt;br /&gt;
&lt;br /&gt;
Here is a simple example:&lt;br /&gt;
&lt;br /&gt;
Suppose you need to do 10 computations.  Say each of these runs for&lt;br /&gt;
1 day on 8 cores, but they take &amp;quot;only&amp;quot; 18 hours on 16 cores.  What is the&lt;br /&gt;
fastest way to get all 10 computations done - as 8-core jobs or as&lt;br /&gt;
16-core jobs?  Let us assume you have 2 nodes at your disposal.&lt;br /&gt;
The answer, after some simple arithmetic, is that running your 10&lt;br /&gt;
jobs as 8-core jobs will take 5 days, whereas if you ran them&lt;br /&gt;
as 16-core jobs it would take 7.5 days.  Take your own conclusions...&lt;br /&gt;
&lt;br /&gt;
===I changed my .bashrc/.bash_profile and now nothing works===&lt;br /&gt;
&lt;br /&gt;
The default startup scripts provided by SciNet, and guidelines for them, can be found [[Important_.bashrc_guidelines|here]].  Certain things - like sourcing &amp;lt;tt&amp;gt;/etc/profile&amp;lt;/tt&amp;gt;&lt;br /&gt;
and &amp;lt;tt&amp;gt;/etc/bashrc&amp;lt;/tt&amp;gt; are ''required'' for various SciNet routines to work!   &lt;br /&gt;
&lt;br /&gt;
If the situation is so bad that you cannot even log in, please send email [mailto:support@scinet.utoronto.ca support].&lt;br /&gt;
&lt;br /&gt;
===Could I have my login shell changed to (t)csh?===&lt;br /&gt;
&lt;br /&gt;
The login shell used on our systems is bash. While the tcsh is available on the GPC and the TCS, we do not support it as the default login shell at present.  So &amp;quot;chsh&amp;quot; will not work, but you can always run tcsh interactively. Also, csh scripts will be executed correctly provided that they have the correct &amp;quot;shebang&amp;quot; &amp;lt;tt&amp;gt;#!/bin/tcsh&amp;lt;/tt&amp;gt; at the top.&lt;br /&gt;
&lt;br /&gt;
===How can I run Matlab / IDL / Gaussian / my favourite commercial software at SciNet?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Because SciNet serves such a disparate group of user communities, there is just no way we can buy licenses for everyone's commercial package.   The only commercial software we have purchased is that which in principle can benefit everyone -- fast compilers and math libraries (Intel's on GPC, and IBM's on TCS).&lt;br /&gt;
&lt;br /&gt;
If your research group requires a commercial package that you already have or are willing to buy licenses for, contact us at [mailto:support@scinet.utoronto.ca support@scinet] and we can work together to find out if it is feasible to implement the packages licensing arrangement on the SciNet clusters, and if so, what is the the best way to do it.&lt;br /&gt;
&lt;br /&gt;
Note that it is important that you contact us before installing commercially licensed software on SciNet machines, even if you have a way to do it in your own directory without requiring sysadmin intervention.   It puts us in a very awkward position if someone is found to be running unlicensed or invalidly licensed software on our systems, so we need to be aware of what is being installed where.&lt;br /&gt;
&lt;br /&gt;
===Do you have a recommended ssh program that will allow scinet access from Windows machines?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
The [[Ssh#SSH_for_Windows_Users | SSH for Windows users]] programs we recommend are:&lt;br /&gt;
&lt;br /&gt;
* [http://mobaxterm.mobatek.net/en/ MobaXterm] is a tabbed ssh client with some Cygwin tools, including ssh and X, all wrapped up into one executable.&lt;br /&gt;
* [http://www.chiark.greenend.org.uk/~sgtatham/putty/ PuTTY]  - this is a terminal for windows that connects via ssh.  It is a quick install and will get you up and running quickly.&amp;lt;br&amp;gt;To set up your passphrase protected ssh key with putty, see [http://the.earth.li/~sgtatham/putty/0.61/htmldoc/Chapter8.html#pubkey here].&lt;br /&gt;
* [http://www.cygwin.com/ CygWin] - this is a whole linux-like environment for windows, which also includes an X window server so that you can display remote windows on your desktop.  Make sure you include the openssh and X window system in the installation for full functionality.  This is recommended if you will be doing a lot of work on Linux machines, as it makes a very similar environment available on your computer.&amp;lt;br&amp;gt;To set up your ssh keys, following the Linux instruction on the [[Ssh keys]] page.&lt;br /&gt;
&amp;lt;br&amp;gt;To set up your ssh keys, following the Linux instruction on the [[Ssh keys]] page.&lt;br /&gt;
&lt;br /&gt;
===My ssh key does not work! WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
[[Ssh_keys#Testing_Your_Key | Testing Your Key]]&lt;br /&gt;
&lt;br /&gt;
* If this doesn't work, you should be able to login using your password, and investigate the problem. For example, if during a login session you get an message similar to the one below, just follow the instruction and delete the offending key on line 3 (you can use vi to jump to that line with ESC plus : plus 3). That only means that you may have logged in from your home computer to SciNet in the past, and that key is obsolete.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh USERNAME@login.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@**@@@@@@@@@@@@@@@@@@@@@@@@@@@@@&lt;br /&gt;
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @&lt;br /&gt;
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@**@@@@@@@@@@@@@@@@@@@@@@@@@@@@@&lt;br /&gt;
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!&lt;br /&gt;
Someone could be eavesdropping on you right now (man-in-the-middle&lt;br /&gt;
attack)!&lt;br /&gt;
It is also possible that the RSA host key has just been changed.&lt;br /&gt;
The fingerprint for the RSA key sent by the remote host is&lt;br /&gt;
53:f9:60:71:a8:0b:5d:74:83:52:**fe:ea:1a:9e:cc:d3.&lt;br /&gt;
Please contact your system administrator.&lt;br /&gt;
Add correct host key in /home/&amp;lt;user&amp;gt;/.ssh/known_hosts to get rid of&lt;br /&gt;
this message.&lt;br /&gt;
Offending key in /home/&amp;lt;user&amp;gt;/.ssh/known_hosts:3&lt;br /&gt;
RSA host key for login.scinet.utoronto.ca &lt;br /&gt;
&amp;lt;http://login.scinet.utoronto.ca &amp;lt;http://login.scinet.utoronto.ca&amp;gt;&amp;gt; has&lt;br /&gt;
changed and you have requested&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* If you get the message below you may need to logout of your gnome session and log back in since the ssh-agent needs to be&lt;br /&gt;
restarted with the new passphrase ssh key.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh USERNAME@login.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
Agent admitted failure to sign using the key.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Can't forward X:  &amp;quot;Warning: No xauth data; using fake authentication data&amp;quot;, or &amp;quot;X11 connection rejected because of wrong authentication.&amp;quot;===&lt;br /&gt;
&lt;br /&gt;
I used to be able to forward X11 windows from SciNet to my home machine, but now I'm getting these messages; what's wrong?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
This very likely means that ssh/xauth can't update your ${HOME}/.Xauthority file. &lt;br /&gt;
&lt;br /&gt;
The simplest pssible reason for this is that you've filled your 10GB /home quota and so can't write anything to your home directory.   Use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load extras&lt;br /&gt;
$ diskUsage&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
&lt;br /&gt;
to check to see how close you are to your disk usage on ${HOME}.&lt;br /&gt;
&lt;br /&gt;
Alternately, this could mean your .Xauthority file has become broken/corrupted/confused some how, in which case you can delete that file, and when you next log in you'll get a similar warning message involving creating .Xauthority, but things should work.&lt;br /&gt;
&lt;br /&gt;
===How come I can not login to TCS?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
A SciNet account doesn't automatically entitle you to TCS access. At a minimum, TCS jobs need to run on at least 32 cores (64 preferred because of Simultaneous Multi Threading - [[TCS_Quickstart#Node_configuration|SMT]] - on these nodes) and need the large memory (4GB/core) and bandwidth on the system. Essentially you need to be able to explain why the work can't be done on the GPC.&lt;br /&gt;
&lt;br /&gt;
===How can I reset the password for my Compute Canada account?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
You can reset your password for your Compute Canada account here:&lt;br /&gt;
&lt;br /&gt;
https://ccdb.computecanada.org/security/forgot&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===How can I change or reset the password for my SciNet account?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
To reset your password at SciNet please e-mail [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;]&lt;br /&gt;
&lt;br /&gt;
If you know your old password and want to change it, that can be done here:&lt;br /&gt;
&lt;br /&gt;
https://portal.scinet.utoronto.ca/&lt;br /&gt;
&lt;br /&gt;
===Why am I getting the error &amp;quot;Permission denied (publickey,gssapi-with-mic,password)&amp;quot;?===&lt;br /&gt;
&lt;br /&gt;
This error can pop up in a variety of situations: when trying to log in, or when after a job has finished, when the error and output files fail to be copied (there are other possible reasons for this failure as well -- see [[FAQ#My_GPC_job_died.2C_telling_me_.60Copy_Stageout_Files_Failed.27|My GPC job died, telling me:Copy Stageout Files Failed]]).&lt;br /&gt;
In most cases, the &amp;quot;Permission denioed&amp;quot; error is caused by incorrect permission of the (hidden) .ssh directory. Ssh is used for logging in as well as for the copying of the standard error and output files after a job. &lt;br /&gt;
&lt;br /&gt;
For security reasons, &lt;br /&gt;
the directory .ssh should only be writable and readable to you, but yours &lt;br /&gt;
has read permission for everybody, and thus it fails.  You can change &lt;br /&gt;
this by&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   chmod 700 ~/.ssh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
And to be sure, also do&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   chmod 600 ~/.ssh/id_rsa ~/.ssh/id_rsa.pub ~/authorized_keys&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===ERROR:102: Tcl command execution failed? when loading modules ===&lt;br /&gt;
Modules sometimes require other modules to be loaded first.&lt;br /&gt;
Module will let you know if you didn’t.&lt;br /&gt;
For example:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module purge&lt;br /&gt;
$ module load python&lt;br /&gt;
python/2.6.2(11):ERROR:151: Module ’python/2.6.2’ depends on one of the module(s) ’gcc/4.4.0’&lt;br /&gt;
python/2.6.2(11):ERROR:102: Tcl command execution failed: prereq gcc/4.4.0&lt;br /&gt;
$ gpc-f103n084-$ module load gcc python&lt;br /&gt;
$&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Compiling your Code==&lt;br /&gt;
&lt;br /&gt;
===How can I get g77 to work?===&lt;br /&gt;
&lt;br /&gt;
The fortran 77 compilers on the GPC are ifort and gfortran. We have dropped support for g77.  This has been a conscious decision. g77 (and the associated library libg2c) were completely replaced six years ago (Apr 2005) by the gcc 4.x branch, and haven't undergone any updates at all, even bug fixes, for over five years.  &lt;br /&gt;
If we would install g77 and libg2c, we would have to deal with the inevitable confusion caused when users accidentally link against the old, broken, wrong versions of the gcc libraries instead of the correct current versions.   &lt;br /&gt;
&lt;br /&gt;
If your code for some reason specifically requires five-plus-year-old libraries,  availability, compatibility, and unfixed-known-bug problems are only going to get worse for you over time, and this might be as good an opportunity as any to address those issues. &lt;br /&gt;
&lt;br /&gt;
''A note on porting to gfortran or ifort:''&lt;br /&gt;
&lt;br /&gt;
While gfortran and ifort are rather compatible with g77, one &lt;br /&gt;
important difference is that by default, gfortran does not preserve &lt;br /&gt;
local variables between function calls, while g77 does.   Preserved &lt;br /&gt;
local variables are for instance often used in implementations of quasi-random number &lt;br /&gt;
generators.  Proper fortran requires to declare such variables as SAVE &lt;br /&gt;
but not all old code does this.&lt;br /&gt;
Luckily, you can change gfortran's default behavior with the flag &lt;br /&gt;
&amp;lt;tt&amp;gt;-fno-automatic&amp;lt;/tt&amp;gt;.   For ifort, the corresponding flag is &amp;lt;tt&amp;gt;-noautomatic&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
===Where is libg2c.so?===&lt;br /&gt;
&lt;br /&gt;
libg2c.so is part of the g77 compiler, for which we dropped support. See [[#How can I get g77 to work on the GPC?]] for our reasons.&lt;br /&gt;
&lt;br /&gt;
===Autoparallelization does not work!===&lt;br /&gt;
&lt;br /&gt;
I compiled my code with the &amp;lt;tt&amp;gt;-qsmp=omp,auto&amp;lt;/tt&amp;gt; option, and then I specified that it should be run with 64 threads - with &lt;br /&gt;
 export OMP_NUM_THREADS=64&lt;br /&gt;
&lt;br /&gt;
However, when I check the load using &amp;lt;tt&amp;gt;llq1 -n&amp;lt;/tt&amp;gt;, it shows a load on the node of 1.37.  Why?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Using the autoparallelization will only get you so far.  In fact, it usually does not do too much.  What is helpful is to run the compiler with the &amp;lt;tt&amp;gt;-qreport&amp;lt;/tt&amp;gt; option, and then read the output listing carefully to see where the compiler thought it could parallelize, where it could not, and the reasons for this.  Then you can go back to your code and carefully try to address each of the issues brought up by the compiler.&lt;br /&gt;
We ''emphasize'' that this is just a rough first guide, and that the compilers are still not magical!   For more sophisticated approaches to parallelizing your code, email us at [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;]  to set up an appointment with one&lt;br /&gt;
of our technical analysts.&lt;br /&gt;
&lt;br /&gt;
===How do I link against the Intel Math Kernel Library?===&lt;br /&gt;
&lt;br /&gt;
If you need to link in the Intel Math Kernel Library (MKL) libraries, you are well advised to use the Intel(R) Math Kernel Library Link Line Advisor: http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/ for help in devising the list of libraries to link with your code.&lt;br /&gt;
&lt;br /&gt;
'''''Note that this give the link line for the command line. When using this in Makefiles, replace $MKLPATH by ${MKLPATH}.'''''&lt;br /&gt;
&lt;br /&gt;
'''''Note too that, unless the integer arguments you will be passing to the MKL libraries are actually 64-bit integers, rather than the normal int or INTEGER types, you want to specify 32-bit integers (lp64) .'''''&lt;br /&gt;
&lt;br /&gt;
===Can the compilers on the login nodes be disabled to prevent accidentally using them?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
You can accomplish this by modifying your .bashrc to not load the compiler modules. See [[Important .bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
===&amp;quot;relocation truncated to fit: R_X86_64_PC32&amp;quot;: Huh?===&lt;br /&gt;
&lt;br /&gt;
What does this mean, and why can't I compile this code?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Welcome to the joys of the x86 architecture!  You're probably having trouble building arrays larger than 2GB, individually or together.   Generally, you have to try to use the medium or large x86 `memory model'.   For the intel compilers, this is specified with the compile options&lt;br /&gt;
&lt;br /&gt;
  -mcmodel=medium -shared-intel&lt;br /&gt;
&lt;br /&gt;
===&amp;quot;feupdateenv is not implemented and will always fail&amp;quot;===&lt;br /&gt;
&lt;br /&gt;
How do I get rid of this and what does it mean?&lt;br /&gt;
 &lt;br /&gt;
'''Answer:'''&lt;br /&gt;
First note that, as ominous as it sounds, this is really just a warning, and has to do with the intel math library. You can ignore it (unless you really are trying to manually change the exception handlers for floating point exceptions such as divide by zero), or take the safe road and get rid off it by linking with the intel math functions library:&amp;lt;pre&amp;gt;-limf&amp;lt;/pre&amp;gt;See also [[#How do I link against the Intel Math Kernel Library?]]&lt;br /&gt;
&lt;br /&gt;
===Cannot find rdmacm library when compiling on GPC===&lt;br /&gt;
&lt;br /&gt;
I get the following error building my code on GPC: &amp;quot;&amp;lt;tt&amp;gt;ld: cannot find -lrdmacm&amp;lt;/tt&amp;gt;&amp;quot;.  Where can I find this library?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
This library is part of the MPI libraries; if your compiler is having problems picking it up, it probably means you are mistakenly trying to compile on the login nodes (scinet01..scinet04).  The login nodes aren't part of the GPC; they are for logging into the data centre only.  From there you must go to the GPC or TCS development nodes to do any real work.&lt;br /&gt;
&lt;br /&gt;
=== Why do I get this error when I try to compile: &amp;quot;icpc: error #10001: could not find directory in which /usr/bin/g++41 resides&amp;quot; ?===&lt;br /&gt;
&lt;br /&gt;
You are trying to compile on the login nodes.   As described in the wiki ( https://support.scinet.utoronto.ca/wiki/index.php/GPC_Quickstart#Login ), or in the users guide you would have received with your account,   Scinet supports two main clusters, with very different architectures.  Compilation must be done on the development nodes of the appropriate cluster (in this case, gpc01-04).   Thus, log into gpc01, gpc02, gpc03, or gpc04, and compile from there.&lt;br /&gt;
&lt;br /&gt;
==Testing your Code==&lt;br /&gt;
&lt;br /&gt;
=== Can I run a something for a short time on the development nodes? ===&lt;br /&gt;
&lt;br /&gt;
I am in the process of playing around with the mpi calls in my code to get it to work. I do a lot of tests and each of them takes a couple of seconds only.  Can I do this on the development nodes?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Yes, as long as it's very brief (a few minutes).   People use the development nodes&lt;br /&gt;
for their work, and you don't want to bog it down for people, and testing a real&lt;br /&gt;
code can chew up a lot more resources than compiling, etc.    The procedures differ&lt;br /&gt;
depending on what machine you're using.&lt;br /&gt;
&lt;br /&gt;
==== TCS ====&lt;br /&gt;
&lt;br /&gt;
On the TCS you can run small MPI jobs on the tcs02 node, which is meant for &lt;br /&gt;
development use.  But even for this test run on one node, you'll need a host file --&lt;br /&gt;
a list of hosts (in this case, all tcs-f11n06, which is the `real' name of tcs02)&lt;br /&gt;
that the job will run on.  Create a file called `hostfile' containing the following:&lt;br /&gt;
&lt;br /&gt;
 tcs-f11n06&lt;br /&gt;
 tcs-f11n06&lt;br /&gt;
 tcs-f11n06&lt;br /&gt;
 tcs-f11n06&lt;br /&gt;
&lt;br /&gt;
for a 4-task run.  When you invoke &amp;quot;poe&amp;quot; or &amp;quot;mpirun&amp;quot;, there are runtime&lt;br /&gt;
arguments that you specify pointing to this file.  You can also specify it&lt;br /&gt;
in an environment variable MP_HOSTFILE, so, if your file is in your /scratch directory, say &lt;br /&gt;
${SCRATCH}/hostfile, then you would do&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
 export MP_HOSTFILE=${SCRATCH}/hostfile&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
in your shell.  You will also need to create a &amp;lt;tt&amp;gt;.rhosts&amp;lt;/tt&amp;gt; file in your &lt;br /&gt;
home director, again listing &amp;lt;tt&amp;gt;tcs-f11n06&amp;lt;/tt&amp;gt; so that &amp;lt;tt&amp;gt;poe&amp;lt;/tt&amp;gt;&lt;br /&gt;
can start jobs.   After that you can simply run your program.  You can use&lt;br /&gt;
mpiexec:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
 mpiexec -n 4 my_test_program&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
adding &amp;lt;tt&amp;gt; -hostfile /path/to/my/hostfile&amp;lt;/tt&amp;gt; if you did not set the environment&lt;br /&gt;
variable above.  Alternatively, you can run it with the poe command (do a &amp;quot;man poe&amp;quot; for details), or even by&lt;br /&gt;
just directly running it.  In this case the number of MPI processes will by default&lt;br /&gt;
be the number of entries in your hostfile.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== GPC ====&lt;br /&gt;
&lt;br /&gt;
On the GPC one can run short test jobs on the GPC [[GPC_Quickstart#Compile.2FDevel_Nodes | development nodes ]]&amp;lt;tt&amp;gt;gpc01&amp;lt;/tt&amp;gt;..&amp;lt;tt&amp;gt;gpc04&amp;lt;/tt&amp;gt;;&lt;br /&gt;
if they are single-node jobs (which they should be) they don't need a hostfile.  Even better, though, is to request an [[ Moab#Interactive | interactive ]] job and run the tests either in regular batch queue or using a short high availability [[ Moab#debug | debug ]] queue that is reserved for this purpose.&lt;br /&gt;
&lt;br /&gt;
=== How do I run a longer (but still shorter than an hour) test job quickly ? ===&lt;br /&gt;
&lt;br /&gt;
'''Answer'''&lt;br /&gt;
&lt;br /&gt;
On the GPC there is a high turnover short queue called [[ Moab#debug | debug ]] that is designed for&lt;br /&gt;
this purpose.  You can use it by adding &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#PBS -q debug&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
to your submission script.&lt;br /&gt;
&lt;br /&gt;
==Running your jobs==&lt;br /&gt;
&lt;br /&gt;
===My job can't write to /home===&lt;br /&gt;
&lt;br /&gt;
My code works fine when I test on the development nodes, but when I submit a job, or even run interactively in the development queue on GPC, it fails.  What's wrong?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
As [[Data_Management#Home_Disk_Space | discussed]] [https://support.scinet.utoronto.ca/wiki/images/5/54/SciNet_Tutorial.pdf elsewhere], &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; is mounted read-only on the compute nodes; you can only write to &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; from the login nodes and devel nodes.  (The [[GPC_Quickstart#128Glargemem | largemem nodes]] on GPC, in this respect, are more like devel nodes than compute nodes).   In general, to run jobs you can read from &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; but you'll have to write to &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt; (or, if you were allocated space through the LRAC/NRAC process, on &amp;lt;tt&amp;gt;/project&amp;lt;/tt&amp;gt;).  More information on SciNet filesytems can be found on our [[Data_Management | Data Management]] page.&lt;br /&gt;
&lt;br /&gt;
===Error Submitting My Job: qsub: Bad UID for job execution MSG=ruserok failed ===&lt;br /&gt;
&lt;br /&gt;
I write up a submission script as in the examples, but when I attempt to submit the job, I get the above error.  What's wrong?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
This error will occur if you try to submit a job from the login nodes.   The login nodes are the gateway to all of SciNet's systems (GPC, TCS, P7, ARC), which have different hardware and queueing systems.  To submit a job, you must log into a development node for the particular cluster you are submitting to and submit from there.&lt;br /&gt;
&lt;br /&gt;
===OpenMP on the TCS===&lt;br /&gt;
&lt;br /&gt;
How do I run an OpenMP job on the TCS?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please look at the [[TCS_Quickstart#Submission_Script_for_an_OpenMP_Job | TCS Quickstart ]] page.&lt;br /&gt;
&lt;br /&gt;
===Can I can use hybrid codes consisting of MPI and openMP on the GPC?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Yes. Please look at the [[GPC_Quickstart#Hybrid_MPI.2FOpenMP_jobs | GPC Quickstart ]] page.&lt;br /&gt;
&lt;br /&gt;
===How do I run serial jobs on GPC?===&lt;br /&gt;
&lt;br /&gt;
'''Answer''':&lt;br /&gt;
&lt;br /&gt;
So it should be said first that SciNet is a parallel computing resource, &lt;br /&gt;
and our priority will always be parallel jobs.   Having said that, if &lt;br /&gt;
you can make efficient use of the resources using serial jobs and get &lt;br /&gt;
good science done, that's good too, and we're happy to help you.&lt;br /&gt;
&lt;br /&gt;
The GPC nodes each have 8 processing cores, and making efficient use of these &lt;br /&gt;
nodes means using all eight cores.  As a result, we'd like to have the &lt;br /&gt;
users take up whole nodes (eg, run multiples of 8 jobs) at a time.  &lt;br /&gt;
&lt;br /&gt;
It depends on the nature of your job what the best strategy is. Several approaches are presented on the [[User_Serial|serial run wiki page]].&lt;br /&gt;
&lt;br /&gt;
===Why can't I request only a single cpu for my job on GPC?===&lt;br /&gt;
&lt;br /&gt;
'''Answer''':&lt;br /&gt;
&lt;br /&gt;
On GPC, computers are allocated by the node - that is, in chunks of 8 processors.   If you want to run a job that requires only one processor, you need to bundle the jobs into groups of 8, so as to not be wasting the other 7 for 48 hours. See [[User_Serial|serial run wiki page]].&lt;br /&gt;
&lt;br /&gt;
===How do I run serial jobs on TCS?===&lt;br /&gt;
&lt;br /&gt;
'''Answer''': You don't.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===But in the queue I found a user who is running jobs on GPC, each of which is using only one processor, so why can't I?===&lt;br /&gt;
&lt;br /&gt;
'''Answer''':&lt;br /&gt;
&lt;br /&gt;
The pradat* and atlaspt* jobs, amongst others, are jobs of the ATLAS high energy physics project. That they are reported as single cpu jobs is an artifact of the moab scheduler. They are in fact being automatically bundled into 8-job bundles but have to run individually to be compatible with their international grid-based systems.&lt;br /&gt;
&lt;br /&gt;
===How do I use the ramdisk on GPC?===&lt;br /&gt;
&lt;br /&gt;
To use the ramdisk, create and read to / write from files in /dev/shm/.. just as one would to (eg) ${SCRATCH}. Only the amount of RAM needed to store the files will be taken up by the temporary file system; thus if you have 8 serial jobs each requiring 1 GB of RAM, and 1GB is taken up by various OS services, you would still have approximately 7GB available to use as ramdisk on a 16GB node. However, if you were to write 8 GB of data to the RAM disk, this would exceed available memory and your job would likely crash.&lt;br /&gt;
&lt;br /&gt;
It is very important to delete your files from ram disk at the end of your job. If you do not do this, the next user to use that node will have less RAM available than they might expect, and this might kill their jobs.&lt;br /&gt;
&lt;br /&gt;
''More details on how to setup your script to use the ramdisk can be found on the [[User_Ramdisk|Ramdisk wiki page]].''&lt;br /&gt;
&lt;br /&gt;
===How can I automatically resubmit a job?===&lt;br /&gt;
&lt;br /&gt;
Commonly you may have a job that you know will take longer to run than what is &lt;br /&gt;
permissible in the queue.  As long as your program contains [[Checkpoints|checkpoint]] or &lt;br /&gt;
restart capability, you can have one job automatically submit the next. In&lt;br /&gt;
the following example it is assumed that the program finishes before &lt;br /&gt;
the 48 hour limit and then resubmits itself by logging into one&lt;br /&gt;
of the development nodes.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque example submission script for auto resubmission&lt;br /&gt;
# SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=48:00:00&lt;br /&gt;
#PBS -N my_job&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# YOUR CODE HERE&lt;br /&gt;
./run_my_code&lt;br /&gt;
&lt;br /&gt;
# RESUBMIT 10 TIMES HERE&lt;br /&gt;
num=$NUM&lt;br /&gt;
if [ $num -lt 10 ]; then&lt;br /&gt;
      num=$(($num+1))&lt;br /&gt;
      ssh gpc01 &amp;quot;cd $PBS_O_WORKDIR; qsub ./script_name.sh -v NUM=$num&amp;quot;;&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub script_name.sh -v&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can alternatively use [[ Moab#Job_Dependencies | Job dependencies ]] through the queuing system which will not start one job until another job has completed.&lt;br /&gt;
&lt;br /&gt;
If your job can't be made to automatically stop before the 48 hour queue window, but it does write out checkpoints, you can use the timeout command to stop the program while you still have time to resubmit; for instance&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
    timeout 2850m ./run_my_code argument1 argument2&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
will run the program for 47.5 hours (2850 minutes), and then send it SIGTERM to exit the program.&lt;br /&gt;
&lt;br /&gt;
===How can I pass in arguments to my submission script?===&lt;br /&gt;
&lt;br /&gt;
If you wish to make your scripts more generic you can use qsub's ability &lt;br /&gt;
to pass in environment variables to pass in arguments to your script.&lt;br /&gt;
The following example shows a case where an input and an output &lt;br /&gt;
file are passed in on the qsub line. Multiple variables can be &lt;br /&gt;
passed in using the qsub &amp;quot;-v&amp;quot; option and comma delimited. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque example of passing in arguments&lt;br /&gt;
# SciNet GPC&lt;br /&gt;
# &lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=48:00:00&lt;br /&gt;
#PBS -N my_job&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# YOUR CODE HERE&lt;br /&gt;
./run_my_code -f $INFILE -o $OUTFILE&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub script_name.sh -v INFILE=input.txt,OUTFILE=outfile.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== How can I run a job longer than 48 hours? ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
The SciNet queues have a queue limit of 48 hours.   This is pretty typical for systems of its size in Canada and elsewhere, and larger systems commonly have shorter limits.   The limits are there to ensure that every user gets a fair share of the system (so that no one user ties up lots of nodes for a long time), and for safety (so that if one memory board in one node fails in the middle of a very long job, you haven't lost a months' worth of work).&lt;br /&gt;
&lt;br /&gt;
Since many of us have simulations that require more than that much time, most widely-used scientific applications have &amp;quot;checkpoint-restart&amp;quot; functionality, where every so often the complete state of the calculation is stored as a checkpoint file, and one can restart a simulation from one of these.   In fact, these restart files tend to be quite useful for a number of purposes.&lt;br /&gt;
&lt;br /&gt;
If your job will take longer, you will have to submit your job in multiple parts, restarting from a checkpoint each time.  In this way, one can run a simulation much longer than the queue limit.  In fact, one can even write job scripts which automatically re-submit themselves until a run is completed, using [[FAQ#How_can_I_automatically_resubmit_a_job.3F | automatic resubmission. ]]&lt;br /&gt;
&lt;br /&gt;
=== Why did showstart say it would take 3 hours for my job to start before, and now it says my job will start in 10 hours? ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please look at the [[FAQ#How_do_priorities_work.2Fwhy_did_that_job_jump_ahead_of_mine_in_the_queue.3F | How do priorities work/why did that job jump ahead of mine in the queue? ]] page.&lt;br /&gt;
&lt;br /&gt;
===How do priorities work/why did that job jump ahead of mine in the queue?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
The [[Moab | queueing system]] used on SciNet machines is a [http://en.wikipedia.org/wiki/Priority_queue Priority Queue].  Jobs enter the queue at the back of the queue, and slowly make their way to the front as those ahead of them are run; but a job that enters the queue with a higher priority can `cut in line'.&lt;br /&gt;
&lt;br /&gt;
The main factor which determines priority is whether or not the user (or their PI) has an [http://wiki.scinethpc.ca/wiki/index.php/Application_Process LRAC or NRAC allocation].  These are competitively allocated grants of computer time; there is a call for proposals towards the end of every calendar year.    Users with an allocation have high priorities in an attempt to make sure that they can use the amount of computer time the committees granted them.   Their priority decreases as they approach their allotted usage over the current window of time; by the time that they have exhausted that allotted usage, their priority is the same as users with no allocation (unallocated, or `default' users).    Unallocated users have a fixed, low, priority.&lt;br /&gt;
&lt;br /&gt;
This priority system is called `fairshare'; the scheduler attempts to make sure everyone has their fair share of the machines, where the share that's fair has been determined by the allocation committee.    The fairshare window is a rolling window of two weeks; that is, any time you have a job in the queue, the fairshare calculation of its priority is given by how much of your allocation of the machine has been used in the last 14 days.&lt;br /&gt;
&lt;br /&gt;
A particular allocation might have some fraction of GPC - say 4% of the machine (if the PI had been allocated 10 million CPU hours on GPC). The allocations have labels; (called `Resource Allocation Proposal Identifiers', or RAPIs) they look something like&lt;br /&gt;
&lt;br /&gt;
  abc-123-ab&lt;br /&gt;
&lt;br /&gt;
where abc-123 is the PIs CCRI, and the suffix specifies which of the allocations granted to the PI is to be used.  These can be specified on a job-by-job basis.  On GPC, one adds the line&lt;br /&gt;
 #PBS -A RAPI&lt;br /&gt;
to your script; on TCS, one uses&lt;br /&gt;
 # @ account_no = RAPI&lt;br /&gt;
If the allocation to charge isn't specified, a default is used; each user has such a default, which can be changed at the same portal where one changes one's password, &lt;br /&gt;
&lt;br /&gt;
 https://portal.scinet.utoronto.ca/&lt;br /&gt;
&lt;br /&gt;
A jobs priority is determined primarily by the fairshare priority of the allocation it is being charged to; the previous 14 days worth of use under that allocation is calculated and compared to the allocated fraction (here, 5%) of the machine over that window (here, 14 days).   The fairshare priority is a decreasing function of the allocation left; if there is no allocation left (eg, jobs running under that allocation have already used 379,038 CPU hours in the past 14 days), the priority is the same as that of a user with no granted allocation.   (This last part has been the topic of some debate; as the machine gets more utilized, it will probably be the case that we allow RAC users who have greatly overused their quota to have their priorities to drop below that of unallocated users, to give the unallocated users some chance to run on our increasingly crowded system; this would have no undue effect on our allocated users as they still would be able to use the amount of resources they had been allocated by the committees.)   Note that all jobs charging the same allocation get the same fairshare priority.&lt;br /&gt;
&lt;br /&gt;
There are other factors that go into calculating priority, but fairshare is the most significant.   Other factors include&lt;br /&gt;
* amount of time waiting in queue (measured in units of the requested runtime).   A job that requests 1 hour in the queue and has been waiting 2 days will get a bump in its priority larger than a job that requests 2 days and has been waiting the same time.&lt;br /&gt;
* User adjustment of priorities ( See below ).&lt;br /&gt;
&lt;br /&gt;
The major effect of these subdominant terms is to shuffle the order of jobs running under the same allocation.&lt;br /&gt;
&lt;br /&gt;
===How do we manage job priorities within our research group?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Obviously, managing shared resources within a large group - whether it &lt;br /&gt;
is conference funding or CPU time - takes some doing.   &lt;br /&gt;
&lt;br /&gt;
It's important to note that the fairshare periods are intentionally kept &lt;br /&gt;
quite short - just two weeks long.   (These exact numbers subject to &lt;br /&gt;
change  as the year goes on and we better understand use patterns, but &lt;br /&gt;
they're unlikely to change radically).   So, for example, let us say that in your resource &lt;br /&gt;
allocation you have about 10% of the machine.   Then for someone to use &lt;br /&gt;
up the whole two week amount of time in 2 days, they'd have to use 70% &lt;br /&gt;
of the machine in those two days - which is unlikely to happen by &lt;br /&gt;
accident.  If that does happen,  &lt;br /&gt;
those using the same allocation as the person who used 70% of the &lt;br /&gt;
machine over the two days will suffer by having much lower priority for &lt;br /&gt;
their jobs, but only for the next 12 days - and even then, if there are &lt;br /&gt;
idle cpus they'll still be able to compute.&lt;br /&gt;
&lt;br /&gt;
There will be online tools for seeing how the allocation is being used, &lt;br /&gt;
and those people who are in charge in your group will be able to use &lt;br /&gt;
that information to manage the users, telling them to dial it down or &lt;br /&gt;
up.   We know that managing a large research group is hard, and we want &lt;br /&gt;
to make sure we provide you the information you need to do your job &lt;br /&gt;
effectively.&lt;br /&gt;
&lt;br /&gt;
One way for users within a group to manage their priorities within the group&lt;br /&gt;
is with [[Moab#Adjusting_Job_Priority | user-adjusted priorities]]; this is&lt;br /&gt;
described in more detail on the [[Moab | Scheduling System]] page.&lt;br /&gt;
&lt;br /&gt;
=== How do I charge jobs to my NRAC/LRAC allocation? ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please see the [[Moab#Accounting|accounting section of Moab page]].&lt;br /&gt;
&lt;br /&gt;
=== How does one check the amount of used CPU-hours in a project, and how does one get statistics for each user in the project? ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
This information is available on the scinet portal,https://portal.scinet.utoronto.ca, See also [[SciNet Usage Reports]].&lt;br /&gt;
&lt;br /&gt;
=== How does the Infiniband Upgrade affect my 2012 NRAC allocation ?===&lt;br /&gt;
&lt;br /&gt;
The  NRAC allocations for the current (2012) year that were based on ethernet and infiniband will carry over, however the allocation will be on the full GPC, not just the subsection.  So if you were allocated 500 hours on Infiniband your fairshare allocation will still be 500 hours, just 500 out or 30,000, instead of 500 out of 7,000.  If you received two allocations, one on gigE and one on IB, they will simply be combined. This should benefit all users as the desegregation of the GPC provides a greater pool of nodes increasing the probability of your job to run.&lt;br /&gt;
&lt;br /&gt;
==Monitoring jobs in the queue==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Why hasn't my job started?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Use the moab command &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
checkjob -v jobid&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and the last couple of lines should explain why a job hasn't started.  &lt;br /&gt;
&lt;br /&gt;
Please see [[Moab| Job Scheduling System (Moab) ]] for more detailed information&lt;br /&gt;
&lt;br /&gt;
===How do I figure out when my job will run?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please see [[Moab#Available_Resources| Job Scheduling System (Moab) ]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- ===My GPC job is Held, and checkjob says &amp;quot;Batch:PolicyViolation&amp;quot; ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
When this happens, you'll see your job stuck in a BatchHold state.  &lt;br /&gt;
This happens because the job you've submitted breaks one of the rules of the queues, and is being held until you modify it or kill it and re-submit a conforming job.  The most common problems are:&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===I submit my GPC job, and I get an email saying it was rejected===&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
This happens because the job you've submitted breaks one of the rules of the queues and is rejected. An email&lt;br /&gt;
is sent with the JOBID, JOBNAME, and the reason it was rejected.  The following is an example where a job&lt;br /&gt;
requests more than 48 hours and was rejected.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PBS Job Id: 3462493.gpc-sched&lt;br /&gt;
Job Name:   STDIN&lt;br /&gt;
job deleted&lt;br /&gt;
Job deleted at request of root@gpc-sched&lt;br /&gt;
MOAB_INFO:  job was rejected - job violates class configuration 'wclimit too high for class 'batch_ib' (345600 &amp;gt; 172800)'&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Jobs on the TCS or GPC may only run for 48 hours at a time; this restriction greatly increases responsiveness of the queue and queue throughput for all our users.  If your computation requires longer than that, as many do, you will have to [[ Checkpoints | checkpoint ]] your job and restart it after each 48-hour queue window.   You can manually re-submit jobs, or if you can have your job cleanly exit before the 48 hour window, there are ways to [[ FAQ#How_can_I_automatically_resubmit_a_job.3F | automatically resubmit jobs ]].&lt;br /&gt;
&lt;br /&gt;
Other rejections return a more cryptic error saying &amp;quot;job violates class configuration&amp;quot; such as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PBS Job Id: 3462409.gpc-sched&lt;br /&gt;
Job Name:   STDIN&lt;br /&gt;
job deleted&lt;br /&gt;
Job deleted at request of root@gpc-sched&lt;br /&gt;
MOAB_INFO:  job was rejected - job violates class configuration 'user required by class 'batch''&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The most common problems that result in this error are:&lt;br /&gt;
&lt;br /&gt;
* '''Incorrect number of processors per node''': Jobs on the GPC are scheduled per-node not per-core and since each node has 8 processor cores (ppn=8) the smallest job allowed is one node with 8 cores (nodes=1:ppn=8).  For serial jobs users must bundle or batch them together in groups of 8. See [[ FAQ#How_do_I_run_serial_jobs_on_GPC.3F | How do I run serial jobs on GPC? ]]&lt;br /&gt;
* '''No number of nodes specified''': Jobs submitted to the main queue must request a specific number of nodes, either in the submission script (with a line like &amp;lt;tt&amp;gt;#PBS -l nodes=2:ppn=8&amp;lt;/tt&amp;gt;) or on the command line (eg, &amp;lt;tt&amp;gt;qsub -l nodes=2:ppn=8,walltime=5:00:00 script.pbs&amp;lt;/tt&amp;gt;).  Note that for the debug queue, you can get away without specifying a number of nodes and a default of one will be assigned; for both technical and policy reasons, we do not enforce such a default for the main (&amp;quot;batch&amp;quot;) queue.&lt;br /&gt;
* '''There is a 15 minute walltime minimum''' on all queues except debug and if you set your walltime less than this, it will be rejected.&lt;br /&gt;
&lt;br /&gt;
===How can I monitor my running jobs on TCS?===&lt;br /&gt;
&lt;br /&gt;
How can I monitor the load of TCS jobs?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
You can get more information with the command &lt;br /&gt;
 /xcat/tools/tcs-scripts/LL/jobState.sh&lt;br /&gt;
which I alias as:&lt;br /&gt;
 alias llq1='/xcat/tools/tcs-scripts/LL/jobState.sh'&lt;br /&gt;
If you run &amp;quot;llq1 -n&amp;quot; you will see a listing of jobs together with a lot of information, including the load.&lt;br /&gt;
&lt;br /&gt;
==Errors in running jobs==&lt;br /&gt;
&lt;br /&gt;
===On GPC, `Job cannot be executed'===&lt;br /&gt;
&lt;br /&gt;
I get error messages like this trying to run on GPC:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PBS Job Id: 30414.gpc-sched&lt;br /&gt;
Job Name:   namd&lt;br /&gt;
Exec host:  gpc-f120n011/7+gpc-f120n011/6+gpc-f120n011/5+gpc-f120n011/4+gpc-f120n011/3+gpc-f120n011/2+gpc-f120n011/1+gpc-f120n011/0&lt;br /&gt;
Aborted by PBS Server &lt;br /&gt;
Job cannot be executed&lt;br /&gt;
See Administrator for help&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
PBS Job Id: 30414.gpc-sched&lt;br /&gt;
Job Name:   namd&lt;br /&gt;
Exec host:  gpc-f120n011/7+gpc-f120n011/6+gpc-f120n011/5+gpc-f120n011/4+gpc-f120n011/3+gpc-f120n011/2+gpc-f120n011/1+gpc-f120n011/0&lt;br /&gt;
An error has occurred processing your job, see below.&lt;br /&gt;
request to copy stageout files failed on node 'gpc-f120n011/7+gpc-f120n011/6+gpc-f120n011/5+gpc-f120n011/4+gpc-f120n011/3+gpc-f120n011/2+gpc-f120n011/1+gpc-f120n011/0' for job 30414.gpc-sched&lt;br /&gt;
&lt;br /&gt;
Unable to copy file 30414.gpc-sched.OU to USER@gpc-f101n084.scinet.local:/scratch/G/GROUP/USER/projects/sim-performance-test/runtime/l/namd/8/namd.o30414&lt;br /&gt;
*** error from copy&lt;br /&gt;
30414.gpc-sched.OU: No such file or directory&lt;br /&gt;
*** end error output&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Try doing the following:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mkdir ${SCRATCH}/.pbs_spool&lt;br /&gt;
ln -s ${SCRATCH}/.pbs_spool ~/.pbs_spool&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This is how all new accounts are setup on SciNet.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; on GPC for compute jobs is mounted as a read-only file system.   &lt;br /&gt;
PBS by default tries to spool its output  files to &amp;lt;tt&amp;gt;${HOME}/.pbs_spool&amp;lt;/tt&amp;gt;&lt;br /&gt;
which fails as it tries to write to a read-only file  &lt;br /&gt;
system.    New accounts at SciNet  get around this by having ${HOME}/.pbs_spool  &lt;br /&gt;
point to somewhere appropriate on &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt;, but if you've deleted that link&lt;br /&gt;
or directory, or had an old account, you will see errors like the above.&lt;br /&gt;
&lt;br /&gt;
'''On Feb 24, the input/output mechanism has been reconfigured to use a local ramdisk as the temporary location, which means that .pbs_spool is no longer needed and this error should not occur anymore.'''&lt;br /&gt;
&lt;br /&gt;
=== I couldn't find the  .o output file in the .pbs_spool directory as I used to ===&lt;br /&gt;
&lt;br /&gt;
On Feb 24 2011, the temporary location of standard input and output files was moved from the shared file system ${SCRATCH}/.pbs_spool to the&lt;br /&gt;
node-local directory /var/spool/torque/spool (which resides in ram). The final location after a job has finished is unchanged,&lt;br /&gt;
but to check the output/error of running jobs, users will now have to ssh into the (first) node assigned to the job and look in&lt;br /&gt;
/var/spool/torque/spool.&lt;br /&gt;
&lt;br /&gt;
This alleviates access contention to the temporary directory, especially for those users that are running a lot of jobs, and  reduces the burden on the file system in general.&lt;br /&gt;
&lt;br /&gt;
Note that it is good practice to redirect output to a file rather than to count on the scheduler to do this for you.&lt;br /&gt;
&lt;br /&gt;
=== My GPC job died, telling me `Copy Stageout Files Failed' ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
When a job runs on GPC, the script's standard output and error are redirected to &lt;br /&gt;
&amp;lt;tt&amp;gt;$PBS_JOBID.gpc-sched.OU&amp;lt;/tt&amp;gt; and &amp;lt;tt&amp;gt;$PBS_JOBID.gpc-sched.ER&amp;lt;/tt&amp;gt; in&lt;br /&gt;
/var/spool/torque/spool on the (first) node on which your job is running.  At the end of the job, those .OU and .ER files are copied to where the batch script tells them to be copied, by default &amp;lt;tt&amp;gt;$PBS_JOBNAME.o$PBS_JOBID&amp;lt;/tt&amp;gt; and&amp;lt;tt&amp;gt;$PBS_JOBNAME.e$PBS_JOBID&amp;lt;/tt&amp;gt;.   (You can set those filenames to be something clearer with the -e and -o options in your PBS script.)&lt;br /&gt;
&lt;br /&gt;
When you get errors like this:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
An error has occurred processing your job, see below.&lt;br /&gt;
request to copy stageout files failed on node&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
it means that the copying back process has failed in some way.  There could be a few reasons for this. The first thing to '''make sure that your .bashrc does not produce any output''', as the output-stageout is performed by bash and further output can cause this to fail.&lt;br /&gt;
But it also could have just been a random filesystem error, or it  could be that your job failed spectacularly enough to shortcircuit the normal job-termination process and those files just never got copied.&lt;br /&gt;
&lt;br /&gt;
Write to [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;] if your input/output files got lost, as we will probably be able to retrieve them for you (please supply at least the jobid, and any other information that may be relevant). &lt;br /&gt;
&lt;br /&gt;
Mind you that it is good practice to redirect output to a file rather than depending on the job scheduler to do this for you.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
&lt;br /&gt;
===Another transport will be used instead===&lt;br /&gt;
&lt;br /&gt;
I get error messages like the following when running on the GPC at the start of the run, although the job seems to proceed OK.   Is this a problem?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
[[45588,1],0]: A high-performance Open MPI point-to-point messaging module&lt;br /&gt;
was unable to find any relevant network interfaces:&lt;br /&gt;
&lt;br /&gt;
Module: OpenFabrics (openib)&lt;br /&gt;
  Host: gpc-f101n005&lt;br /&gt;
&lt;br /&gt;
Another transport will be used instead, although this may result in&lt;br /&gt;
lower performance.&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Everything's fine.   The two MPI libraries scinet provides work for both the InifiniBand and the Gigabit Ethernet interconnects, and will always try to use the fastest interconnect available.   In this case, you ran on normal gigabit GPC nodes with no infiniband; but the MPI libraries have no way of knowing this, and try the infiniband first anyway.  This is just a harmless `failover' message; it tried to use the infiniband, which doesn't exist on this node, then fell back on using Gigabit ethernet (`another transport').&lt;br /&gt;
&lt;br /&gt;
With OpenMPI, this can be avoided by not looking for infiniband; eg, by using the option&lt;br /&gt;
&lt;br /&gt;
--mca btl ^openib&lt;br /&gt;
&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===IB Memory Errors, eg &amp;lt;tt&amp;gt; reg_mr Cannot allocate memory &amp;lt;/tt&amp;gt;===&lt;br /&gt;
&lt;br /&gt;
Infiniband requires more memory than ethernet; it can use RDMA (remote direct memory access) transport for which it sets aside registered memory to transfer data.&lt;br /&gt;
&lt;br /&gt;
In our current network configuration, it requires a _lot_ more memory, particularly as you go to larger process counts; unfortunately, that means you can't get around the &amp;quot;I need more memory&amp;quot; problem the usual way, by running on more nodes.   Machines with different memory or &lt;br /&gt;
network configurations may exhibit this problem at higher or lower MPI &lt;br /&gt;
task counts.&lt;br /&gt;
&lt;br /&gt;
Right now, the best workaround is to reduce the number and size of OpenIB queues, using XRC: with the OpenMPI, add the following options to your mpirun command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-mca btl_openib_receive_queues X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32 -mca btl_openib_max_send_size 12288&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
With Intel MPI, you should be able to do&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load intelmpi/4.0.3.008&lt;br /&gt;
mpirun -genv I_MPI_FABRICS=shm:ofa  -genv I_MPI_OFA_USE_XRC=1 -genv I_MPI_OFA_DYNAMIC_QPS=1 -genv I_MPI_DEBUG=5 -np XX ./mycode&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
to the same end.  &lt;br /&gt;
&lt;br /&gt;
For more information see [[GPC MPI Versions]].&lt;br /&gt;
&lt;br /&gt;
===My compute job fails, saying &amp;lt;tt&amp;gt;libpng12.so.0: cannot open shared object file&amp;lt;/tt&amp;gt; or &amp;lt;tt&amp;gt;libjpeg.so.62: cannot open shared object file&amp;lt;/tt&amp;gt;===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
To maximize the amount of memory available for compute jobs, the compute nodes have a less complete system image than the development nodes.   In particular, since interactive graphics libraries like matplotlib and gnuplot are usually used interactively, the libraries for their use are included in the devel nodes' image but not the compute nodes.&lt;br /&gt;
&lt;br /&gt;
Many of these extra libraries are, however, available in the &amp;quot;extras&amp;quot; module.   So adding a &amp;quot;module load extras&amp;quot; to your job submission  script - or, for overkill, to your .bashrc - should enable these scripts to run on the compute nodes.&lt;br /&gt;
&lt;br /&gt;
==Data on SciNet disks==&lt;br /&gt;
&lt;br /&gt;
===How do I find out my disk usage?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
The standard unix/linux utilities for finding the amount of disk space used by a directory are very slow, and notoriously inefficient on the GPFS filesystems that we run on the SciNet systems.  There are utilities that very quickly report your disk usage:&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;'''/scinet/gpc/bin/diskUsage'''&amp;lt;/tt&amp;gt; command, available on the login nodes, datamovers and the GPC devel nodes, provides information in a number of ways on the home, scratch, and project file systems. For instance, how much disk space is being used by yourself and your group (with the -a option), or how much your usage has changed over a certain period (&amp;quot;delta information&amp;quot;) or you may generate plots of your usage over time.&lt;br /&gt;
This information is only updated hourly!&lt;br /&gt;
&lt;br /&gt;
More information about these filesystems is available at the [[Data_Management | Data_Management]].&lt;br /&gt;
&lt;br /&gt;
===How do I transfer data to/from SciNet?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
All incoming connections to SciNet go through relatively low-speed connections to the &amp;lt;tt&amp;gt;login.scinet&amp;lt;/tt&amp;gt; gateways, so using scp to copy files the same way you ssh in is not an effective way to move lots of data.  Better tools are described in our page on [[Data_Management#Data_Transfer | Data Transfer]].&lt;br /&gt;
&lt;br /&gt;
===My group works with data files of size 1-2 GB.  Is this too large to  transfer by scp to login.scinet.utoronto.ca ?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Generally, occasion transfers of data less than 10GB is perfectly acceptible to so through the login nodes. See [[Data_Management#Data_Transfer | Data Transfer]].&lt;br /&gt;
&lt;br /&gt;
===How can I check if I have files in /scratch that are scheduled for automatic deletion?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please see [[Storage_Quickstart#Scratch_Disk_Purging_Policy | Storage At SciNet]]&lt;br /&gt;
&lt;br /&gt;
===How to allow my supervisor to manage files for me using ACL-based commands?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please see [[Data_Management#File.2FOwnership_Management_.28ACL.29 | File/Ownership Management]]&lt;br /&gt;
&lt;br /&gt;
===Can we buy extra storage space on SciNet?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
Yes, please see [[Data_Management#Buying_storage_space_on_GPFS_or_HPSS | Buying storage space on GPFS or HPSS ]] for more details.&lt;br /&gt;
&lt;br /&gt;
==Keep 'em Coming!==&lt;br /&gt;
&lt;br /&gt;
===Next question, please===&lt;br /&gt;
&lt;br /&gt;
Send your question to [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;];  we'll answer it asap!&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Oldwiki.scinet.utoronto.ca:System_Alerts&amp;diff=6073</id>
		<title>Oldwiki.scinet.utoronto.ca:System Alerts</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Oldwiki.scinet.utoronto.ca:System_Alerts&amp;diff=6073"/>
		<updated>2013-05-04T15:18:10Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== System Status==&lt;br /&gt;
[[File:down.png|down|link=GPC Quickstart]]GPC&lt;br /&gt;
[[File:down.png|down|link=TCS Quickstart]]TCS&lt;br /&gt;
[[File:down.png|down|link=GPU Devel Nodes]]ARC&lt;br /&gt;
[[File:down.png|down|link=P7 Linux Cluster]]P7&lt;br /&gt;
[[File:down.png|down|link=BGQ]]BGQ&lt;br /&gt;
[[File:down.png|down|link=HPSS]]HPSS&lt;br /&gt;
&lt;br /&gt;
Systems unavailable due to power glitch at data center; will update shortly&lt;br /&gt;
&lt;br /&gt;
Last updated:  Tue May 4 11:17:52 EDT 2013&lt;br /&gt;
&lt;br /&gt;
([[Previous_messages:|Previous messages]])&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Oldwiki.scinet.utoronto.ca:System_Alerts&amp;diff=6072</id>
		<title>Oldwiki.scinet.utoronto.ca:System Alerts</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Oldwiki.scinet.utoronto.ca:System_Alerts&amp;diff=6072"/>
		<updated>2013-05-04T15:17:54Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: /* System Status */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== System Status==&lt;br /&gt;
[[File:down.png|down|link=GPC Quickstart]]GPC&lt;br /&gt;
[[File:down.png|down|link=TCS Quickstart]]TCS&lt;br /&gt;
[[File:down.png|down|link=GPU Devel Nodes]]ARC&lt;br /&gt;
[[File:down.png|down|link=P7 Linux Cluster]]P7&lt;br /&gt;
[[File:down..png|down|link=BGQ]]BGQ&lt;br /&gt;
[[File:down..png|down|link=HPSS]]HPSS&lt;br /&gt;
&lt;br /&gt;
Systems unavailable due to power glitch at data center; will update shortly&lt;br /&gt;
&lt;br /&gt;
Last updated:  Tue May 4 11:17:52 EDT 2013&lt;br /&gt;
&lt;br /&gt;
([[Previous_messages:|Previous messages]])&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=File:Lecture23-2013.pdf&amp;diff=5938</id>
		<title>File:Lecture23-2013.pdf</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=File:Lecture23-2013.pdf&amp;diff=5938"/>
		<updated>2013-04-09T14:05:43Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: uploaded a new version of &amp;amp;quot;File:Lecture23-2013.pdf&amp;amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=File:Lecture23-2013.pdf&amp;diff=5937</id>
		<title>File:Lecture23-2013.pdf</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=File:Lecture23-2013.pdf&amp;diff=5937"/>
		<updated>2013-04-09T14:04:02Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Scientific_Computing_Course&amp;diff=5936</id>
		<title>Scientific Computing Course</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Scientific_Computing_Course&amp;diff=5936"/>
		<updated>2013-04-09T14:03:41Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: /* Topics */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;''This wiki page concerns the 2013 installment of SciNet's Scientific Computing course. Material from the previous installment can be found on [[Scientific Software Development Course]], [[Numerical Tools for Physical Scientists (course)]], and [[High Performance Scientific Computing]]''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
=Syllabus=&lt;br /&gt;
&lt;br /&gt;
==About the course==&lt;br /&gt;
* Whole-term graduate course&lt;br /&gt;
* Prerequisite: basic C, C++ or Fortran experience.&lt;br /&gt;
* Will use `C++ light' and Python&lt;br /&gt;
* Topics include: Scientific computing and programming skills, Parallel programming, and Hybrid programming.  &lt;br /&gt;
&lt;br /&gt;
There are three parts to this course:&lt;br /&gt;
&lt;br /&gt;
# Scientific Software Development: Jan/Feb 2013&amp;lt;br&amp;gt;''python, C++, git, make, modular programming, debugging''&lt;br /&gt;
# Numerical Tools for Physical Scientists: Feb/Mar 2013&amp;lt;br&amp;gt;''modelling, floating point, Monte Carlo, ODE, linear algebra,fft''&lt;br /&gt;
# High Performance Scientific Computing: Mar/Apr 2013&amp;lt;br&amp;gt;''openmp, mpi and hybrid programming''&lt;br /&gt;
&lt;br /&gt;
Each part consists of eight one-hour lectures, two per week.&lt;br /&gt;
&lt;br /&gt;
These can be taken separately by astrophysics graduate students at the University of Toronto as mini-courses, and by physics graduate students at the University of Toronto as modular courses.&lt;br /&gt;
&lt;br /&gt;
The first two parts count towards the SciNet Certificate in Scientific Computing, while the third part can count towards the SciNet HPC Certificate. For more info about the SciNet Certificates, see http://www.scinethpc.ca/2012/12/scinet-hpc-certificate-program.&lt;br /&gt;
&lt;br /&gt;
==Location and Times==&lt;br /&gt;
[http://www.scinethpc.ca/2010/08/contact-us SciNet HeadQuarters]&amp;lt;br&amp;gt;&lt;br /&gt;
256 McCaul Street, Toronto, ON&amp;lt;br&amp;gt;&lt;br /&gt;
Room 229 (Conference Room)&amp;lt;br&amp;gt;&lt;br /&gt;
Tuesdays 11:00 am - 12:00 noon&amp;lt;br&amp;gt;&lt;br /&gt;
Thursdays 11:00 am - 12:00 noon&lt;br /&gt;
&lt;br /&gt;
==Instructors and office hours==&lt;br /&gt;
&lt;br /&gt;
* Ramses van Zon - 256 McCaul Street, Rm 228 - Mondays 3-4pm&lt;br /&gt;
* L. Jonathan Dursi - 256 McCaul Street, Rm 216 - Wednesdays 3-4pm&lt;br /&gt;
&lt;br /&gt;
==Grading scheme==&lt;br /&gt;
&lt;br /&gt;
Attendence to lectures.&lt;br /&gt;
&lt;br /&gt;
Four home work sets (i.e., one per week), to be returned by email by 9:00 am the next Thursday.&lt;br /&gt;
&lt;br /&gt;
==Sign up==&lt;br /&gt;
Sign up for this graduate course goes through SciNet's course website.&amp;lt;br&amp;gt;The direct link is https://support.scinet.utoronto.ca/courses/?q=node/99.&amp;lt;br&amp;gt;  If you do not have a SciNet account but wish to register for this course, please email support@scinet.utoronto.ca . &amp;lt;br&amp;gt;&lt;br /&gt;
Sign up is closed.&lt;br /&gt;
&lt;br /&gt;
=Part 1: Scientific Software Development=&lt;br /&gt;
&lt;br /&gt;
==Prerequisites==&lt;br /&gt;
&lt;br /&gt;
Some programming experience. Some unix prompt experience.&lt;br /&gt;
&lt;br /&gt;
'''Software that you'll need:'''&lt;br /&gt;
&lt;br /&gt;
A unix-like environment with the GNU compiler suite (e.g. Cygwin), and Python 2, IPython, Numpy, SciPy and Matplotlib (which you all get if you use the Enthought distribution) installed on your laptop. Links are given at the bottom of this page.&lt;br /&gt;
&lt;br /&gt;
==Dates==&lt;br /&gt;
&lt;br /&gt;
January 15, 17, 22, 24, 29, and 31, 2013&amp;lt;br&amp;gt;&lt;br /&gt;
February 5 and 7, 2013&lt;br /&gt;
&lt;br /&gt;
==Topics (with lecture slides and recordings)==&lt;br /&gt;
&lt;br /&gt;
===''Lecture 1:'' C++ introduction===&lt;br /&gt;
:::[[File:Lecture1-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture1-2013/lecture1-2013.html]]&lt;br /&gt;
:::[[Media:Lecture1-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture1-2013/lecture1-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 2:'' More C++, build and version control&amp;lt;br&amp;gt;===&lt;br /&gt;
:::[[File:Lecture2-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture2-2013/lecture2-2013.html]]&lt;br /&gt;
:::Guest lecturer: Michael Nolta (CITA) for the git portion of the lecture.&lt;br /&gt;
:::[[Media:Lecture2-2013.pdf|C++ and Make slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture2-2013/lecture2-2013.mp4 C++ and Make video recording] &amp;amp;nbsp;/ &amp;amp;nbsp; [[Media:Git-Nolta.pdf|Git slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [[#HW1|Homework assigment 1]]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 3:'' Python and visualization===&lt;br /&gt;
:::[[File:Lecture3-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture3-2013/lecture3-2013.html]]&lt;br /&gt;
:::[[Media:Lecture3-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture3-2013/lecture3-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 4:'' Modular programming, refactoring, testing===&lt;br /&gt;
:::[[File:Lecture4-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture4-2013/lecture4-2013.html]]&lt;br /&gt;
:::[[Media:Lecture4-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture4-2013/lecture4-2013.mp4 Video recording] &amp;amp;nbsp;/ &amp;amp;nbsp;  [[#HW2|Homework assigment 2]]&lt;br /&gt;
:::[http://wiki.scinethpc.ca/wiki/images/f/f0/diffuse.cc diffuse.cc (course project source file)] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://wiki.scinethpc.ca/wiki/images/f/f0/plotdata.py plotdata.py (corresponding python movie generator)]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 5:'' Object oriented programming===&lt;br /&gt;
:::[[Media:Lecture5-2013.pdf|Slides]]&lt;br /&gt;
:::Recordings of this lecture are missing, but you could view the videos of SciNet's [[One-Day Scientific C++ Class]], in particular the parts on classes, polymorphism, and inheritance.&lt;br /&gt;
&lt;br /&gt;
===''Lecture 6:'' ODE, interpolation===&lt;br /&gt;
:::[[File:Lecture6-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture6-2013/lecture6-2013.html]]&lt;br /&gt;
:::[[Media:ScientificComputing2013-Lecture5-ODE.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture6-2013/lecture6-2013.mp4 Video recording] &amp;amp;nbsp;/ &amp;amp;nbsp; [[#HW3|Homework assigment 3]]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 7:'' Development tools: debugging and profiling===&lt;br /&gt;
:::[[File:Lecture7-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture7-2013/lecture7-2013.html]]&lt;br /&gt;
:::[[Media:ScientificComputing2013-Debugging.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture7-2013/lecture7-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 8:'' Objects in Python, linking C++ and Python===&lt;br /&gt;
:::[[File:Lecture8-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture8-2013/lecture8-2013.html]]&lt;br /&gt;
:::[[Media:Lecture8-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture8-2013/lecture8-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
==Homework assignments==&lt;br /&gt;
&lt;br /&gt;
===HW1===&lt;br /&gt;
&lt;br /&gt;
'''''Multi-file C++ program to create a data file'''''&lt;br /&gt;
&lt;br /&gt;
We’ve learned programming in basic C++, use of make and Makefiles to build projects, and local use of git for version control. In this first assignment, you’ll use these to make a multi-file C++ program, built with make, which computes and outputs a data file.&lt;br /&gt;
&lt;br /&gt;
* Start a git repository, and begin writing a C++ program to&lt;br /&gt;
:# Get an array size and a standard deviation from user input,&lt;br /&gt;
:# Allocate a 2d array (use the code given in lecture 2),&lt;br /&gt;
:# Store a 2d Gaussian with a maximum at the centre of the array and given standard deviation (in units of grid points),&lt;br /&gt;
:# Output that array to a text file,&lt;br /&gt;
:# Free the array, and exit. &lt;br /&gt;
* The output text file should contain just the data in text format, with a row of the file corresponding to a row of the array and with whitespace between the numbers. &lt;br /&gt;
* The 2d array creation/freeing routines should be in one file (with an associated header file), the gaussian calculation be in another (ditto), and the output routine be in a third, with the main program calling each of these. &lt;br /&gt;
* Use a makefile to build your code (add it to the repository).&lt;br /&gt;
* You can start with everything in one file, with hardcoded values for sizes and standard deviation and a static array, then refactor things into multiple files, adding the other features.&lt;br /&gt;
* As a test, use the ipython executable that came with your Enthought python distribution to read your data and plot it.&amp;lt;br&amp;gt;If your data file is named ‘data.txt’, running the following:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ipython --pylab&lt;br /&gt;
In [1]: data = numpy.genfromtxt('data.txt') &lt;br /&gt;
In [2]: contour(data) &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
should give a nice contour plot of a 2-dimensional gaussian.&lt;br /&gt;
* Email in your source code, makefile and the &amp;quot;git log&amp;quot; output of all your commits by email by at 9:00 am Thursday Jan 24th, 2013. Please zip or tar these files together as one attachment, with a file name that includes your name and &amp;quot;HW1&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
===HW2===&lt;br /&gt;
'''''Refactor legacy code to a modular project with unit tests'''''&lt;br /&gt;
&lt;br /&gt;
In class, today, we talked about modular programming and testing, and the project we’ll be working on for the next three weeks. This homework will start advancing on that project by working on the “legacy” code given to us by our supervisor ([http://wiki.scinethpc.ca/wiki/images/f/f0/diffuse.cc diffuse.cc]), with a corresponding python plotting script ([http://wiki.scinethpc.ca/wiki/images/f/f0/plotdata.py plotdata.py]), and whipping it into shape before we start adding new physics.&lt;br /&gt;
* Start a git repository for this project, and add the two files.&lt;br /&gt;
* Create a Makefile and add it to the repository.&lt;br /&gt;
* Since we have no tests, run the program with console output redirected to a file:&lt;br /&gt;
:&amp;lt;pre&amp;gt;$ diffuse &amp;gt; original-output.txt&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;''It turns out the code has a bug that can make the output different when the same code is run again, which obviously would not be good for a baseline test. Replace 'float error;' by 'float error=0.0;' to fix this.''&lt;br /&gt;
* Also save the two .npy output files, e.g. to original-data.npy and original-theory.npy. The triplet of files (original-output.txt, original-data.npy and original-theory.npy) serve as a baseline integrated test (add these to repository). &lt;br /&gt;
* Then write a 'test' target in your makefile that:&lt;br /&gt;
** Runs 'diffuse' with output to a new file.&lt;br /&gt;
** Compares the file with the baseline test file, and compare the .npy files.&lt;br /&gt;
:: (hint: the unix command diff or cmp can compare files).&lt;br /&gt;
* First refactoring: Move the global variables into the main routine.&lt;br /&gt;
* ''Chorus: Test your modified code, and commit.''&lt;br /&gt;
* Second refactoring: Extract a diffusion operator routine, that gets called from main.&lt;br /&gt;
* ''Chorus''&lt;br /&gt;
* Create a .cc/.h module for the diffusion operator.&lt;br /&gt;
* ''Chorus''&lt;br /&gt;
* Add two tests for the diffusion operator: for a constant and for a linear input field (&amp;lt;tt&amp;gt;rho[i][j]=a*i+b*j&amp;lt;/tt&amp;gt;). Add these to the test target in the makefile.&lt;br /&gt;
* ''Chorus''&lt;br /&gt;
* More refactoring: Extract three more .cc/.h modules:&lt;br /&gt;
** for output (should not contain hardcoded filenames)    &lt;br /&gt;
** computation of the theory&lt;br /&gt;
** and for the array allocation stuff.&lt;br /&gt;
* ''Chorus''&lt;br /&gt;
* Describe, but don't implement in the .h and .cc, what would be appropriate unit tests for these three modules.&lt;br /&gt;
&lt;br /&gt;
Email in your source code and the git log file of all your commits as a .zip or .tar file by email to rzon@scinethpc.ca and ljdursi@scinethpc.ca by 9:00 am on Thursday January 31, 2013.&lt;br /&gt;
&lt;br /&gt;
===HW3===&lt;br /&gt;
This week, we learned about object oriented programming, which fits nicely within the modular programming idea.  In this homework, we are going to use some of it to restructure our code and get it ready to add the tracer particle, the goal of the course project. &lt;br /&gt;
&lt;br /&gt;
The goal will be to have an instance of a &amp;lt;tt&amp;gt;Diffusion&amp;lt;/tt&amp;gt; class,&lt;br /&gt;
as well as an instance of &amp;lt;tt&amp;gt;Tracer&amp;lt;/tt&amp;gt;, which for now will be a&lt;br /&gt;
free particle moving as ('''x'''(t),'''y'''(t)) = ('''x'''(0) +&lt;br /&gt;
'''vx''' t, '''y'''(0) + '''vy''' t), without any coupling yet (we&lt;br /&gt;
will handle this next week).&lt;br /&gt;
&lt;br /&gt;
To be more specific:&lt;br /&gt;
* Clean up your code, using the feedback from your HW2 grading, such that the modules are as independent as possible. &lt;br /&gt;
* If you have not done so yet, add comments to the header files of your modules to explain exactly what each function does (without going into implementation details), what its arguments mean and what it returns (unless it's a void function, of course). &lt;br /&gt;
* Objectify the &amp;lt;tt&amp;gt;main&amp;lt;/tt&amp;gt; routine, by creating a class &amp;lt;tt&amp;gt;Diffusion&amp;lt;/tt&amp;gt;.&lt;br /&gt;
* Put this class in its own module (declaration in .h, implementation in .cc). For instance, the declaration could be&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
// diffusion.h&lt;br /&gt;
#ifndef DIFFUSIONH&lt;br /&gt;
#define DIFFUSIONH&lt;br /&gt;
#include &amp;lt;fstream&amp;gt;&lt;br /&gt;
class Diffusion {&lt;br /&gt;
  public:&lt;br /&gt;
    Diffusion(float x1, float x2, float D, int numPoints);&lt;br /&gt;
    void init(float a0, float sigma0); // set initial field&lt;br /&gt;
    void timeStep(float dt);           // solve diff. equation over dt&lt;br /&gt;
    void toFile(std::ofstream&amp;amp; f);     // write to file (binary,no npyheader)&lt;br /&gt;
    void toScreen();                   // report a line to screen&lt;br /&gt;
    float getRho(int i, int j);        // get a value of the field&lt;br /&gt;
    ~Diffusion();&lt;br /&gt;
  private:&lt;br /&gt;
    float*** rho;&lt;br /&gt;
    ...&lt;br /&gt;
};&lt;br /&gt;
#endif&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
(this is not supposed to be prescriptive.)&lt;br /&gt;
* In the implementation file you'd have things like&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
// diffusion.cc&lt;br /&gt;
#include &amp;quot;diffusion.h&amp;quot;&lt;br /&gt;
...&lt;br /&gt;
void Diffusion::timeStep(float dt) &lt;br /&gt;
{&lt;br /&gt;
   // code for the timeStep ...&lt;br /&gt;
}&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
(note the inclusion of the module's header file on the top of the implementation, so the class is declared).&lt;br /&gt;
* Let &amp;lt;tt&amp;gt;int main()&amp;lt;/tt&amp;gt; have the same functionality as before, but now by defining the parameters of the run, creating an object of this class, setting up file streams, and taking time steps and writing out by using calls to member functions of this object. &lt;br /&gt;
* Additionally, write a class &amp;lt;tt&amp;gt;Tracer&amp;lt;/tt&amp;gt; which for now implements a free particle in 2d. Something like:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
class Tracer {&lt;br /&gt;
  public:&lt;br /&gt;
    Tracer(float x1, float x2);&lt;br /&gt;
    void init(float x0, float y0, float vx, float vy);&lt;br /&gt;
    void timeStep(float dt);           // solve diff. equation over dt&lt;br /&gt;
    void toFile(std::ofstream&amp;amp; f);     // write to file (binary,no npyheader)&lt;br /&gt;
    void toScreen();                   // report a line to screen&lt;br /&gt;
    ~Tracer();&lt;br /&gt;
  private:&lt;br /&gt;
    ...&lt;br /&gt;
};&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
:The timeStep implementation can in this case use the infamous forward Euler integration scheme, because it happens to be exact here.&lt;br /&gt;
:When it comes to output to a npy file, let's view the the data of the tracer particle at one point in time as a 2x2 matrix &amp;lt;tt&amp;gt;[[x,y],[vx,vy]]&amp;lt;/tt&amp;gt;, so we can use much of the npy output code that we used for the diffusion field, which was a (numPoints+2)x(numPoints+2) matrix.&lt;br /&gt;
* This class too should be its own module (Often, &amp;quot;one class, one module&amp;quot; is a good paradigm, though occasionally you'll have closely related classes).&lt;br /&gt;
* Add some code to int main to  have the Tracer particle evolve at the same time as the diffusion field (although the two are completely uncoupled).&lt;br /&gt;
* Keep using git and make, run the tests that you have regularly to make sure your program still works.&lt;br /&gt;
&lt;br /&gt;
Note that because we've now set up our program in a modular fashion, you can do&lt;br /&gt;
different parts of this assignment in any order you want.  For instance, to wrap your head around object oriented programming, you may like implementing the tracer particle first, so that your diffusion code stays intact.  Or you might want to wait with commenting until the end if you think you'll have to change a module for this assignment.&lt;br /&gt;
&lt;br /&gt;
Email in your source code and the git log file of all your commits as a .zip or .tar file by email to rzon@scinethpc.ca and ljdursi@scinethpc.ca by &lt;br /&gt;
&amp;lt;span style=&amp;quot;color:#ee3300&amp;quot;&amp;gt;3:00 pm on Friday February 8, 2013&amp;lt;/span&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
===HW4===&lt;br /&gt;
&lt;br /&gt;
In this homework, we are going to implement the class project of a tracer particle coupled to a diffusion equation. &lt;br /&gt;
The full specification of the physical problem is [[Media:ScClassProject.pdf|here]].  &lt;br /&gt;
* Augment the tracer particle to include a force in the x and in the y direction, and a friction coefficient alpha, which at first can be constant.&lt;br /&gt;
* Implement the so-called leapfrog integration algorithm for the tracer particle&lt;br /&gt;
:::v &amp;amp;larr; v + f(v) &amp;amp;Delta;t / m&lt;br /&gt;
:::r &amp;amp;larr; r + v &amp;amp;Delta;t&lt;br /&gt;
:where v,r, and f are 2d vectors and f(v) is the total, velocity dependence force speficied in the class project, i.e., the sum of the external force F=qE and the friction force -&amp;amp;alpha;v.&amp;lt;br/&amp;gt;(Note: the v dependence of f make this strictly not a leapfrog integration, but we'll ignore that here.)&lt;br /&gt;
* Further augment the tracer class with a member function 'couple' which takes a diffusion field as input, and adjusts the friction constant. &lt;br /&gt;
* Your implementation of the 'couple' member function will need to interpolate the diffusion field to the current position of the particle. Use [[Media:CppInterpolation.tgz|this interpolation module]].&lt;br /&gt;
* Rewrite your main routine so that before tracer's time step, one calls the coupling. You may need to modify the Diffusion class a bit to get &amp;lt;tt&amp;gt;rho[active]&amp;lt;/tt&amp;gt; out.&lt;br /&gt;
* For simplicity, use the same time step for both the diffusion and the tracer particle.&lt;br /&gt;
* Keep using git and make.&lt;br /&gt;
&lt;br /&gt;
You will hand in your source code, makefiles and the git log file of all your commits by email by &amp;lt;span style=&amp;quot;color:#ee3300&amp;quot;&amp;gt;9:00 am on Thursday February 21, 2013&amp;lt;/span&amp;gt;.  Email the files, preferably zipped or tarred, to rzon@scinethpc.ca and ljdursi@scinethpc.ca.&lt;br /&gt;
&lt;br /&gt;
=Part 2: Numerical Tools for Physical Scientists=&lt;br /&gt;
&lt;br /&gt;
==Prerequisites==&lt;br /&gt;
&lt;br /&gt;
Part 1 or solid c++ programming skills, including make and unix/linux prompt experience.&lt;br /&gt;
&lt;br /&gt;
'''Software that you'll need'''&lt;br /&gt;
&lt;br /&gt;
A unix-like environment with the GNU compiler suite (e.g. Cygwin), and Python (Enthought) installed on your laptop.&lt;br /&gt;
&lt;br /&gt;
==Dates==&lt;br /&gt;
&lt;br /&gt;
February 12, 14, 26, and 28, 2013&amp;lt;br&amp;gt;&lt;br /&gt;
March 5, 7, 12, and 14, 2013&lt;br /&gt;
&lt;br /&gt;
==Topics==&lt;br /&gt;
&lt;br /&gt;
===''Lecture 1:'' Numerics ===&lt;br /&gt;
:::[[File:Lecture9-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture9-2013/lecture9-2013.html]]&lt;br /&gt;
:::[[Media:Lecture9-2013-Numerics.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture9-2013/lecture9-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 2:'' Random numbers ===&lt;br /&gt;
:::[[File:Lecture10-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture10-2013/lecture10-2013.html]]&lt;br /&gt;
:::[[Media:Lecture10-2013-PRNG.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture10-2013/lecture10-2013.mp4 Video recording] &amp;amp;nbsp;/ &amp;amp;nbsp;[http://wiki.scinethpc.ca/wiki/index.php/Scientific_Computing_Course#HW1_2 Homework assignment 1]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 3:'' Numerical integration and ODEs ===&lt;br /&gt;
:::[[File:Lecture11-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture11-2013/lecture11-2013.html]]&lt;br /&gt;
:::[[Media:Lecture11-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture11-2013/lecture11-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 4:'' Molecular Dynamics ===&lt;br /&gt;
:::[[File:Lecture12-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture12-2013/lecture12-2013.html]]&lt;br /&gt;
:::[[Media:Lecture12-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture12-2013/lecture12-2013.mp4 Video recording]  &amp;amp;nbsp;/ &amp;amp;nbsp;[http://wiki.scinethpc.ca/wiki/index.php/Scientific_Computing_Course#HW2_2 Homework assignment 2]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 5:'' Linear Algebra part I ===&lt;br /&gt;
:::[[Media:Lecture13-2013.pdf|Slides (combined with lecture 6)]]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 6:'' Linear Algebra part II and PDEs===&lt;br /&gt;
:::[[File:Lecture14-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture14-2013/lecture14-2013.html]]&lt;br /&gt;
:::[[Media:Lecture13-2013.pdf|Slides (combined with lecture 5)]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture14-2013/lecture14-2013.mp4 Video recording]  &amp;amp;nbsp;/ &amp;amp;nbsp;[http://wiki.scinethpc.ca/wiki/index.php/Scientific_Computing_Course#HW3_2 Homework assignment 3]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 7:'' Fast Fourier Transform===&lt;br /&gt;
:::[[File:Lecture15-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture15-2013/lecture15-2013.html]]&lt;br /&gt;
:::[[Media:Lecture15-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture15-2013/lecture15-2013.mp4 Video recording]  &amp;amp;nbsp;/ &amp;amp;nbsp;[[Media:Sincfftw.cc|example code]]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 8:'' FFT for real and multidimensional data===&lt;br /&gt;
:::[[File:Lecture15-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture16-2013/lecture16-2013.html]]&lt;br /&gt;
:::[[Media:Lecture16-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture16-2013/lecture16-2013.mp4 Video recording]  &amp;amp;nbsp;/ &amp;amp;nbsp; [http://wiki.scinethpc.ca/wiki/index.php/Scientific_Computing_Course#HW4_2 Homework assignment 4]&lt;br /&gt;
&lt;br /&gt;
==Homework Assignments==&lt;br /&gt;
&lt;br /&gt;
===HW1===&lt;br /&gt;
This week's homework consists of two assignments.&lt;br /&gt;
&lt;br /&gt;
''Assignment 1''&lt;br /&gt;
&lt;br /&gt;
* Consider the sequence of numbers: 1 followed by 10&amp;lt;sup&amp;gt;8&amp;lt;/sup&amp;gt; values of 10&amp;lt;sup&amp;gt;-8&amp;lt;/sup&amp;gt;&lt;br /&gt;
* Should sum to 2&lt;br /&gt;
* Write code which sums up those values in order. What answer does it get?&lt;br /&gt;
* Add to program routine which sums up values in reverse order. Does it get correct answer?&lt;br /&gt;
* How would you get correct answer?&lt;br /&gt;
* Submit code, Makefile, text file with answers.&lt;br /&gt;
&lt;br /&gt;
''Assignment 2''&lt;br /&gt;
&lt;br /&gt;
* Implement an linear congruential generator with a = 106, c = 1283, m = 6075 that generates random numbers from 0..1&lt;br /&gt;
* Using that and MT: generate 10,000 pairs (dx, dy) with dx, dy each in -0.1 .. +0.1. Generate histograms of dx and dy (say 200 bins). Does it look okay? What would you expect variation to be?&lt;br /&gt;
* For 10,000 points: take random walks from (x,y)=(0,0) until exceed radius of 2, then stop. Plot histogram of final angles for the two psuedo random number generators. What do you see?&lt;br /&gt;
* Submit makefile, code, plots, git log.&lt;br /&gt;
&lt;br /&gt;
Both assignments due on Thursday Feb 28th, 2013, at 9:00 am. Email the files to rzon@scinethpc.ca and ljdursi@scinethpc.ca.&lt;br /&gt;
&lt;br /&gt;
===HW2===&lt;br /&gt;
&lt;br /&gt;
''Assignment 1''&lt;br /&gt;
&lt;br /&gt;
* Compute numerically (using the GSL):&lt;br /&gt;
&lt;br /&gt;
::&amp;amp;int;&amp;lt;sub&amp;gt;0&amp;lt;/sub&amp;gt;&amp;lt;sup&amp;gt;3&amp;lt;/sup&amp;gt; f(x) &amp;amp;nbsp;dx&lt;br /&gt;
&lt;br /&gt;
:(that is the integral of f(x) from x=0 to x=3)&lt;br /&gt;
&lt;br /&gt;
:with&lt;br /&gt;
&lt;br /&gt;
::f(x) = ln(x) sin(x) e&amp;lt;sup&amp;gt;-x&amp;lt;/sup&amp;gt;&lt;br /&gt;
&lt;br /&gt;
:using three different methods:&lt;br /&gt;
# Extended Simpsons' rule&lt;br /&gt;
# Gauss-Legendre quadrature&lt;br /&gt;
# Monte Carlo sampling &lt;br /&gt;
&lt;br /&gt;
*Hint: what is f(0)?&lt;br /&gt;
&lt;br /&gt;
*Compare the convergence of these methods by increasing number of function evaluations.&lt;br /&gt;
&lt;br /&gt;
*Submit makefile, code, plots, version control log. &lt;br /&gt;
&lt;br /&gt;
''Assignment 2''&lt;br /&gt;
&lt;br /&gt;
* Using an adaptive 4th order Runge-Kutta approach, with a relative accuracy of 1e-4, compute the solution for t = [0,100] of the following set of coupled ODEs (Lorenz oscillator)&lt;br /&gt;
&lt;br /&gt;
::dx/dt = &amp;amp;sigma;(y - x)&lt;br /&gt;
&lt;br /&gt;
::dy/dt = (&amp;amp;rho;-z)x-y&lt;br /&gt;
&lt;br /&gt;
::dz/dt = xy - &amp;amp;beta;z&lt;br /&gt;
&lt;br /&gt;
:with &amp;amp;sigma;=10; &amp;amp;beta;=8/3; &amp;amp;rho; = 28, and with initial conditions&lt;br /&gt;
&lt;br /&gt;
::x(0) = 10&lt;br /&gt;
&lt;br /&gt;
::y(0) = 20&lt;br /&gt;
&lt;br /&gt;
::z(0) = 30&lt;br /&gt;
&lt;br /&gt;
* Hint: study the GSL documentation.&lt;br /&gt;
&lt;br /&gt;
*Submit makefile, code, plots, version control log.&lt;br /&gt;
&lt;br /&gt;
Both assignments due on Thursday Mar 7th, 2013, at 9:00 am. Email the files to rzon@scinethpc.ca and ljdursi@scinethpc.ca.&lt;br /&gt;
&lt;br /&gt;
===HW3===&lt;br /&gt;
&lt;br /&gt;
Part 1:&lt;br /&gt;
&lt;br /&gt;
The time-explicit formulation of the 1d diffusion equation looks like this:&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\begin{eqnarray*}&lt;br /&gt;
q^{n+1} &amp;amp; = &amp;amp; q^n + \frac{D \Delta t}{\Delta x^2} &lt;br /&gt;
\left (&lt;br /&gt;
\begin{matrix}&lt;br /&gt;
-2 &amp;amp; 1 \\&lt;br /&gt;
1 &amp;amp; -2 &amp;amp; 1 \\&lt;br /&gt;
&amp;amp; 1 &amp;amp; -2 &amp;amp; 1 \\&lt;br /&gt;
&amp;amp;  &amp;amp;  &amp;amp; \cdots &amp;amp; \\&lt;br /&gt;
&amp;amp;  &amp;amp;  &amp;amp; 1 &amp;amp; -2 &amp;amp; 1 \\&lt;br /&gt;
&amp;amp;  &amp;amp;  &amp;amp; &amp;amp; 1 &amp;amp; -2 \\&lt;br /&gt;
\end{matrix}&lt;br /&gt;
\right ) q^n \\&lt;br /&gt;
&amp;amp; = &amp;amp; \left ( 1 + \frac{D \Delta t}{\Delta x^2} A \right ) q^n&lt;br /&gt;
\end{eqnarray*}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
what are the eignvalues of the matrix A?   What modes would we expect to be amplified/damped by this operator?&lt;br /&gt;
&lt;br /&gt;
* Consider 100 points in the discretization (eg, A is 100x100)&lt;br /&gt;
* Calculate the eigenvalues and eigenvectors (using D__EV ; which sort of matrix are we using here?)&lt;br /&gt;
* Plot the modes with the largest and smallest absolute-value of eigenvalues, and explain their physical significance&lt;br /&gt;
* The numerical method will become unstable with one eigenmode $v$ begins to grow uncontrollably whenever it is present, e.g.&lt;br /&gt;
$ \frac{D \Delta t}{\Delta x^2} A v = \frac{D \Delta t}{\Delta x^2} \lambda v &amp;gt; v$.   In a timestepping solution, the only way to avoid this for a given physical set of parameters and grid size is to reduce the timestep, $\Delta t$.   Use the largest absolute value eigenvalue to place a constraint on $\Delta t$ for stability.&lt;br /&gt;
&lt;br /&gt;
Part 2:&lt;br /&gt;
&lt;br /&gt;
Using the above constraint on $\Delta t$, for a 1d grid of size 100 (eg, a 100x100 matrix A), using lapack, evolve this PDE. Plot and explain results.&lt;br /&gt;
&lt;br /&gt;
* Have an initial condition of $q(x=0,t=0) = 1$, and $q(t=0)$ everywhere else being zero (eg, hot plate just turned on at the left)&lt;br /&gt;
* Take ~100 timesteps and plot the the evolution of $q(x,t)$ at 5 times over that period.&lt;br /&gt;
* You’ll want to use a matrix multiply to compute the matrix-vector multiply ( http://www.gnu.org/software/gsl/manual/html_node/Level-2-GSL-BLAS-Interface.html). Do multiply in double precision (D__MV). Which  should you use?&lt;br /&gt;
* The GSL has a cblas interface, http://www.gnu.org/software/gsl/manual/html_node/Level-2-GSL-BLAS-Interface.html ; an example of its use can be found here http://www.gnu.org/software/gsl/manual/html_node/GSL-CBLAS-Examples.html&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Important things to know about lapack:&lt;br /&gt;
* If you are using an nxn array, the “leading dimension” of the array is n. (This argument is so that you could work on sub-matrices if you wanted)&lt;br /&gt;
* Have to make sure the 2d array is contiguous block of memory&lt;br /&gt;
* You'll (presumably) want to use the C bindings for LAPACK - [http://www.netlib.org/lapack/lapacke.html lapacke].  Note that the usual C arrays are row-major.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Here's a simple example of calling a LAPACKE routine; note that how the matrix is described (here with a pointer to the data, a leading dimension, and the number of rows and columns) will vary with different types of matrix:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
#include &amp;lt;iostream&amp;gt;&lt;br /&gt;
#include &amp;lt;mkl_lapacke.h&amp;gt;&lt;br /&gt;
&lt;br /&gt;
double **matrix(int n,int m);&lt;br /&gt;
void free_matrix(double **a);&lt;br /&gt;
&lt;br /&gt;
int main (int argc, const char * argv[])&lt;br /&gt;
{&lt;br /&gt;
&lt;br /&gt;
   const int n=5;             // number of rows, columns of the matrix&lt;br /&gt;
   const int m = n;           // nrows&lt;br /&gt;
   const int leading_dim_A=n; // leading dimension (# of cols for row major);&lt;br /&gt;
                              // lets us operate on sub-matrices in principle&lt;br /&gt;
   const int leading_dim_b=n; // similarly for b&lt;br /&gt;
   double **A;&lt;br /&gt;
   double *b;&lt;br /&gt;
&lt;br /&gt;
   b = new double[leading_dim_b];&lt;br /&gt;
   A = matrix(n,leading_dim_A);&lt;br /&gt;
&lt;br /&gt;
   for (int i=0; i&amp;lt;n; i++)&lt;br /&gt;
       for (int j=0; j&amp;lt;leading_dim_A; j++)&lt;br /&gt;
            A[i][j] = 0.;&lt;br /&gt;
&lt;br /&gt;
   // let's do a trivial solve&lt;br /&gt;
   // It should be pretty clear that the solution to this system&lt;br /&gt;
   // is x = {0,1,2...n-1}&lt;br /&gt;
&lt;br /&gt;
   for (int i=0; i&amp;lt;leading_dim_A; i++) {&lt;br /&gt;
        A[i][i] = 2.;&lt;br /&gt;
   }&lt;br /&gt;
&lt;br /&gt;
   for (int i=0; i&amp;lt;leading_dim_b; i++) {&lt;br /&gt;
        b[i]    = 2*i;&lt;br /&gt;
   }&lt;br /&gt;
&lt;br /&gt;
   const char transpose='N';     //solve Ax=b, not A^T x = b&lt;br /&gt;
   const int  nrhs = 1;          //  we're only solving 1 right hand side&lt;br /&gt;
   int info;&lt;br /&gt;
&lt;br /&gt;
   // Call DGELS; b will be overwritten with the value of x.&lt;br /&gt;
   info = LAPACKE_dgels(LAPACK_COL_MAJOR,transpose,m,n,nrhs,&lt;br /&gt;
                          &amp;amp;(A[0][0]),leading_dim_A, &amp;amp;(b[0]),leading_dim_b);&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
   // print results&lt;br /&gt;
   for(int i=0;i&amp;lt;n;i++)&lt;br /&gt;
   {&lt;br /&gt;
      if (i != n/2)&lt;br /&gt;
        std::cout &amp;lt;&amp;lt; &amp;quot;    &amp;quot; &amp;lt;&amp;lt; b[i] &amp;lt;&amp;lt; std::endl;&lt;br /&gt;
      else&lt;br /&gt;
        std::cout &amp;lt;&amp;lt; &amp;quot;x = &amp;quot; &amp;lt;&amp;lt; b[i] &amp;lt;&amp;lt; std::endl;&lt;br /&gt;
   }&lt;br /&gt;
   return(info);&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
double **matrix(int n,int m) {&lt;br /&gt;
   double **a = new double * [n];&lt;br /&gt;
   a[0] = new double [n*m];&lt;br /&gt;
&lt;br /&gt;
   for (int i=1; i&amp;lt;n; i++)&lt;br /&gt;
         a[i] = &amp;amp;a[0][i*m];&lt;br /&gt;
&lt;br /&gt;
   return a;&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
void free_matrix(double **a) {&lt;br /&gt;
   delete[] a[0];&lt;br /&gt;
   delete[] a;&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===HW4===&lt;br /&gt;
&lt;br /&gt;
''Assignment 1''&lt;br /&gt;
&lt;br /&gt;
Trigonometric interpolation uses a n point Fourier series to find values at intermediate points. It is one way of downscaling data, and was a motivation for Gauss, to be applied to planetary motion.&lt;br /&gt;
&lt;br /&gt;
The way it works is:&lt;br /&gt;
&lt;br /&gt;
# You fourier-transform your data&lt;br /&gt;
# You add frequecies above the Nyquist frequency (in absolute values), but set all the amplitudes of the new frequencies to zero.&lt;br /&gt;
# Note that the frequencies are stored such that eg. f&amp;lt;sub&amp;gt;n-1&amp;lt;/sub&amp;gt; is a low frequency -1.&lt;br /&gt;
# The resulting 2n array can be back transformed, and now gives an interpolated signal.&lt;br /&gt;
&lt;br /&gt;
For this assignment, write an application that will read in an image from a binary file into a 2d double precision array (this will require converting from bytes to doubles), and creates an image twice the size in all directions using trigonometric interpolation. Use a real-to-half-complex version of the fftw (note: in 2d, this version of the fftw mixes fourier components with the same physical magnitude of their wave number k, so this will work).&lt;br /&gt;
You can process the red, green and blue values separately. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
''Assignment 2''&lt;br /&gt;
&lt;br /&gt;
Write an application which reads an image and performs a low pass filter on the image, i.e., any fourier components with magnitudes k larger than n/8 are to be set to zero, after which the fourier inverse is taken and the image is written out to disk again. Use the same fft technique as in the first assignment.&lt;br /&gt;
&lt;br /&gt;
'''Input image'''&lt;br /&gt;
&lt;br /&gt;
Use [[Media:gauss256.tgz|this image of Gauss]].&lt;br /&gt;
&lt;br /&gt;
'''Image format:'''&lt;br /&gt;
&lt;br /&gt;
Use the following simple PPM format:&lt;br /&gt;
&lt;br /&gt;
First line (ascii): 'P6\n'&amp;lt;br&amp;gt;&lt;br /&gt;
Second line, in ascii, 'width height\n'&amp;lt;br&amp;gt;&lt;br /&gt;
Third line, in ascii, 'maxcolorvalue\n' (this is typically just 255)&amp;lt;br&amp;gt;&lt;br /&gt;
Following that, in binary, are byte-triplets with the red, green and blue values of each pixel.&amp;lt;br&amp;gt;&lt;br /&gt;
Note: in C, the 'unsigned char' data type matches the concept of a byte best (for most machines anyway).&lt;br /&gt;
&lt;br /&gt;
In fact, between the first and second line, one can have comment lines that start with '#'.&lt;br /&gt;
&lt;br /&gt;
=Part 3: High Performance Scientific Computing=&lt;br /&gt;
&lt;br /&gt;
==Prerequisites==&lt;br /&gt;
&lt;br /&gt;
Part 1 or good c++ programming skills, including make and unix/linux prompt experience.&lt;br /&gt;
&lt;br /&gt;
'''Software that you'll need'''&lt;br /&gt;
&lt;br /&gt;
You will need to bring a laptop with a ssh facility. Hands-on parts will be done on SciNet's GPC cluster.&lt;br /&gt;
&lt;br /&gt;
For those who don't have a SciNet account yet, the instructions can be found at http://wiki.scinethpc.ca/wiki/index.php/Essentials#Accounts&lt;br /&gt;
&lt;br /&gt;
==Dates==&lt;br /&gt;
March 19, 21, 26, and 28, 2013&amp;lt;br&amp;gt;&lt;br /&gt;
April 2, 4, 9, and 11, 2013&lt;br /&gt;
&lt;br /&gt;
==Topics==&lt;br /&gt;
===''Lecture 1:'' Introduction to Parallel Programming ===&lt;br /&gt;
:::[[File:Lecture17-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture17-2013/lecture17-2013.html]]&lt;br /&gt;
:::[[Media:Lecture17-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture17-2013/lecture17-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 2:'' Parallel Computing Paradigms ===&lt;br /&gt;
&lt;br /&gt;
:::[[File:Lecture18-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture18-2013/lecture18-2013.html]]&lt;br /&gt;
:::[[Media:Lecture18-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture18-2013/lecture18-2013.mp4 Video recording] &amp;amp;nbsp;/ &amp;amp;nbsp; [[#HW1_3|homework 1]]&lt;br /&gt;
&lt;br /&gt;
===''Lectures 3,4:''  Shared Memory Programming with OpenMP, part 1,2===&lt;br /&gt;
&lt;br /&gt;
:::[[Media:Lecture19-2013.pdf|Slides]]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 5:'' Distributed Parallel Programming with MPI, part 1===&lt;br /&gt;
&lt;br /&gt;
:::[[Media:Lecture21-2013.pdf|Slides]]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 6:'' Distributed Parallel Programming with MPI, part 2===&lt;br /&gt;
&lt;br /&gt;
:::[[Media:Lecture22-2013.pdf|Slides]]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 7''&amp;amp;nbsp;&amp;amp;nbsp; Distributed Parallel Programming with MPI, part 3===&lt;br /&gt;
&lt;br /&gt;
:::[[Media:Lecture23-2013.pdf|Slides]]&lt;br /&gt;
&lt;br /&gt;
''Lecture 8''&amp;amp;nbsp;&amp;amp;nbsp; Hybrid OpenMPI+MPI Programming&lt;br /&gt;
&lt;br /&gt;
== Homework assignments ==&lt;br /&gt;
&lt;br /&gt;
=== HW1 ===&lt;br /&gt;
&lt;br /&gt;
* Read the SciNet tutorial (as it pertains to the GPC)&lt;br /&gt;
* Read the GPC Quick Start.&lt;br /&gt;
* Get the first set of code:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
   $ cd $SCRATCH&lt;br /&gt;
   $ git clone /scinet/course/sc3/homework1&lt;br /&gt;
   $ cd homework1&lt;br /&gt;
   $ source setup&lt;br /&gt;
   $ make&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
*This contains threaded program 'blurppm' and 266 ppm images to be blurred. Usage:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
  blurppm INPUTPPM OUTPUTPPM BLURRADIUS NUMBEROFTHREADS&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
* Simple test:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
  $ qsub -l nodes=1:ppn=8,walltime=2:00:00 -I -X -qdebug&lt;br /&gt;
  $ cd $SCRATCH/homework1&lt;br /&gt;
  $ time blurppm 001.ppm new001.ppm 30 1&lt;br /&gt;
  real  0m52.900s&lt;br /&gt;
  user  0m52.881s&lt;br /&gt;
  sys   0m0.008s&lt;br /&gt;
  $ display 001.ppm &amp;amp;&lt;br /&gt;
  $ display new001.ppm &amp;amp;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
''Assignment 1''&lt;br /&gt;
* Time blurppm with a BLURRADIUS ranging from 1 to 41 in steps of 4, and for NUMBEROFTHREADS ranging from 1 to 16.  Record the (real) duration of each run.&lt;br /&gt;
* Plot the duration as a function of NUMBEROFTHREADS, as well as  the speed-up and efficiency.&lt;br /&gt;
* Submit script and plots of the duration, speedup and effiency as a function of NUMBEROFTHREADS.&lt;br /&gt;
''Assignment 2''&lt;br /&gt;
* Use GNU parallel to run blurppm on all 266 images with a radius of 41.&lt;br /&gt;
* Investigate different scenarios:&lt;br /&gt;
:# Have GNU parallel run 16 at a time with just 1 thread.&lt;br /&gt;
:# Have GNU parallel run 8 at a time with 2 threads.&lt;br /&gt;
:# Have GNU parallel run 4 at a time with 4 threads.&lt;br /&gt;
:# Have GNU parallel run 2 at a time with 8 threads.&lt;br /&gt;
:# Have GNU parallel run 1 at a time with 16 threads.&lt;br /&gt;
:Record the total time it takes in each of these scenarios.&lt;br /&gt;
* Repeat this with a BLURRADIUS of 3.&lt;br /&gt;
* Submit scripts, timing data  and plots.&lt;br /&gt;
&lt;br /&gt;
=== HW2 ===&lt;br /&gt;
&lt;br /&gt;
In the course materials ( /scinet/course/ppp/nbodyc or nbodyf ) there is the source code for a serial N-body integrator.  This, like the molecular dynamics you've seen earlier, calculates long-range forces by particles on all of the other particles.&lt;br /&gt;
&lt;br /&gt;
OpenMP the force calculation, and present timing results for 1,4, and 8 threads compared to the serial version.  Note that you can turn off graphic output by removing the &amp;quot;USEPGPLOT = -DPGPLOT&amp;quot; line in Makefile.inc in the top level directory.&lt;br /&gt;
&lt;br /&gt;
Begin by doubling the work by _not_ calculating two forces at once (eg, not making use of f&amp;lt;sub&amp;gt;ji&amp;lt;/sub&amp;gt; = -f&amp;lt;sub&amp;gt;ij&amp;lt;/sub&amp;gt;), and simply parallelizing the outer force loop.  Then find a way to implement the forces efficiently but also in parallel.  Is there any other part of the problem which could usefully be parallelized?&lt;br /&gt;
&lt;br /&gt;
=== HW3 ===&lt;br /&gt;
&lt;br /&gt;
In the same area (  /scinet/course/ppp/diffusion/diffusion.c ) there is a 1d diffusion problem of a form you may recognize from earlier modules.   Your task is to parallelize this with MPI.  I'd encourage you to use the graphics output while you're developing this (use -DPGPLOT on the compile line), and then omit the graphics and do timings with a larger totpoints (say 16000) and run timings on 1, 2, 4, and 8 processors.   Note that for this simple problem, we don't necessarily expect a huge speedup; it's already pretty fast.&lt;br /&gt;
&lt;br /&gt;
I'd suggest doing this in steps:&lt;br /&gt;
&lt;br /&gt;
* include mpi.h, add mpi_init/finalize/comm_size/comm_rank, and compile with mpicc and make sure it runs;&lt;br /&gt;
* calculate the local number of points from the total number of points and the size (and possibly rank); treat the case where the total number of points isn't divisible by the number of processors however you like, but make sure you're consistent about it.&lt;br /&gt;
* Once you've calculated the local number of points, you won't need the variable totpoints; no arrays will be declared of that size, no plots will be made of that size, etc.   Make those changes, compile, and run on (say) 2 procs.&lt;br /&gt;
* Now you'll find that you'll need to figure out the local xleft and local xright of the domain; again, once this is done you won't need to know the global variables any more.   Make those changes, compile, and run.&lt;br /&gt;
* Finally, after the &amp;quot;old&amp;quot; boundary condition setting, do the internal boundary conditions, as in our example on class on Tuesday with sending messages around to our neighbour.&lt;br /&gt;
&lt;br /&gt;
=Links=&lt;br /&gt;
&lt;br /&gt;
==Unix==&lt;br /&gt;
* Cygwin: http://www.cygwin.com&lt;br /&gt;
* Linux Command Line: A Primer (June 2012) [[Media:SS_IntroToShell.pdf|Slides,]] [[Media:SS_IntroToShell.tgz|Files]]&lt;br /&gt;
* Intro to unix shell from software carpentry: http://software-carpentry.org/4_0/shell&lt;br /&gt;
&lt;br /&gt;
==C/C++==&lt;br /&gt;
* [[One-Day Scientific C++ Class]] at SciNet&lt;br /&gt;
* C++ library reference: http://www.cplusplus.com/reference&lt;br /&gt;
* C preprocessor: http://www.cprogramming.com/tutorial/cpreprocessor.html&lt;br /&gt;
* Boost: http://www.boost.org&lt;br /&gt;
* Boost Python tutorial: http://www.boost.org/doc/libs/1_53_0/libs/python/doc/tutorial/doc/html/index.html&lt;br /&gt;
&lt;br /&gt;
==Git==&lt;br /&gt;
* Git: http://git-scm.com&lt;br /&gt;
* Version Control: [http://support.scinet.utoronto.ca/CourseVideo/PPPcourse/Thursday_Morning_BP_Revision_Control/Thursday_Morning_BP_Revision_Control.mp4 Video]/ [[Media:Snug_techtalk_revcontrol.pdf | Slides]]&lt;br /&gt;
* Git cheat sheet from Git Tower: http://www.git-tower.com/files/cheatsheet/Git_Cheat_Sheet_grey.pdf&lt;br /&gt;
&lt;br /&gt;
==Python==&lt;br /&gt;
* Python: http://www.python.org&lt;br /&gt;
* IPython: http://ipython.org&lt;br /&gt;
* Matplotlib: http://www.matplotlib.org&lt;br /&gt;
* Enthought python distribution: http://www.enthought.com/products/edudownload.php&amp;lt;br/&amp;gt;&lt;br /&gt;
(this gives you numpy, matplotlib and ipython all installed in one fell swoop)&lt;br /&gt;
&lt;br /&gt;
* Intro to python from software carpentry: http://software-carpentry.org/4_0/python&lt;br /&gt;
* Tutorial on matplotlib: http://conference.scipy.org/scipy2011/tutorials.php#jonathan&lt;br /&gt;
* Npy file format: https://github.com/numpy/numpy/blob/master/doc/neps/npy-format.txt&lt;br /&gt;
* Boost Python tutorial: http://www.boost.org/doc/libs/1_53_0/libs/python/doc/tutorial/doc/html/index.html&lt;br /&gt;
&lt;br /&gt;
==ODEs==&lt;br /&gt;
* Integrators for particle based ODEs (i.e. molecular dynamics): http://www.chem.utoronto.ca/~rzon/simcourse/partmd.pdf. &amp;lt;br&amp;gt;'''Focus on 4.1.4 - 4.1.6 for practical aspects.'''&lt;br /&gt;
* Numerical algorithm to solve ODEs (General) in ''Numerical Recipes for C'': http://apps.nrbook.com/c/index.html Chapter 16&lt;br /&gt;
&lt;br /&gt;
==Interpolation (2D) ==&lt;br /&gt;
* Interpolation in ''Numerical Recipes for C'': http://apps.nrbook.com/c/index.html Pages 123-128&lt;br /&gt;
* Wikipedia pages on [http://en.wikipedia.org/wiki/Bilinear_interpolation Bilinear Interpolation] and [http://en.wikipedia.org/wiki/Bicubic_interpolation Bicubic Interpolation] are not bad either.&lt;br /&gt;
&lt;br /&gt;
==BLAS==&lt;br /&gt;
* [http://www.tacc.utexas.edu/tacc-projects/gotoblas2 gotoblas]&lt;br /&gt;
* [http://math-atlas.sourceforge.net/ ATLAS]&lt;br /&gt;
&lt;br /&gt;
==LAPACK==&lt;br /&gt;
* http://www.netlib.org/lapack&lt;br /&gt;
&lt;br /&gt;
==GSL==&lt;br /&gt;
* GNU Scientific Library: http://www.gnu.org/s/gsl&lt;br /&gt;
&lt;br /&gt;
==FFT==&lt;br /&gt;
* FFTW: http://www.fftw.org&lt;br /&gt;
&lt;br /&gt;
==Top500==&lt;br /&gt;
* TOP500 Supercomputing Sites: http://top500.org&lt;br /&gt;
&lt;br /&gt;
==OpenMP==&lt;br /&gt;
* OpenMP (open multi-processing) application programming interface for shared memory programming: http://openmp.org&lt;br /&gt;
&lt;br /&gt;
==GNU parallel==&lt;br /&gt;
* Official citation: O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login: The USENIX Magazine, February 2011:42-47.&lt;br /&gt;
* [[Media:Tech-talk-gnu-parallel.pdf|Slides of the SciNet TechTalk on Gnu Parallel (14 Nov 2012)]]&lt;br /&gt;
* The documentation for GNU parallel can be found at http://www.gnu.org/software/parallel/&lt;br /&gt;
* Its man page can be found here http://www.gnu.org/software/parallel/man.html&lt;br /&gt;
* The man page is also available on the GPC when the gnu-parallel module is loaded, with the command &amp;lt;code&amp;gt;$ man parallel&amp;lt;/code&amp;gt;. The man page contains options, such as how to make sure the output is not all scrambled, and examples.&lt;br /&gt;
&lt;br /&gt;
==SciNet==&lt;br /&gt;
&lt;br /&gt;
Anything on this wiki, really, but specifically:&lt;br /&gt;
* [[Essentials|SciNet Essentials]]&lt;br /&gt;
* [[GPC Quickstart]]&lt;br /&gt;
* [[Media:SciNet_Tutorial.pdf |SciNet User Tutorial]]&lt;br /&gt;
* [[Software and Libraries]]&lt;br /&gt;
&lt;br /&gt;
==Other Resources==&lt;br /&gt;
* [http://galileo.phys.virginia.edu/classes/551.jvn.fall01/goldberg.pdf What Every Computer Scientist Should Know About Floating-Point Arithmetic] - the classic (and extremely comprehensive) overview of the basics of floating point math.   The first few pages, in particular, are very useful.&lt;br /&gt;
* [http://arxiv.org/abs/1005.4117 Random Numbers In Scientific Computing: An Introduction] by Katzgraber.   A very lucid discussion of pseudo random number generators for science.&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Scientific_Computing_Course&amp;diff=5927</id>
		<title>Scientific Computing Course</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Scientific_Computing_Course&amp;diff=5927"/>
		<updated>2013-04-05T20:23:32Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: /* Homework assignments */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;''This wiki page concerns the 2013 installment of SciNet's Scientific Computing course. Material from the previous installment can be found on [[Scientific Software Development Course]], [[Numerical Tools for Physical Scientists (course)]], and [[High Performance Scientific Computing]]''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
=Syllabus=&lt;br /&gt;
&lt;br /&gt;
==About the course==&lt;br /&gt;
* Whole-term graduate course&lt;br /&gt;
* Prerequisite: basic C, C++ or Fortran experience.&lt;br /&gt;
* Will use `C++ light' and Python&lt;br /&gt;
* Topics include: Scientific computing and programming skills, Parallel programming, and Hybrid programming.  &lt;br /&gt;
&lt;br /&gt;
There are three parts to this course:&lt;br /&gt;
&lt;br /&gt;
# Scientific Software Development: Jan/Feb 2013&amp;lt;br&amp;gt;''python, C++, git, make, modular programming, debugging''&lt;br /&gt;
# Numerical Tools for Physical Scientists: Feb/Mar 2013&amp;lt;br&amp;gt;''modelling, floating point, Monte Carlo, ODE, linear algebra,fft''&lt;br /&gt;
# High Performance Scientific Computing: Mar/Apr 2013&amp;lt;br&amp;gt;''openmp, mpi and hybrid programming''&lt;br /&gt;
&lt;br /&gt;
Each part consists of eight one-hour lectures, two per week.&lt;br /&gt;
&lt;br /&gt;
These can be taken separately by astrophysics graduate students at the University of Toronto as mini-courses, and by physics graduate students at the University of Toronto as modular courses.&lt;br /&gt;
&lt;br /&gt;
The first two parts count towards the SciNet Certificate in Scientific Computing, while the third part can count towards the SciNet HPC Certificate. For more info about the SciNet Certificates, see http://www.scinethpc.ca/2012/12/scinet-hpc-certificate-program.&lt;br /&gt;
&lt;br /&gt;
==Location and Times==&lt;br /&gt;
[http://www.scinethpc.ca/2010/08/contact-us SciNet HeadQuarters]&amp;lt;br&amp;gt;&lt;br /&gt;
256 McCaul Street, Toronto, ON&amp;lt;br&amp;gt;&lt;br /&gt;
Room 229 (Conference Room)&amp;lt;br&amp;gt;&lt;br /&gt;
Tuesdays 11:00 am - 12:00 noon&amp;lt;br&amp;gt;&lt;br /&gt;
Thursdays 11:00 am - 12:00 noon&lt;br /&gt;
&lt;br /&gt;
==Instructors and office hours==&lt;br /&gt;
&lt;br /&gt;
* Ramses van Zon - 256 McCaul Street, Rm 228 - Mondays 3-4pm&lt;br /&gt;
* L. Jonathan Dursi - 256 McCaul Street, Rm 216 - Wednesdays 3-4pm&lt;br /&gt;
&lt;br /&gt;
==Grading scheme==&lt;br /&gt;
&lt;br /&gt;
Attendence to lectures.&lt;br /&gt;
&lt;br /&gt;
Four home work sets (i.e., one per week), to be returned by email by 9:00 am the next Thursday.&lt;br /&gt;
&lt;br /&gt;
==Sign up==&lt;br /&gt;
Sign up for this graduate course goes through SciNet's course website.&amp;lt;br&amp;gt;The direct link is https://support.scinet.utoronto.ca/courses/?q=node/99.&amp;lt;br&amp;gt;  If you do not have a SciNet account but wish to register for this course, please email support@scinet.utoronto.ca . &amp;lt;br&amp;gt;&lt;br /&gt;
Sign up is closed.&lt;br /&gt;
&lt;br /&gt;
=Part 1: Scientific Software Development=&lt;br /&gt;
&lt;br /&gt;
==Prerequisites==&lt;br /&gt;
&lt;br /&gt;
Some programming experience. Some unix prompt experience.&lt;br /&gt;
&lt;br /&gt;
'''Software that you'll need:'''&lt;br /&gt;
&lt;br /&gt;
A unix-like environment with the GNU compiler suite (e.g. Cygwin), and Python 2, IPython, Numpy, SciPy and Matplotlib (which you all get if you use the Enthought distribution) installed on your laptop. Links are given at the bottom of this page.&lt;br /&gt;
&lt;br /&gt;
==Dates==&lt;br /&gt;
&lt;br /&gt;
January 15, 17, 22, 24, 29, and 31, 2013&amp;lt;br&amp;gt;&lt;br /&gt;
February 5 and 7, 2013&lt;br /&gt;
&lt;br /&gt;
==Topics (with lecture slides and recordings)==&lt;br /&gt;
&lt;br /&gt;
===''Lecture 1:'' C++ introduction===&lt;br /&gt;
:::[[File:Lecture1-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture1-2013/lecture1-2013.html]]&lt;br /&gt;
:::[[Media:Lecture1-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture1-2013/lecture1-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 2:'' More C++, build and version control&amp;lt;br&amp;gt;===&lt;br /&gt;
:::[[File:Lecture2-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture2-2013/lecture2-2013.html]]&lt;br /&gt;
:::Guest lecturer: Michael Nolta (CITA) for the git portion of the lecture.&lt;br /&gt;
:::[[Media:Lecture2-2013.pdf|C++ and Make slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture2-2013/lecture2-2013.mp4 C++ and Make video recording] &amp;amp;nbsp;/ &amp;amp;nbsp; [[Media:Git-Nolta.pdf|Git slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [[#HW1|Homework assigment 1]]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 3:'' Python and visualization===&lt;br /&gt;
:::[[File:Lecture3-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture3-2013/lecture3-2013.html]]&lt;br /&gt;
:::[[Media:Lecture3-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture3-2013/lecture3-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 4:'' Modular programming, refactoring, testing===&lt;br /&gt;
:::[[File:Lecture4-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture4-2013/lecture4-2013.html]]&lt;br /&gt;
:::[[Media:Lecture4-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture4-2013/lecture4-2013.mp4 Video recording] &amp;amp;nbsp;/ &amp;amp;nbsp;  [[#HW2|Homework assigment 2]]&lt;br /&gt;
:::[http://wiki.scinethpc.ca/wiki/images/f/f0/diffuse.cc diffuse.cc (course project source file)] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://wiki.scinethpc.ca/wiki/images/f/f0/plotdata.py plotdata.py (corresponding python movie generator)]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 5:'' Object oriented programming===&lt;br /&gt;
:::[[Media:Lecture5-2013.pdf|Slides]]&lt;br /&gt;
:::Recordings of this lecture are missing, but you could view the videos of SciNet's [[One-Day Scientific C++ Class]], in particular the parts on classes, polymorphism, and inheritance.&lt;br /&gt;
&lt;br /&gt;
===''Lecture 6:'' ODE, interpolation===&lt;br /&gt;
:::[[File:Lecture6-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture6-2013/lecture6-2013.html]]&lt;br /&gt;
:::[[Media:ScientificComputing2013-Lecture5-ODE.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture6-2013/lecture6-2013.mp4 Video recording] &amp;amp;nbsp;/ &amp;amp;nbsp; [[#HW3|Homework assigment 3]]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 7:'' Development tools: debugging and profiling===&lt;br /&gt;
:::[[File:Lecture7-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture7-2013/lecture7-2013.html]]&lt;br /&gt;
:::[[Media:ScientificComputing2013-Debugging.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture7-2013/lecture7-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 8:'' Objects in Python, linking C++ and Python===&lt;br /&gt;
:::[[File:Lecture8-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture8-2013/lecture8-2013.html]]&lt;br /&gt;
:::[[Media:Lecture8-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture8-2013/lecture8-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
==Homework assignments==&lt;br /&gt;
&lt;br /&gt;
===HW1===&lt;br /&gt;
&lt;br /&gt;
'''''Multi-file C++ program to create a data file'''''&lt;br /&gt;
&lt;br /&gt;
We’ve learned programming in basic C++, use of make and Makefiles to build projects, and local use of git for version control. In this first assignment, you’ll use these to make a multi-file C++ program, built with make, which computes and outputs a data file.&lt;br /&gt;
&lt;br /&gt;
* Start a git repository, and begin writing a C++ program to&lt;br /&gt;
:# Get an array size and a standard deviation from user input,&lt;br /&gt;
:# Allocate a 2d array (use the code given in lecture 2),&lt;br /&gt;
:# Store a 2d Gaussian with a maximum at the centre of the array and given standard deviation (in units of grid points),&lt;br /&gt;
:# Output that array to a text file,&lt;br /&gt;
:# Free the array, and exit. &lt;br /&gt;
* The output text file should contain just the data in text format, with a row of the file corresponding to a row of the array and with whitespace between the numbers. &lt;br /&gt;
* The 2d array creation/freeing routines should be in one file (with an associated header file), the gaussian calculation be in another (ditto), and the output routine be in a third, with the main program calling each of these. &lt;br /&gt;
* Use a makefile to build your code (add it to the repository).&lt;br /&gt;
* You can start with everything in one file, with hardcoded values for sizes and standard deviation and a static array, then refactor things into multiple files, adding the other features.&lt;br /&gt;
* As a test, use the ipython executable that came with your Enthought python distribution to read your data and plot it.&amp;lt;br&amp;gt;If your data file is named ‘data.txt’, running the following:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ipython --pylab&lt;br /&gt;
In [1]: data = numpy.genfromtxt('data.txt') &lt;br /&gt;
In [2]: contour(data) &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
should give a nice contour plot of a 2-dimensional gaussian.&lt;br /&gt;
* Email in your source code, makefile and the &amp;quot;git log&amp;quot; output of all your commits by email by at 9:00 am Thursday Jan 24th, 2013. Please zip or tar these files together as one attachment, with a file name that includes your name and &amp;quot;HW1&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
===HW2===&lt;br /&gt;
'''''Refactor legacy code to a modular project with unit tests'''''&lt;br /&gt;
&lt;br /&gt;
In class, today, we talked about modular programming and testing, and the project we’ll be working on for the next three weeks. This homework will start advancing on that project by working on the “legacy” code given to us by our supervisor ([http://wiki.scinethpc.ca/wiki/images/f/f0/diffuse.cc diffuse.cc]), with a corresponding python plotting script ([http://wiki.scinethpc.ca/wiki/images/f/f0/plotdata.py plotdata.py]), and whipping it into shape before we start adding new physics.&lt;br /&gt;
* Start a git repository for this project, and add the two files.&lt;br /&gt;
* Create a Makefile and add it to the repository.&lt;br /&gt;
* Since we have no tests, run the program with console output redirected to a file:&lt;br /&gt;
:&amp;lt;pre&amp;gt;$ diffuse &amp;gt; original-output.txt&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;''It turns out the code has a bug that can make the output different when the same code is run again, which obviously would not be good for a baseline test. Replace 'float error;' by 'float error=0.0;' to fix this.''&lt;br /&gt;
* Also save the two .npy output files, e.g. to original-data.npy and original-theory.npy. The triplet of files (original-output.txt, original-data.npy and original-theory.npy) serve as a baseline integrated test (add these to repository). &lt;br /&gt;
* Then write a 'test' target in your makefile that:&lt;br /&gt;
** Runs 'diffuse' with output to a new file.&lt;br /&gt;
** Compares the file with the baseline test file, and compare the .npy files.&lt;br /&gt;
:: (hint: the unix command diff or cmp can compare files).&lt;br /&gt;
* First refactoring: Move the global variables into the main routine.&lt;br /&gt;
* ''Chorus: Test your modified code, and commit.''&lt;br /&gt;
* Second refactoring: Extract a diffusion operator routine, that gets called from main.&lt;br /&gt;
* ''Chorus''&lt;br /&gt;
* Create a .cc/.h module for the diffusion operator.&lt;br /&gt;
* ''Chorus''&lt;br /&gt;
* Add two tests for the diffusion operator: for a constant and for a linear input field (&amp;lt;tt&amp;gt;rho[i][j]=a*i+b*j&amp;lt;/tt&amp;gt;). Add these to the test target in the makefile.&lt;br /&gt;
* ''Chorus''&lt;br /&gt;
* More refactoring: Extract three more .cc/.h modules:&lt;br /&gt;
** for output (should not contain hardcoded filenames)    &lt;br /&gt;
** computation of the theory&lt;br /&gt;
** and for the array allocation stuff.&lt;br /&gt;
* ''Chorus''&lt;br /&gt;
* Describe, but don't implement in the .h and .cc, what would be appropriate unit tests for these three modules.&lt;br /&gt;
&lt;br /&gt;
Email in your source code and the git log file of all your commits as a .zip or .tar file by email to rzon@scinethpc.ca and ljdursi@scinethpc.ca by 9:00 am on Thursday January 31, 2013.&lt;br /&gt;
&lt;br /&gt;
===HW3===&lt;br /&gt;
This week, we learned about object oriented programming, which fits nicely within the modular programming idea.  In this homework, we are going to use some of it to restructure our code and get it ready to add the tracer particle, the goal of the course project. &lt;br /&gt;
&lt;br /&gt;
The goal will be to have an instance of a &amp;lt;tt&amp;gt;Diffusion&amp;lt;/tt&amp;gt; class,&lt;br /&gt;
as well as an instance of &amp;lt;tt&amp;gt;Tracer&amp;lt;/tt&amp;gt;, which for now will be a&lt;br /&gt;
free particle moving as ('''x'''(t),'''y'''(t)) = ('''x'''(0) +&lt;br /&gt;
'''vx''' t, '''y'''(0) + '''vy''' t), without any coupling yet (we&lt;br /&gt;
will handle this next week).&lt;br /&gt;
&lt;br /&gt;
To be more specific:&lt;br /&gt;
* Clean up your code, using the feedback from your HW2 grading, such that the modules are as independent as possible. &lt;br /&gt;
* If you have not done so yet, add comments to the header files of your modules to explain exactly what each function does (without going into implementation details), what its arguments mean and what it returns (unless it's a void function, of course). &lt;br /&gt;
* Objectify the &amp;lt;tt&amp;gt;main&amp;lt;/tt&amp;gt; routine, by creating a class &amp;lt;tt&amp;gt;Diffusion&amp;lt;/tt&amp;gt;.&lt;br /&gt;
* Put this class in its own module (declaration in .h, implementation in .cc). For instance, the declaration could be&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
// diffusion.h&lt;br /&gt;
#ifndef DIFFUSIONH&lt;br /&gt;
#define DIFFUSIONH&lt;br /&gt;
#include &amp;lt;fstream&amp;gt;&lt;br /&gt;
class Diffusion {&lt;br /&gt;
  public:&lt;br /&gt;
    Diffusion(float x1, float x2, float D, int numPoints);&lt;br /&gt;
    void init(float a0, float sigma0); // set initial field&lt;br /&gt;
    void timeStep(float dt);           // solve diff. equation over dt&lt;br /&gt;
    void toFile(std::ofstream&amp;amp; f);     // write to file (binary,no npyheader)&lt;br /&gt;
    void toScreen();                   // report a line to screen&lt;br /&gt;
    float getRho(int i, int j);        // get a value of the field&lt;br /&gt;
    ~Diffusion();&lt;br /&gt;
  private:&lt;br /&gt;
    float*** rho;&lt;br /&gt;
    ...&lt;br /&gt;
};&lt;br /&gt;
#endif&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
(this is not supposed to be prescriptive.)&lt;br /&gt;
* In the implementation file you'd have things like&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
// diffusion.cc&lt;br /&gt;
#include &amp;quot;diffusion.h&amp;quot;&lt;br /&gt;
...&lt;br /&gt;
void Diffusion::timeStep(float dt) &lt;br /&gt;
{&lt;br /&gt;
   // code for the timeStep ...&lt;br /&gt;
}&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
(note the inclusion of the module's header file on the top of the implementation, so the class is declared).&lt;br /&gt;
* Let &amp;lt;tt&amp;gt;int main()&amp;lt;/tt&amp;gt; have the same functionality as before, but now by defining the parameters of the run, creating an object of this class, setting up file streams, and taking time steps and writing out by using calls to member functions of this object. &lt;br /&gt;
* Additionally, write a class &amp;lt;tt&amp;gt;Tracer&amp;lt;/tt&amp;gt; which for now implements a free particle in 2d. Something like:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
class Tracer {&lt;br /&gt;
  public:&lt;br /&gt;
    Tracer(float x1, float x2);&lt;br /&gt;
    void init(float x0, float y0, float vx, float vy);&lt;br /&gt;
    void timeStep(float dt);           // solve diff. equation over dt&lt;br /&gt;
    void toFile(std::ofstream&amp;amp; f);     // write to file (binary,no npyheader)&lt;br /&gt;
    void toScreen();                   // report a line to screen&lt;br /&gt;
    ~Tracer();&lt;br /&gt;
  private:&lt;br /&gt;
    ...&lt;br /&gt;
};&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
:The timeStep implementation can in this case use the infamous forward Euler integration scheme, because it happens to be exact here.&lt;br /&gt;
:When it comes to output to a npy file, let's view the the data of the tracer particle at one point in time as a 2x2 matrix &amp;lt;tt&amp;gt;[[x,y],[vx,vy]]&amp;lt;/tt&amp;gt;, so we can use much of the npy output code that we used for the diffusion field, which was a (numPoints+2)x(numPoints+2) matrix.&lt;br /&gt;
* This class too should be its own module (Often, &amp;quot;one class, one module&amp;quot; is a good paradigm, though occasionally you'll have closely related classes).&lt;br /&gt;
* Add some code to int main to  have the Tracer particle evolve at the same time as the diffusion field (although the two are completely uncoupled).&lt;br /&gt;
* Keep using git and make, run the tests that you have regularly to make sure your program still works.&lt;br /&gt;
&lt;br /&gt;
Note that because we've now set up our program in a modular fashion, you can do&lt;br /&gt;
different parts of this assignment in any order you want.  For instance, to wrap your head around object oriented programming, you may like implementing the tracer particle first, so that your diffusion code stays intact.  Or you might want to wait with commenting until the end if you think you'll have to change a module for this assignment.&lt;br /&gt;
&lt;br /&gt;
Email in your source code and the git log file of all your commits as a .zip or .tar file by email to rzon@scinethpc.ca and ljdursi@scinethpc.ca by &lt;br /&gt;
&amp;lt;span style=&amp;quot;color:#ee3300&amp;quot;&amp;gt;3:00 pm on Friday February 8, 2013&amp;lt;/span&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
===HW4===&lt;br /&gt;
&lt;br /&gt;
In this homework, we are going to implement the class project of a tracer particle coupled to a diffusion equation. &lt;br /&gt;
The full specification of the physical problem is [[Media:ScClassProject.pdf|here]].  &lt;br /&gt;
* Augment the tracer particle to include a force in the x and in the y direction, and a friction coefficient alpha, which at first can be constant.&lt;br /&gt;
* Implement the so-called leapfrog integration algorithm for the tracer particle&lt;br /&gt;
:::v &amp;amp;larr; v + f(v) &amp;amp;Delta;t / m&lt;br /&gt;
:::r &amp;amp;larr; r + v &amp;amp;Delta;t&lt;br /&gt;
:where v,r, and f are 2d vectors and f(v) is the total, velocity dependence force speficied in the class project, i.e., the sum of the external force F=qE and the friction force -&amp;amp;alpha;v.&amp;lt;br/&amp;gt;(Note: the v dependence of f make this strictly not a leapfrog integration, but we'll ignore that here.)&lt;br /&gt;
* Further augment the tracer class with a member function 'couple' which takes a diffusion field as input, and adjusts the friction constant. &lt;br /&gt;
* Your implementation of the 'couple' member function will need to interpolate the diffusion field to the current position of the particle. Use [[Media:CppInterpolation.tgz|this interpolation module]].&lt;br /&gt;
* Rewrite your main routine so that before tracer's time step, one calls the coupling. You may need to modify the Diffusion class a bit to get &amp;lt;tt&amp;gt;rho[active]&amp;lt;/tt&amp;gt; out.&lt;br /&gt;
* For simplicity, use the same time step for both the diffusion and the tracer particle.&lt;br /&gt;
* Keep using git and make.&lt;br /&gt;
&lt;br /&gt;
You will hand in your source code, makefiles and the git log file of all your commits by email by &amp;lt;span style=&amp;quot;color:#ee3300&amp;quot;&amp;gt;9:00 am on Thursday February 21, 2013&amp;lt;/span&amp;gt;.  Email the files, preferably zipped or tarred, to rzon@scinethpc.ca and ljdursi@scinethpc.ca.&lt;br /&gt;
&lt;br /&gt;
=Part 2: Numerical Tools for Physical Scientists=&lt;br /&gt;
&lt;br /&gt;
==Prerequisites==&lt;br /&gt;
&lt;br /&gt;
Part 1 or solid c++ programming skills, including make and unix/linux prompt experience.&lt;br /&gt;
&lt;br /&gt;
'''Software that you'll need'''&lt;br /&gt;
&lt;br /&gt;
A unix-like environment with the GNU compiler suite (e.g. Cygwin), and Python (Enthought) installed on your laptop.&lt;br /&gt;
&lt;br /&gt;
==Dates==&lt;br /&gt;
&lt;br /&gt;
February 12, 14, 26, and 28, 2013&amp;lt;br&amp;gt;&lt;br /&gt;
March 5, 7, 12, and 14, 2013&lt;br /&gt;
&lt;br /&gt;
==Topics==&lt;br /&gt;
&lt;br /&gt;
===''Lecture 1:'' Numerics ===&lt;br /&gt;
:::[[File:Lecture9-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture9-2013/lecture9-2013.html]]&lt;br /&gt;
:::[[Media:Lecture9-2013-Numerics.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture9-2013/lecture9-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 2:'' Random numbers ===&lt;br /&gt;
:::[[File:Lecture10-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture10-2013/lecture10-2013.html]]&lt;br /&gt;
:::[[Media:Lecture10-2013-PRNG.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture10-2013/lecture10-2013.mp4 Video recording] &amp;amp;nbsp;/ &amp;amp;nbsp;[http://wiki.scinethpc.ca/wiki/index.php/Scientific_Computing_Course#HW1_2 Homework assignment 1]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 3:'' Numerical integration and ODEs ===&lt;br /&gt;
:::[[File:Lecture11-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture11-2013/lecture11-2013.html]]&lt;br /&gt;
:::[[Media:Lecture11-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture11-2013/lecture11-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 4:'' Molecular Dynamics ===&lt;br /&gt;
:::[[File:Lecture12-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture12-2013/lecture12-2013.html]]&lt;br /&gt;
:::[[Media:Lecture12-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture12-2013/lecture12-2013.mp4 Video recording]  &amp;amp;nbsp;/ &amp;amp;nbsp;[http://wiki.scinethpc.ca/wiki/index.php/Scientific_Computing_Course#HW2_2 Homework assignment 2]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 5:'' Linear Algebra part I ===&lt;br /&gt;
:::[[Media:Lecture13-2013.pdf|Slides (combined with lecture 6)]]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 6:'' Linear Algebra part II and PDEs===&lt;br /&gt;
:::[[File:Lecture14-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture14-2013/lecture14-2013.html]]&lt;br /&gt;
:::[[Media:Lecture13-2013.pdf|Slides (combined with lecture 5)]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture14-2013/lecture14-2013.mp4 Video recording]  &amp;amp;nbsp;/ &amp;amp;nbsp;[http://wiki.scinethpc.ca/wiki/index.php/Scientific_Computing_Course#HW3_2 Homework assignment 3]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 7:'' Fast Fourier Transform===&lt;br /&gt;
:::[[File:Lecture15-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture15-2013/lecture15-2013.html]]&lt;br /&gt;
:::[[Media:Lecture15-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture15-2013/lecture15-2013.mp4 Video recording]  &amp;amp;nbsp;/ &amp;amp;nbsp;[[Media:Sincfftw.cc|example code]]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 8:'' FFT for real and multidimensional data===&lt;br /&gt;
:::[[File:Lecture15-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture16-2013/lecture16-2013.html]]&lt;br /&gt;
:::[[Media:Lecture16-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture16-2013/lecture16-2013.mp4 Video recording]  &amp;amp;nbsp;/ &amp;amp;nbsp; [http://wiki.scinethpc.ca/wiki/index.php/Scientific_Computing_Course#HW4_2 Homework assignment 4]&lt;br /&gt;
&lt;br /&gt;
==Homework Assignments==&lt;br /&gt;
&lt;br /&gt;
===HW1===&lt;br /&gt;
This week's homework consists of two assignments.&lt;br /&gt;
&lt;br /&gt;
''Assignment 1''&lt;br /&gt;
&lt;br /&gt;
* Consider the sequence of numbers: 1 followed by 10&amp;lt;sup&amp;gt;8&amp;lt;/sup&amp;gt; values of 10&amp;lt;sup&amp;gt;-8&amp;lt;/sup&amp;gt;&lt;br /&gt;
* Should sum to 2&lt;br /&gt;
* Write code which sums up those values in order. What answer does it get?&lt;br /&gt;
* Add to program routine which sums up values in reverse order. Does it get correct answer?&lt;br /&gt;
* How would you get correct answer?&lt;br /&gt;
* Submit code, Makefile, text file with answers.&lt;br /&gt;
&lt;br /&gt;
''Assignment 2''&lt;br /&gt;
&lt;br /&gt;
* Implement an linear congruential generator with a = 106, c = 1283, m = 6075 that generates random numbers from 0..1&lt;br /&gt;
* Using that and MT: generate 10,000 pairs (dx, dy) with dx, dy each in -0.1 .. +0.1. Generate histograms of dx and dy (say 200 bins). Does it look okay? What would you expect variation to be?&lt;br /&gt;
* For 10,000 points: take random walks from (x,y)=(0,0) until exceed radius of 2, then stop. Plot histogram of final angles for the two psuedo random number generators. What do you see?&lt;br /&gt;
* Submit makefile, code, plots, git log.&lt;br /&gt;
&lt;br /&gt;
Both assignments due on Thursday Feb 28th, 2013, at 9:00 am. Email the files to rzon@scinethpc.ca and ljdursi@scinethpc.ca.&lt;br /&gt;
&lt;br /&gt;
===HW2===&lt;br /&gt;
&lt;br /&gt;
''Assignment 1''&lt;br /&gt;
&lt;br /&gt;
* Compute numerically (using the GSL):&lt;br /&gt;
&lt;br /&gt;
::&amp;amp;int;&amp;lt;sub&amp;gt;0&amp;lt;/sub&amp;gt;&amp;lt;sup&amp;gt;3&amp;lt;/sup&amp;gt; f(x) &amp;amp;nbsp;dx&lt;br /&gt;
&lt;br /&gt;
:(that is the integral of f(x) from x=0 to x=3)&lt;br /&gt;
&lt;br /&gt;
:with&lt;br /&gt;
&lt;br /&gt;
::f(x) = ln(x) sin(x) e&amp;lt;sup&amp;gt;-x&amp;lt;/sup&amp;gt;&lt;br /&gt;
&lt;br /&gt;
:using three different methods:&lt;br /&gt;
# Extended Simpsons' rule&lt;br /&gt;
# Gauss-Legendre quadrature&lt;br /&gt;
# Monte Carlo sampling &lt;br /&gt;
&lt;br /&gt;
*Hint: what is f(0)?&lt;br /&gt;
&lt;br /&gt;
*Compare the convergence of these methods by increasing number of function evaluations.&lt;br /&gt;
&lt;br /&gt;
*Submit makefile, code, plots, version control log. &lt;br /&gt;
&lt;br /&gt;
''Assignment 2''&lt;br /&gt;
&lt;br /&gt;
* Using an adaptive 4th order Runge-Kutta approach, with a relative accuracy of 1e-4, compute the solution for t = [0,100] of the following set of coupled ODEs (Lorenz oscillator)&lt;br /&gt;
&lt;br /&gt;
::dx/dt = &amp;amp;sigma;(y - x)&lt;br /&gt;
&lt;br /&gt;
::dy/dt = (&amp;amp;rho;-z)x-y&lt;br /&gt;
&lt;br /&gt;
::dz/dt = xy - &amp;amp;beta;z&lt;br /&gt;
&lt;br /&gt;
:with &amp;amp;sigma;=10; &amp;amp;beta;=8/3; &amp;amp;rho; = 28, and with initial conditions&lt;br /&gt;
&lt;br /&gt;
::x(0) = 10&lt;br /&gt;
&lt;br /&gt;
::y(0) = 20&lt;br /&gt;
&lt;br /&gt;
::z(0) = 30&lt;br /&gt;
&lt;br /&gt;
* Hint: study the GSL documentation.&lt;br /&gt;
&lt;br /&gt;
*Submit makefile, code, plots, version control log.&lt;br /&gt;
&lt;br /&gt;
Both assignments due on Thursday Mar 7th, 2013, at 9:00 am. Email the files to rzon@scinethpc.ca and ljdursi@scinethpc.ca.&lt;br /&gt;
&lt;br /&gt;
===HW3===&lt;br /&gt;
&lt;br /&gt;
Part 1:&lt;br /&gt;
&lt;br /&gt;
The time-explicit formulation of the 1d diffusion equation looks like this:&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\begin{eqnarray*}&lt;br /&gt;
q^{n+1} &amp;amp; = &amp;amp; q^n + \frac{D \Delta t}{\Delta x^2} &lt;br /&gt;
\left (&lt;br /&gt;
\begin{matrix}&lt;br /&gt;
-2 &amp;amp; 1 \\&lt;br /&gt;
1 &amp;amp; -2 &amp;amp; 1 \\&lt;br /&gt;
&amp;amp; 1 &amp;amp; -2 &amp;amp; 1 \\&lt;br /&gt;
&amp;amp;  &amp;amp;  &amp;amp; \cdots &amp;amp; \\&lt;br /&gt;
&amp;amp;  &amp;amp;  &amp;amp; 1 &amp;amp; -2 &amp;amp; 1 \\&lt;br /&gt;
&amp;amp;  &amp;amp;  &amp;amp; &amp;amp; 1 &amp;amp; -2 \\&lt;br /&gt;
\end{matrix}&lt;br /&gt;
\right ) q^n \\&lt;br /&gt;
&amp;amp; = &amp;amp; \left ( 1 + \frac{D \Delta t}{\Delta x^2} A \right ) q^n&lt;br /&gt;
\end{eqnarray*}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
what are the eignvalues of the matrix A?   What modes would we expect to be amplified/damped by this operator?&lt;br /&gt;
&lt;br /&gt;
* Consider 100 points in the discretization (eg, A is 100x100)&lt;br /&gt;
* Calculate the eigenvalues and eigenvectors (using D__EV ; which sort of matrix are we using here?)&lt;br /&gt;
* Plot the modes with the largest and smallest absolute-value of eigenvalues, and explain their physical significance&lt;br /&gt;
* The numerical method will become unstable with one eigenmode $v$ begins to grow uncontrollably whenever it is present, e.g.&lt;br /&gt;
$ \frac{D \Delta t}{\Delta x^2} A v = \frac{D \Delta t}{\Delta x^2} \lambda v &amp;gt; v$.   In a timestepping solution, the only way to avoid this for a given physical set of parameters and grid size is to reduce the timestep, $\Delta t$.   Use the largest absolute value eigenvalue to place a constraint on $\Delta t$ for stability.&lt;br /&gt;
&lt;br /&gt;
Part 2:&lt;br /&gt;
&lt;br /&gt;
Using the above constraint on $\Delta t$, for a 1d grid of size 100 (eg, a 100x100 matrix A), using lapack, evolve this PDE. Plot and explain results.&lt;br /&gt;
&lt;br /&gt;
* Have an initial condition of $q(x=0,t=0) = 1$, and $q(t=0)$ everywhere else being zero (eg, hot plate just turned on at the left)&lt;br /&gt;
* Take ~100 timesteps and plot the the evolution of $q(x,t)$ at 5 times over that period.&lt;br /&gt;
* You’ll want to use a matrix multiply to compute the matrix-vector multiply ( http://www.gnu.org/software/gsl/manual/html_node/Level-2-GSL-BLAS-Interface.html). Do multiply in double precision (D__MV). Which  should you use?&lt;br /&gt;
* The GSL has a cblas interface, http://www.gnu.org/software/gsl/manual/html_node/Level-2-GSL-BLAS-Interface.html ; an example of its use can be found here http://www.gnu.org/software/gsl/manual/html_node/GSL-CBLAS-Examples.html&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Important things to know about lapack:&lt;br /&gt;
* If you are using an nxn array, the “leading dimension” of the array is n. (This argument is so that you could work on sub-matrices if you wanted)&lt;br /&gt;
* Have to make sure the 2d array is contiguous block of memory&lt;br /&gt;
* You'll (presumably) want to use the C bindings for LAPACK - [http://www.netlib.org/lapack/lapacke.html lapacke].  Note that the usual C arrays are row-major.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Here's a simple example of calling a LAPACKE routine; note that how the matrix is described (here with a pointer to the data, a leading dimension, and the number of rows and columns) will vary with different types of matrix:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
#include &amp;lt;iostream&amp;gt;&lt;br /&gt;
#include &amp;lt;mkl_lapacke.h&amp;gt;&lt;br /&gt;
&lt;br /&gt;
double **matrix(int n,int m);&lt;br /&gt;
void free_matrix(double **a);&lt;br /&gt;
&lt;br /&gt;
int main (int argc, const char * argv[])&lt;br /&gt;
{&lt;br /&gt;
&lt;br /&gt;
   const int n=5;             // number of rows, columns of the matrix&lt;br /&gt;
   const int m = n;           // nrows&lt;br /&gt;
   const int leading_dim_A=n; // leading dimension (# of cols for row major);&lt;br /&gt;
                              // lets us operate on sub-matrices in principle&lt;br /&gt;
   const int leading_dim_b=n; // similarly for b&lt;br /&gt;
   double **A;&lt;br /&gt;
   double *b;&lt;br /&gt;
&lt;br /&gt;
   b = new double[leading_dim_b];&lt;br /&gt;
   A = matrix(n,leading_dim_A);&lt;br /&gt;
&lt;br /&gt;
   for (int i=0; i&amp;lt;n; i++)&lt;br /&gt;
       for (int j=0; j&amp;lt;leading_dim_A; j++)&lt;br /&gt;
            A[i][j] = 0.;&lt;br /&gt;
&lt;br /&gt;
   // let's do a trivial solve&lt;br /&gt;
   // It should be pretty clear that the solution to this system&lt;br /&gt;
   // is x = {0,1,2...n-1}&lt;br /&gt;
&lt;br /&gt;
   for (int i=0; i&amp;lt;leading_dim_A; i++) {&lt;br /&gt;
        A[i][i] = 2.;&lt;br /&gt;
   }&lt;br /&gt;
&lt;br /&gt;
   for (int i=0; i&amp;lt;leading_dim_b; i++) {&lt;br /&gt;
        b[i]    = 2*i;&lt;br /&gt;
   }&lt;br /&gt;
&lt;br /&gt;
   const char transpose='N';     //solve Ax=b, not A^T x = b&lt;br /&gt;
   const int  nrhs = 1;          //  we're only solving 1 right hand side&lt;br /&gt;
   int info;&lt;br /&gt;
&lt;br /&gt;
   // Call DGELS; b will be overwritten with the value of x.&lt;br /&gt;
   info = LAPACKE_dgels(LAPACK_COL_MAJOR,transpose,m,n,nrhs,&lt;br /&gt;
                          &amp;amp;(A[0][0]),leading_dim_A, &amp;amp;(b[0]),leading_dim_b);&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
   // print results&lt;br /&gt;
   for(int i=0;i&amp;lt;n;i++)&lt;br /&gt;
   {&lt;br /&gt;
      if (i != n/2)&lt;br /&gt;
        std::cout &amp;lt;&amp;lt; &amp;quot;    &amp;quot; &amp;lt;&amp;lt; b[i] &amp;lt;&amp;lt; std::endl;&lt;br /&gt;
      else&lt;br /&gt;
        std::cout &amp;lt;&amp;lt; &amp;quot;x = &amp;quot; &amp;lt;&amp;lt; b[i] &amp;lt;&amp;lt; std::endl;&lt;br /&gt;
   }&lt;br /&gt;
   return(info);&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
double **matrix(int n,int m) {&lt;br /&gt;
   double **a = new double * [n];&lt;br /&gt;
   a[0] = new double [n*m];&lt;br /&gt;
&lt;br /&gt;
   for (int i=1; i&amp;lt;n; i++)&lt;br /&gt;
         a[i] = &amp;amp;a[0][i*m];&lt;br /&gt;
&lt;br /&gt;
   return a;&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
void free_matrix(double **a) {&lt;br /&gt;
   delete[] a[0];&lt;br /&gt;
   delete[] a;&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===HW4===&lt;br /&gt;
&lt;br /&gt;
''Assignment 1''&lt;br /&gt;
&lt;br /&gt;
Trigonometric interpolation uses a n point Fourier series to find values at intermediate points. It is one way of downscaling data, and was a motivation for Gauss, to be applied to planetary motion.&lt;br /&gt;
&lt;br /&gt;
The way it works is:&lt;br /&gt;
&lt;br /&gt;
# You fourier-transform your data&lt;br /&gt;
# You add frequecies above the Nyquist frequency (in absolute values), but set all the amplitudes of the new frequencies to zero.&lt;br /&gt;
# Note that the frequencies are stored such that eg. f&amp;lt;sub&amp;gt;n-1&amp;lt;/sub&amp;gt; is a low frequency -1.&lt;br /&gt;
# The resulting 2n array can be back transformed, and now gives an interpolated signal.&lt;br /&gt;
&lt;br /&gt;
For this assignment, write an application that will read in an image from a binary file into a 2d double precision array (this will require converting from bytes to doubles), and creates an image twice the size in all directions using trigonometric interpolation. Use a real-to-half-complex version of the fftw (note: in 2d, this version of the fftw mixes fourier components with the same physical magnitude of their wave number k, so this will work).&lt;br /&gt;
You can process the red, green and blue values separately. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
''Assignment 2''&lt;br /&gt;
&lt;br /&gt;
Write an application which reads an image and performs a low pass filter on the image, i.e., any fourier components with magnitudes k larger than n/8 are to be set to zero, after which the fourier inverse is taken and the image is written out to disk again. Use the same fft technique as in the first assignment.&lt;br /&gt;
&lt;br /&gt;
'''Input image'''&lt;br /&gt;
&lt;br /&gt;
Use [[Media:gauss256.tgz|this image of Gauss]].&lt;br /&gt;
&lt;br /&gt;
'''Image format:'''&lt;br /&gt;
&lt;br /&gt;
Use the following simple PPM format:&lt;br /&gt;
&lt;br /&gt;
First line (ascii): 'P6\n'&amp;lt;br&amp;gt;&lt;br /&gt;
Second line, in ascii, 'width height\n'&amp;lt;br&amp;gt;&lt;br /&gt;
Third line, in ascii, 'maxcolorvalue\n' (this is typically just 255)&amp;lt;br&amp;gt;&lt;br /&gt;
Following that, in binary, are byte-triplets with the red, green and blue values of each pixel.&amp;lt;br&amp;gt;&lt;br /&gt;
Note: in C, the 'unsigned char' data type matches the concept of a byte best (for most machines anyway).&lt;br /&gt;
&lt;br /&gt;
In fact, between the first and second line, one can have comment lines that start with '#'.&lt;br /&gt;
&lt;br /&gt;
=Part 3: High Performance Scientific Computing=&lt;br /&gt;
&lt;br /&gt;
==Prerequisites==&lt;br /&gt;
&lt;br /&gt;
Part 1 or good c++ programming skills, including make and unix/linux prompt experience.&lt;br /&gt;
&lt;br /&gt;
'''Software that you'll need'''&lt;br /&gt;
&lt;br /&gt;
You will need to bring a laptop with a ssh facility. Hands-on parts will be done on SciNet's GPC cluster.&lt;br /&gt;
&lt;br /&gt;
For those who don't have a SciNet account yet, the instructions can be found at http://wiki.scinethpc.ca/wiki/index.php/Essentials#Accounts&lt;br /&gt;
&lt;br /&gt;
==Dates==&lt;br /&gt;
March 19, 21, 26, and 28, 2013&amp;lt;br&amp;gt;&lt;br /&gt;
April 2, 4, 9, and 11, 2013&lt;br /&gt;
&lt;br /&gt;
==Topics==&lt;br /&gt;
===''Lecture 1:'' Introduction to Parallel Programming ===&lt;br /&gt;
:::[[File:Lecture17-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture17-2013/lecture17-2013.html]]&lt;br /&gt;
:::[[Media:Lecture17-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture17-2013/lecture17-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 2:'' Parallel Computing Paradigms ===&lt;br /&gt;
&lt;br /&gt;
:::[[File:Lecture18-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture18-2013/lecture18-2013.html]]&lt;br /&gt;
:::[[Media:Lecture18-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture18-2013/lecture18-2013.mp4 Video recording] &amp;amp;nbsp;/ &amp;amp;nbsp; [[#HW1_3|homework 1]]&lt;br /&gt;
&lt;br /&gt;
===''Lectures 3,4:''  Shared Memory Programming with OpenMP, part 1,2===&lt;br /&gt;
&lt;br /&gt;
:::[[Media:Lecture19-2013.pdf|Slides]]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 5:'' Distributed Parallel Programming with MPI, part 1===&lt;br /&gt;
&lt;br /&gt;
:::[[Media:Lecture21-2013.pdf|Slides]]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 6:'' Distributed Parallel Programming with MPI, part 2===&lt;br /&gt;
&lt;br /&gt;
:::[[Media:Lecture22-2013.pdf|Slides]]&lt;br /&gt;
&lt;br /&gt;
''Lecture 7''&amp;amp;nbsp;&amp;amp;nbsp; Distributed Parallel Programming with MPI, part 3&amp;lt;br&amp;gt;&lt;br /&gt;
''Lecture 8''&amp;amp;nbsp;&amp;amp;nbsp; Hybrid OpenMPI+MPI Programming&lt;br /&gt;
&lt;br /&gt;
== Homework assignments ==&lt;br /&gt;
&lt;br /&gt;
=== HW1 ===&lt;br /&gt;
&lt;br /&gt;
* Read the SciNet tutorial (as it pertains to the GPC)&lt;br /&gt;
* Read the GPC Quick Start.&lt;br /&gt;
* Get the first set of code:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
   $ cd $SCRATCH&lt;br /&gt;
   $ git clone /scinet/course/sc3/homework1&lt;br /&gt;
   $ cd homework1&lt;br /&gt;
   $ source setup&lt;br /&gt;
   $ make&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
*This contains threaded program 'blurppm' and 266 ppm images to be blurred. Usage:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
  blurppm INPUTPPM OUTPUTPPM BLURRADIUS NUMBEROFTHREADS&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
* Simple test:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
  $ qsub -l nodes=1:ppn=8,walltime=2:00:00 -I -X -qdebug&lt;br /&gt;
  $ cd $SCRATCH/homework1&lt;br /&gt;
  $ time blurppm 001.ppm new001.ppm 30 1&lt;br /&gt;
  real  0m52.900s&lt;br /&gt;
  user  0m52.881s&lt;br /&gt;
  sys   0m0.008s&lt;br /&gt;
  $ display 001.ppm &amp;amp;&lt;br /&gt;
  $ display new001.ppm &amp;amp;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
''Assignment 1''&lt;br /&gt;
* Time blurppm with a BLURRADIUS ranging from 1 to 41 in steps of 4, and for NUMBEROFTHREADS ranging from 1 to 16.  Record the (real) duration of each run.&lt;br /&gt;
* Plot the duration as a function of NUMBEROFTHREADS, as well as  the speed-up and efficiency.&lt;br /&gt;
* Submit script and plots of the duration, speedup and effiency as a function of NUMBEROFTHREADS.&lt;br /&gt;
''Assignment 2''&lt;br /&gt;
* Use GNU parallel to run blurppm on all 266 images with a radius of 41.&lt;br /&gt;
* Investigate different scenarios:&lt;br /&gt;
:# Have GNU parallel run 16 at a time with just 1 thread.&lt;br /&gt;
:# Have GNU parallel run 8 at a time with 2 threads.&lt;br /&gt;
:# Have GNU parallel run 4 at a time with 4 threads.&lt;br /&gt;
:# Have GNU parallel run 2 at a time with 8 threads.&lt;br /&gt;
:# Have GNU parallel run 1 at a time with 16 threads.&lt;br /&gt;
:Record the total time it takes in each of these scenarios.&lt;br /&gt;
* Repeat this with a BLURRADIUS of 3.&lt;br /&gt;
* Submit scripts, timing data  and plots.&lt;br /&gt;
&lt;br /&gt;
=== HW2 ===&lt;br /&gt;
&lt;br /&gt;
In the course materials ( /scinet/course/ppp/nbodyc or nbodyf ) there is the source code for a serial N-body integrator.  This, like the molecular dynamics you've seen earlier, calculates long-range forces by particles on all of the other particles.&lt;br /&gt;
&lt;br /&gt;
OpenMP the force calculation, and present timing results for 1,4, and 8 threads compared to the serial version.  Note that you can turn off graphic output by removing the &amp;quot;USEPGPLOT = -DPGPLOT&amp;quot; line in Makefile.inc in the top level directory.&lt;br /&gt;
&lt;br /&gt;
Begin by doubling the work by _not_ calculating two forces at once (eg, not making use of f&amp;lt;sub&amp;gt;ji&amp;lt;/sub&amp;gt; = -f&amp;lt;sub&amp;gt;ij&amp;lt;/sub&amp;gt;), and simply parallelizing the outer force loop.  Then find a way to implement the forces efficiently but also in parallel.  Is there any other part of the problem which could usefully be parallelized?&lt;br /&gt;
&lt;br /&gt;
=== HW3 ===&lt;br /&gt;
&lt;br /&gt;
In the same area (  /scinet/course/ppp/diffusion/diffusion.c ) there is a 1d diffusion problem of a form you may recognize from earlier modules.   Your task is to parallelize this with MPI.  I'd encourage you to use the graphics output while you're developing this (use -DPGPLOT on the compile line), and then omit the graphics and do timings with a larger totpoints (say 16000) and run timings on 1, 2, 4, and 8 processors.   Note that for this simple problem, we don't necessarily expect a huge speedup; it's already pretty fast.&lt;br /&gt;
&lt;br /&gt;
I'd suggest doing this in steps:&lt;br /&gt;
&lt;br /&gt;
* include mpi.h, add mpi_init/finalize/comm_size/comm_rank, and compile with mpicc and make sure it runs;&lt;br /&gt;
* calculate the local number of points from the total number of points and the size (and possibly rank); treat the case where the total number of points isn't divisible by the number of processors however you like, but make sure you're consistent about it.&lt;br /&gt;
* Once you've calculated the local number of points, you won't need the variable totpoints; no arrays will be declared of that size, no plots will be made of that size, etc.   Make those changes, compile, and run on (say) 2 procs.&lt;br /&gt;
* Now you'll find that you'll need to figure out the local xleft and local xright of the domain; again, once this is done you won't need to know the global variables any more.   Make those changes, compile, and run.&lt;br /&gt;
* Finally, after the &amp;quot;old&amp;quot; boundary condition setting, do the internal boundary conditions, as in our example on class on Tuesday with sending messages around to our neighbour.&lt;br /&gt;
&lt;br /&gt;
=Links=&lt;br /&gt;
&lt;br /&gt;
==Unix==&lt;br /&gt;
* Cygwin: http://www.cygwin.com&lt;br /&gt;
* Linux Command Line: A Primer (June 2012) [[Media:SS_IntroToShell.pdf|Slides,]] [[Media:SS_IntroToShell.tgz|Files]]&lt;br /&gt;
* Intro to unix shell from software carpentry: http://software-carpentry.org/4_0/shell&lt;br /&gt;
&lt;br /&gt;
==C/C++==&lt;br /&gt;
* [[One-Day Scientific C++ Class]] at SciNet&lt;br /&gt;
* C++ library reference: http://www.cplusplus.com/reference&lt;br /&gt;
* C preprocessor: http://www.cprogramming.com/tutorial/cpreprocessor.html&lt;br /&gt;
* Boost: http://www.boost.org&lt;br /&gt;
* Boost Python tutorial: http://www.boost.org/doc/libs/1_53_0/libs/python/doc/tutorial/doc/html/index.html&lt;br /&gt;
&lt;br /&gt;
==Git==&lt;br /&gt;
* Git: http://git-scm.com&lt;br /&gt;
* Version Control: [http://support.scinet.utoronto.ca/CourseVideo/PPPcourse/Thursday_Morning_BP_Revision_Control/Thursday_Morning_BP_Revision_Control.mp4 Video]/ [[Media:Snug_techtalk_revcontrol.pdf | Slides]]&lt;br /&gt;
* Git cheat sheet from Git Tower: http://www.git-tower.com/files/cheatsheet/Git_Cheat_Sheet_grey.pdf&lt;br /&gt;
&lt;br /&gt;
==Python==&lt;br /&gt;
* Python: http://www.python.org&lt;br /&gt;
* IPython: http://ipython.org&lt;br /&gt;
* Matplotlib: http://www.matplotlib.org&lt;br /&gt;
* Enthought python distribution: http://www.enthought.com/products/edudownload.php&amp;lt;br/&amp;gt;&lt;br /&gt;
(this gives you numpy, matplotlib and ipython all installed in one fell swoop)&lt;br /&gt;
&lt;br /&gt;
* Intro to python from software carpentry: http://software-carpentry.org/4_0/python&lt;br /&gt;
* Tutorial on matplotlib: http://conference.scipy.org/scipy2011/tutorials.php#jonathan&lt;br /&gt;
* Npy file format: https://github.com/numpy/numpy/blob/master/doc/neps/npy-format.txt&lt;br /&gt;
* Boost Python tutorial: http://www.boost.org/doc/libs/1_53_0/libs/python/doc/tutorial/doc/html/index.html&lt;br /&gt;
&lt;br /&gt;
==ODEs==&lt;br /&gt;
* Integrators for particle based ODEs (i.e. molecular dynamics): http://www.chem.utoronto.ca/~rzon/simcourse/partmd.pdf. &amp;lt;br&amp;gt;'''Focus on 4.1.4 - 4.1.6 for practical aspects.'''&lt;br /&gt;
* Numerical algorithm to solve ODEs (General) in ''Numerical Recipes for C'': http://apps.nrbook.com/c/index.html Chapter 16&lt;br /&gt;
&lt;br /&gt;
==Interpolation (2D) ==&lt;br /&gt;
* Interpolation in ''Numerical Recipes for C'': http://apps.nrbook.com/c/index.html Pages 123-128&lt;br /&gt;
* Wikipedia pages on [http://en.wikipedia.org/wiki/Bilinear_interpolation Bilinear Interpolation] and [http://en.wikipedia.org/wiki/Bicubic_interpolation Bicubic Interpolation] are not bad either.&lt;br /&gt;
&lt;br /&gt;
==BLAS==&lt;br /&gt;
* [http://www.tacc.utexas.edu/tacc-projects/gotoblas2 gotoblas]&lt;br /&gt;
* [http://math-atlas.sourceforge.net/ ATLAS]&lt;br /&gt;
&lt;br /&gt;
==LAPACK==&lt;br /&gt;
* http://www.netlib.org/lapack&lt;br /&gt;
&lt;br /&gt;
==GSL==&lt;br /&gt;
* GNU Scientific Library: http://www.gnu.org/s/gsl&lt;br /&gt;
&lt;br /&gt;
==FFT==&lt;br /&gt;
* FFTW: http://www.fftw.org&lt;br /&gt;
&lt;br /&gt;
==Top500==&lt;br /&gt;
* TOP500 Supercomputing Sites: http://top500.org&lt;br /&gt;
&lt;br /&gt;
==OpenMP==&lt;br /&gt;
* OpenMP (open multi-processing) application programming interface for shared memory programming: http://openmp.org&lt;br /&gt;
&lt;br /&gt;
==GNU parallel==&lt;br /&gt;
* Official citation: O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login: The USENIX Magazine, February 2011:42-47.&lt;br /&gt;
* [[Media:Tech-talk-gnu-parallel.pdf|Slides of the SciNet TechTalk on Gnu Parallel (14 Nov 2012)]]&lt;br /&gt;
* The documentation for GNU parallel can be found at http://www.gnu.org/software/parallel/&lt;br /&gt;
* Its man page can be found here http://www.gnu.org/software/parallel/man.html&lt;br /&gt;
* The man page is also available on the GPC when the gnu-parallel module is loaded, with the command &amp;lt;code&amp;gt;$ man parallel&amp;lt;/code&amp;gt;. The man page contains options, such as how to make sure the output is not all scrambled, and examples.&lt;br /&gt;
&lt;br /&gt;
==SciNet==&lt;br /&gt;
&lt;br /&gt;
Anything on this wiki, really, but specifically:&lt;br /&gt;
* [[Essentials|SciNet Essentials]]&lt;br /&gt;
* [[GPC Quickstart]]&lt;br /&gt;
* [[Media:SciNet_Tutorial.pdf |SciNet User Tutorial]]&lt;br /&gt;
* [[Software and Libraries]]&lt;br /&gt;
&lt;br /&gt;
==Other Resources==&lt;br /&gt;
* [http://galileo.phys.virginia.edu/classes/551.jvn.fall01/goldberg.pdf What Every Computer Scientist Should Know About Floating-Point Arithmetic] - the classic (and extremely comprehensive) overview of the basics of floating point math.   The first few pages, in particular, are very useful.&lt;br /&gt;
* [http://arxiv.org/abs/1005.4117 Random Numbers In Scientific Computing: An Introduction] by Katzgraber.   A very lucid discussion of pseudo random number generators for science.&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=File:Lecture22-2013.pdf&amp;diff=5922</id>
		<title>File:Lecture22-2013.pdf</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=File:Lecture22-2013.pdf&amp;diff=5922"/>
		<updated>2013-04-04T13:38:37Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Scientific_Computing_Course&amp;diff=5921</id>
		<title>Scientific Computing Course</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Scientific_Computing_Course&amp;diff=5921"/>
		<updated>2013-04-04T13:38:12Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: /* Lecture 5: Distributed Parallel Programming with MPI, part 1 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;''This wiki page concerns the 2013 installment of SciNet's Scientific Computing course. Material from the previous installment can be found on [[Scientific Software Development Course]], [[Numerical Tools for Physical Scientists (course)]], and [[High Performance Scientific Computing]]''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
=Syllabus=&lt;br /&gt;
&lt;br /&gt;
==About the course==&lt;br /&gt;
* Whole-term graduate course&lt;br /&gt;
* Prerequisite: basic C, C++ or Fortran experience.&lt;br /&gt;
* Will use `C++ light' and Python&lt;br /&gt;
* Topics include: Scientific computing and programming skills, Parallel programming, and Hybrid programming.  &lt;br /&gt;
&lt;br /&gt;
There are three parts to this course:&lt;br /&gt;
&lt;br /&gt;
# Scientific Software Development: Jan/Feb 2013&amp;lt;br&amp;gt;''python, C++, git, make, modular programming, debugging''&lt;br /&gt;
# Numerical Tools for Physical Scientists: Feb/Mar 2013&amp;lt;br&amp;gt;''modelling, floating point, Monte Carlo, ODE, linear algebra,fft''&lt;br /&gt;
# High Performance Scientific Computing: Mar/Apr 2013&amp;lt;br&amp;gt;''openmp, mpi and hybrid programming''&lt;br /&gt;
&lt;br /&gt;
Each part consists of eight one-hour lectures, two per week.&lt;br /&gt;
&lt;br /&gt;
These can be taken separately by astrophysics graduate students at the University of Toronto as mini-courses, and by physics graduate students at the University of Toronto as modular courses.&lt;br /&gt;
&lt;br /&gt;
The first two parts count towards the SciNet Certificate in Scientific Computing, while the third part can count towards the SciNet HPC Certificate. For more info about the SciNet Certificates, see http://www.scinethpc.ca/2012/12/scinet-hpc-certificate-program.&lt;br /&gt;
&lt;br /&gt;
==Location and Times==&lt;br /&gt;
[http://www.scinethpc.ca/2010/08/contact-us SciNet HeadQuarters]&amp;lt;br&amp;gt;&lt;br /&gt;
256 McCaul Street, Toronto, ON&amp;lt;br&amp;gt;&lt;br /&gt;
Room 229 (Conference Room)&amp;lt;br&amp;gt;&lt;br /&gt;
Tuesdays 11:00 am - 12:00 noon&amp;lt;br&amp;gt;&lt;br /&gt;
Thursdays 11:00 am - 12:00 noon&lt;br /&gt;
&lt;br /&gt;
==Instructors and office hours==&lt;br /&gt;
&lt;br /&gt;
* Ramses van Zon - 256 McCaul Street, Rm 228 - Mondays 3-4pm&lt;br /&gt;
* L. Jonathan Dursi - 256 McCaul Street, Rm 216 - Wednesdays 3-4pm&lt;br /&gt;
&lt;br /&gt;
==Grading scheme==&lt;br /&gt;
&lt;br /&gt;
Attendence to lectures.&lt;br /&gt;
&lt;br /&gt;
Four home work sets (i.e., one per week), to be returned by email by 9:00 am the next Thursday.&lt;br /&gt;
&lt;br /&gt;
==Sign up==&lt;br /&gt;
Sign up for this graduate course goes through SciNet's course website.&amp;lt;br&amp;gt;The direct link is https://support.scinet.utoronto.ca/courses/?q=node/99.&amp;lt;br&amp;gt;  If you do not have a SciNet account but wish to register for this course, please email support@scinet.utoronto.ca . &amp;lt;br&amp;gt;&lt;br /&gt;
Sign up is closed.&lt;br /&gt;
&lt;br /&gt;
=Part 1: Scientific Software Development=&lt;br /&gt;
&lt;br /&gt;
==Prerequisites==&lt;br /&gt;
&lt;br /&gt;
Some programming experience. Some unix prompt experience.&lt;br /&gt;
&lt;br /&gt;
'''Software that you'll need:'''&lt;br /&gt;
&lt;br /&gt;
A unix-like environment with the GNU compiler suite (e.g. Cygwin), and Python 2, IPython, Numpy, SciPy and Matplotlib (which you all get if you use the Enthought distribution) installed on your laptop. Links are given at the bottom of this page.&lt;br /&gt;
&lt;br /&gt;
==Dates==&lt;br /&gt;
&lt;br /&gt;
January 15, 17, 22, 24, 29, and 31, 2013&amp;lt;br&amp;gt;&lt;br /&gt;
February 5 and 7, 2013&lt;br /&gt;
&lt;br /&gt;
==Topics (with lecture slides and recordings)==&lt;br /&gt;
&lt;br /&gt;
===''Lecture 1:'' C++ introduction===&lt;br /&gt;
:::[[File:Lecture1-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture1-2013/lecture1-2013.html]]&lt;br /&gt;
:::[[Media:Lecture1-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture1-2013/lecture1-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 2:'' More C++, build and version control&amp;lt;br&amp;gt;===&lt;br /&gt;
:::[[File:Lecture2-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture2-2013/lecture2-2013.html]]&lt;br /&gt;
:::Guest lecturer: Michael Nolta (CITA) for the git portion of the lecture.&lt;br /&gt;
:::[[Media:Lecture2-2013.pdf|C++ and Make slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture2-2013/lecture2-2013.mp4 C++ and Make video recording] &amp;amp;nbsp;/ &amp;amp;nbsp; [[Media:Git-Nolta.pdf|Git slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [[#HW1|Homework assigment 1]]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 3:'' Python and visualization===&lt;br /&gt;
:::[[File:Lecture3-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture3-2013/lecture3-2013.html]]&lt;br /&gt;
:::[[Media:Lecture3-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture3-2013/lecture3-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 4:'' Modular programming, refactoring, testing===&lt;br /&gt;
:::[[File:Lecture4-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture4-2013/lecture4-2013.html]]&lt;br /&gt;
:::[[Media:Lecture4-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture4-2013/lecture4-2013.mp4 Video recording] &amp;amp;nbsp;/ &amp;amp;nbsp;  [[#HW2|Homework assigment 2]]&lt;br /&gt;
:::[http://wiki.scinethpc.ca/wiki/images/f/f0/diffuse.cc diffuse.cc (course project source file)] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://wiki.scinethpc.ca/wiki/images/f/f0/plotdata.py plotdata.py (corresponding python movie generator)]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 5:'' Object oriented programming===&lt;br /&gt;
:::[[Media:Lecture5-2013.pdf|Slides]]&lt;br /&gt;
:::Recordings of this lecture are missing, but you could view the videos of SciNet's [[One-Day Scientific C++ Class]], in particular the parts on classes, polymorphism, and inheritance.&lt;br /&gt;
&lt;br /&gt;
===''Lecture 6:'' ODE, interpolation===&lt;br /&gt;
:::[[File:Lecture6-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture6-2013/lecture6-2013.html]]&lt;br /&gt;
:::[[Media:ScientificComputing2013-Lecture5-ODE.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture6-2013/lecture6-2013.mp4 Video recording] &amp;amp;nbsp;/ &amp;amp;nbsp; [[#HW3|Homework assigment 3]]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 7:'' Development tools: debugging and profiling===&lt;br /&gt;
:::[[File:Lecture7-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture7-2013/lecture7-2013.html]]&lt;br /&gt;
:::[[Media:ScientificComputing2013-Debugging.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture7-2013/lecture7-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 8:'' Objects in Python, linking C++ and Python===&lt;br /&gt;
:::[[File:Lecture8-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture8-2013/lecture8-2013.html]]&lt;br /&gt;
:::[[Media:Lecture8-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture8-2013/lecture8-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
==Homework assignments==&lt;br /&gt;
&lt;br /&gt;
===HW1===&lt;br /&gt;
&lt;br /&gt;
'''''Multi-file C++ program to create a data file'''''&lt;br /&gt;
&lt;br /&gt;
We’ve learned programming in basic C++, use of make and Makefiles to build projects, and local use of git for version control. In this first assignment, you’ll use these to make a multi-file C++ program, built with make, which computes and outputs a data file.&lt;br /&gt;
&lt;br /&gt;
* Start a git repository, and begin writing a C++ program to&lt;br /&gt;
:# Get an array size and a standard deviation from user input,&lt;br /&gt;
:# Allocate a 2d array (use the code given in lecture 2),&lt;br /&gt;
:# Store a 2d Gaussian with a maximum at the centre of the array and given standard deviation (in units of grid points),&lt;br /&gt;
:# Output that array to a text file,&lt;br /&gt;
:# Free the array, and exit. &lt;br /&gt;
* The output text file should contain just the data in text format, with a row of the file corresponding to a row of the array and with whitespace between the numbers. &lt;br /&gt;
* The 2d array creation/freeing routines should be in one file (with an associated header file), the gaussian calculation be in another (ditto), and the output routine be in a third, with the main program calling each of these. &lt;br /&gt;
* Use a makefile to build your code (add it to the repository).&lt;br /&gt;
* You can start with everything in one file, with hardcoded values for sizes and standard deviation and a static array, then refactor things into multiple files, adding the other features.&lt;br /&gt;
* As a test, use the ipython executable that came with your Enthought python distribution to read your data and plot it.&amp;lt;br&amp;gt;If your data file is named ‘data.txt’, running the following:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ipython --pylab&lt;br /&gt;
In [1]: data = numpy.genfromtxt('data.txt') &lt;br /&gt;
In [2]: contour(data) &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
should give a nice contour plot of a 2-dimensional gaussian.&lt;br /&gt;
* Email in your source code, makefile and the &amp;quot;git log&amp;quot; output of all your commits by email by at 9:00 am Thursday Jan 24th, 2013. Please zip or tar these files together as one attachment, with a file name that includes your name and &amp;quot;HW1&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
===HW2===&lt;br /&gt;
'''''Refactor legacy code to a modular project with unit tests'''''&lt;br /&gt;
&lt;br /&gt;
In class, today, we talked about modular programming and testing, and the project we’ll be working on for the next three weeks. This homework will start advancing on that project by working on the “legacy” code given to us by our supervisor ([http://wiki.scinethpc.ca/wiki/images/f/f0/diffuse.cc diffuse.cc]), with a corresponding python plotting script ([http://wiki.scinethpc.ca/wiki/images/f/f0/plotdata.py plotdata.py]), and whipping it into shape before we start adding new physics.&lt;br /&gt;
* Start a git repository for this project, and add the two files.&lt;br /&gt;
* Create a Makefile and add it to the repository.&lt;br /&gt;
* Since we have no tests, run the program with console output redirected to a file:&lt;br /&gt;
:&amp;lt;pre&amp;gt;$ diffuse &amp;gt; original-output.txt&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;''It turns out the code has a bug that can make the output different when the same code is run again, which obviously would not be good for a baseline test. Replace 'float error;' by 'float error=0.0;' to fix this.''&lt;br /&gt;
* Also save the two .npy output files, e.g. to original-data.npy and original-theory.npy. The triplet of files (original-output.txt, original-data.npy and original-theory.npy) serve as a baseline integrated test (add these to repository). &lt;br /&gt;
* Then write a 'test' target in your makefile that:&lt;br /&gt;
** Runs 'diffuse' with output to a new file.&lt;br /&gt;
** Compares the file with the baseline test file, and compare the .npy files.&lt;br /&gt;
:: (hint: the unix command diff or cmp can compare files).&lt;br /&gt;
* First refactoring: Move the global variables into the main routine.&lt;br /&gt;
* ''Chorus: Test your modified code, and commit.''&lt;br /&gt;
* Second refactoring: Extract a diffusion operator routine, that gets called from main.&lt;br /&gt;
* ''Chorus''&lt;br /&gt;
* Create a .cc/.h module for the diffusion operator.&lt;br /&gt;
* ''Chorus''&lt;br /&gt;
* Add two tests for the diffusion operator: for a constant and for a linear input field (&amp;lt;tt&amp;gt;rho[i][j]=a*i+b*j&amp;lt;/tt&amp;gt;). Add these to the test target in the makefile.&lt;br /&gt;
* ''Chorus''&lt;br /&gt;
* More refactoring: Extract three more .cc/.h modules:&lt;br /&gt;
** for output (should not contain hardcoded filenames)    &lt;br /&gt;
** computation of the theory&lt;br /&gt;
** and for the array allocation stuff.&lt;br /&gt;
* ''Chorus''&lt;br /&gt;
* Describe, but don't implement in the .h and .cc, what would be appropriate unit tests for these three modules.&lt;br /&gt;
&lt;br /&gt;
Email in your source code and the git log file of all your commits as a .zip or .tar file by email to rzon@scinethpc.ca and ljdursi@scinethpc.ca by 9:00 am on Thursday January 31, 2013.&lt;br /&gt;
&lt;br /&gt;
===HW3===&lt;br /&gt;
This week, we learned about object oriented programming, which fits nicely within the modular programming idea.  In this homework, we are going to use some of it to restructure our code and get it ready to add the tracer particle, the goal of the course project. &lt;br /&gt;
&lt;br /&gt;
The goal will be to have an instance of a &amp;lt;tt&amp;gt;Diffusion&amp;lt;/tt&amp;gt; class,&lt;br /&gt;
as well as an instance of &amp;lt;tt&amp;gt;Tracer&amp;lt;/tt&amp;gt;, which for now will be a&lt;br /&gt;
free particle moving as ('''x'''(t),'''y'''(t)) = ('''x'''(0) +&lt;br /&gt;
'''vx''' t, '''y'''(0) + '''vy''' t), without any coupling yet (we&lt;br /&gt;
will handle this next week).&lt;br /&gt;
&lt;br /&gt;
To be more specific:&lt;br /&gt;
* Clean up your code, using the feedback from your HW2 grading, such that the modules are as independent as possible. &lt;br /&gt;
* If you have not done so yet, add comments to the header files of your modules to explain exactly what each function does (without going into implementation details), what its arguments mean and what it returns (unless it's a void function, of course). &lt;br /&gt;
* Objectify the &amp;lt;tt&amp;gt;main&amp;lt;/tt&amp;gt; routine, by creating a class &amp;lt;tt&amp;gt;Diffusion&amp;lt;/tt&amp;gt;.&lt;br /&gt;
* Put this class in its own module (declaration in .h, implementation in .cc). For instance, the declaration could be&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
// diffusion.h&lt;br /&gt;
#ifndef DIFFUSIONH&lt;br /&gt;
#define DIFFUSIONH&lt;br /&gt;
#include &amp;lt;fstream&amp;gt;&lt;br /&gt;
class Diffusion {&lt;br /&gt;
  public:&lt;br /&gt;
    Diffusion(float x1, float x2, float D, int numPoints);&lt;br /&gt;
    void init(float a0, float sigma0); // set initial field&lt;br /&gt;
    void timeStep(float dt);           // solve diff. equation over dt&lt;br /&gt;
    void toFile(std::ofstream&amp;amp; f);     // write to file (binary,no npyheader)&lt;br /&gt;
    void toScreen();                   // report a line to screen&lt;br /&gt;
    float getRho(int i, int j);        // get a value of the field&lt;br /&gt;
    ~Diffusion();&lt;br /&gt;
  private:&lt;br /&gt;
    float*** rho;&lt;br /&gt;
    ...&lt;br /&gt;
};&lt;br /&gt;
#endif&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
(this is not supposed to be prescriptive.)&lt;br /&gt;
* In the implementation file you'd have things like&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
// diffusion.cc&lt;br /&gt;
#include &amp;quot;diffusion.h&amp;quot;&lt;br /&gt;
...&lt;br /&gt;
void Diffusion::timeStep(float dt) &lt;br /&gt;
{&lt;br /&gt;
   // code for the timeStep ...&lt;br /&gt;
}&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
(note the inclusion of the module's header file on the top of the implementation, so the class is declared).&lt;br /&gt;
* Let &amp;lt;tt&amp;gt;int main()&amp;lt;/tt&amp;gt; have the same functionality as before, but now by defining the parameters of the run, creating an object of this class, setting up file streams, and taking time steps and writing out by using calls to member functions of this object. &lt;br /&gt;
* Additionally, write a class &amp;lt;tt&amp;gt;Tracer&amp;lt;/tt&amp;gt; which for now implements a free particle in 2d. Something like:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
class Tracer {&lt;br /&gt;
  public:&lt;br /&gt;
    Tracer(float x1, float x2);&lt;br /&gt;
    void init(float x0, float y0, float vx, float vy);&lt;br /&gt;
    void timeStep(float dt);           // solve diff. equation over dt&lt;br /&gt;
    void toFile(std::ofstream&amp;amp; f);     // write to file (binary,no npyheader)&lt;br /&gt;
    void toScreen();                   // report a line to screen&lt;br /&gt;
    ~Tracer();&lt;br /&gt;
  private:&lt;br /&gt;
    ...&lt;br /&gt;
};&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
:The timeStep implementation can in this case use the infamous forward Euler integration scheme, because it happens to be exact here.&lt;br /&gt;
:When it comes to output to a npy file, let's view the the data of the tracer particle at one point in time as a 2x2 matrix &amp;lt;tt&amp;gt;[[x,y],[vx,vy]]&amp;lt;/tt&amp;gt;, so we can use much of the npy output code that we used for the diffusion field, which was a (numPoints+2)x(numPoints+2) matrix.&lt;br /&gt;
* This class too should be its own module (Often, &amp;quot;one class, one module&amp;quot; is a good paradigm, though occasionally you'll have closely related classes).&lt;br /&gt;
* Add some code to int main to  have the Tracer particle evolve at the same time as the diffusion field (although the two are completely uncoupled).&lt;br /&gt;
* Keep using git and make, run the tests that you have regularly to make sure your program still works.&lt;br /&gt;
&lt;br /&gt;
Note that because we've now set up our program in a modular fashion, you can do&lt;br /&gt;
different parts of this assignment in any order you want.  For instance, to wrap your head around object oriented programming, you may like implementing the tracer particle first, so that your diffusion code stays intact.  Or you might want to wait with commenting until the end if you think you'll have to change a module for this assignment.&lt;br /&gt;
&lt;br /&gt;
Email in your source code and the git log file of all your commits as a .zip or .tar file by email to rzon@scinethpc.ca and ljdursi@scinethpc.ca by &lt;br /&gt;
&amp;lt;span style=&amp;quot;color:#ee3300&amp;quot;&amp;gt;3:00 pm on Friday February 8, 2013&amp;lt;/span&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
===HW4===&lt;br /&gt;
&lt;br /&gt;
In this homework, we are going to implement the class project of a tracer particle coupled to a diffusion equation. &lt;br /&gt;
The full specification of the physical problem is [[Media:ScClassProject.pdf|here]].  &lt;br /&gt;
* Augment the tracer particle to include a force in the x and in the y direction, and a friction coefficient alpha, which at first can be constant.&lt;br /&gt;
* Implement the so-called leapfrog integration algorithm for the tracer particle&lt;br /&gt;
:::v &amp;amp;larr; v + f(v) &amp;amp;Delta;t / m&lt;br /&gt;
:::r &amp;amp;larr; r + v &amp;amp;Delta;t&lt;br /&gt;
:where v,r, and f are 2d vectors and f(v) is the total, velocity dependence force speficied in the class project, i.e., the sum of the external force F=qE and the friction force -&amp;amp;alpha;v.&amp;lt;br/&amp;gt;(Note: the v dependence of f make this strictly not a leapfrog integration, but we'll ignore that here.)&lt;br /&gt;
* Further augment the tracer class with a member function 'couple' which takes a diffusion field as input, and adjusts the friction constant. &lt;br /&gt;
* Your implementation of the 'couple' member function will need to interpolate the diffusion field to the current position of the particle. Use [[Media:CppInterpolation.tgz|this interpolation module]].&lt;br /&gt;
* Rewrite your main routine so that before tracer's time step, one calls the coupling. You may need to modify the Diffusion class a bit to get &amp;lt;tt&amp;gt;rho[active]&amp;lt;/tt&amp;gt; out.&lt;br /&gt;
* For simplicity, use the same time step for both the diffusion and the tracer particle.&lt;br /&gt;
* Keep using git and make.&lt;br /&gt;
&lt;br /&gt;
You will hand in your source code, makefiles and the git log file of all your commits by email by &amp;lt;span style=&amp;quot;color:#ee3300&amp;quot;&amp;gt;9:00 am on Thursday February 21, 2013&amp;lt;/span&amp;gt;.  Email the files, preferably zipped or tarred, to rzon@scinethpc.ca and ljdursi@scinethpc.ca.&lt;br /&gt;
&lt;br /&gt;
=Part 2: Numerical Tools for Physical Scientists=&lt;br /&gt;
&lt;br /&gt;
==Prerequisites==&lt;br /&gt;
&lt;br /&gt;
Part 1 or solid c++ programming skills, including make and unix/linux prompt experience.&lt;br /&gt;
&lt;br /&gt;
'''Software that you'll need'''&lt;br /&gt;
&lt;br /&gt;
A unix-like environment with the GNU compiler suite (e.g. Cygwin), and Python (Enthought) installed on your laptop.&lt;br /&gt;
&lt;br /&gt;
==Dates==&lt;br /&gt;
&lt;br /&gt;
February 12, 14, 26, and 28, 2013&amp;lt;br&amp;gt;&lt;br /&gt;
March 5, 7, 12, and 14, 2013&lt;br /&gt;
&lt;br /&gt;
==Topics==&lt;br /&gt;
&lt;br /&gt;
===''Lecture 1:'' Numerics ===&lt;br /&gt;
:::[[File:Lecture9-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture9-2013/lecture9-2013.html]]&lt;br /&gt;
:::[[Media:Lecture9-2013-Numerics.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture9-2013/lecture9-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 2:'' Random numbers ===&lt;br /&gt;
:::[[File:Lecture10-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture10-2013/lecture10-2013.html]]&lt;br /&gt;
:::[[Media:Lecture10-2013-PRNG.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture10-2013/lecture10-2013.mp4 Video recording] &amp;amp;nbsp;/ &amp;amp;nbsp;[http://wiki.scinethpc.ca/wiki/index.php/Scientific_Computing_Course#HW1_2 Homework assignment 1]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 3:'' Numerical integration and ODEs ===&lt;br /&gt;
:::[[File:Lecture11-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture11-2013/lecture11-2013.html]]&lt;br /&gt;
:::[[Media:Lecture11-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture11-2013/lecture11-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 4:'' Molecular Dynamics ===&lt;br /&gt;
:::[[File:Lecture12-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture12-2013/lecture12-2013.html]]&lt;br /&gt;
:::[[Media:Lecture12-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture12-2013/lecture12-2013.mp4 Video recording]  &amp;amp;nbsp;/ &amp;amp;nbsp;[http://wiki.scinethpc.ca/wiki/index.php/Scientific_Computing_Course#HW2_2 Homework assignment 2]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 5:'' Linear Algebra part I ===&lt;br /&gt;
:::[[Media:Lecture13-2013.pdf|Slides (combined with lecture 6)]]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 6:'' Linear Algebra part II and PDEs===&lt;br /&gt;
:::[[File:Lecture14-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture14-2013/lecture14-2013.html]]&lt;br /&gt;
:::[[Media:Lecture13-2013.pdf|Slides (combined with lecture 5)]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture14-2013/lecture14-2013.mp4 Video recording]  &amp;amp;nbsp;/ &amp;amp;nbsp;[http://wiki.scinethpc.ca/wiki/index.php/Scientific_Computing_Course#HW3_2 Homework assignment 3]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 7:'' Fast Fourier Transform===&lt;br /&gt;
:::[[File:Lecture15-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture15-2013/lecture15-2013.html]]&lt;br /&gt;
:::[[Media:Lecture15-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture15-2013/lecture15-2013.mp4 Video recording]  &amp;amp;nbsp;/ &amp;amp;nbsp;[[Media:Sincfftw.cc|example code]]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 8:'' FFT for real and multidimensional data===&lt;br /&gt;
:::[[File:Lecture15-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture16-2013/lecture16-2013.html]]&lt;br /&gt;
:::[[Media:Lecture16-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture16-2013/lecture16-2013.mp4 Video recording]  &amp;amp;nbsp;/ &amp;amp;nbsp; [http://wiki.scinethpc.ca/wiki/index.php/Scientific_Computing_Course#HW4_2 Homework assignment 4]&lt;br /&gt;
&lt;br /&gt;
==Homework Assignments==&lt;br /&gt;
&lt;br /&gt;
===HW1===&lt;br /&gt;
This week's homework consists of two assignments.&lt;br /&gt;
&lt;br /&gt;
''Assignment 1''&lt;br /&gt;
&lt;br /&gt;
* Consider the sequence of numbers: 1 followed by 10&amp;lt;sup&amp;gt;8&amp;lt;/sup&amp;gt; values of 10&amp;lt;sup&amp;gt;-8&amp;lt;/sup&amp;gt;&lt;br /&gt;
* Should sum to 2&lt;br /&gt;
* Write code which sums up those values in order. What answer does it get?&lt;br /&gt;
* Add to program routine which sums up values in reverse order. Does it get correct answer?&lt;br /&gt;
* How would you get correct answer?&lt;br /&gt;
* Submit code, Makefile, text file with answers.&lt;br /&gt;
&lt;br /&gt;
''Assignment 2''&lt;br /&gt;
&lt;br /&gt;
* Implement an linear congruential generator with a = 106, c = 1283, m = 6075 that generates random numbers from 0..1&lt;br /&gt;
* Using that and MT: generate 10,000 pairs (dx, dy) with dx, dy each in -0.1 .. +0.1. Generate histograms of dx and dy (say 200 bins). Does it look okay? What would you expect variation to be?&lt;br /&gt;
* For 10,000 points: take random walks from (x,y)=(0,0) until exceed radius of 2, then stop. Plot histogram of final angles for the two psuedo random number generators. What do you see?&lt;br /&gt;
* Submit makefile, code, plots, git log.&lt;br /&gt;
&lt;br /&gt;
Both assignments due on Thursday Feb 28th, 2013, at 9:00 am. Email the files to rzon@scinethpc.ca and ljdursi@scinethpc.ca.&lt;br /&gt;
&lt;br /&gt;
===HW2===&lt;br /&gt;
&lt;br /&gt;
''Assignment 1''&lt;br /&gt;
&lt;br /&gt;
* Compute numerically (using the GSL):&lt;br /&gt;
&lt;br /&gt;
::&amp;amp;int;&amp;lt;sub&amp;gt;0&amp;lt;/sub&amp;gt;&amp;lt;sup&amp;gt;3&amp;lt;/sup&amp;gt; f(x) &amp;amp;nbsp;dx&lt;br /&gt;
&lt;br /&gt;
:(that is the integral of f(x) from x=0 to x=3)&lt;br /&gt;
&lt;br /&gt;
:with&lt;br /&gt;
&lt;br /&gt;
::f(x) = ln(x) sin(x) e&amp;lt;sup&amp;gt;-x&amp;lt;/sup&amp;gt;&lt;br /&gt;
&lt;br /&gt;
:using three different methods:&lt;br /&gt;
# Extended Simpsons' rule&lt;br /&gt;
# Gauss-Legendre quadrature&lt;br /&gt;
# Monte Carlo sampling &lt;br /&gt;
&lt;br /&gt;
*Hint: what is f(0)?&lt;br /&gt;
&lt;br /&gt;
*Compare the convergence of these methods by increasing number of function evaluations.&lt;br /&gt;
&lt;br /&gt;
*Submit makefile, code, plots, version control log. &lt;br /&gt;
&lt;br /&gt;
''Assignment 2''&lt;br /&gt;
&lt;br /&gt;
* Using an adaptive 4th order Runge-Kutta approach, with a relative accuracy of 1e-4, compute the solution for t = [0,100] of the following set of coupled ODEs (Lorenz oscillator)&lt;br /&gt;
&lt;br /&gt;
::dx/dt = &amp;amp;sigma;(y - x)&lt;br /&gt;
&lt;br /&gt;
::dy/dt = (&amp;amp;rho;-z)x-y&lt;br /&gt;
&lt;br /&gt;
::dz/dt = xy - &amp;amp;beta;z&lt;br /&gt;
&lt;br /&gt;
:with &amp;amp;sigma;=10; &amp;amp;beta;=8/3; &amp;amp;rho; = 28, and with initial conditions&lt;br /&gt;
&lt;br /&gt;
::x(0) = 10&lt;br /&gt;
&lt;br /&gt;
::y(0) = 20&lt;br /&gt;
&lt;br /&gt;
::z(0) = 30&lt;br /&gt;
&lt;br /&gt;
* Hint: study the GSL documentation.&lt;br /&gt;
&lt;br /&gt;
*Submit makefile, code, plots, version control log.&lt;br /&gt;
&lt;br /&gt;
Both assignments due on Thursday Mar 7th, 2013, at 9:00 am. Email the files to rzon@scinethpc.ca and ljdursi@scinethpc.ca.&lt;br /&gt;
&lt;br /&gt;
===HW3===&lt;br /&gt;
&lt;br /&gt;
Part 1:&lt;br /&gt;
&lt;br /&gt;
The time-explicit formulation of the 1d diffusion equation looks like this:&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\begin{eqnarray*}&lt;br /&gt;
q^{n+1} &amp;amp; = &amp;amp; q^n + \frac{D \Delta t}{\Delta x^2} &lt;br /&gt;
\left (&lt;br /&gt;
\begin{matrix}&lt;br /&gt;
-2 &amp;amp; 1 \\&lt;br /&gt;
1 &amp;amp; -2 &amp;amp; 1 \\&lt;br /&gt;
&amp;amp; 1 &amp;amp; -2 &amp;amp; 1 \\&lt;br /&gt;
&amp;amp;  &amp;amp;  &amp;amp; \cdots &amp;amp; \\&lt;br /&gt;
&amp;amp;  &amp;amp;  &amp;amp; 1 &amp;amp; -2 &amp;amp; 1 \\&lt;br /&gt;
&amp;amp;  &amp;amp;  &amp;amp; &amp;amp; 1 &amp;amp; -2 \\&lt;br /&gt;
\end{matrix}&lt;br /&gt;
\right ) q^n \\&lt;br /&gt;
&amp;amp; = &amp;amp; \left ( 1 + \frac{D \Delta t}{\Delta x^2} A \right ) q^n&lt;br /&gt;
\end{eqnarray*}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
what are the eignvalues of the matrix A?   What modes would we expect to be amplified/damped by this operator?&lt;br /&gt;
&lt;br /&gt;
* Consider 100 points in the discretization (eg, A is 100x100)&lt;br /&gt;
* Calculate the eigenvalues and eigenvectors (using D__EV ; which sort of matrix are we using here?)&lt;br /&gt;
* Plot the modes with the largest and smallest absolute-value of eigenvalues, and explain their physical significance&lt;br /&gt;
* The numerical method will become unstable with one eigenmode $v$ begins to grow uncontrollably whenever it is present, e.g.&lt;br /&gt;
$ \frac{D \Delta t}{\Delta x^2} A v = \frac{D \Delta t}{\Delta x^2} \lambda v &amp;gt; v$.   In a timestepping solution, the only way to avoid this for a given physical set of parameters and grid size is to reduce the timestep, $\Delta t$.   Use the largest absolute value eigenvalue to place a constraint on $\Delta t$ for stability.&lt;br /&gt;
&lt;br /&gt;
Part 2:&lt;br /&gt;
&lt;br /&gt;
Using the above constraint on $\Delta t$, for a 1d grid of size 100 (eg, a 100x100 matrix A), using lapack, evolve this PDE. Plot and explain results.&lt;br /&gt;
&lt;br /&gt;
* Have an initial condition of $q(x=0,t=0) = 1$, and $q(t=0)$ everywhere else being zero (eg, hot plate just turned on at the left)&lt;br /&gt;
* Take ~100 timesteps and plot the the evolution of $q(x,t)$ at 5 times over that period.&lt;br /&gt;
* You’ll want to use a matrix multiply to compute the matrix-vector multiply ( http://www.gnu.org/software/gsl/manual/html_node/Level-2-GSL-BLAS-Interface.html). Do multiply in double precision (D__MV). Which  should you use?&lt;br /&gt;
* The GSL has a cblas interface, http://www.gnu.org/software/gsl/manual/html_node/Level-2-GSL-BLAS-Interface.html ; an example of its use can be found here http://www.gnu.org/software/gsl/manual/html_node/GSL-CBLAS-Examples.html&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Important things to know about lapack:&lt;br /&gt;
* If you are using an nxn array, the “leading dimension” of the array is n. (This argument is so that you could work on sub-matrices if you wanted)&lt;br /&gt;
* Have to make sure the 2d array is contiguous block of memory&lt;br /&gt;
* You'll (presumably) want to use the C bindings for LAPACK - [http://www.netlib.org/lapack/lapacke.html lapacke].  Note that the usual C arrays are row-major.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Here's a simple example of calling a LAPACKE routine; note that how the matrix is described (here with a pointer to the data, a leading dimension, and the number of rows and columns) will vary with different types of matrix:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
#include &amp;lt;iostream&amp;gt;&lt;br /&gt;
#include &amp;lt;mkl_lapacke.h&amp;gt;&lt;br /&gt;
&lt;br /&gt;
double **matrix(int n,int m);&lt;br /&gt;
void free_matrix(double **a);&lt;br /&gt;
&lt;br /&gt;
int main (int argc, const char * argv[])&lt;br /&gt;
{&lt;br /&gt;
&lt;br /&gt;
   const int n=5;             // number of rows, columns of the matrix&lt;br /&gt;
   const int m = n;           // nrows&lt;br /&gt;
   const int leading_dim_A=n; // leading dimension (# of cols for row major);&lt;br /&gt;
                              // lets us operate on sub-matrices in principle&lt;br /&gt;
   const int leading_dim_b=n; // similarly for b&lt;br /&gt;
   double **A;&lt;br /&gt;
   double *b;&lt;br /&gt;
&lt;br /&gt;
   b = new double[leading_dim_b];&lt;br /&gt;
   A = matrix(n,leading_dim_A);&lt;br /&gt;
&lt;br /&gt;
   for (int i=0; i&amp;lt;n; i++)&lt;br /&gt;
       for (int j=0; j&amp;lt;leading_dim_A; j++)&lt;br /&gt;
            A[i][j] = 0.;&lt;br /&gt;
&lt;br /&gt;
   // let's do a trivial solve&lt;br /&gt;
   // It should be pretty clear that the solution to this system&lt;br /&gt;
   // is x = {0,1,2...n-1}&lt;br /&gt;
&lt;br /&gt;
   for (int i=0; i&amp;lt;leading_dim_A; i++) {&lt;br /&gt;
        A[i][i] = 2.;&lt;br /&gt;
   }&lt;br /&gt;
&lt;br /&gt;
   for (int i=0; i&amp;lt;leading_dim_b; i++) {&lt;br /&gt;
        b[i]    = 2*i;&lt;br /&gt;
   }&lt;br /&gt;
&lt;br /&gt;
   const char transpose='N';     //solve Ax=b, not A^T x = b&lt;br /&gt;
   const int  nrhs = 1;          //  we're only solving 1 right hand side&lt;br /&gt;
   int info;&lt;br /&gt;
&lt;br /&gt;
   // Call DGELS; b will be overwritten with the value of x.&lt;br /&gt;
   info = LAPACKE_dgels(LAPACK_COL_MAJOR,transpose,m,n,nrhs,&lt;br /&gt;
                          &amp;amp;(A[0][0]),leading_dim_A, &amp;amp;(b[0]),leading_dim_b);&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
   // print results&lt;br /&gt;
   for(int i=0;i&amp;lt;n;i++)&lt;br /&gt;
   {&lt;br /&gt;
      if (i != n/2)&lt;br /&gt;
        std::cout &amp;lt;&amp;lt; &amp;quot;    &amp;quot; &amp;lt;&amp;lt; b[i] &amp;lt;&amp;lt; std::endl;&lt;br /&gt;
      else&lt;br /&gt;
        std::cout &amp;lt;&amp;lt; &amp;quot;x = &amp;quot; &amp;lt;&amp;lt; b[i] &amp;lt;&amp;lt; std::endl;&lt;br /&gt;
   }&lt;br /&gt;
   return(info);&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
double **matrix(int n,int m) {&lt;br /&gt;
   double **a = new double * [n];&lt;br /&gt;
   a[0] = new double [n*m];&lt;br /&gt;
&lt;br /&gt;
   for (int i=1; i&amp;lt;n; i++)&lt;br /&gt;
         a[i] = &amp;amp;a[0][i*m];&lt;br /&gt;
&lt;br /&gt;
   return a;&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
void free_matrix(double **a) {&lt;br /&gt;
   delete[] a[0];&lt;br /&gt;
   delete[] a;&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===HW4===&lt;br /&gt;
&lt;br /&gt;
''Assignment 1''&lt;br /&gt;
&lt;br /&gt;
Trigonometric interpolation uses a n point Fourier series to find values at intermediate points. It is one way of downscaling data, and was a motivation for Gauss, to be applied to planetary motion.&lt;br /&gt;
&lt;br /&gt;
The way it works is:&lt;br /&gt;
&lt;br /&gt;
# You fourier-transform your data&lt;br /&gt;
# You add frequecies above the Nyquist frequency (in absolute values), but set all the amplitudes of the new frequencies to zero.&lt;br /&gt;
# Note that the frequencies are stored such that eg. f&amp;lt;sub&amp;gt;n-1&amp;lt;/sub&amp;gt; is a low frequency -1.&lt;br /&gt;
# The resulting 2n array can be back transformed, and now gives an interpolated signal.&lt;br /&gt;
&lt;br /&gt;
For this assignment, write an application that will read in an image from a binary file into a 2d double precision array (this will require converting from bytes to doubles), and creates an image twice the size in all directions using trigonometric interpolation. Use a real-to-half-complex version of the fftw (note: in 2d, this version of the fftw mixes fourier components with the same physical magnitude of their wave number k, so this will work).&lt;br /&gt;
You can process the red, green and blue values separately. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
''Assignment 2''&lt;br /&gt;
&lt;br /&gt;
Write an application which reads an image and performs a low pass filter on the image, i.e., any fourier components with magnitudes k larger than n/8 are to be set to zero, after which the fourier inverse is taken and the image is written out to disk again. Use the same fft technique as in the first assignment.&lt;br /&gt;
&lt;br /&gt;
'''Input image'''&lt;br /&gt;
&lt;br /&gt;
Use [[Media:gauss256.tgz|this image of Gauss]].&lt;br /&gt;
&lt;br /&gt;
'''Image format:'''&lt;br /&gt;
&lt;br /&gt;
Use the following simple PPM format:&lt;br /&gt;
&lt;br /&gt;
First line (ascii): 'P6\n'&amp;lt;br&amp;gt;&lt;br /&gt;
Second line, in ascii, 'width height\n'&amp;lt;br&amp;gt;&lt;br /&gt;
Third line, in ascii, 'maxcolorvalue\n' (this is typically just 255)&amp;lt;br&amp;gt;&lt;br /&gt;
Following that, in binary, are byte-triplets with the red, green and blue values of each pixel.&amp;lt;br&amp;gt;&lt;br /&gt;
Note: in C, the 'unsigned char' data type matches the concept of a byte best (for most machines anyway).&lt;br /&gt;
&lt;br /&gt;
In fact, between the first and second line, one can have comment lines that start with '#'.&lt;br /&gt;
&lt;br /&gt;
=Part 3: High Performance Scientific Computing=&lt;br /&gt;
&lt;br /&gt;
==Prerequisites==&lt;br /&gt;
&lt;br /&gt;
Part 1 or good c++ programming skills, including make and unix/linux prompt experience.&lt;br /&gt;
&lt;br /&gt;
'''Software that you'll need'''&lt;br /&gt;
&lt;br /&gt;
You will need to bring a laptop with a ssh facility. Hands-on parts will be done on SciNet's GPC cluster.&lt;br /&gt;
&lt;br /&gt;
For those who don't have a SciNet account yet, the instructions can be found at http://wiki.scinethpc.ca/wiki/index.php/Essentials#Accounts&lt;br /&gt;
&lt;br /&gt;
==Dates==&lt;br /&gt;
March 19, 21, 26, and 28, 2013&amp;lt;br&amp;gt;&lt;br /&gt;
April 2, 4, 9, and 11, 2013&lt;br /&gt;
&lt;br /&gt;
==Topics==&lt;br /&gt;
===''Lecture 1:'' Introduction to Parallel Programming ===&lt;br /&gt;
:::[[File:Lecture17-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture17-2013/lecture17-2013.html]]&lt;br /&gt;
:::[[Media:Lecture17-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture17-2013/lecture17-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 2:'' Parallel Computing Paradigms ===&lt;br /&gt;
&lt;br /&gt;
:::[[File:Lecture18-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture18-2013/lecture18-2013.html]]&lt;br /&gt;
:::[[Media:Lecture18-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture18-2013/lecture18-2013.mp4 Video recording] &amp;amp;nbsp;/ &amp;amp;nbsp; [[#HW1_3|homework 1]]&lt;br /&gt;
&lt;br /&gt;
===''Lectures 3,4:''  Shared Memory Programming with OpenMP, part 1,2===&lt;br /&gt;
&lt;br /&gt;
:::[[Media:Lecture19-2013.pdf|Slides]]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 5:'' Distributed Parallel Programming with MPI, part 1===&lt;br /&gt;
&lt;br /&gt;
:::[[Media:Lecture21-2013.pdf|Slides]]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 6:'' Distributed Parallel Programming with MPI, part 2===&lt;br /&gt;
&lt;br /&gt;
:::[[Media:Lecture22-2013.pdf|Slides]]&lt;br /&gt;
&lt;br /&gt;
''Lecture 7''&amp;amp;nbsp;&amp;amp;nbsp; Distributed Parallel Programming with MPI, part 3&amp;lt;br&amp;gt;&lt;br /&gt;
''Lecture 8''&amp;amp;nbsp;&amp;amp;nbsp; Hybrid OpenMPI+MPI Programming&lt;br /&gt;
&lt;br /&gt;
== Homework assignments ==&lt;br /&gt;
&lt;br /&gt;
=== HW1 ===&lt;br /&gt;
&lt;br /&gt;
* Read the SciNet tutorial (as it pertains to the GPC)&lt;br /&gt;
* Read the GPC Quick Start.&lt;br /&gt;
* Get the first set of code:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
   $ cd $SCRATCH&lt;br /&gt;
   $ git clone /scinet/course/sc3/homework1&lt;br /&gt;
   $ cd homework1&lt;br /&gt;
   $ source setup&lt;br /&gt;
   $ make&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
*This contains threaded program 'blurppm' and 266 ppm images to be blurred. Usage:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
  blurppm INPUTPPM OUTPUTPPM BLURRADIUS NUMBEROFTHREADS&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
* Simple test:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
  $ qsub -l nodes=1:ppn=8,walltime=2:00:00 -I -X -qdebug&lt;br /&gt;
  $ cd $SCRATCH/homework1&lt;br /&gt;
  $ time blurppm 001.ppm new001.ppm 30 1&lt;br /&gt;
  real  0m52.900s&lt;br /&gt;
  user  0m52.881s&lt;br /&gt;
  sys   0m0.008s&lt;br /&gt;
  $ display 001.ppm &amp;amp;&lt;br /&gt;
  $ display new001.ppm &amp;amp;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
''Assignment 1''&lt;br /&gt;
* Time blurppm with a BLURRADIUS ranging from 1 to 41 in steps of 4, and for NUMBEROFTHREADS ranging from 1 to 16.  Record the (real) duration of each run.&lt;br /&gt;
* Plot the duration as a function of NUMBEROFTHREADS, as well as  the speed-up and efficiency.&lt;br /&gt;
* Submit script and plots of the duration, speedup and effiency as a function of NUMBEROFTHREADS.&lt;br /&gt;
''Assignment 2''&lt;br /&gt;
* Use GNU parallel to run blurppm on all 266 images with a radius of 41.&lt;br /&gt;
* Investigate different scenarios:&lt;br /&gt;
:# Have GNU parallel run 16 at a time with just 1 thread.&lt;br /&gt;
:# Have GNU parallel run 8 at a time with 2 threads.&lt;br /&gt;
:# Have GNU parallel run 4 at a time with 4 threads.&lt;br /&gt;
:# Have GNU parallel run 2 at a time with 8 threads.&lt;br /&gt;
:# Have GNU parallel run 1 at a time with 16 threads.&lt;br /&gt;
:Record the total time it takes in each of these scenarios.&lt;br /&gt;
* Repeat this with a BLURRADIUS of 3.&lt;br /&gt;
* Submit scripts, timing data  and plots.&lt;br /&gt;
&lt;br /&gt;
=== HW2 ===&lt;br /&gt;
&lt;br /&gt;
In the course materials ( /scinet/course/ppp/nbodyc or nbodyf ) there is the source code for a serial N-body integrator.  This, like the molecular dynamics you've seen earlier, calculates long-range forces by particles on all of the other particles.&lt;br /&gt;
&lt;br /&gt;
OpenMP the force calculation, and present timing results for 1,4, and 8 threads compared to the serial version.  Note that you can turn off graphic output by removing the &amp;quot;USEPGPLOT = -DPGPLOT&amp;quot; line in Makefile.inc in the top level directory.&lt;br /&gt;
&lt;br /&gt;
Begin by doubling the work by _not_ calculating two forces at once (eg, not making use of f&amp;lt;sub&amp;gt;ji&amp;lt;/sub&amp;gt; = -f&amp;lt;sub&amp;gt;ij&amp;lt;/sub&amp;gt;), and simply parallelizing the outer force loop.  Then find a way to implement the forces efficiently but also in parallel.  Is there any other part of the problem which could usefully be parallelized?&lt;br /&gt;
&lt;br /&gt;
=Links=&lt;br /&gt;
&lt;br /&gt;
==Unix==&lt;br /&gt;
* Cygwin: http://www.cygwin.com&lt;br /&gt;
* Linux Command Line: A Primer (June 2012) [[Media:SS_IntroToShell.pdf|Slides,]] [[Media:SS_IntroToShell.tgz|Files]]&lt;br /&gt;
* Intro to unix shell from software carpentry: http://software-carpentry.org/4_0/shell&lt;br /&gt;
&lt;br /&gt;
==C/C++==&lt;br /&gt;
* [[One-Day Scientific C++ Class]] at SciNet&lt;br /&gt;
* C++ library reference: http://www.cplusplus.com/reference&lt;br /&gt;
* C preprocessor: http://www.cprogramming.com/tutorial/cpreprocessor.html&lt;br /&gt;
* Boost: http://www.boost.org&lt;br /&gt;
* Boost Python tutorial: http://www.boost.org/doc/libs/1_53_0/libs/python/doc/tutorial/doc/html/index.html&lt;br /&gt;
&lt;br /&gt;
==Git==&lt;br /&gt;
* Git: http://git-scm.com&lt;br /&gt;
* Version Control: [http://support.scinet.utoronto.ca/CourseVideo/PPPcourse/Thursday_Morning_BP_Revision_Control/Thursday_Morning_BP_Revision_Control.mp4 Video]/ [[Media:Snug_techtalk_revcontrol.pdf | Slides]]&lt;br /&gt;
* Git cheat sheet from Git Tower: http://www.git-tower.com/files/cheatsheet/Git_Cheat_Sheet_grey.pdf&lt;br /&gt;
&lt;br /&gt;
==Python==&lt;br /&gt;
* Python: http://www.python.org&lt;br /&gt;
* IPython: http://ipython.org&lt;br /&gt;
* Matplotlib: http://www.matplotlib.org&lt;br /&gt;
* Enthought python distribution: http://www.enthought.com/products/edudownload.php&amp;lt;br/&amp;gt;&lt;br /&gt;
(this gives you numpy, matplotlib and ipython all installed in one fell swoop)&lt;br /&gt;
&lt;br /&gt;
* Intro to python from software carpentry: http://software-carpentry.org/4_0/python&lt;br /&gt;
* Tutorial on matplotlib: http://conference.scipy.org/scipy2011/tutorials.php#jonathan&lt;br /&gt;
* Npy file format: https://github.com/numpy/numpy/blob/master/doc/neps/npy-format.txt&lt;br /&gt;
* Boost Python tutorial: http://www.boost.org/doc/libs/1_53_0/libs/python/doc/tutorial/doc/html/index.html&lt;br /&gt;
&lt;br /&gt;
==ODEs==&lt;br /&gt;
* Integrators for particle based ODEs (i.e. molecular dynamics): http://www.chem.utoronto.ca/~rzon/simcourse/partmd.pdf. &amp;lt;br&amp;gt;'''Focus on 4.1.4 - 4.1.6 for practical aspects.'''&lt;br /&gt;
* Numerical algorithm to solve ODEs (General) in ''Numerical Recipes for C'': http://apps.nrbook.com/c/index.html Chapter 16&lt;br /&gt;
&lt;br /&gt;
==Interpolation (2D) ==&lt;br /&gt;
* Interpolation in ''Numerical Recipes for C'': http://apps.nrbook.com/c/index.html Pages 123-128&lt;br /&gt;
* Wikipedia pages on [http://en.wikipedia.org/wiki/Bilinear_interpolation Bilinear Interpolation] and [http://en.wikipedia.org/wiki/Bicubic_interpolation Bicubic Interpolation] are not bad either.&lt;br /&gt;
&lt;br /&gt;
==BLAS==&lt;br /&gt;
* [http://www.tacc.utexas.edu/tacc-projects/gotoblas2 gotoblas]&lt;br /&gt;
* [http://math-atlas.sourceforge.net/ ATLAS]&lt;br /&gt;
&lt;br /&gt;
==LAPACK==&lt;br /&gt;
* http://www.netlib.org/lapack&lt;br /&gt;
&lt;br /&gt;
==GSL==&lt;br /&gt;
* GNU Scientific Library: http://www.gnu.org/s/gsl&lt;br /&gt;
&lt;br /&gt;
==FFT==&lt;br /&gt;
* FFTW: http://www.fftw.org&lt;br /&gt;
&lt;br /&gt;
==Top500==&lt;br /&gt;
* TOP500 Supercomputing Sites: http://top500.org&lt;br /&gt;
&lt;br /&gt;
==OpenMP==&lt;br /&gt;
* OpenMP (open multi-processing) application programming interface for shared memory programming: http://openmp.org&lt;br /&gt;
&lt;br /&gt;
==GNU parallel==&lt;br /&gt;
* Official citation: O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login: The USENIX Magazine, February 2011:42-47.&lt;br /&gt;
* [[Media:Tech-talk-gnu-parallel.pdf|Slides of the SciNet TechTalk on Gnu Parallel (14 Nov 2012)]]&lt;br /&gt;
* The documentation for GNU parallel can be found at http://www.gnu.org/software/parallel/&lt;br /&gt;
* Its man page can be found here http://www.gnu.org/software/parallel/man.html&lt;br /&gt;
* The man page is also available on the GPC when the gnu-parallel module is loaded, with the command &amp;lt;code&amp;gt;$ man parallel&amp;lt;/code&amp;gt;. The man page contains options, such as how to make sure the output is not all scrambled, and examples.&lt;br /&gt;
&lt;br /&gt;
==SciNet==&lt;br /&gt;
&lt;br /&gt;
Anything on this wiki, really, but specifically:&lt;br /&gt;
* [[Essentials|SciNet Essentials]]&lt;br /&gt;
* [[GPC Quickstart]]&lt;br /&gt;
* [[Media:SciNet_Tutorial.pdf |SciNet User Tutorial]]&lt;br /&gt;
* [[Software and Libraries]]&lt;br /&gt;
&lt;br /&gt;
==Other Resources==&lt;br /&gt;
* [http://galileo.phys.virginia.edu/classes/551.jvn.fall01/goldberg.pdf What Every Computer Scientist Should Know About Floating-Point Arithmetic] - the classic (and extremely comprehensive) overview of the basics of floating point math.   The first few pages, in particular, are very useful.&lt;br /&gt;
* [http://arxiv.org/abs/1005.4117 Random Numbers In Scientific Computing: An Introduction] by Katzgraber.   A very lucid discussion of pseudo random number generators for science.&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Scientific_Computing_Course&amp;diff=5916</id>
		<title>Scientific Computing Course</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Scientific_Computing_Course&amp;diff=5916"/>
		<updated>2013-04-02T15:10:11Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: /* Homework assignments */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;''This wiki page concerns the 2013 installment of SciNet's Scientific Computing course. Material from the previous installment can be found on [[Scientific Software Development Course]], [[Numerical Tools for Physical Scientists (course)]], and [[High Performance Scientific Computing]]''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
=Syllabus=&lt;br /&gt;
&lt;br /&gt;
==About the course==&lt;br /&gt;
* Whole-term graduate course&lt;br /&gt;
* Prerequisite: basic C, C++ or Fortran experience.&lt;br /&gt;
* Will use `C++ light' and Python&lt;br /&gt;
* Topics include: Scientific computing and programming skills, Parallel programming, and Hybrid programming.  &lt;br /&gt;
&lt;br /&gt;
There are three parts to this course:&lt;br /&gt;
&lt;br /&gt;
# Scientific Software Development: Jan/Feb 2013&amp;lt;br&amp;gt;''python, C++, git, make, modular programming, debugging''&lt;br /&gt;
# Numerical Tools for Physical Scientists: Feb/Mar 2013&amp;lt;br&amp;gt;''modelling, floating point, Monte Carlo, ODE, linear algebra,fft''&lt;br /&gt;
# High Performance Scientific Computing: Mar/Apr 2013&amp;lt;br&amp;gt;''openmp, mpi and hybrid programming''&lt;br /&gt;
&lt;br /&gt;
Each part consists of eight one-hour lectures, two per week.&lt;br /&gt;
&lt;br /&gt;
These can be taken separately by astrophysics graduate students at the University of Toronto as mini-courses, and by physics graduate students at the University of Toronto as modular courses.&lt;br /&gt;
&lt;br /&gt;
The first two parts count towards the SciNet Certificate in Scientific Computing, while the third part can count towards the SciNet HPC Certificate. For more info about the SciNet Certificates, see http://www.scinethpc.ca/2012/12/scinet-hpc-certificate-program.&lt;br /&gt;
&lt;br /&gt;
==Location and Times==&lt;br /&gt;
[http://www.scinethpc.ca/2010/08/contact-us SciNet HeadQuarters]&amp;lt;br&amp;gt;&lt;br /&gt;
256 McCaul Street, Toronto, ON&amp;lt;br&amp;gt;&lt;br /&gt;
Room 229 (Conference Room)&amp;lt;br&amp;gt;&lt;br /&gt;
Tuesdays 11:00 am - 12:00 noon&amp;lt;br&amp;gt;&lt;br /&gt;
Thursdays 11:00 am - 12:00 noon&lt;br /&gt;
&lt;br /&gt;
==Instructors and office hours==&lt;br /&gt;
&lt;br /&gt;
* Ramses van Zon - 256 McCaul Street, Rm 228 - Mondays 3-4pm&lt;br /&gt;
* L. Jonathan Dursi - 256 McCaul Street, Rm 216 - Wednesdays 3-4pm&lt;br /&gt;
&lt;br /&gt;
==Grading scheme==&lt;br /&gt;
&lt;br /&gt;
Attendence to lectures.&lt;br /&gt;
&lt;br /&gt;
Four home work sets (i.e., one per week), to be returned by email by 9:00 am the next Thursday.&lt;br /&gt;
&lt;br /&gt;
==Sign up==&lt;br /&gt;
Sign up for this graduate course goes through SciNet's course website.&amp;lt;br&amp;gt;The direct link is https://support.scinet.utoronto.ca/courses/?q=node/99.&amp;lt;br&amp;gt;  If you do not have a SciNet account but wish to register for this course, please email support@scinet.utoronto.ca . &amp;lt;br&amp;gt;&lt;br /&gt;
Sign up is closed.&lt;br /&gt;
&lt;br /&gt;
=Part 1: Scientific Software Development=&lt;br /&gt;
&lt;br /&gt;
==Prerequisites==&lt;br /&gt;
&lt;br /&gt;
Some programming experience. Some unix prompt experience.&lt;br /&gt;
&lt;br /&gt;
'''Software that you'll need:'''&lt;br /&gt;
&lt;br /&gt;
A unix-like environment with the GNU compiler suite (e.g. Cygwin), and Python 2, IPython, Numpy, SciPy and Matplotlib (which you all get if you use the Enthought distribution) installed on your laptop. Links are given at the bottom of this page.&lt;br /&gt;
&lt;br /&gt;
==Dates==&lt;br /&gt;
&lt;br /&gt;
January 15, 17, 22, 24, 29, and 31, 2013&amp;lt;br&amp;gt;&lt;br /&gt;
February 5 and 7, 2013&lt;br /&gt;
&lt;br /&gt;
==Topics (with lecture slides and recordings)==&lt;br /&gt;
&lt;br /&gt;
===''Lecture 1:'' C++ introduction===&lt;br /&gt;
:::[[File:Lecture1-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture1-2013/lecture1-2013.html]]&lt;br /&gt;
:::[[Media:Lecture1-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture1-2013/lecture1-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 2:'' More C++, build and version control&amp;lt;br&amp;gt;===&lt;br /&gt;
:::[[File:Lecture2-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture2-2013/lecture2-2013.html]]&lt;br /&gt;
:::Guest lecturer: Michael Nolta (CITA) for the git portion of the lecture.&lt;br /&gt;
:::[[Media:Lecture2-2013.pdf|C++ and Make slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture2-2013/lecture2-2013.mp4 C++ and Make video recording] &amp;amp;nbsp;/ &amp;amp;nbsp; [[Media:Git-Nolta.pdf|Git slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [[#HW1|Homework assigment 1]]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 3:'' Python and visualization===&lt;br /&gt;
:::[[File:Lecture3-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture3-2013/lecture3-2013.html]]&lt;br /&gt;
:::[[Media:Lecture3-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture3-2013/lecture3-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 4:'' Modular programming, refactoring, testing===&lt;br /&gt;
:::[[File:Lecture4-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture4-2013/lecture4-2013.html]]&lt;br /&gt;
:::[[Media:Lecture4-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture4-2013/lecture4-2013.mp4 Video recording] &amp;amp;nbsp;/ &amp;amp;nbsp;  [[#HW2|Homework assigment 2]]&lt;br /&gt;
:::[http://wiki.scinethpc.ca/wiki/images/f/f0/diffuse.cc diffuse.cc (course project source file)] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://wiki.scinethpc.ca/wiki/images/f/f0/plotdata.py plotdata.py (corresponding python movie generator)]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 5:'' Object oriented programming===&lt;br /&gt;
:::[[Media:Lecture5-2013.pdf|Slides]]&lt;br /&gt;
:::Recordings of this lecture are missing, but you could view the videos of SciNet's [[One-Day Scientific C++ Class]], in particular the parts on classes, polymorphism, and inheritance.&lt;br /&gt;
&lt;br /&gt;
===''Lecture 6:'' ODE, interpolation===&lt;br /&gt;
:::[[File:Lecture6-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture6-2013/lecture6-2013.html]]&lt;br /&gt;
:::[[Media:ScientificComputing2013-Lecture5-ODE.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture6-2013/lecture6-2013.mp4 Video recording] &amp;amp;nbsp;/ &amp;amp;nbsp; [[#HW3|Homework assigment 3]]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 7:'' Development tools: debugging and profiling===&lt;br /&gt;
:::[[File:Lecture7-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture7-2013/lecture7-2013.html]]&lt;br /&gt;
:::[[Media:ScientificComputing2013-Debugging.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture7-2013/lecture7-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 8:'' Objects in Python, linking C++ and Python===&lt;br /&gt;
:::[[File:Lecture8-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture8-2013/lecture8-2013.html]]&lt;br /&gt;
:::[[Media:Lecture8-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture8-2013/lecture8-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
==Homework assignments==&lt;br /&gt;
&lt;br /&gt;
===HW1===&lt;br /&gt;
&lt;br /&gt;
'''''Multi-file C++ program to create a data file'''''&lt;br /&gt;
&lt;br /&gt;
We’ve learned programming in basic C++, use of make and Makefiles to build projects, and local use of git for version control. In this first assignment, you’ll use these to make a multi-file C++ program, built with make, which computes and outputs a data file.&lt;br /&gt;
&lt;br /&gt;
* Start a git repository, and begin writing a C++ program to&lt;br /&gt;
:# Get an array size and a standard deviation from user input,&lt;br /&gt;
:# Allocate a 2d array (use the code given in lecture 2),&lt;br /&gt;
:# Store a 2d Gaussian with a maximum at the centre of the array and given standard deviation (in units of grid points),&lt;br /&gt;
:# Output that array to a text file,&lt;br /&gt;
:# Free the array, and exit. &lt;br /&gt;
* The output text file should contain just the data in text format, with a row of the file corresponding to a row of the array and with whitespace between the numbers. &lt;br /&gt;
* The 2d array creation/freeing routines should be in one file (with an associated header file), the gaussian calculation be in another (ditto), and the output routine be in a third, with the main program calling each of these. &lt;br /&gt;
* Use a makefile to build your code (add it to the repository).&lt;br /&gt;
* You can start with everything in one file, with hardcoded values for sizes and standard deviation and a static array, then refactor things into multiple files, adding the other features.&lt;br /&gt;
* As a test, use the ipython executable that came with your Enthought python distribution to read your data and plot it.&amp;lt;br&amp;gt;If your data file is named ‘data.txt’, running the following:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ipython --pylab&lt;br /&gt;
In [1]: data = numpy.genfromtxt('data.txt') &lt;br /&gt;
In [2]: contour(data) &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
should give a nice contour plot of a 2-dimensional gaussian.&lt;br /&gt;
* Email in your source code, makefile and the &amp;quot;git log&amp;quot; output of all your commits by email by at 9:00 am Thursday Jan 24th, 2013. Please zip or tar these files together as one attachment, with a file name that includes your name and &amp;quot;HW1&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
===HW2===&lt;br /&gt;
'''''Refactor legacy code to a modular project with unit tests'''''&lt;br /&gt;
&lt;br /&gt;
In class, today, we talked about modular programming and testing, and the project we’ll be working on for the next three weeks. This homework will start advancing on that project by working on the “legacy” code given to us by our supervisor ([http://wiki.scinethpc.ca/wiki/images/f/f0/diffuse.cc diffuse.cc]), with a corresponding python plotting script ([http://wiki.scinethpc.ca/wiki/images/f/f0/plotdata.py plotdata.py]), and whipping it into shape before we start adding new physics.&lt;br /&gt;
* Start a git repository for this project, and add the two files.&lt;br /&gt;
* Create a Makefile and add it to the repository.&lt;br /&gt;
* Since we have no tests, run the program with console output redirected to a file:&lt;br /&gt;
:&amp;lt;pre&amp;gt;$ diffuse &amp;gt; original-output.txt&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;''It turns out the code has a bug that can make the output different when the same code is run again, which obviously would not be good for a baseline test. Replace 'float error;' by 'float error=0.0;' to fix this.''&lt;br /&gt;
* Also save the two .npy output files, e.g. to original-data.npy and original-theory.npy. The triplet of files (original-output.txt, original-data.npy and original-theory.npy) serve as a baseline integrated test (add these to repository). &lt;br /&gt;
* Then write a 'test' target in your makefile that:&lt;br /&gt;
** Runs 'diffuse' with output to a new file.&lt;br /&gt;
** Compares the file with the baseline test file, and compare the .npy files.&lt;br /&gt;
:: (hint: the unix command diff or cmp can compare files).&lt;br /&gt;
* First refactoring: Move the global variables into the main routine.&lt;br /&gt;
* ''Chorus: Test your modified code, and commit.''&lt;br /&gt;
* Second refactoring: Extract a diffusion operator routine, that gets called from main.&lt;br /&gt;
* ''Chorus''&lt;br /&gt;
* Create a .cc/.h module for the diffusion operator.&lt;br /&gt;
* ''Chorus''&lt;br /&gt;
* Add two tests for the diffusion operator: for a constant and for a linear input field (&amp;lt;tt&amp;gt;rho[i][j]=a*i+b*j&amp;lt;/tt&amp;gt;). Add these to the test target in the makefile.&lt;br /&gt;
* ''Chorus''&lt;br /&gt;
* More refactoring: Extract three more .cc/.h modules:&lt;br /&gt;
** for output (should not contain hardcoded filenames)    &lt;br /&gt;
** computation of the theory&lt;br /&gt;
** and for the array allocation stuff.&lt;br /&gt;
* ''Chorus''&lt;br /&gt;
* Describe, but don't implement in the .h and .cc, what would be appropriate unit tests for these three modules.&lt;br /&gt;
&lt;br /&gt;
Email in your source code and the git log file of all your commits as a .zip or .tar file by email to rzon@scinethpc.ca and ljdursi@scinethpc.ca by 9:00 am on Thursday January 31, 2013.&lt;br /&gt;
&lt;br /&gt;
===HW3===&lt;br /&gt;
This week, we learned about object oriented programming, which fits nicely within the modular programming idea.  In this homework, we are going to use some of it to restructure our code and get it ready to add the tracer particle, the goal of the course project. &lt;br /&gt;
&lt;br /&gt;
The goal will be to have an instance of a &amp;lt;tt&amp;gt;Diffusion&amp;lt;/tt&amp;gt; class,&lt;br /&gt;
as well as an instance of &amp;lt;tt&amp;gt;Tracer&amp;lt;/tt&amp;gt;, which for now will be a&lt;br /&gt;
free particle moving as ('''x'''(t),'''y'''(t)) = ('''x'''(0) +&lt;br /&gt;
'''vx''' t, '''y'''(0) + '''vy''' t), without any coupling yet (we&lt;br /&gt;
will handle this next week).&lt;br /&gt;
&lt;br /&gt;
To be more specific:&lt;br /&gt;
* Clean up your code, using the feedback from your HW2 grading, such that the modules are as independent as possible. &lt;br /&gt;
* If you have not done so yet, add comments to the header files of your modules to explain exactly what each function does (without going into implementation details), what its arguments mean and what it returns (unless it's a void function, of course). &lt;br /&gt;
* Objectify the &amp;lt;tt&amp;gt;main&amp;lt;/tt&amp;gt; routine, by creating a class &amp;lt;tt&amp;gt;Diffusion&amp;lt;/tt&amp;gt;.&lt;br /&gt;
* Put this class in its own module (declaration in .h, implementation in .cc). For instance, the declaration could be&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
// diffusion.h&lt;br /&gt;
#ifndef DIFFUSIONH&lt;br /&gt;
#define DIFFUSIONH&lt;br /&gt;
#include &amp;lt;fstream&amp;gt;&lt;br /&gt;
class Diffusion {&lt;br /&gt;
  public:&lt;br /&gt;
    Diffusion(float x1, float x2, float D, int numPoints);&lt;br /&gt;
    void init(float a0, float sigma0); // set initial field&lt;br /&gt;
    void timeStep(float dt);           // solve diff. equation over dt&lt;br /&gt;
    void toFile(std::ofstream&amp;amp; f);     // write to file (binary,no npyheader)&lt;br /&gt;
    void toScreen();                   // report a line to screen&lt;br /&gt;
    float getRho(int i, int j);        // get a value of the field&lt;br /&gt;
    ~Diffusion();&lt;br /&gt;
  private:&lt;br /&gt;
    float*** rho;&lt;br /&gt;
    ...&lt;br /&gt;
};&lt;br /&gt;
#endif&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
(this is not supposed to be prescriptive.)&lt;br /&gt;
* In the implementation file you'd have things like&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
// diffusion.cc&lt;br /&gt;
#include &amp;quot;diffusion.h&amp;quot;&lt;br /&gt;
...&lt;br /&gt;
void Diffusion::timeStep(float dt) &lt;br /&gt;
{&lt;br /&gt;
   // code for the timeStep ...&lt;br /&gt;
}&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
(note the inclusion of the module's header file on the top of the implementation, so the class is declared).&lt;br /&gt;
* Let &amp;lt;tt&amp;gt;int main()&amp;lt;/tt&amp;gt; have the same functionality as before, but now by defining the parameters of the run, creating an object of this class, setting up file streams, and taking time steps and writing out by using calls to member functions of this object. &lt;br /&gt;
* Additionally, write a class &amp;lt;tt&amp;gt;Tracer&amp;lt;/tt&amp;gt; which for now implements a free particle in 2d. Something like:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
class Tracer {&lt;br /&gt;
  public:&lt;br /&gt;
    Tracer(float x1, float x2);&lt;br /&gt;
    void init(float x0, float y0, float vx, float vy);&lt;br /&gt;
    void timeStep(float dt);           // solve diff. equation over dt&lt;br /&gt;
    void toFile(std::ofstream&amp;amp; f);     // write to file (binary,no npyheader)&lt;br /&gt;
    void toScreen();                   // report a line to screen&lt;br /&gt;
    ~Tracer();&lt;br /&gt;
  private:&lt;br /&gt;
    ...&lt;br /&gt;
};&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
:The timeStep implementation can in this case use the infamous forward Euler integration scheme, because it happens to be exact here.&lt;br /&gt;
:When it comes to output to a npy file, let's view the the data of the tracer particle at one point in time as a 2x2 matrix &amp;lt;tt&amp;gt;[[x,y],[vx,vy]]&amp;lt;/tt&amp;gt;, so we can use much of the npy output code that we used for the diffusion field, which was a (numPoints+2)x(numPoints+2) matrix.&lt;br /&gt;
* This class too should be its own module (Often, &amp;quot;one class, one module&amp;quot; is a good paradigm, though occasionally you'll have closely related classes).&lt;br /&gt;
* Add some code to int main to  have the Tracer particle evolve at the same time as the diffusion field (although the two are completely uncoupled).&lt;br /&gt;
* Keep using git and make, run the tests that you have regularly to make sure your program still works.&lt;br /&gt;
&lt;br /&gt;
Note that because we've now set up our program in a modular fashion, you can do&lt;br /&gt;
different parts of this assignment in any order you want.  For instance, to wrap your head around object oriented programming, you may like implementing the tracer particle first, so that your diffusion code stays intact.  Or you might want to wait with commenting until the end if you think you'll have to change a module for this assignment.&lt;br /&gt;
&lt;br /&gt;
Email in your source code and the git log file of all your commits as a .zip or .tar file by email to rzon@scinethpc.ca and ljdursi@scinethpc.ca by &lt;br /&gt;
&amp;lt;span style=&amp;quot;color:#ee3300&amp;quot;&amp;gt;3:00 pm on Friday February 8, 2013&amp;lt;/span&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
===HW4===&lt;br /&gt;
&lt;br /&gt;
In this homework, we are going to implement the class project of a tracer particle coupled to a diffusion equation. &lt;br /&gt;
The full specification of the physical problem is [[Media:ScClassProject.pdf|here]].  &lt;br /&gt;
* Augment the tracer particle to include a force in the x and in the y direction, and a friction coefficient alpha, which at first can be constant.&lt;br /&gt;
* Implement the so-called leapfrog integration algorithm for the tracer particle&lt;br /&gt;
:::v &amp;amp;larr; v + f(v) &amp;amp;Delta;t / m&lt;br /&gt;
:::r &amp;amp;larr; r + v &amp;amp;Delta;t&lt;br /&gt;
:where v,r, and f are 2d vectors and f(v) is the total, velocity dependence force speficied in the class project, i.e., the sum of the external force F=qE and the friction force -&amp;amp;alpha;v.&amp;lt;br/&amp;gt;(Note: the v dependence of f make this strictly not a leapfrog integration, but we'll ignore that here.)&lt;br /&gt;
* Further augment the tracer class with a member function 'couple' which takes a diffusion field as input, and adjusts the friction constant. &lt;br /&gt;
* Your implementation of the 'couple' member function will need to interpolate the diffusion field to the current position of the particle. Use [[Media:CppInterpolation.tgz|this interpolation module]].&lt;br /&gt;
* Rewrite your main routine so that before tracer's time step, one calls the coupling. You may need to modify the Diffusion class a bit to get &amp;lt;tt&amp;gt;rho[active]&amp;lt;/tt&amp;gt; out.&lt;br /&gt;
* For simplicity, use the same time step for both the diffusion and the tracer particle.&lt;br /&gt;
* Keep using git and make.&lt;br /&gt;
&lt;br /&gt;
You will hand in your source code, makefiles and the git log file of all your commits by email by &amp;lt;span style=&amp;quot;color:#ee3300&amp;quot;&amp;gt;9:00 am on Thursday February 21, 2013&amp;lt;/span&amp;gt;.  Email the files, preferably zipped or tarred, to rzon@scinethpc.ca and ljdursi@scinethpc.ca.&lt;br /&gt;
&lt;br /&gt;
=Part 2: Numerical Tools for Physical Scientists=&lt;br /&gt;
&lt;br /&gt;
==Prerequisites==&lt;br /&gt;
&lt;br /&gt;
Part 1 or solid c++ programming skills, including make and unix/linux prompt experience.&lt;br /&gt;
&lt;br /&gt;
'''Software that you'll need'''&lt;br /&gt;
&lt;br /&gt;
A unix-like environment with the GNU compiler suite (e.g. Cygwin), and Python (Enthought) installed on your laptop.&lt;br /&gt;
&lt;br /&gt;
==Dates==&lt;br /&gt;
&lt;br /&gt;
February 12, 14, 26, and 28, 2013&amp;lt;br&amp;gt;&lt;br /&gt;
March 5, 7, 12, and 14, 2013&lt;br /&gt;
&lt;br /&gt;
==Topics==&lt;br /&gt;
&lt;br /&gt;
===''Lecture 1:'' Numerics ===&lt;br /&gt;
:::[[File:Lecture9-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture9-2013/lecture9-2013.html]]&lt;br /&gt;
:::[[Media:Lecture9-2013-Numerics.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture9-2013/lecture9-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 2:'' Random numbers ===&lt;br /&gt;
:::[[File:Lecture10-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture10-2013/lecture10-2013.html]]&lt;br /&gt;
:::[[Media:Lecture10-2013-PRNG.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture10-2013/lecture10-2013.mp4 Video recording] &amp;amp;nbsp;/ &amp;amp;nbsp;[http://wiki.scinethpc.ca/wiki/index.php/Scientific_Computing_Course#HW1_2 Homework assignment 1]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 3:'' Numerical integration and ODEs ===&lt;br /&gt;
:::[[File:Lecture11-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture11-2013/lecture11-2013.html]]&lt;br /&gt;
:::[[Media:Lecture11-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture11-2013/lecture11-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 4:'' Molecular Dynamics ===&lt;br /&gt;
:::[[File:Lecture12-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture12-2013/lecture12-2013.html]]&lt;br /&gt;
:::[[Media:Lecture12-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture12-2013/lecture12-2013.mp4 Video recording]  &amp;amp;nbsp;/ &amp;amp;nbsp;[http://wiki.scinethpc.ca/wiki/index.php/Scientific_Computing_Course#HW2_2 Homework assignment 2]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 5:'' Linear Algebra part I ===&lt;br /&gt;
:::[[Media:Lecture13-2013.pdf|Slides (combined with lecture 6)]]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 6:'' Linear Algebra part II and PDEs===&lt;br /&gt;
:::[[File:Lecture14-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture14-2013/lecture14-2013.html]]&lt;br /&gt;
:::[[Media:Lecture13-2013.pdf|Slides (combined with lecture 5)]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture14-2013/lecture14-2013.mp4 Video recording]  &amp;amp;nbsp;/ &amp;amp;nbsp;[http://wiki.scinethpc.ca/wiki/index.php/Scientific_Computing_Course#HW3_2 Homework assignment 3]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 7:'' Fast Fourier Transform===&lt;br /&gt;
:::[[File:Lecture15-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture15-2013/lecture15-2013.html]]&lt;br /&gt;
:::[[Media:Lecture15-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture15-2013/lecture15-2013.mp4 Video recording]  &amp;amp;nbsp;/ &amp;amp;nbsp;[[Media:Sincfftw.cc|example code]]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 8:'' FFT for real and multidimensional data===&lt;br /&gt;
:::[[File:Lecture15-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture16-2013/lecture16-2013.html]]&lt;br /&gt;
:::[[Media:Lecture16-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture16-2013/lecture16-2013.mp4 Video recording]  &amp;amp;nbsp;/ &amp;amp;nbsp; [http://wiki.scinethpc.ca/wiki/index.php/Scientific_Computing_Course#HW4_2 Homework assignment 4]&lt;br /&gt;
&lt;br /&gt;
==Homework Assignments==&lt;br /&gt;
&lt;br /&gt;
===HW1===&lt;br /&gt;
This week's homework consists of two assignments.&lt;br /&gt;
&lt;br /&gt;
''Assignment 1''&lt;br /&gt;
&lt;br /&gt;
* Consider the sequence of numbers: 1 followed by 10&amp;lt;sup&amp;gt;8&amp;lt;/sup&amp;gt; values of 10&amp;lt;sup&amp;gt;-8&amp;lt;/sup&amp;gt;&lt;br /&gt;
* Should sum to 2&lt;br /&gt;
* Write code which sums up those values in order. What answer does it get?&lt;br /&gt;
* Add to program routine which sums up values in reverse order. Does it get correct answer?&lt;br /&gt;
* How would you get correct answer?&lt;br /&gt;
* Submit code, Makefile, text file with answers.&lt;br /&gt;
&lt;br /&gt;
''Assignment 2''&lt;br /&gt;
&lt;br /&gt;
* Implement an linear congruential generator with a = 106, c = 1283, m = 6075 that generates random numbers from 0..1&lt;br /&gt;
* Using that and MT: generate 10,000 pairs (dx, dy) with dx, dy each in -0.1 .. +0.1. Generate histograms of dx and dy (say 200 bins). Does it look okay? What would you expect variation to be?&lt;br /&gt;
* For 10,000 points: take random walks from (x,y)=(0,0) until exceed radius of 2, then stop. Plot histogram of final angles for the two psuedo random number generators. What do you see?&lt;br /&gt;
* Submit makefile, code, plots, git log.&lt;br /&gt;
&lt;br /&gt;
Both assignments due on Thursday Feb 28th, 2013, at 9:00 am. Email the files to rzon@scinethpc.ca and ljdursi@scinethpc.ca.&lt;br /&gt;
&lt;br /&gt;
===HW2===&lt;br /&gt;
&lt;br /&gt;
''Assignment 1''&lt;br /&gt;
&lt;br /&gt;
* Compute numerically (using the GSL):&lt;br /&gt;
&lt;br /&gt;
::&amp;amp;int;&amp;lt;sub&amp;gt;0&amp;lt;/sub&amp;gt;&amp;lt;sup&amp;gt;3&amp;lt;/sup&amp;gt; f(x) &amp;amp;nbsp;dx&lt;br /&gt;
&lt;br /&gt;
:(that is the integral of f(x) from x=0 to x=3)&lt;br /&gt;
&lt;br /&gt;
:with&lt;br /&gt;
&lt;br /&gt;
::f(x) = ln(x) sin(x) e&amp;lt;sup&amp;gt;-x&amp;lt;/sup&amp;gt;&lt;br /&gt;
&lt;br /&gt;
:using three different methods:&lt;br /&gt;
# Extended Simpsons' rule&lt;br /&gt;
# Gauss-Legendre quadrature&lt;br /&gt;
# Monte Carlo sampling &lt;br /&gt;
&lt;br /&gt;
*Hint: what is f(0)?&lt;br /&gt;
&lt;br /&gt;
*Compare the convergence of these methods by increasing number of function evaluations.&lt;br /&gt;
&lt;br /&gt;
*Submit makefile, code, plots, version control log. &lt;br /&gt;
&lt;br /&gt;
''Assignment 2''&lt;br /&gt;
&lt;br /&gt;
* Using an adaptive 4th order Runge-Kutta approach, with a relative accuracy of 1e-4, compute the solution for t = [0,100] of the following set of coupled ODEs (Lorenz oscillator)&lt;br /&gt;
&lt;br /&gt;
::dx/dt = &amp;amp;sigma;(y - x)&lt;br /&gt;
&lt;br /&gt;
::dy/dt = (&amp;amp;rho;-z)x-y&lt;br /&gt;
&lt;br /&gt;
::dz/dt = xy - &amp;amp;beta;z&lt;br /&gt;
&lt;br /&gt;
:with &amp;amp;sigma;=10; &amp;amp;beta;=8/3; &amp;amp;rho; = 28, and with initial conditions&lt;br /&gt;
&lt;br /&gt;
::x(0) = 10&lt;br /&gt;
&lt;br /&gt;
::y(0) = 20&lt;br /&gt;
&lt;br /&gt;
::z(0) = 30&lt;br /&gt;
&lt;br /&gt;
* Hint: study the GSL documentation.&lt;br /&gt;
&lt;br /&gt;
*Submit makefile, code, plots, version control log.&lt;br /&gt;
&lt;br /&gt;
Both assignments due on Thursday Mar 7th, 2013, at 9:00 am. Email the files to rzon@scinethpc.ca and ljdursi@scinethpc.ca.&lt;br /&gt;
&lt;br /&gt;
===HW3===&lt;br /&gt;
&lt;br /&gt;
Part 1:&lt;br /&gt;
&lt;br /&gt;
The time-explicit formulation of the 1d diffusion equation looks like this:&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\begin{eqnarray*}&lt;br /&gt;
q^{n+1} &amp;amp; = &amp;amp; q^n + \frac{D \Delta t}{\Delta x^2} &lt;br /&gt;
\left (&lt;br /&gt;
\begin{matrix}&lt;br /&gt;
-2 &amp;amp; 1 \\&lt;br /&gt;
1 &amp;amp; -2 &amp;amp; 1 \\&lt;br /&gt;
&amp;amp; 1 &amp;amp; -2 &amp;amp; 1 \\&lt;br /&gt;
&amp;amp;  &amp;amp;  &amp;amp; \cdots &amp;amp; \\&lt;br /&gt;
&amp;amp;  &amp;amp;  &amp;amp; 1 &amp;amp; -2 &amp;amp; 1 \\&lt;br /&gt;
&amp;amp;  &amp;amp;  &amp;amp; &amp;amp; 1 &amp;amp; -2 \\&lt;br /&gt;
\end{matrix}&lt;br /&gt;
\right ) q^n \\&lt;br /&gt;
&amp;amp; = &amp;amp; \left ( 1 + \frac{D \Delta t}{\Delta x^2} A \right ) q^n&lt;br /&gt;
\end{eqnarray*}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
what are the eignvalues of the matrix A?   What modes would we expect to be amplified/damped by this operator?&lt;br /&gt;
&lt;br /&gt;
* Consider 100 points in the discretization (eg, A is 100x100)&lt;br /&gt;
* Calculate the eigenvalues and eigenvectors (using D__EV ; which sort of matrix are we using here?)&lt;br /&gt;
* Plot the modes with the largest and smallest absolute-value of eigenvalues, and explain their physical significance&lt;br /&gt;
* The numerical method will become unstable with one eigenmode $v$ begins to grow uncontrollably whenever it is present, e.g.&lt;br /&gt;
$ \frac{D \Delta t}{\Delta x^2} A v = \frac{D \Delta t}{\Delta x^2} \lambda v &amp;gt; v$.   In a timestepping solution, the only way to avoid this for a given physical set of parameters and grid size is to reduce the timestep, $\Delta t$.   Use the largest absolute value eigenvalue to place a constraint on $\Delta t$ for stability.&lt;br /&gt;
&lt;br /&gt;
Part 2:&lt;br /&gt;
&lt;br /&gt;
Using the above constraint on $\Delta t$, for a 1d grid of size 100 (eg, a 100x100 matrix A), using lapack, evolve this PDE. Plot and explain results.&lt;br /&gt;
&lt;br /&gt;
* Have an initial condition of $q(x=0,t=0) = 1$, and $q(t=0)$ everywhere else being zero (eg, hot plate just turned on at the left)&lt;br /&gt;
* Take ~100 timesteps and plot the the evolution of $q(x,t)$ at 5 times over that period.&lt;br /&gt;
* You’ll want to use a matrix multiply to compute the matrix-vector multiply ( http://www.gnu.org/software/gsl/manual/html_node/Level-2-GSL-BLAS-Interface.html). Do multiply in double precision (D__MV). Which  should you use?&lt;br /&gt;
* The GSL has a cblas interface, http://www.gnu.org/software/gsl/manual/html_node/Level-2-GSL-BLAS-Interface.html ; an example of its use can be found here http://www.gnu.org/software/gsl/manual/html_node/GSL-CBLAS-Examples.html&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Important things to know about lapack:&lt;br /&gt;
* If you are using an nxn array, the “leading dimension” of the array is n. (This argument is so that you could work on sub-matrices if you wanted)&lt;br /&gt;
* Have to make sure the 2d array is contiguous block of memory&lt;br /&gt;
* You'll (presumably) want to use the C bindings for LAPACK - [http://www.netlib.org/lapack/lapacke.html lapacke].  Note that the usual C arrays are row-major.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Here's a simple example of calling a LAPACKE routine; note that how the matrix is described (here with a pointer to the data, a leading dimension, and the number of rows and columns) will vary with different types of matrix:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
#include &amp;lt;iostream&amp;gt;&lt;br /&gt;
#include &amp;lt;mkl_lapacke.h&amp;gt;&lt;br /&gt;
&lt;br /&gt;
double **matrix(int n,int m);&lt;br /&gt;
void free_matrix(double **a);&lt;br /&gt;
&lt;br /&gt;
int main (int argc, const char * argv[])&lt;br /&gt;
{&lt;br /&gt;
&lt;br /&gt;
   const int n=5;             // number of rows, columns of the matrix&lt;br /&gt;
   const int m = n;           // nrows&lt;br /&gt;
   const int leading_dim_A=n; // leading dimension (# of cols for row major);&lt;br /&gt;
                              // lets us operate on sub-matrices in principle&lt;br /&gt;
   const int leading_dim_b=n; // similarly for b&lt;br /&gt;
   double **A;&lt;br /&gt;
   double *b;&lt;br /&gt;
&lt;br /&gt;
   b = new double[leading_dim_b];&lt;br /&gt;
   A = matrix(n,leading_dim_A);&lt;br /&gt;
&lt;br /&gt;
   for (int i=0; i&amp;lt;n; i++)&lt;br /&gt;
       for (int j=0; j&amp;lt;leading_dim_A; j++)&lt;br /&gt;
            A[i][j] = 0.;&lt;br /&gt;
&lt;br /&gt;
   // let's do a trivial solve&lt;br /&gt;
   // It should be pretty clear that the solution to this system&lt;br /&gt;
   // is x = {0,1,2...n-1}&lt;br /&gt;
&lt;br /&gt;
   for (int i=0; i&amp;lt;leading_dim_A; i++) {&lt;br /&gt;
        A[i][i] = 2.;&lt;br /&gt;
   }&lt;br /&gt;
&lt;br /&gt;
   for (int i=0; i&amp;lt;leading_dim_b; i++) {&lt;br /&gt;
        b[i]    = 2*i;&lt;br /&gt;
   }&lt;br /&gt;
&lt;br /&gt;
   const char transpose='N';     //solve Ax=b, not A^T x = b&lt;br /&gt;
   const int  nrhs = 1;          //  we're only solving 1 right hand side&lt;br /&gt;
   int info;&lt;br /&gt;
&lt;br /&gt;
   // Call DGELS; b will be overwritten with the value of x.&lt;br /&gt;
   info = LAPACKE_dgels(LAPACK_COL_MAJOR,transpose,m,n,nrhs,&lt;br /&gt;
                          &amp;amp;(A[0][0]),leading_dim_A, &amp;amp;(b[0]),leading_dim_b);&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
   // print results&lt;br /&gt;
   for(int i=0;i&amp;lt;n;i++)&lt;br /&gt;
   {&lt;br /&gt;
      if (i != n/2)&lt;br /&gt;
        std::cout &amp;lt;&amp;lt; &amp;quot;    &amp;quot; &amp;lt;&amp;lt; b[i] &amp;lt;&amp;lt; std::endl;&lt;br /&gt;
      else&lt;br /&gt;
        std::cout &amp;lt;&amp;lt; &amp;quot;x = &amp;quot; &amp;lt;&amp;lt; b[i] &amp;lt;&amp;lt; std::endl;&lt;br /&gt;
   }&lt;br /&gt;
   return(info);&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
double **matrix(int n,int m) {&lt;br /&gt;
   double **a = new double * [n];&lt;br /&gt;
   a[0] = new double [n*m];&lt;br /&gt;
&lt;br /&gt;
   for (int i=1; i&amp;lt;n; i++)&lt;br /&gt;
         a[i] = &amp;amp;a[0][i*m];&lt;br /&gt;
&lt;br /&gt;
   return a;&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
void free_matrix(double **a) {&lt;br /&gt;
   delete[] a[0];&lt;br /&gt;
   delete[] a;&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===HW4===&lt;br /&gt;
&lt;br /&gt;
''Assignment 1''&lt;br /&gt;
&lt;br /&gt;
Trigonometric interpolation uses a n point Fourier series to find values at intermediate points. It is one way of downscaling data, and was a motivation for Gauss, to be applied to planetary motion.&lt;br /&gt;
&lt;br /&gt;
The way it works is:&lt;br /&gt;
&lt;br /&gt;
# You fourier-transform your data&lt;br /&gt;
# You add frequecies above the Nyquist frequency (in absolute values), but set all the amplitudes of the new frequencies to zero.&lt;br /&gt;
# Note that the frequencies are stored such that eg. f&amp;lt;sub&amp;gt;n-1&amp;lt;/sub&amp;gt; is a low frequency -1.&lt;br /&gt;
# The resulting 2n array can be back transformed, and now gives an interpolated signal.&lt;br /&gt;
&lt;br /&gt;
For this assignment, write an application that will read in an image from a binary file into a 2d double precision array (this will require converting from bytes to doubles), and creates an image twice the size in all directions using trigonometric interpolation. Use a real-to-half-complex version of the fftw (note: in 2d, this version of the fftw mixes fourier components with the same physical magnitude of their wave number k, so this will work).&lt;br /&gt;
You can process the red, green and blue values separately. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
''Assignment 2''&lt;br /&gt;
&lt;br /&gt;
Write an application which reads an image and performs a low pass filter on the image, i.e., any fourier components with magnitudes k larger than n/8 are to be set to zero, after which the fourier inverse is taken and the image is written out to disk again. Use the same fft technique as in the first assignment.&lt;br /&gt;
&lt;br /&gt;
'''Input image'''&lt;br /&gt;
&lt;br /&gt;
Use [[Media:gauss256.tgz|this image of Gauss]].&lt;br /&gt;
&lt;br /&gt;
'''Image format:'''&lt;br /&gt;
&lt;br /&gt;
Use the following simple PPM format:&lt;br /&gt;
&lt;br /&gt;
First line (ascii): 'P6\n'&amp;lt;br&amp;gt;&lt;br /&gt;
Second line, in ascii, 'width height\n'&amp;lt;br&amp;gt;&lt;br /&gt;
Third line, in ascii, 'maxcolorvalue\n' (this is typically just 255)&amp;lt;br&amp;gt;&lt;br /&gt;
Following that, in binary, are byte-triplets with the red, green and blue values of each pixel.&amp;lt;br&amp;gt;&lt;br /&gt;
Note: in C, the 'unsigned char' data type matches the concept of a byte best (for most machines anyway).&lt;br /&gt;
&lt;br /&gt;
In fact, between the first and second line, one can have comment lines that start with '#'.&lt;br /&gt;
&lt;br /&gt;
=Part 3: High Performance Scientific Computing=&lt;br /&gt;
&lt;br /&gt;
==Prerequisites==&lt;br /&gt;
&lt;br /&gt;
Part 1 or good c++ programming skills, including make and unix/linux prompt experience.&lt;br /&gt;
&lt;br /&gt;
'''Software that you'll need'''&lt;br /&gt;
&lt;br /&gt;
You will need to bring a laptop with a ssh facility. Hands-on parts will be done on SciNet's GPC cluster.&lt;br /&gt;
&lt;br /&gt;
For those who don't have a SciNet account yet, the instructions can be found at http://wiki.scinethpc.ca/wiki/index.php/Essentials#Accounts&lt;br /&gt;
&lt;br /&gt;
==Dates==&lt;br /&gt;
March 19, 21, 26, and 28, 2013&amp;lt;br&amp;gt;&lt;br /&gt;
April 2, 4, 9, and 11, 2013&lt;br /&gt;
&lt;br /&gt;
==Topics==&lt;br /&gt;
===''Lecture 1:'' Introduction to Parallel Programming ===&lt;br /&gt;
:::[[File:Lecture17-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture17-2013/lecture17-2013.html]]&lt;br /&gt;
:::[[Media:Lecture17-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture17-2013/lecture17-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 2:'' Parallel Computing Paradigms ===&lt;br /&gt;
&lt;br /&gt;
:::[[File:Lecture18-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture18-2013/lecture18-2013.html]]&lt;br /&gt;
:::[[Media:Lecture18-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture18-2013/lecture18-2013.mp4 Video recording] &amp;amp;nbsp;/ &amp;amp;nbsp; [[#HW1_3|homework 1]]&lt;br /&gt;
&lt;br /&gt;
===''Lectures 3,4:''  Shared Memory Programming with OpenMP, part 1,2===&lt;br /&gt;
&lt;br /&gt;
:::[[Media:Lecture19-2013.pdf|Slides]]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 5:'' Distributed Parallel Programming with MPI, part 1===&lt;br /&gt;
&lt;br /&gt;
:::[[Media:Lecture21-2013.pdf|Slides]]&lt;br /&gt;
&lt;br /&gt;
''Lecture 6''&amp;amp;nbsp;&amp;amp;nbsp; Distributed Parallel Programming with MPI, part 2&amp;lt;br&amp;gt;&lt;br /&gt;
''Lecture 7''&amp;amp;nbsp;&amp;amp;nbsp; Distributed Parallel Programming with MPI, part 3&amp;lt;br&amp;gt;&lt;br /&gt;
''Lecture 8''&amp;amp;nbsp;&amp;amp;nbsp; Hybrid OpenMPI+MPI Programming&lt;br /&gt;
&lt;br /&gt;
== Homework assignments ==&lt;br /&gt;
&lt;br /&gt;
=== HW1 ===&lt;br /&gt;
&lt;br /&gt;
* Read the SciNet tutorial (as it pertains to the GPC)&lt;br /&gt;
* Read the GPC Quick Start.&lt;br /&gt;
* Get the first set of code:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
   $ cd $SCRATCH&lt;br /&gt;
   $ git clone /scinet/course/sc3/homework1&lt;br /&gt;
   $ cd homework1&lt;br /&gt;
   $ source setup&lt;br /&gt;
   $ make&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
*This contains threaded program 'blurppm' and 266 ppm images to be blurred. Usage:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
  blurppm INPUTPPM OUTPUTPPM BLURRADIUS NUMBEROFTHREADS&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
* Simple test:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
  $ qsub -l nodes=1:ppn=8,walltime=2:00:00 -I -X -qdebug&lt;br /&gt;
  $ cd $SCRATCH/homework1&lt;br /&gt;
  $ time blurppm 001.ppm new001.ppm 30 1&lt;br /&gt;
  real  0m52.900s&lt;br /&gt;
  user  0m52.881s&lt;br /&gt;
  sys   0m0.008s&lt;br /&gt;
  $ display 001.ppm &amp;amp;&lt;br /&gt;
  $ display new001.ppm &amp;amp;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
''Assignment 1''&lt;br /&gt;
* Time blurppm with a BLURRADIUS ranging from 1 to 41 in steps of 4, and for NUMBEROFTHREADS ranging from 1 to 16.  Record the (real) duration of each run.&lt;br /&gt;
* Plot the duration as a function of NUMBEROFTHREADS, as well as  the speed-up and efficiency.&lt;br /&gt;
* Submit script and plots of the duration, speedup and effiency as a function of NUMBEROFTHREADS.&lt;br /&gt;
''Assignment 2''&lt;br /&gt;
* Use GNU parallel to run blurppm on all 266 images with a radius of 41.&lt;br /&gt;
* Investigate different scenarios:&lt;br /&gt;
:# Have GNU parallel run 16 at a time with just 1 thread.&lt;br /&gt;
:# Have GNU parallel run 8 at a time with 2 threads.&lt;br /&gt;
:# Have GNU parallel run 4 at a time with 4 threads.&lt;br /&gt;
:# Have GNU parallel run 2 at a time with 8 threads.&lt;br /&gt;
:# Have GNU parallel run 1 at a time with 16 threads.&lt;br /&gt;
:Record the total time it takes in each of these scenarios.&lt;br /&gt;
* Repeat this with a BLURRADIUS of 3.&lt;br /&gt;
* Submit scripts, timing data  and plots.&lt;br /&gt;
&lt;br /&gt;
=== HW2 ===&lt;br /&gt;
&lt;br /&gt;
In the course materials ( /scinet/course/ppp/nbodyc or nbodyf ) there is the source code for a serial N-body integrator.  This, like the molecular dynamics you've seen earlier, calculates long-range forces by particles on all of the other particles.&lt;br /&gt;
&lt;br /&gt;
OpenMP the force calculation, and present timing results for 1,4, and 8 threads compared to the serial version.  Note that you can turn off graphic output by removing the &amp;quot;USEPGPLOT = -DPGPLOT&amp;quot; line in Makefile.inc in the top level directory.&lt;br /&gt;
&lt;br /&gt;
Begin by doubling the work by _not_ calculating two forces at once (eg, not making use of f&amp;lt;sub&amp;gt;ji&amp;lt;/sub&amp;gt; = -f&amp;lt;sub&amp;gt;ij&amp;lt;/sub&amp;gt;), and simply parallelizing the outer force loop.  Then find a way to implement the forces efficiently but also in parallel.  Is there any other part of the problem which could usefully be parallelized?&lt;br /&gt;
&lt;br /&gt;
=Links=&lt;br /&gt;
&lt;br /&gt;
==Unix==&lt;br /&gt;
* Cygwin: http://www.cygwin.com&lt;br /&gt;
* Linux Command Line: A Primer (June 2012) [[Media:SS_IntroToShell.pdf|Slides,]] [[Media:SS_IntroToShell.tgz|Files]]&lt;br /&gt;
* Intro to unix shell from software carpentry: http://software-carpentry.org/4_0/shell&lt;br /&gt;
&lt;br /&gt;
==C/C++==&lt;br /&gt;
* [[One-Day Scientific C++ Class]] at SciNet&lt;br /&gt;
* C++ library reference: http://www.cplusplus.com/reference&lt;br /&gt;
* C preprocessor: http://www.cprogramming.com/tutorial/cpreprocessor.html&lt;br /&gt;
* Boost: http://www.boost.org&lt;br /&gt;
* Boost Python tutorial: http://www.boost.org/doc/libs/1_53_0/libs/python/doc/tutorial/doc/html/index.html&lt;br /&gt;
&lt;br /&gt;
==Git==&lt;br /&gt;
* Git: http://git-scm.com&lt;br /&gt;
* Version Control: [http://support.scinet.utoronto.ca/CourseVideo/PPPcourse/Thursday_Morning_BP_Revision_Control/Thursday_Morning_BP_Revision_Control.mp4 Video]/ [[Media:Snug_techtalk_revcontrol.pdf | Slides]]&lt;br /&gt;
* Git cheat sheet from Git Tower: http://www.git-tower.com/files/cheatsheet/Git_Cheat_Sheet_grey.pdf&lt;br /&gt;
&lt;br /&gt;
==Python==&lt;br /&gt;
* Python: http://www.python.org&lt;br /&gt;
* IPython: http://ipython.org&lt;br /&gt;
* Matplotlib: http://www.matplotlib.org&lt;br /&gt;
* Enthought python distribution: http://www.enthought.com/products/edudownload.php&amp;lt;br/&amp;gt;&lt;br /&gt;
(this gives you numpy, matplotlib and ipython all installed in one fell swoop)&lt;br /&gt;
&lt;br /&gt;
* Intro to python from software carpentry: http://software-carpentry.org/4_0/python&lt;br /&gt;
* Tutorial on matplotlib: http://conference.scipy.org/scipy2011/tutorials.php#jonathan&lt;br /&gt;
* Npy file format: https://github.com/numpy/numpy/blob/master/doc/neps/npy-format.txt&lt;br /&gt;
* Boost Python tutorial: http://www.boost.org/doc/libs/1_53_0/libs/python/doc/tutorial/doc/html/index.html&lt;br /&gt;
&lt;br /&gt;
==ODEs==&lt;br /&gt;
* Integrators for particle based ODEs (i.e. molecular dynamics): http://www.chem.utoronto.ca/~rzon/simcourse/partmd.pdf. &amp;lt;br&amp;gt;'''Focus on 4.1.4 - 4.1.6 for practical aspects.'''&lt;br /&gt;
* Numerical algorithm to solve ODEs (General) in ''Numerical Recipes for C'': http://apps.nrbook.com/c/index.html Chapter 16&lt;br /&gt;
&lt;br /&gt;
==Interpolation (2D) ==&lt;br /&gt;
* Interpolation in ''Numerical Recipes for C'': http://apps.nrbook.com/c/index.html Pages 123-128&lt;br /&gt;
* Wikipedia pages on [http://en.wikipedia.org/wiki/Bilinear_interpolation Bilinear Interpolation] and [http://en.wikipedia.org/wiki/Bicubic_interpolation Bicubic Interpolation] are not bad either.&lt;br /&gt;
&lt;br /&gt;
==BLAS==&lt;br /&gt;
* [http://www.tacc.utexas.edu/tacc-projects/gotoblas2 gotoblas]&lt;br /&gt;
* [http://math-atlas.sourceforge.net/ ATLAS]&lt;br /&gt;
&lt;br /&gt;
==LAPACK==&lt;br /&gt;
* http://www.netlib.org/lapack&lt;br /&gt;
&lt;br /&gt;
==GSL==&lt;br /&gt;
* GNU Scientific Library: http://www.gnu.org/s/gsl&lt;br /&gt;
&lt;br /&gt;
==FFT==&lt;br /&gt;
* FFTW: http://www.fftw.org&lt;br /&gt;
&lt;br /&gt;
==Top500==&lt;br /&gt;
* TOP500 Supercomputing Sites: http://top500.org&lt;br /&gt;
&lt;br /&gt;
==OpenMP==&lt;br /&gt;
* OpenMP (open multi-processing) application programming interface for shared memory programming: http://openmp.org&lt;br /&gt;
&lt;br /&gt;
==GNU parallel==&lt;br /&gt;
* Official citation: O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login: The USENIX Magazine, February 2011:42-47.&lt;br /&gt;
* [[Media:Tech-talk-gnu-parallel.pdf|Slides of the SciNet TechTalk on Gnu Parallel (14 Nov 2012)]]&lt;br /&gt;
* The documentation for GNU parallel can be found at http://www.gnu.org/software/parallel/&lt;br /&gt;
* Its man page can be found here http://www.gnu.org/software/parallel/man.html&lt;br /&gt;
* The man page is also available on the GPC when the gnu-parallel module is loaded, with the command &amp;lt;code&amp;gt;$ man parallel&amp;lt;/code&amp;gt;. The man page contains options, such as how to make sure the output is not all scrambled, and examples.&lt;br /&gt;
&lt;br /&gt;
==SciNet==&lt;br /&gt;
&lt;br /&gt;
Anything on this wiki, really, but specifically:&lt;br /&gt;
* [[Essentials|SciNet Essentials]]&lt;br /&gt;
* [[GPC Quickstart]]&lt;br /&gt;
* [[Media:SciNet_Tutorial.pdf |SciNet User Tutorial]]&lt;br /&gt;
* [[Software and Libraries]]&lt;br /&gt;
&lt;br /&gt;
==Other Resources==&lt;br /&gt;
* [http://galileo.phys.virginia.edu/classes/551.jvn.fall01/goldberg.pdf What Every Computer Scientist Should Know About Floating-Point Arithmetic] - the classic (and extremely comprehensive) overview of the basics of floating point math.   The first few pages, in particular, are very useful.&lt;br /&gt;
* [http://arxiv.org/abs/1005.4117 Random Numbers In Scientific Computing: An Introduction] by Katzgraber.   A very lucid discussion of pseudo random number generators for science.&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=File:Lecture21-2013.pdf&amp;diff=5915</id>
		<title>File:Lecture21-2013.pdf</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=File:Lecture21-2013.pdf&amp;diff=5915"/>
		<updated>2013-04-02T13:59:17Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Scientific_Computing_Course&amp;diff=5914</id>
		<title>Scientific Computing Course</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Scientific_Computing_Course&amp;diff=5914"/>
		<updated>2013-04-02T13:58:49Z</updated>

		<summary type="html">&lt;p&gt;Ljdursi: /* Topics */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;''This wiki page concerns the 2013 installment of SciNet's Scientific Computing course. Material from the previous installment can be found on [[Scientific Software Development Course]], [[Numerical Tools for Physical Scientists (course)]], and [[High Performance Scientific Computing]]''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
=Syllabus=&lt;br /&gt;
&lt;br /&gt;
==About the course==&lt;br /&gt;
* Whole-term graduate course&lt;br /&gt;
* Prerequisite: basic C, C++ or Fortran experience.&lt;br /&gt;
* Will use `C++ light' and Python&lt;br /&gt;
* Topics include: Scientific computing and programming skills, Parallel programming, and Hybrid programming.  &lt;br /&gt;
&lt;br /&gt;
There are three parts to this course:&lt;br /&gt;
&lt;br /&gt;
# Scientific Software Development: Jan/Feb 2013&amp;lt;br&amp;gt;''python, C++, git, make, modular programming, debugging''&lt;br /&gt;
# Numerical Tools for Physical Scientists: Feb/Mar 2013&amp;lt;br&amp;gt;''modelling, floating point, Monte Carlo, ODE, linear algebra,fft''&lt;br /&gt;
# High Performance Scientific Computing: Mar/Apr 2013&amp;lt;br&amp;gt;''openmp, mpi and hybrid programming''&lt;br /&gt;
&lt;br /&gt;
Each part consists of eight one-hour lectures, two per week.&lt;br /&gt;
&lt;br /&gt;
These can be taken separately by astrophysics graduate students at the University of Toronto as mini-courses, and by physics graduate students at the University of Toronto as modular courses.&lt;br /&gt;
&lt;br /&gt;
The first two parts count towards the SciNet Certificate in Scientific Computing, while the third part can count towards the SciNet HPC Certificate. For more info about the SciNet Certificates, see http://www.scinethpc.ca/2012/12/scinet-hpc-certificate-program.&lt;br /&gt;
&lt;br /&gt;
==Location and Times==&lt;br /&gt;
[http://www.scinethpc.ca/2010/08/contact-us SciNet HeadQuarters]&amp;lt;br&amp;gt;&lt;br /&gt;
256 McCaul Street, Toronto, ON&amp;lt;br&amp;gt;&lt;br /&gt;
Room 229 (Conference Room)&amp;lt;br&amp;gt;&lt;br /&gt;
Tuesdays 11:00 am - 12:00 noon&amp;lt;br&amp;gt;&lt;br /&gt;
Thursdays 11:00 am - 12:00 noon&lt;br /&gt;
&lt;br /&gt;
==Instructors and office hours==&lt;br /&gt;
&lt;br /&gt;
* Ramses van Zon - 256 McCaul Street, Rm 228 - Mondays 3-4pm&lt;br /&gt;
* L. Jonathan Dursi - 256 McCaul Street, Rm 216 - Wednesdays 3-4pm&lt;br /&gt;
&lt;br /&gt;
==Grading scheme==&lt;br /&gt;
&lt;br /&gt;
Attendence to lectures.&lt;br /&gt;
&lt;br /&gt;
Four home work sets (i.e., one per week), to be returned by email by 9:00 am the next Thursday.&lt;br /&gt;
&lt;br /&gt;
==Sign up==&lt;br /&gt;
Sign up for this graduate course goes through SciNet's course website.&amp;lt;br&amp;gt;The direct link is https://support.scinet.utoronto.ca/courses/?q=node/99.&amp;lt;br&amp;gt;  If you do not have a SciNet account but wish to register for this course, please email support@scinet.utoronto.ca . &amp;lt;br&amp;gt;&lt;br /&gt;
Sign up is closed.&lt;br /&gt;
&lt;br /&gt;
=Part 1: Scientific Software Development=&lt;br /&gt;
&lt;br /&gt;
==Prerequisites==&lt;br /&gt;
&lt;br /&gt;
Some programming experience. Some unix prompt experience.&lt;br /&gt;
&lt;br /&gt;
'''Software that you'll need:'''&lt;br /&gt;
&lt;br /&gt;
A unix-like environment with the GNU compiler suite (e.g. Cygwin), and Python 2, IPython, Numpy, SciPy and Matplotlib (which you all get if you use the Enthought distribution) installed on your laptop. Links are given at the bottom of this page.&lt;br /&gt;
&lt;br /&gt;
==Dates==&lt;br /&gt;
&lt;br /&gt;
January 15, 17, 22, 24, 29, and 31, 2013&amp;lt;br&amp;gt;&lt;br /&gt;
February 5 and 7, 2013&lt;br /&gt;
&lt;br /&gt;
==Topics (with lecture slides and recordings)==&lt;br /&gt;
&lt;br /&gt;
===''Lecture 1:'' C++ introduction===&lt;br /&gt;
:::[[File:Lecture1-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture1-2013/lecture1-2013.html]]&lt;br /&gt;
:::[[Media:Lecture1-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture1-2013/lecture1-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 2:'' More C++, build and version control&amp;lt;br&amp;gt;===&lt;br /&gt;
:::[[File:Lecture2-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture2-2013/lecture2-2013.html]]&lt;br /&gt;
:::Guest lecturer: Michael Nolta (CITA) for the git portion of the lecture.&lt;br /&gt;
:::[[Media:Lecture2-2013.pdf|C++ and Make slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture2-2013/lecture2-2013.mp4 C++ and Make video recording] &amp;amp;nbsp;/ &amp;amp;nbsp; [[Media:Git-Nolta.pdf|Git slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [[#HW1|Homework assigment 1]]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 3:'' Python and visualization===&lt;br /&gt;
:::[[File:Lecture3-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture3-2013/lecture3-2013.html]]&lt;br /&gt;
:::[[Media:Lecture3-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture3-2013/lecture3-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 4:'' Modular programming, refactoring, testing===&lt;br /&gt;
:::[[File:Lecture4-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture4-2013/lecture4-2013.html]]&lt;br /&gt;
:::[[Media:Lecture4-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture4-2013/lecture4-2013.mp4 Video recording] &amp;amp;nbsp;/ &amp;amp;nbsp;  [[#HW2|Homework assigment 2]]&lt;br /&gt;
:::[http://wiki.scinethpc.ca/wiki/images/f/f0/diffuse.cc diffuse.cc (course project source file)] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://wiki.scinethpc.ca/wiki/images/f/f0/plotdata.py plotdata.py (corresponding python movie generator)]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 5:'' Object oriented programming===&lt;br /&gt;
:::[[Media:Lecture5-2013.pdf|Slides]]&lt;br /&gt;
:::Recordings of this lecture are missing, but you could view the videos of SciNet's [[One-Day Scientific C++ Class]], in particular the parts on classes, polymorphism, and inheritance.&lt;br /&gt;
&lt;br /&gt;
===''Lecture 6:'' ODE, interpolation===&lt;br /&gt;
:::[[File:Lecture6-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture6-2013/lecture6-2013.html]]&lt;br /&gt;
:::[[Media:ScientificComputing2013-Lecture5-ODE.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture6-2013/lecture6-2013.mp4 Video recording] &amp;amp;nbsp;/ &amp;amp;nbsp; [[#HW3|Homework assigment 3]]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 7:'' Development tools: debugging and profiling===&lt;br /&gt;
:::[[File:Lecture7-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture7-2013/lecture7-2013.html]]&lt;br /&gt;
:::[[Media:ScientificComputing2013-Debugging.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture7-2013/lecture7-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 8:'' Objects in Python, linking C++ and Python===&lt;br /&gt;
:::[[File:Lecture8-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture8-2013/lecture8-2013.html]]&lt;br /&gt;
:::[[Media:Lecture8-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture8-2013/lecture8-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
==Homework assignments==&lt;br /&gt;
&lt;br /&gt;
===HW1===&lt;br /&gt;
&lt;br /&gt;
'''''Multi-file C++ program to create a data file'''''&lt;br /&gt;
&lt;br /&gt;
We’ve learned programming in basic C++, use of make and Makefiles to build projects, and local use of git for version control. In this first assignment, you’ll use these to make a multi-file C++ program, built with make, which computes and outputs a data file.&lt;br /&gt;
&lt;br /&gt;
* Start a git repository, and begin writing a C++ program to&lt;br /&gt;
:# Get an array size and a standard deviation from user input,&lt;br /&gt;
:# Allocate a 2d array (use the code given in lecture 2),&lt;br /&gt;
:# Store a 2d Gaussian with a maximum at the centre of the array and given standard deviation (in units of grid points),&lt;br /&gt;
:# Output that array to a text file,&lt;br /&gt;
:# Free the array, and exit. &lt;br /&gt;
* The output text file should contain just the data in text format, with a row of the file corresponding to a row of the array and with whitespace between the numbers. &lt;br /&gt;
* The 2d array creation/freeing routines should be in one file (with an associated header file), the gaussian calculation be in another (ditto), and the output routine be in a third, with the main program calling each of these. &lt;br /&gt;
* Use a makefile to build your code (add it to the repository).&lt;br /&gt;
* You can start with everything in one file, with hardcoded values for sizes and standard deviation and a static array, then refactor things into multiple files, adding the other features.&lt;br /&gt;
* As a test, use the ipython executable that came with your Enthought python distribution to read your data and plot it.&amp;lt;br&amp;gt;If your data file is named ‘data.txt’, running the following:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ipython --pylab&lt;br /&gt;
In [1]: data = numpy.genfromtxt('data.txt') &lt;br /&gt;
In [2]: contour(data) &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
should give a nice contour plot of a 2-dimensional gaussian.&lt;br /&gt;
* Email in your source code, makefile and the &amp;quot;git log&amp;quot; output of all your commits by email by at 9:00 am Thursday Jan 24th, 2013. Please zip or tar these files together as one attachment, with a file name that includes your name and &amp;quot;HW1&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
===HW2===&lt;br /&gt;
'''''Refactor legacy code to a modular project with unit tests'''''&lt;br /&gt;
&lt;br /&gt;
In class, today, we talked about modular programming and testing, and the project we’ll be working on for the next three weeks. This homework will start advancing on that project by working on the “legacy” code given to us by our supervisor ([http://wiki.scinethpc.ca/wiki/images/f/f0/diffuse.cc diffuse.cc]), with a corresponding python plotting script ([http://wiki.scinethpc.ca/wiki/images/f/f0/plotdata.py plotdata.py]), and whipping it into shape before we start adding new physics.&lt;br /&gt;
* Start a git repository for this project, and add the two files.&lt;br /&gt;
* Create a Makefile and add it to the repository.&lt;br /&gt;
* Since we have no tests, run the program with console output redirected to a file:&lt;br /&gt;
:&amp;lt;pre&amp;gt;$ diffuse &amp;gt; original-output.txt&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;''It turns out the code has a bug that can make the output different when the same code is run again, which obviously would not be good for a baseline test. Replace 'float error;' by 'float error=0.0;' to fix this.''&lt;br /&gt;
* Also save the two .npy output files, e.g. to original-data.npy and original-theory.npy. The triplet of files (original-output.txt, original-data.npy and original-theory.npy) serve as a baseline integrated test (add these to repository). &lt;br /&gt;
* Then write a 'test' target in your makefile that:&lt;br /&gt;
** Runs 'diffuse' with output to a new file.&lt;br /&gt;
** Compares the file with the baseline test file, and compare the .npy files.&lt;br /&gt;
:: (hint: the unix command diff or cmp can compare files).&lt;br /&gt;
* First refactoring: Move the global variables into the main routine.&lt;br /&gt;
* ''Chorus: Test your modified code, and commit.''&lt;br /&gt;
* Second refactoring: Extract a diffusion operator routine, that gets called from main.&lt;br /&gt;
* ''Chorus''&lt;br /&gt;
* Create a .cc/.h module for the diffusion operator.&lt;br /&gt;
* ''Chorus''&lt;br /&gt;
* Add two tests for the diffusion operator: for a constant and for a linear input field (&amp;lt;tt&amp;gt;rho[i][j]=a*i+b*j&amp;lt;/tt&amp;gt;). Add these to the test target in the makefile.&lt;br /&gt;
* ''Chorus''&lt;br /&gt;
* More refactoring: Extract three more .cc/.h modules:&lt;br /&gt;
** for output (should not contain hardcoded filenames)    &lt;br /&gt;
** computation of the theory&lt;br /&gt;
** and for the array allocation stuff.&lt;br /&gt;
* ''Chorus''&lt;br /&gt;
* Describe, but don't implement in the .h and .cc, what would be appropriate unit tests for these three modules.&lt;br /&gt;
&lt;br /&gt;
Email in your source code and the git log file of all your commits as a .zip or .tar file by email to rzon@scinethpc.ca and ljdursi@scinethpc.ca by 9:00 am on Thursday January 31, 2013.&lt;br /&gt;
&lt;br /&gt;
===HW3===&lt;br /&gt;
This week, we learned about object oriented programming, which fits nicely within the modular programming idea.  In this homework, we are going to use some of it to restructure our code and get it ready to add the tracer particle, the goal of the course project. &lt;br /&gt;
&lt;br /&gt;
The goal will be to have an instance of a &amp;lt;tt&amp;gt;Diffusion&amp;lt;/tt&amp;gt; class,&lt;br /&gt;
as well as an instance of &amp;lt;tt&amp;gt;Tracer&amp;lt;/tt&amp;gt;, which for now will be a&lt;br /&gt;
free particle moving as ('''x'''(t),'''y'''(t)) = ('''x'''(0) +&lt;br /&gt;
'''vx''' t, '''y'''(0) + '''vy''' t), without any coupling yet (we&lt;br /&gt;
will handle this next week).&lt;br /&gt;
&lt;br /&gt;
To be more specific:&lt;br /&gt;
* Clean up your code, using the feedback from your HW2 grading, such that the modules are as independent as possible. &lt;br /&gt;
* If you have not done so yet, add comments to the header files of your modules to explain exactly what each function does (without going into implementation details), what its arguments mean and what it returns (unless it's a void function, of course). &lt;br /&gt;
* Objectify the &amp;lt;tt&amp;gt;main&amp;lt;/tt&amp;gt; routine, by creating a class &amp;lt;tt&amp;gt;Diffusion&amp;lt;/tt&amp;gt;.&lt;br /&gt;
* Put this class in its own module (declaration in .h, implementation in .cc). For instance, the declaration could be&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
// diffusion.h&lt;br /&gt;
#ifndef DIFFUSIONH&lt;br /&gt;
#define DIFFUSIONH&lt;br /&gt;
#include &amp;lt;fstream&amp;gt;&lt;br /&gt;
class Diffusion {&lt;br /&gt;
  public:&lt;br /&gt;
    Diffusion(float x1, float x2, float D, int numPoints);&lt;br /&gt;
    void init(float a0, float sigma0); // set initial field&lt;br /&gt;
    void timeStep(float dt);           // solve diff. equation over dt&lt;br /&gt;
    void toFile(std::ofstream&amp;amp; f);     // write to file (binary,no npyheader)&lt;br /&gt;
    void toScreen();                   // report a line to screen&lt;br /&gt;
    float getRho(int i, int j);        // get a value of the field&lt;br /&gt;
    ~Diffusion();&lt;br /&gt;
  private:&lt;br /&gt;
    float*** rho;&lt;br /&gt;
    ...&lt;br /&gt;
};&lt;br /&gt;
#endif&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
(this is not supposed to be prescriptive.)&lt;br /&gt;
* In the implementation file you'd have things like&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
// diffusion.cc&lt;br /&gt;
#include &amp;quot;diffusion.h&amp;quot;&lt;br /&gt;
...&lt;br /&gt;
void Diffusion::timeStep(float dt) &lt;br /&gt;
{&lt;br /&gt;
   // code for the timeStep ...&lt;br /&gt;
}&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
(note the inclusion of the module's header file on the top of the implementation, so the class is declared).&lt;br /&gt;
* Let &amp;lt;tt&amp;gt;int main()&amp;lt;/tt&amp;gt; have the same functionality as before, but now by defining the parameters of the run, creating an object of this class, setting up file streams, and taking time steps and writing out by using calls to member functions of this object. &lt;br /&gt;
* Additionally, write a class &amp;lt;tt&amp;gt;Tracer&amp;lt;/tt&amp;gt; which for now implements a free particle in 2d. Something like:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
class Tracer {&lt;br /&gt;
  public:&lt;br /&gt;
    Tracer(float x1, float x2);&lt;br /&gt;
    void init(float x0, float y0, float vx, float vy);&lt;br /&gt;
    void timeStep(float dt);           // solve diff. equation over dt&lt;br /&gt;
    void toFile(std::ofstream&amp;amp; f);     // write to file (binary,no npyheader)&lt;br /&gt;
    void toScreen();                   // report a line to screen&lt;br /&gt;
    ~Tracer();&lt;br /&gt;
  private:&lt;br /&gt;
    ...&lt;br /&gt;
};&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
:The timeStep implementation can in this case use the infamous forward Euler integration scheme, because it happens to be exact here.&lt;br /&gt;
:When it comes to output to a npy file, let's view the the data of the tracer particle at one point in time as a 2x2 matrix &amp;lt;tt&amp;gt;[[x,y],[vx,vy]]&amp;lt;/tt&amp;gt;, so we can use much of the npy output code that we used for the diffusion field, which was a (numPoints+2)x(numPoints+2) matrix.&lt;br /&gt;
* This class too should be its own module (Often, &amp;quot;one class, one module&amp;quot; is a good paradigm, though occasionally you'll have closely related classes).&lt;br /&gt;
* Add some code to int main to  have the Tracer particle evolve at the same time as the diffusion field (although the two are completely uncoupled).&lt;br /&gt;
* Keep using git and make, run the tests that you have regularly to make sure your program still works.&lt;br /&gt;
&lt;br /&gt;
Note that because we've now set up our program in a modular fashion, you can do&lt;br /&gt;
different parts of this assignment in any order you want.  For instance, to wrap your head around object oriented programming, you may like implementing the tracer particle first, so that your diffusion code stays intact.  Or you might want to wait with commenting until the end if you think you'll have to change a module for this assignment.&lt;br /&gt;
&lt;br /&gt;
Email in your source code and the git log file of all your commits as a .zip or .tar file by email to rzon@scinethpc.ca and ljdursi@scinethpc.ca by &lt;br /&gt;
&amp;lt;span style=&amp;quot;color:#ee3300&amp;quot;&amp;gt;3:00 pm on Friday February 8, 2013&amp;lt;/span&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
===HW4===&lt;br /&gt;
&lt;br /&gt;
In this homework, we are going to implement the class project of a tracer particle coupled to a diffusion equation. &lt;br /&gt;
The full specification of the physical problem is [[Media:ScClassProject.pdf|here]].  &lt;br /&gt;
* Augment the tracer particle to include a force in the x and in the y direction, and a friction coefficient alpha, which at first can be constant.&lt;br /&gt;
* Implement the so-called leapfrog integration algorithm for the tracer particle&lt;br /&gt;
:::v &amp;amp;larr; v + f(v) &amp;amp;Delta;t / m&lt;br /&gt;
:::r &amp;amp;larr; r + v &amp;amp;Delta;t&lt;br /&gt;
:where v,r, and f are 2d vectors and f(v) is the total, velocity dependence force speficied in the class project, i.e., the sum of the external force F=qE and the friction force -&amp;amp;alpha;v.&amp;lt;br/&amp;gt;(Note: the v dependence of f make this strictly not a leapfrog integration, but we'll ignore that here.)&lt;br /&gt;
* Further augment the tracer class with a member function 'couple' which takes a diffusion field as input, and adjusts the friction constant. &lt;br /&gt;
* Your implementation of the 'couple' member function will need to interpolate the diffusion field to the current position of the particle. Use [[Media:CppInterpolation.tgz|this interpolation module]].&lt;br /&gt;
* Rewrite your main routine so that before tracer's time step, one calls the coupling. You may need to modify the Diffusion class a bit to get &amp;lt;tt&amp;gt;rho[active]&amp;lt;/tt&amp;gt; out.&lt;br /&gt;
* For simplicity, use the same time step for both the diffusion and the tracer particle.&lt;br /&gt;
* Keep using git and make.&lt;br /&gt;
&lt;br /&gt;
You will hand in your source code, makefiles and the git log file of all your commits by email by &amp;lt;span style=&amp;quot;color:#ee3300&amp;quot;&amp;gt;9:00 am on Thursday February 21, 2013&amp;lt;/span&amp;gt;.  Email the files, preferably zipped or tarred, to rzon@scinethpc.ca and ljdursi@scinethpc.ca.&lt;br /&gt;
&lt;br /&gt;
=Part 2: Numerical Tools for Physical Scientists=&lt;br /&gt;
&lt;br /&gt;
==Prerequisites==&lt;br /&gt;
&lt;br /&gt;
Part 1 or solid c++ programming skills, including make and unix/linux prompt experience.&lt;br /&gt;
&lt;br /&gt;
'''Software that you'll need'''&lt;br /&gt;
&lt;br /&gt;
A unix-like environment with the GNU compiler suite (e.g. Cygwin), and Python (Enthought) installed on your laptop.&lt;br /&gt;
&lt;br /&gt;
==Dates==&lt;br /&gt;
&lt;br /&gt;
February 12, 14, 26, and 28, 2013&amp;lt;br&amp;gt;&lt;br /&gt;
March 5, 7, 12, and 14, 2013&lt;br /&gt;
&lt;br /&gt;
==Topics==&lt;br /&gt;
&lt;br /&gt;
===''Lecture 1:'' Numerics ===&lt;br /&gt;
:::[[File:Lecture9-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture9-2013/lecture9-2013.html]]&lt;br /&gt;
:::[[Media:Lecture9-2013-Numerics.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture9-2013/lecture9-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 2:'' Random numbers ===&lt;br /&gt;
:::[[File:Lecture10-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture10-2013/lecture10-2013.html]]&lt;br /&gt;
:::[[Media:Lecture10-2013-PRNG.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture10-2013/lecture10-2013.mp4 Video recording] &amp;amp;nbsp;/ &amp;amp;nbsp;[http://wiki.scinethpc.ca/wiki/index.php/Scientific_Computing_Course#HW1_2 Homework assignment 1]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 3:'' Numerical integration and ODEs ===&lt;br /&gt;
:::[[File:Lecture11-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture11-2013/lecture11-2013.html]]&lt;br /&gt;
:::[[Media:Lecture11-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture11-2013/lecture11-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 4:'' Molecular Dynamics ===&lt;br /&gt;
:::[[File:Lecture12-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture12-2013/lecture12-2013.html]]&lt;br /&gt;
:::[[Media:Lecture12-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture12-2013/lecture12-2013.mp4 Video recording]  &amp;amp;nbsp;/ &amp;amp;nbsp;[http://wiki.scinethpc.ca/wiki/index.php/Scientific_Computing_Course#HW2_2 Homework assignment 2]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 5:'' Linear Algebra part I ===&lt;br /&gt;
:::[[Media:Lecture13-2013.pdf|Slides (combined with lecture 6)]]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 6:'' Linear Algebra part II and PDEs===&lt;br /&gt;
:::[[File:Lecture14-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture14-2013/lecture14-2013.html]]&lt;br /&gt;
:::[[Media:Lecture13-2013.pdf|Slides (combined with lecture 5)]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture14-2013/lecture14-2013.mp4 Video recording]  &amp;amp;nbsp;/ &amp;amp;nbsp;[http://wiki.scinethpc.ca/wiki/index.php/Scientific_Computing_Course#HW3_2 Homework assignment 3]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 7:'' Fast Fourier Transform===&lt;br /&gt;
:::[[File:Lecture15-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture15-2013/lecture15-2013.html]]&lt;br /&gt;
:::[[Media:Lecture15-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture15-2013/lecture15-2013.mp4 Video recording]  &amp;amp;nbsp;/ &amp;amp;nbsp;[[Media:Sincfftw.cc|example code]]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 8:'' FFT for real and multidimensional data===&lt;br /&gt;
:::[[File:Lecture15-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture16-2013/lecture16-2013.html]]&lt;br /&gt;
:::[[Media:Lecture16-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture16-2013/lecture16-2013.mp4 Video recording]  &amp;amp;nbsp;/ &amp;amp;nbsp; [http://wiki.scinethpc.ca/wiki/index.php/Scientific_Computing_Course#HW4_2 Homework assignment 4]&lt;br /&gt;
&lt;br /&gt;
==Homework Assignments==&lt;br /&gt;
&lt;br /&gt;
===HW1===&lt;br /&gt;
This week's homework consists of two assignments.&lt;br /&gt;
&lt;br /&gt;
''Assignment 1''&lt;br /&gt;
&lt;br /&gt;
* Consider the sequence of numbers: 1 followed by 10&amp;lt;sup&amp;gt;8&amp;lt;/sup&amp;gt; values of 10&amp;lt;sup&amp;gt;-8&amp;lt;/sup&amp;gt;&lt;br /&gt;
* Should sum to 2&lt;br /&gt;
* Write code which sums up those values in order. What answer does it get?&lt;br /&gt;
* Add to program routine which sums up values in reverse order. Does it get correct answer?&lt;br /&gt;
* How would you get correct answer?&lt;br /&gt;
* Submit code, Makefile, text file with answers.&lt;br /&gt;
&lt;br /&gt;
''Assignment 2''&lt;br /&gt;
&lt;br /&gt;
* Implement an linear congruential generator with a = 106, c = 1283, m = 6075 that generates random numbers from 0..1&lt;br /&gt;
* Using that and MT: generate 10,000 pairs (dx, dy) with dx, dy each in -0.1 .. +0.1. Generate histograms of dx and dy (say 200 bins). Does it look okay? What would you expect variation to be?&lt;br /&gt;
* For 10,000 points: take random walks from (x,y)=(0,0) until exceed radius of 2, then stop. Plot histogram of final angles for the two psuedo random number generators. What do you see?&lt;br /&gt;
* Submit makefile, code, plots, git log.&lt;br /&gt;
&lt;br /&gt;
Both assignments due on Thursday Feb 28th, 2013, at 9:00 am. Email the files to rzon@scinethpc.ca and ljdursi@scinethpc.ca.&lt;br /&gt;
&lt;br /&gt;
===HW2===&lt;br /&gt;
&lt;br /&gt;
''Assignment 1''&lt;br /&gt;
&lt;br /&gt;
* Compute numerically (using the GSL):&lt;br /&gt;
&lt;br /&gt;
::&amp;amp;int;&amp;lt;sub&amp;gt;0&amp;lt;/sub&amp;gt;&amp;lt;sup&amp;gt;3&amp;lt;/sup&amp;gt; f(x) &amp;amp;nbsp;dx&lt;br /&gt;
&lt;br /&gt;
:(that is the integral of f(x) from x=0 to x=3)&lt;br /&gt;
&lt;br /&gt;
:with&lt;br /&gt;
&lt;br /&gt;
::f(x) = ln(x) sin(x) e&amp;lt;sup&amp;gt;-x&amp;lt;/sup&amp;gt;&lt;br /&gt;
&lt;br /&gt;
:using three different methods:&lt;br /&gt;
# Extended Simpsons' rule&lt;br /&gt;
# Gauss-Legendre quadrature&lt;br /&gt;
# Monte Carlo sampling &lt;br /&gt;
&lt;br /&gt;
*Hint: what is f(0)?&lt;br /&gt;
&lt;br /&gt;
*Compare the convergence of these methods by increasing number of function evaluations.&lt;br /&gt;
&lt;br /&gt;
*Submit makefile, code, plots, version control log. &lt;br /&gt;
&lt;br /&gt;
''Assignment 2''&lt;br /&gt;
&lt;br /&gt;
* Using an adaptive 4th order Runge-Kutta approach, with a relative accuracy of 1e-4, compute the solution for t = [0,100] of the following set of coupled ODEs (Lorenz oscillator)&lt;br /&gt;
&lt;br /&gt;
::dx/dt = &amp;amp;sigma;(y - x)&lt;br /&gt;
&lt;br /&gt;
::dy/dt = (&amp;amp;rho;-z)x-y&lt;br /&gt;
&lt;br /&gt;
::dz/dt = xy - &amp;amp;beta;z&lt;br /&gt;
&lt;br /&gt;
:with &amp;amp;sigma;=10; &amp;amp;beta;=8/3; &amp;amp;rho; = 28, and with initial conditions&lt;br /&gt;
&lt;br /&gt;
::x(0) = 10&lt;br /&gt;
&lt;br /&gt;
::y(0) = 20&lt;br /&gt;
&lt;br /&gt;
::z(0) = 30&lt;br /&gt;
&lt;br /&gt;
* Hint: study the GSL documentation.&lt;br /&gt;
&lt;br /&gt;
*Submit makefile, code, plots, version control log.&lt;br /&gt;
&lt;br /&gt;
Both assignments due on Thursday Mar 7th, 2013, at 9:00 am. Email the files to rzon@scinethpc.ca and ljdursi@scinethpc.ca.&lt;br /&gt;
&lt;br /&gt;
===HW3===&lt;br /&gt;
&lt;br /&gt;
Part 1:&lt;br /&gt;
&lt;br /&gt;
The time-explicit formulation of the 1d diffusion equation looks like this:&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\begin{eqnarray*}&lt;br /&gt;
q^{n+1} &amp;amp; = &amp;amp; q^n + \frac{D \Delta t}{\Delta x^2} &lt;br /&gt;
\left (&lt;br /&gt;
\begin{matrix}&lt;br /&gt;
-2 &amp;amp; 1 \\&lt;br /&gt;
1 &amp;amp; -2 &amp;amp; 1 \\&lt;br /&gt;
&amp;amp; 1 &amp;amp; -2 &amp;amp; 1 \\&lt;br /&gt;
&amp;amp;  &amp;amp;  &amp;amp; \cdots &amp;amp; \\&lt;br /&gt;
&amp;amp;  &amp;amp;  &amp;amp; 1 &amp;amp; -2 &amp;amp; 1 \\&lt;br /&gt;
&amp;amp;  &amp;amp;  &amp;amp; &amp;amp; 1 &amp;amp; -2 \\&lt;br /&gt;
\end{matrix}&lt;br /&gt;
\right ) q^n \\&lt;br /&gt;
&amp;amp; = &amp;amp; \left ( 1 + \frac{D \Delta t}{\Delta x^2} A \right ) q^n&lt;br /&gt;
\end{eqnarray*}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
what are the eignvalues of the matrix A?   What modes would we expect to be amplified/damped by this operator?&lt;br /&gt;
&lt;br /&gt;
* Consider 100 points in the discretization (eg, A is 100x100)&lt;br /&gt;
* Calculate the eigenvalues and eigenvectors (using D__EV ; which sort of matrix are we using here?)&lt;br /&gt;
* Plot the modes with the largest and smallest absolute-value of eigenvalues, and explain their physical significance&lt;br /&gt;
* The numerical method will become unstable with one eigenmode $v$ begins to grow uncontrollably whenever it is present, e.g.&lt;br /&gt;
$ \frac{D \Delta t}{\Delta x^2} A v = \frac{D \Delta t}{\Delta x^2} \lambda v &amp;gt; v$.   In a timestepping solution, the only way to avoid this for a given physical set of parameters and grid size is to reduce the timestep, $\Delta t$.   Use the largest absolute value eigenvalue to place a constraint on $\Delta t$ for stability.&lt;br /&gt;
&lt;br /&gt;
Part 2:&lt;br /&gt;
&lt;br /&gt;
Using the above constraint on $\Delta t$, for a 1d grid of size 100 (eg, a 100x100 matrix A), using lapack, evolve this PDE. Plot and explain results.&lt;br /&gt;
&lt;br /&gt;
* Have an initial condition of $q(x=0,t=0) = 1$, and $q(t=0)$ everywhere else being zero (eg, hot plate just turned on at the left)&lt;br /&gt;
* Take ~100 timesteps and plot the the evolution of $q(x,t)$ at 5 times over that period.&lt;br /&gt;
* You’ll want to use a matrix multiply to compute the matrix-vector multiply ( http://www.gnu.org/software/gsl/manual/html_node/Level-2-GSL-BLAS-Interface.html). Do multiply in double precision (D__MV). Which  should you use?&lt;br /&gt;
* The GSL has a cblas interface, http://www.gnu.org/software/gsl/manual/html_node/Level-2-GSL-BLAS-Interface.html ; an example of its use can be found here http://www.gnu.org/software/gsl/manual/html_node/GSL-CBLAS-Examples.html&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Important things to know about lapack:&lt;br /&gt;
* If you are using an nxn array, the “leading dimension” of the array is n. (This argument is so that you could work on sub-matrices if you wanted)&lt;br /&gt;
* Have to make sure the 2d array is contiguous block of memory&lt;br /&gt;
* You'll (presumably) want to use the C bindings for LAPACK - [http://www.netlib.org/lapack/lapacke.html lapacke].  Note that the usual C arrays are row-major.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Here's a simple example of calling a LAPACKE routine; note that how the matrix is described (here with a pointer to the data, a leading dimension, and the number of rows and columns) will vary with different types of matrix:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
#include &amp;lt;iostream&amp;gt;&lt;br /&gt;
#include &amp;lt;mkl_lapacke.h&amp;gt;&lt;br /&gt;
&lt;br /&gt;
double **matrix(int n,int m);&lt;br /&gt;
void free_matrix(double **a);&lt;br /&gt;
&lt;br /&gt;
int main (int argc, const char * argv[])&lt;br /&gt;
{&lt;br /&gt;
&lt;br /&gt;
   const int n=5;             // number of rows, columns of the matrix&lt;br /&gt;
   const int m = n;           // nrows&lt;br /&gt;
   const int leading_dim_A=n; // leading dimension (# of cols for row major);&lt;br /&gt;
                              // lets us operate on sub-matrices in principle&lt;br /&gt;
   const int leading_dim_b=n; // similarly for b&lt;br /&gt;
   double **A;&lt;br /&gt;
   double *b;&lt;br /&gt;
&lt;br /&gt;
   b = new double[leading_dim_b];&lt;br /&gt;
   A = matrix(n,leading_dim_A);&lt;br /&gt;
&lt;br /&gt;
   for (int i=0; i&amp;lt;n; i++)&lt;br /&gt;
       for (int j=0; j&amp;lt;leading_dim_A; j++)&lt;br /&gt;
            A[i][j] = 0.;&lt;br /&gt;
&lt;br /&gt;
   // let's do a trivial solve&lt;br /&gt;
   // It should be pretty clear that the solution to this system&lt;br /&gt;
   // is x = {0,1,2...n-1}&lt;br /&gt;
&lt;br /&gt;
   for (int i=0; i&amp;lt;leading_dim_A; i++) {&lt;br /&gt;
        A[i][i] = 2.;&lt;br /&gt;
   }&lt;br /&gt;
&lt;br /&gt;
   for (int i=0; i&amp;lt;leading_dim_b; i++) {&lt;br /&gt;
        b[i]    = 2*i;&lt;br /&gt;
   }&lt;br /&gt;
&lt;br /&gt;
   const char transpose='N';     //solve Ax=b, not A^T x = b&lt;br /&gt;
   const int  nrhs = 1;          //  we're only solving 1 right hand side&lt;br /&gt;
   int info;&lt;br /&gt;
&lt;br /&gt;
   // Call DGELS; b will be overwritten with the value of x.&lt;br /&gt;
   info = LAPACKE_dgels(LAPACK_COL_MAJOR,transpose,m,n,nrhs,&lt;br /&gt;
                          &amp;amp;(A[0][0]),leading_dim_A, &amp;amp;(b[0]),leading_dim_b);&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
   // print results&lt;br /&gt;
   for(int i=0;i&amp;lt;n;i++)&lt;br /&gt;
   {&lt;br /&gt;
      if (i != n/2)&lt;br /&gt;
        std::cout &amp;lt;&amp;lt; &amp;quot;    &amp;quot; &amp;lt;&amp;lt; b[i] &amp;lt;&amp;lt; std::endl;&lt;br /&gt;
      else&lt;br /&gt;
        std::cout &amp;lt;&amp;lt; &amp;quot;x = &amp;quot; &amp;lt;&amp;lt; b[i] &amp;lt;&amp;lt; std::endl;&lt;br /&gt;
   }&lt;br /&gt;
   return(info);&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
double **matrix(int n,int m) {&lt;br /&gt;
   double **a = new double * [n];&lt;br /&gt;
   a[0] = new double [n*m];&lt;br /&gt;
&lt;br /&gt;
   for (int i=1; i&amp;lt;n; i++)&lt;br /&gt;
         a[i] = &amp;amp;a[0][i*m];&lt;br /&gt;
&lt;br /&gt;
   return a;&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
void free_matrix(double **a) {&lt;br /&gt;
   delete[] a[0];&lt;br /&gt;
   delete[] a;&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===HW4===&lt;br /&gt;
&lt;br /&gt;
''Assignment 1''&lt;br /&gt;
&lt;br /&gt;
Trigonometric interpolation uses a n point Fourier series to find values at intermediate points. It is one way of downscaling data, and was a motivation for Gauss, to be applied to planetary motion.&lt;br /&gt;
&lt;br /&gt;
The way it works is:&lt;br /&gt;
&lt;br /&gt;
# You fourier-transform your data&lt;br /&gt;
# You add frequecies above the Nyquist frequency (in absolute values), but set all the amplitudes of the new frequencies to zero.&lt;br /&gt;
# Note that the frequencies are stored such that eg. f&amp;lt;sub&amp;gt;n-1&amp;lt;/sub&amp;gt; is a low frequency -1.&lt;br /&gt;
# The resulting 2n array can be back transformed, and now gives an interpolated signal.&lt;br /&gt;
&lt;br /&gt;
For this assignment, write an application that will read in an image from a binary file into a 2d double precision array (this will require converting from bytes to doubles), and creates an image twice the size in all directions using trigonometric interpolation. Use a real-to-half-complex version of the fftw (note: in 2d, this version of the fftw mixes fourier components with the same physical magnitude of their wave number k, so this will work).&lt;br /&gt;
You can process the red, green and blue values separately. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
''Assignment 2''&lt;br /&gt;
&lt;br /&gt;
Write an application which reads an image and performs a low pass filter on the image, i.e., any fourier components with magnitudes k larger than n/8 are to be set to zero, after which the fourier inverse is taken and the image is written out to disk again. Use the same fft technique as in the first assignment.&lt;br /&gt;
&lt;br /&gt;
'''Input image'''&lt;br /&gt;
&lt;br /&gt;
Use [[Media:gauss256.tgz|this image of Gauss]].&lt;br /&gt;
&lt;br /&gt;
'''Image format:'''&lt;br /&gt;
&lt;br /&gt;
Use the following simple PPM format:&lt;br /&gt;
&lt;br /&gt;
First line (ascii): 'P6\n'&amp;lt;br&amp;gt;&lt;br /&gt;
Second line, in ascii, 'width height\n'&amp;lt;br&amp;gt;&lt;br /&gt;
Third line, in ascii, 'maxcolorvalue\n' (this is typically just 255)&amp;lt;br&amp;gt;&lt;br /&gt;
Following that, in binary, are byte-triplets with the red, green and blue values of each pixel.&amp;lt;br&amp;gt;&lt;br /&gt;
Note: in C, the 'unsigned char' data type matches the concept of a byte best (for most machines anyway).&lt;br /&gt;
&lt;br /&gt;
In fact, between the first and second line, one can have comment lines that start with '#'.&lt;br /&gt;
&lt;br /&gt;
=Part 3: High Performance Scientific Computing=&lt;br /&gt;
&lt;br /&gt;
==Prerequisites==&lt;br /&gt;
&lt;br /&gt;
Part 1 or good c++ programming skills, including make and unix/linux prompt experience.&lt;br /&gt;
&lt;br /&gt;
'''Software that you'll need'''&lt;br /&gt;
&lt;br /&gt;
You will need to bring a laptop with a ssh facility. Hands-on parts will be done on SciNet's GPC cluster.&lt;br /&gt;
&lt;br /&gt;
For those who don't have a SciNet account yet, the instructions can be found at http://wiki.scinethpc.ca/wiki/index.php/Essentials#Accounts&lt;br /&gt;
&lt;br /&gt;
==Dates==&lt;br /&gt;
March 19, 21, 26, and 28, 2013&amp;lt;br&amp;gt;&lt;br /&gt;
April 2, 4, 9, and 11, 2013&lt;br /&gt;
&lt;br /&gt;
==Topics==&lt;br /&gt;
===''Lecture 1:'' Introduction to Parallel Programming ===&lt;br /&gt;
:::[[File:Lecture17-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture17-2013/lecture17-2013.html]]&lt;br /&gt;
:::[[Media:Lecture17-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture17-2013/lecture17-2013.mp4 Video recording]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 2:'' Parallel Computing Paradigms ===&lt;br /&gt;
&lt;br /&gt;
:::[[File:Lecture18-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture18-2013/lecture18-2013.html]]&lt;br /&gt;
:::[[Media:Lecture18-2013.pdf|Slides]] &amp;amp;nbsp;/ &amp;amp;nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture18-2013/lecture18-2013.mp4 Video recording] &amp;amp;nbsp;/ &amp;amp;nbsp; [[#HW1_3|homework 1]]&lt;br /&gt;
&lt;br /&gt;
===''Lectures 3,4:''  Shared Memory Programming with OpenMP, part 1,2===&lt;br /&gt;
&lt;br /&gt;
:::[[Media:Lecture19-2013.pdf|Slides]]&lt;br /&gt;
&lt;br /&gt;
===''Lecture 5:'' Distributed Parallel Programming with MPI, part 1===&lt;br /&gt;
&lt;br /&gt;
:::[[Media:Lecture21-2013.pdf|Slides]]&lt;br /&gt;
&lt;br /&gt;
''Lecture 6''&amp;amp;nbsp;&amp;amp;nbsp; Distributed Parallel Programming with MPI, part 2&amp;lt;br&amp;gt;&lt;br /&gt;
''Lecture 7''&amp;amp;nbsp;&amp;amp;nbsp; Distributed Parallel Programming with MPI, part 3&amp;lt;br&amp;gt;&lt;br /&gt;
''Lecture 8''&amp;amp;nbsp;&amp;amp;nbsp; Hybrid OpenMPI+MPI Programming&lt;br /&gt;
&lt;br /&gt;
== Homework assignments ==&lt;br /&gt;
&lt;br /&gt;
=== HW1 ===&lt;br /&gt;
&lt;br /&gt;
* Read the SciNet tutorial (as it pertains to the GPC)&lt;br /&gt;
* Read the GPC Quick Start.&lt;br /&gt;
* Get the first set of code:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
   $ cd $SCRATCH&lt;br /&gt;
   $ git clone /scinet/course/sc3/homework1&lt;br /&gt;
   $ cd homework1&lt;br /&gt;
   $ source setup&lt;br /&gt;
   $ make&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
*This contains threaded program 'blurppm' and 266 ppm images to be blurred. Usage:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
  blurppm INPUTPPM OUTPUTPPM BLURRADIUS NUMBEROFTHREADS&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
* Simple test:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
  $ qsub -l nodes=1:ppn=8,walltime=2:00:00 -I -X -qdebug&lt;br /&gt;
  $ cd $SCRATCH/homework1&lt;br /&gt;
  $ time blurppm 001.ppm new001.ppm 30 1&lt;br /&gt;
  real  0m52.900s&lt;br /&gt;
  user  0m52.881s&lt;br /&gt;
  sys   0m0.008s&lt;br /&gt;
  $ display 001.ppm &amp;amp;&lt;br /&gt;
  $ display new001.ppm &amp;amp;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
''Assignment 1''&lt;br /&gt;
* Time blurppm with a BLURRADIUS ranging from 1 to 41 in steps of 4, and for NUMBEROFTHREADS ranging from 1 to 16.  Record the (real) duration of each run.&lt;br /&gt;
* Plot the duration as a function of NUMBEROFTHREADS, as well as  the speed-up and efficiency.&lt;br /&gt;
* Submit script and plots of the duration, speedup and effiency as a function of NUMBEROFTHREADS.&lt;br /&gt;
''Assignment 2''&lt;br /&gt;
* Use GNU parallel to run blurppm on all 266 images with a radius of 41.&lt;br /&gt;
* Investigate different scenarios:&lt;br /&gt;
:# Have GNU parallel run 16 at a time with just 1 thread.&lt;br /&gt;
:# Have GNU parallel run 8 at a time with 2 threads.&lt;br /&gt;
:# Have GNU parallel run 4 at a time with 4 threads.&lt;br /&gt;
:# Have GNU parallel run 2 at a time with 8 threads.&lt;br /&gt;
:# Have GNU parallel run 1 at a time with 16 threads.&lt;br /&gt;
:Record the total time it takes in each of these scenarios.&lt;br /&gt;
* Repeat this with a BLURRADIUS of 3.&lt;br /&gt;
* Submit scripts, timing data  and plots.&lt;br /&gt;
&lt;br /&gt;
=Links=&lt;br /&gt;
&lt;br /&gt;
==Unix==&lt;br /&gt;
* Cygwin: http://www.cygwin.com&lt;br /&gt;
* Linux Command Line: A Primer (June 2012) [[Media:SS_IntroToShell.pdf|Slides,]] [[Media:SS_IntroToShell.tgz|Files]]&lt;br /&gt;
* Intro to unix shell from software carpentry: http://software-carpentry.org/4_0/shell&lt;br /&gt;
&lt;br /&gt;
==C/C++==&lt;br /&gt;
* [[One-Day Scientific C++ Class]] at SciNet&lt;br /&gt;
* C++ library reference: http://www.cplusplus.com/reference&lt;br /&gt;
* C preprocessor: http://www.cprogramming.com/tutorial/cpreprocessor.html&lt;br /&gt;
* Boost: http://www.boost.org&lt;br /&gt;
* Boost Python tutorial: http://www.boost.org/doc/libs/1_53_0/libs/python/doc/tutorial/doc/html/index.html&lt;br /&gt;
&lt;br /&gt;
==Git==&lt;br /&gt;
* Git: http://git-scm.com&lt;br /&gt;
* Version Control: [http://support.scinet.utoronto.ca/CourseVideo/PPPcourse/Thursday_Morning_BP_Revision_Control/Thursday_Morning_BP_Revision_Control.mp4 Video]/ [[Media:Snug_techtalk_revcontrol.pdf | Slides]]&lt;br /&gt;
* Git cheat sheet from Git Tower: http://www.git-tower.com/files/cheatsheet/Git_Cheat_Sheet_grey.pdf&lt;br /&gt;
&lt;br /&gt;
==Python==&lt;br /&gt;
* Python: http://www.python.org&lt;br /&gt;
* IPython: http://ipython.org&lt;br /&gt;
* Matplotlib: http://www.matplotlib.org&lt;br /&gt;
* Enthought python distribution: http://www.enthought.com/products/edudownload.php&amp;lt;br/&amp;gt;&lt;br /&gt;
(this gives you numpy, matplotlib and ipython all installed in one fell swoop)&lt;br /&gt;
&lt;br /&gt;
* Intro to python from software carpentry: http://software-carpentry.org/4_0/python&lt;br /&gt;
* Tutorial on matplotlib: http://conference.scipy.org/scipy2011/tutorials.php#jonathan&lt;br /&gt;
* Npy file format: https://github.com/numpy/numpy/blob/master/doc/neps/npy-format.txt&lt;br /&gt;
* Boost Python tutorial: http://www.boost.org/doc/libs/1_53_0/libs/python/doc/tutorial/doc/html/index.html&lt;br /&gt;
&lt;br /&gt;
==ODEs==&lt;br /&gt;
* Integrators for particle based ODEs (i.e. molecular dynamics): http://www.chem.utoronto.ca/~rzon/simcourse/partmd.pdf. &amp;lt;br&amp;gt;'''Focus on 4.1.4 - 4.1.6 for practical aspects.'''&lt;br /&gt;
* Numerical algorithm to solve ODEs (General) in ''Numerical Recipes for C'': http://apps.nrbook.com/c/index.html Chapter 16&lt;br /&gt;
&lt;br /&gt;
==Interpolation (2D) ==&lt;br /&gt;
* Interpolation in ''Numerical Recipes for C'': http://apps.nrbook.com/c/index.html Pages 123-128&lt;br /&gt;
* Wikipedia pages on [http://en.wikipedia.org/wiki/Bilinear_interpolation Bilinear Interpolation] and [http://en.wikipedia.org/wiki/Bicubic_interpolation Bicubic Interpolation] are not bad either.&lt;br /&gt;
&lt;br /&gt;
==BLAS==&lt;br /&gt;
* [http://www.tacc.utexas.edu/tacc-projects/gotoblas2 gotoblas]&lt;br /&gt;
* [http://math-atlas.sourceforge.net/ ATLAS]&lt;br /&gt;
&lt;br /&gt;
==LAPACK==&lt;br /&gt;
* http://www.netlib.org/lapack&lt;br /&gt;
&lt;br /&gt;
==GSL==&lt;br /&gt;
* GNU Scientific Library: http://www.gnu.org/s/gsl&lt;br /&gt;
&lt;br /&gt;
==FFT==&lt;br /&gt;
* FFTW: http://www.fftw.org&lt;br /&gt;
&lt;br /&gt;
==Top500==&lt;br /&gt;
* TOP500 Supercomputing Sites: http://top500.org&lt;br /&gt;
&lt;br /&gt;
==OpenMP==&lt;br /&gt;
* OpenMP (open multi-processing) application programming interface for shared memory programming: http://openmp.org&lt;br /&gt;
&lt;br /&gt;
==GNU parallel==&lt;br /&gt;
* Official citation: O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login: The USENIX Magazine, February 2011:42-47.&lt;br /&gt;
* [[Media:Tech-talk-gnu-parallel.pdf|Slides of the SciNet TechTalk on Gnu Parallel (14 Nov 2012)]]&lt;br /&gt;
* The documentation for GNU parallel can be found at http://www.gnu.org/software/parallel/&lt;br /&gt;
* Its man page can be found here http://www.gnu.org/software/parallel/man.html&lt;br /&gt;
* The man page is also available on the GPC when the gnu-parallel module is loaded, with the command &amp;lt;code&amp;gt;$ man parallel&amp;lt;/code&amp;gt;. The man page contains options, such as how to make sure the output is not all scrambled, and examples.&lt;br /&gt;
&lt;br /&gt;
==SciNet==&lt;br /&gt;
&lt;br /&gt;
Anything on this wiki, really, but specifically:&lt;br /&gt;
* [[Essentials|SciNet Essentials]]&lt;br /&gt;
* [[GPC Quickstart]]&lt;br /&gt;
* [[Media:SciNet_Tutorial.pdf |SciNet User Tutorial]]&lt;br /&gt;
* [[Software and Libraries]]&lt;br /&gt;
&lt;br /&gt;
==Other Resources==&lt;br /&gt;
* [http://galileo.phys.virginia.edu/classes/551.jvn.fall01/goldberg.pdf What Every Computer Scientist Should Know About Floating-Point Arithmetic] - the classic (and extremely comprehensive) overview of the basics of floating point math.   The first few pages, in particular, are very useful.&lt;br /&gt;
* [http://arxiv.org/abs/1005.4117 Random Numbers In Scientific Computing: An Introduction] by Katzgraber.   A very lucid discussion of pseudo random number generators for science.&lt;/div&gt;</summary>
		<author><name>Ljdursi</name></author>
	</entry>
</feed>