<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en-GB">
	<id>https://oldwiki.scinet.utoronto.ca/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Cneale</id>
	<title>oldwiki.scinet.utoronto.ca - User contributions [en-gb]</title>
	<link rel="self" type="application/atom+xml" href="https://oldwiki.scinet.utoronto.ca/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Cneale"/>
	<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php/Special:Contributions/Cneale"/>
	<updated>2026-05-18T12:42:57Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.35.12</generator>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=FAQ&amp;diff=5327</id>
		<title>FAQ</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=FAQ&amp;diff=5327"/>
		<updated>2012-10-22T13:08:59Z</updated>

		<summary type="html">&lt;p&gt;Cneale: added gnuplot to the extras module faq listing so that this faq is returned on a gnuplot search&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__TOC__&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==The Basics==&lt;br /&gt;
===Who do I contact for support?===&lt;br /&gt;
&lt;br /&gt;
Who do I contact if I have problems or questions about how to use the SciNet systems?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
E-mail [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;]  &lt;br /&gt;
&lt;br /&gt;
In your email, please include the following information:&lt;br /&gt;
&lt;br /&gt;
* your username on SciNet&lt;br /&gt;
* the cluster that your question pertains to (GPC or TCS; SciNet is not a cluster!),&lt;br /&gt;
* any relevant error messages&lt;br /&gt;
* the commands you typed before the errors occured&lt;br /&gt;
* the path to your code (if applicable)&lt;br /&gt;
* the location of the job scripts (if applicable)&lt;br /&gt;
* the directory from which it was submitted (if applicable)&lt;br /&gt;
* a description of what it is supposed to do (if applicable)&lt;br /&gt;
* if your problem is about connecting to SciNet, the type of computer you are connecting from.&lt;br /&gt;
&lt;br /&gt;
Note that your password should never, never, never be to sent to us, even if your question is about your account.&lt;br /&gt;
&lt;br /&gt;
Try to avoid sending email only to specific individuals at SciNet. Your chances of a quick reply increase significantly if you email our team!&lt;br /&gt;
&lt;br /&gt;
===What does ''code scaling'' mean?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please see [[Introduction_To_Performance#Parallel_Speedup|A Performance Primer]]&lt;br /&gt;
&lt;br /&gt;
===What do you mean by ''throughput''?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please see [[Introduction_To_Performance#Throughput|A Performance Primer]].&lt;br /&gt;
&lt;br /&gt;
Here is a simple example:&lt;br /&gt;
&lt;br /&gt;
Suppose you need to do 10 computations.  Say each of these runs for&lt;br /&gt;
1 day on 8 cores, but they take &amp;quot;only&amp;quot; 18 hours on 16 cores.  What is the&lt;br /&gt;
fastest way to get all 10 computations done - as 8-core jobs or as&lt;br /&gt;
16-core jobs?  Let us assume you have 2 nodes at your disposal.&lt;br /&gt;
The answer, after some simple arithmetic, is that running your 10&lt;br /&gt;
jobs as 8-core jobs will take 5 days, whereas if you ran them&lt;br /&gt;
as 16-core jobs it would take 7.5 days.  Take your own conclusions...&lt;br /&gt;
&lt;br /&gt;
===I changed my .bashrc/.bash_profile and now nothing works===&lt;br /&gt;
&lt;br /&gt;
The default startup scripts provided by SciNet, and guidelines for them, can be found [[Important_.bashrc_guidelines|here]].  Certain things - like sourcing &amp;lt;tt&amp;gt;/etc/profile&amp;lt;/tt&amp;gt;&lt;br /&gt;
and &amp;lt;tt&amp;gt;/etc/bashrc&amp;lt;/tt&amp;gt; are ''required'' for various SciNet routines to work!   &lt;br /&gt;
&lt;br /&gt;
If the situation is so bad that you cannot even log in, please send email [mailto:support@scinet.utoronto.ca support].&lt;br /&gt;
&lt;br /&gt;
===Could I have my login shell changed to (t)csh?===&lt;br /&gt;
&lt;br /&gt;
The login shell used on our systems is bash. While the tcsh is available on the GPC and the TCS, we do not support it as the default login shell at present.  So &amp;quot;chsh&amp;quot; will not work, but you can always run tcsh interactively. Also, csh scripts will be executed correctly provided that they have the correct &amp;quot;shebang&amp;quot; &amp;lt;tt&amp;gt;#!/bin/tcsh&amp;lt;/tt&amp;gt; at the top.&lt;br /&gt;
&lt;br /&gt;
===How can I run Matlab / IDL / Gaussian / my favourite commercial software at SciNet?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Because SciNet serves such a disparate group of user communities, there is just no way we can buy licenses for everyone's commercial package.   The only commercial software we have purchased is that which in principle can benefit everyone -- fast compilers and math libraries (Intel's on GPC, and IBM's on TCS).&lt;br /&gt;
&lt;br /&gt;
If your research group requires a commercial package that you already have or are willing to buy licenses for, contact us at [mailto:support@scinet.utoronto.ca support@scinet] and we can work together to find out if it is feasible to implement the packages licensing arrangement on the SciNet clusters, and if so, what is the the best way to do it.&lt;br /&gt;
&lt;br /&gt;
Note that it is important that you contact us before installing commercially licensed software on SciNet machines, even if you have a way to do it in your own directory without requiring sysadmin intervention.   It puts us in a very awkward position if someone is found to be running unlicensed or invalidly licensed software on our systems, so we need to be aware of what is being installed where.&lt;br /&gt;
&lt;br /&gt;
===Do you have a recommended ssh program that will allow scinet access from Windows machines?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
The [[Ssh#SSH_for_Windows_Users | SSH for Windows users]] programs we recommend are:&lt;br /&gt;
&lt;br /&gt;
* [http://www.chiark.greenend.org.uk/~sgtatham/putty/ PuTTY]  - this is a terminal for windows that connects via ssh.  It is a quick install and will get you up and running quickly.&amp;lt;br&amp;gt;To set up your passphrase protected ssh key with putty, see [http://the.earth.li/~sgtatham/putty/0.61/htmldoc/Chapter8.html#pubkey here].&lt;br /&gt;
* [http://www.cygwin.com/ CygWin] - this is a whole linux-like environment for windows, which also includes an X window server so that you can display remote windows on your desktop.  Make sure you include the openssh and X window system in the installation for full functionality.  This is recommended if you will be doing a lot of work on Linux machines, as it makes a very similar environment available on your computer.&amp;lt;br&amp;gt;To set up your ssh keys, following the Linux instruction on the [[Ssh keys]] page.&lt;br /&gt;
* [http://mobaxterm.mobatek.net/en/ MobaXterm] is a tabbed ssh client with some Cygwin tools, including ssh and X, all wrapped up into one executable.&amp;lt;br&amp;gt;To set up your ssh keys, following the Linux instruction on the [[Ssh keys]] page.&lt;br /&gt;
&lt;br /&gt;
===My ssh key does not work! WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
[[Ssh_keys#Testing_Your_Key | Testing Your Key]]&lt;br /&gt;
&lt;br /&gt;
* If this doesn't work, you should be able to login using your password, and investigate the problem. For example, if during a login session you get an message similar to the one below, just follow the instruction and delete the offending key on line 3 (you can use vi to jump to that line with ESC plus : plus 3). That only means that you may have logged in from your home computer to SciNet in the past, and that key is obsolete.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh USERNAME@login.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@**@@@@@@@@@@@@@@@@@@@@@@@@@@@@@&lt;br /&gt;
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @&lt;br /&gt;
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@**@@@@@@@@@@@@@@@@@@@@@@@@@@@@@&lt;br /&gt;
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!&lt;br /&gt;
Someone could be eavesdropping on you right now (man-in-the-middle&lt;br /&gt;
attack)!&lt;br /&gt;
It is also possible that the RSA host key has just been changed.&lt;br /&gt;
The fingerprint for the RSA key sent by the remote host is&lt;br /&gt;
53:f9:60:71:a8:0b:5d:74:83:52:**fe:ea:1a:9e:cc:d3.&lt;br /&gt;
Please contact your system administrator.&lt;br /&gt;
Add correct host key in /home/&amp;lt;user&amp;gt;/.ssh/known_hosts to get rid of&lt;br /&gt;
this message.&lt;br /&gt;
Offending key in /home/&amp;lt;user&amp;gt;/.ssh/known_hosts:3&lt;br /&gt;
RSA host key for login.scinet.utoronto.ca &lt;br /&gt;
&amp;lt;http://login.scinet.utoronto.ca &amp;lt;http://login.scinet.utoronto.ca&amp;gt;&amp;gt; has&lt;br /&gt;
changed and you have requested&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* If you get the message below you may need to logout of your gnome session and log back in since the ssh-agent needs to be&lt;br /&gt;
restarted with the new passphrase ssh key.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh USERNAME@login.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
Agent admitted failure to sign using the key.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Can't forward X:  &amp;quot;Warning: No xauth data; using fake authentication data&amp;quot;, or &amp;quot;X11 connection rejected because of wrong authentication.&amp;quot;===&lt;br /&gt;
&lt;br /&gt;
I used to be able to forward X11 windows from SciNet to my home machine, but now I'm getting these messages; what's wrong?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
This very likely means that ssh/xauth can't update your ${HOME}/.Xauthority file. &lt;br /&gt;
&lt;br /&gt;
The simplest pssible reason for this is that you've filled your 10GB /home quota and so can't write anything to your home directory.   Use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load extras&lt;br /&gt;
$ diskUsage&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
&lt;br /&gt;
to check to see how close you are to your disk usage on ${HOME}.&lt;br /&gt;
&lt;br /&gt;
Alternately, this could mean your .Xauthority file has become broken/corrupted/confused some how, in which case you can delete that file, and when you next log in you'll get a similar warning message involving creating .Xauthority, but things should work.&lt;br /&gt;
&lt;br /&gt;
===How come I can not login to TCS?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
A SciNet account doesn't automatically entitle you to TCS access. At a minimum, TCS jobs need to run on at least 32 cores (64 preferred because of Simultaneous Multi Threading - [[TCS_Quickstart#Node_configuration|SMT]] - on these nodes) and need the large memory (4GB/core) and bandwidth on the system. Essentially you need to be able to explain why the work can't be done on the GPC.&lt;br /&gt;
&lt;br /&gt;
===How can I reset the password for my Compute Canada account?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
You can reset your password for your Compute Canada account here:&lt;br /&gt;
&lt;br /&gt;
https://ccdb.computecanada.org/security/forgot&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===How can I change or reset the password for my SciNet account?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
To reset your password at SciNet please e-mail [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;]&lt;br /&gt;
&lt;br /&gt;
If you know your old password and want to change it, that can be done here:&lt;br /&gt;
&lt;br /&gt;
https://portal.scinet.utoronto.ca/&lt;br /&gt;
&lt;br /&gt;
===Why am I getting the error &amp;quot;Permission denied (publickey,gssapi-with-mic,password)&amp;quot;?===&lt;br /&gt;
&lt;br /&gt;
This error can pop up in a variety of situations: when trying to log in, or when after a job has finished, when the error and output files fail to be copied (there are other possible reasons for this failure as well -- see [[FAQ#My_GPC_job_died.2C_telling_me_.60Copy_Stageout_Files_Failed.27|My GPC job died, telling me:Copy Stageout Files Failed]]).&lt;br /&gt;
In most cases, the &amp;quot;Permission denioed&amp;quot; error is caused by incorrect permission of the (hidden) .ssh directory. Ssh is used for logging in as well as for the copying of the standard error and output files after a job. &lt;br /&gt;
&lt;br /&gt;
For security reasons, &lt;br /&gt;
the directory .ssh should only be writable and readable to you, but yours &lt;br /&gt;
has read permission for everybody, and thus it fails.  You can change &lt;br /&gt;
this by&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   chmod 700 ~/.ssh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
And to be sure, also do&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   chmod 600 ~/.ssh/id_rsa ~/.ssh/id_rsa.pub ~/authorized_keys&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===ERROR:102: Tcl command execution failed? when loading modules ===&lt;br /&gt;
Modules sometimes require other modules to be loaded first.&lt;br /&gt;
Module will let you know if you didn’t.&lt;br /&gt;
For example:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module purge&lt;br /&gt;
$ module load python&lt;br /&gt;
python/2.6.2(11):ERROR:151: Module ’python/2.6.2’ depends on one of the module(s) ’gcc/4.4.0’&lt;br /&gt;
python/2.6.2(11):ERROR:102: Tcl command execution failed: prereq gcc/4.4.0&lt;br /&gt;
$ gpc-f103n084-$ module load gcc python&lt;br /&gt;
$&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Compiling your Code==&lt;br /&gt;
&lt;br /&gt;
===How can I get g77 to work?===&lt;br /&gt;
&lt;br /&gt;
The fortran 77 compilers on the GPC are ifort and gfortran. We have dropped support for g77.  This has been a conscious decision. g77 (and the associated library libg2c) were completely replaced six years ago (Apr 2005) by the gcc 4.x branch, and haven't undergone any updates at all, even bug fixes, for over five years.  &lt;br /&gt;
If we would install g77 and libg2c, we would have to deal with the inevitable confusion caused when users accidentally link against the old, broken, wrong versions of the gcc libraries instead of the correct current versions.   &lt;br /&gt;
&lt;br /&gt;
If your code for some reason specifically requires five-plus-year-old libraries,  availability, compatibility, and unfixed-known-bug problems are only going to get worse for you over time, and this might be as good an opportunity as any to address those issues. &lt;br /&gt;
&lt;br /&gt;
''A note on porting to gfortran or ifort:''&lt;br /&gt;
&lt;br /&gt;
While gfortran and ifort are rather compatible with g77, one &lt;br /&gt;
important difference is that by default, gfortran does not preserve &lt;br /&gt;
local variables between function calls, while g77 does.   Preserved &lt;br /&gt;
local variables are for instance often used in implementations of quasi-random number &lt;br /&gt;
generators.  Proper fortran requires to declare such variables as SAVE &lt;br /&gt;
but not all old code does this.&lt;br /&gt;
Luckily, you can change gfortran's default behavior with the flag &lt;br /&gt;
&amp;lt;tt&amp;gt;-fno-automatic&amp;lt;/tt&amp;gt;.   For ifort, the corresponding flag is &amp;lt;tt&amp;gt;-noautomatic&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
===Where is libg2c.so?===&lt;br /&gt;
&lt;br /&gt;
libg2c.so is part of the g77 compiler, for which we dropped support. See [[#How can I get g77 to work on the GPC?]] for our reasons.&lt;br /&gt;
&lt;br /&gt;
===Autoparallelization does not work!===&lt;br /&gt;
&lt;br /&gt;
I compiled my code with the &amp;lt;tt&amp;gt;-qsmp=omp,auto&amp;lt;/tt&amp;gt; option, and then I specified that it should be run with 64 threads - with &lt;br /&gt;
 export OMP_NUM_THREADS=64&lt;br /&gt;
&lt;br /&gt;
However, when I check the load using &amp;lt;tt&amp;gt;llq1 -n&amp;lt;/tt&amp;gt;, it shows a load on the node of 1.37.  Why?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Using the autoparallelization will only get you so far.  In fact, it usually does not do too much.  What is helpful is to run the compiler with the &amp;lt;tt&amp;gt;-qreport&amp;lt;/tt&amp;gt; option, and then read the output listing carefully to see where the compiler thought it could parallelize, where it could not, and the reasons for this.  Then you can go back to your code and carefully try to address each of the issues brought up by the compiler.&lt;br /&gt;
We ''emphasize'' that this is just a rough first guide, and that the compilers are still not magical!   For more sophisticated approaches to parallelizing your code, email us at [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;]  to set up an appointment with one&lt;br /&gt;
of our technical analysts.&lt;br /&gt;
&lt;br /&gt;
===How do I link against the Intel Math Kernel Library?===&lt;br /&gt;
&lt;br /&gt;
If you need to link in the Intel Math Kernel Library (MKL) libraries, you are well advised to use the Intel(R) Math Kernel Library Link Line Advisor: http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/ for help in devising the list of libraries to link with your code.&lt;br /&gt;
&lt;br /&gt;
'''''Note that this give the link line for the command line. When using this in Makefiles, replace $MKLPATH by ${MKLPATH}.'''''&lt;br /&gt;
&lt;br /&gt;
'''''Note too that, unless the integer arguments you will be passing to the MKL libraries are actually 64-bit integers, rather than the normal int or INTEGER types, you want to specify 32-bit integers (lp64) .'''''&lt;br /&gt;
&lt;br /&gt;
===Can the compilers on the login nodes be disabled to prevent accidentally using them?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
You can accomplish this by modifying your .bashrc to not load the compiler modules. See [[Important .bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
===&amp;quot;relocation truncated to fit: R_X86_64_PC32&amp;quot;: Huh?===&lt;br /&gt;
&lt;br /&gt;
What does this mean, and why can't I compile this code?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Welcome to the joys of the x86 architecture!  You're probably having trouble building arrays larger than 2GB, individually or together.   Generally, you have to try to use the medium or large x86 `memory model'.   For the intel compilers, this is specified with the compile options&lt;br /&gt;
&lt;br /&gt;
  -mcmodel=medium -shared-intel&lt;br /&gt;
&lt;br /&gt;
===&amp;quot;feupdateenv is not implemented and will always fail&amp;quot;===&lt;br /&gt;
&lt;br /&gt;
How do I get rid of this and what does it mean?&lt;br /&gt;
 &lt;br /&gt;
'''Answer:'''&lt;br /&gt;
First note that, as ominous as it sounds, this is really just a warning, and has to do with the intel math library. You can ignore it (unless you really are trying to manually change the exception handlers for floating point exceptions such as divide by zero), or take the safe road and get rid off it by linking with the intel math functions library:&amp;lt;pre&amp;gt;-limf&amp;lt;/pre&amp;gt;See also [[#How do I link against the Intel Math Kernel Library?]]&lt;br /&gt;
&lt;br /&gt;
===Cannot find rdmacm library when compiling on GPC===&lt;br /&gt;
&lt;br /&gt;
I get the following error building my code on GPC: &amp;quot;&amp;lt;tt&amp;gt;ld: cannot find -lrdmacm&amp;lt;/tt&amp;gt;&amp;quot;.  Where can I find this library?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
This library is part of the MPI libraries; if your compiler is having problems picking it up, it probably means you are mistakenly trying to compile on the login nodes (scinet01..scinet04).  The login nodes aren't part of the GPC; they are for logging into the data centre only.  From there you must go to the GPC or TCS development nodes to do any real work.&lt;br /&gt;
&lt;br /&gt;
=== Why do I get this error when I try to compile: &amp;quot;icpc: error #10001: could not find directory in which /usr/bin/g++41 resides&amp;quot; ?===&lt;br /&gt;
&lt;br /&gt;
You are trying to compile on the login nodes.   As described in the wiki ( https://support.scinet.utoronto.ca/wiki/index.php/GPC_Quickstart#Login ), or in the users guide you would have received with your account,   Scinet supports two main clusters, with very different architectures.  Compilation must be done on the development nodes of the appropriate cluster (in this case, gpc01-04).   Thus, log into gpc01, gpc02, gpc03, or gpc04, and compile from there.&lt;br /&gt;
&lt;br /&gt;
==Testing your Code==&lt;br /&gt;
&lt;br /&gt;
=== Can I run a something for a short time on the development nodes? ===&lt;br /&gt;
&lt;br /&gt;
I am in the process of playing around with the mpi calls in my code to get it to work. I do a lot of tests and each of them takes a couple of seconds only.  Can I do this on the development nodes?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Yes, as long as it's very brief (a few minutes).   People use the development nodes&lt;br /&gt;
for their work, and you don't want to bog it down for people, and testing a real&lt;br /&gt;
code can chew up a lot more resources than compiling, etc.    The procedures differ&lt;br /&gt;
depending on what machine you're using.&lt;br /&gt;
&lt;br /&gt;
==== TCS ====&lt;br /&gt;
&lt;br /&gt;
On the TCS you can run small MPI jobs on the tcs02 node, which is meant for &lt;br /&gt;
development use.  But even for this test run on one node, you'll need a host file --&lt;br /&gt;
a list of hosts (in this case, all tcs-f11n06, which is the `real' name of tcs02)&lt;br /&gt;
that the job will run on.  Create a file called `hostfile' containing the following:&lt;br /&gt;
&lt;br /&gt;
 tcs-f11n06&lt;br /&gt;
 tcs-f11n06&lt;br /&gt;
 tcs-f11n06&lt;br /&gt;
 tcs-f11n06&lt;br /&gt;
&lt;br /&gt;
for a 4-task run.  When you invoke &amp;quot;poe&amp;quot; or &amp;quot;mpirun&amp;quot;, there are runtime&lt;br /&gt;
arguments that you specify pointing to this file.  You can also specify it&lt;br /&gt;
in an environment variable MP_HOSTFILE, so, if your file is in your /scratch directory, say &lt;br /&gt;
${SCRATCH}/hostfile, then you would do&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
 export MP_HOSTFILE=${SCRATCH}/hostfile&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
in your shell.  You will also need to create a &amp;lt;tt&amp;gt;.rhosts&amp;lt;/tt&amp;gt; file in your &lt;br /&gt;
home director, again listing &amp;lt;tt&amp;gt;tcs-f11n06&amp;lt;/tt&amp;gt; so that &amp;lt;tt&amp;gt;poe&amp;lt;/tt&amp;gt;&lt;br /&gt;
can start jobs.   After that you can simply run your program.  You can use&lt;br /&gt;
mpiexec:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
 mpiexec -n 4 my_test_program&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
adding &amp;lt;tt&amp;gt; -hostfile /path/to/my/hostfile&amp;lt;/tt&amp;gt; if you did not set the environment&lt;br /&gt;
variable above.  Alternatively, you can run it with the poe command (do a &amp;quot;man poe&amp;quot; for details), or even by&lt;br /&gt;
just directly running it.  In this case the number of MPI processes will by default&lt;br /&gt;
be the number of entries in your hostfile.&lt;br /&gt;
&lt;br /&gt;
==== GPC ====&lt;br /&gt;
&lt;br /&gt;
On the GPC one can run short test jobs on the GPC [[GPC_Quickstart#Compile.2FDevel_Nodes | development nodes ]]&amp;lt;tt&amp;gt;gpc01&amp;lt;/tt&amp;gt;..&amp;lt;tt&amp;gt;gpc04&amp;lt;/tt&amp;gt;;&lt;br /&gt;
if they are single-node jobs (which they should be) they don't need a hostfile.  Even better, though, is to request an [[ Moab#Interactive | interactive ]] job and run the tests either in regular batch queue or using a short high availability [[ Moab#debug | debug ]] queue that is reserved for this purpose.&lt;br /&gt;
&lt;br /&gt;
=== How do I run a longer (but still shorter than an hour) test job quickly ? ===&lt;br /&gt;
&lt;br /&gt;
'''Answer'''&lt;br /&gt;
&lt;br /&gt;
On the GPC there is a high turnover short queue called [[ Moab#debug | debug ]] that is designed for&lt;br /&gt;
this purpose.  You can use it by adding &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#PBS -q debug&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
to your submission script.&lt;br /&gt;
&lt;br /&gt;
==Running your jobs==&lt;br /&gt;
&lt;br /&gt;
===My job can't write to /home===&lt;br /&gt;
&lt;br /&gt;
My code works fine when I test on the development nodes, but when I submit a job, or even run interactively in the development queue on GPC, it fails.  What's wrong?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
As [[Data_Management#Home_Disk_Space | discussed]] [https://support.scinet.utoronto.ca/wiki/images/5/54/SciNet_Tutorial.pdf elsewhere], &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; is mounted read-only on the compute nodes; you can only write to &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; from the login nodes and devel nodes.  (The [[GPC_Quickstart#128Glargemem | largemem nodes]] on GPC, in this respect, are more like devel nodes than compute nodes).   In general, to run jobs you can read from &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; but you'll have to write to &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt; (or, if you were allocated space through the LRAC/NRAC process, on &amp;lt;tt&amp;gt;/project&amp;lt;/tt&amp;gt;).  More information on SciNet filesytems can be found on our [[Data_Management | Data Management]] page.&lt;br /&gt;
&lt;br /&gt;
===OpenMP on the TCS===&lt;br /&gt;
&lt;br /&gt;
How do I run an OpenMP job on the TCS?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please look at the [[TCS_Quickstart#Submission_Script_for_an_OpenMP_Job | TCS Quickstart ]] page.&lt;br /&gt;
&lt;br /&gt;
===Can I can use hybrid codes consisting of MPI and openMP on the GPC?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Yes. Please look at the [[GPC_Quickstart#Hybrid_MPI.2FOpenMP_jobs | GPC Quickstart ]] page.&lt;br /&gt;
&lt;br /&gt;
===How do I run serial jobs on GPC?===&lt;br /&gt;
&lt;br /&gt;
'''Answer''':&lt;br /&gt;
&lt;br /&gt;
So it should be said first that SciNet is a parallel computing resource, &lt;br /&gt;
and our priority will always be parallel jobs.   Having said that, if &lt;br /&gt;
you can make efficient use of the resources using serial jobs and get &lt;br /&gt;
good science done, that's good too, and we're happy to help you.&lt;br /&gt;
&lt;br /&gt;
The GPC nodes each have 8 processing cores, and making efficient use of these &lt;br /&gt;
nodes means using all eight cores.  As a result, we'd like to have the &lt;br /&gt;
users take up whole nodes (eg, run multiples of 8 jobs) at a time.  &lt;br /&gt;
&lt;br /&gt;
It depends on the nature of your job what the best strategy is. Several approaches are presented on the [[User_Serial|serial run wiki page]].&lt;br /&gt;
&lt;br /&gt;
===Why can't I request only a single cpu for my job on GPC?===&lt;br /&gt;
&lt;br /&gt;
'''Answer''':&lt;br /&gt;
&lt;br /&gt;
On GPC, computers are allocated by the node - that is, in chunks of 8 processors.   If you want to run a job that requires only one processor, you need to bundle the jobs into groups of 8, so as to not be wasting the other 7 for 48 hours. See [[User_Serial|serial run wiki page]].&lt;br /&gt;
&lt;br /&gt;
===How do I run serial jobs on TCS?===&lt;br /&gt;
&lt;br /&gt;
'''Answer''': You don't.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===But in the queue I found a user who is running jobs on GPC, each of which is using only one processor, so why can't I?===&lt;br /&gt;
&lt;br /&gt;
'''Answer''':&lt;br /&gt;
&lt;br /&gt;
The pradat* and atlaspt* jobs, amongst others, are jobs of the ATLAS high energy physics project. That they are reported as single cpu jobs is an artifact of the moab scheduler. They are in fact being automatically bundled into 8-job bundles but have to run individually to be compatible with their international grid-based systems.&lt;br /&gt;
&lt;br /&gt;
===How do I use the ramdisk on GPC?===&lt;br /&gt;
&lt;br /&gt;
To use the ramdisk, create and read to / write from files in /dev/shm/.. just as one would to (eg) ${SCRATCH}. Only the amount of RAM needed to store the files will be taken up by the temporary file system; thus if you have 8 serial jobs each requiring 1 GB of RAM, and 1GB is taken up by various OS services, you would still have approximately 7GB available to use as ramdisk on a 16GB node. However, if you were to write 8 GB of data to the RAM disk, this would exceed available memory and your job would likely crash.&lt;br /&gt;
&lt;br /&gt;
It is very important to delete your files from ram disk at the end of your job. If you do not do this, the next user to use that node will have less RAM available than they might expect, and this might kill their jobs.&lt;br /&gt;
&lt;br /&gt;
''More details on how to setup your script to use the ramdisk can be found on the [[User_Ramdisk|Ramdisk wiki page]].''&lt;br /&gt;
&lt;br /&gt;
===How can I automatically resubmit a job?===&lt;br /&gt;
&lt;br /&gt;
Commonly you may have a job that you know will take longer to run than what is &lt;br /&gt;
permissible in the queue.  As long as your program contains [[Checkpoints|checkpoint]] or &lt;br /&gt;
restart capability, you can have one job automatically submit the next. In&lt;br /&gt;
the following example it is assumed that the program finishes before &lt;br /&gt;
the 48 hour limit and then resubmits itself by logging into one&lt;br /&gt;
of the development nodes.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque example submission script for auto resubmission&lt;br /&gt;
# SciNet GPC&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=48:00:00&lt;br /&gt;
#PBS -N my_job&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# YOUR CODE HERE&lt;br /&gt;
./run_my_code&lt;br /&gt;
&lt;br /&gt;
# RESUBMIT 10 TIMES HERE&lt;br /&gt;
num=$NUM&lt;br /&gt;
if [ $num -lt 10 ]; then&lt;br /&gt;
      num=$(($num+1))&lt;br /&gt;
      ssh gpc01 &amp;quot;cd $PBS_O_WORKDIR; qsub ./script_name.sh -v NUM=$num&amp;quot;;&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub script_name.sh -v&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can alternatively use [[ Moab#Job_Dependencies | Job dependencies ]] through the queuing system which will not start one job until another job has completed.&lt;br /&gt;
&lt;br /&gt;
If your job can't be made to automatically stop before the 48 hour queue window, but it does write out checkpoints, you can use the timeout command to stop the program while you still have time to resubmit; for instance&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
    timeout 2850m ./run_my_code argument1 argument2&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
will run the program for 47.5 hours (2850 minutes), and then send it SIGTERM to exit the program.&lt;br /&gt;
&lt;br /&gt;
===How can I pass in arguments to my submission script?===&lt;br /&gt;
&lt;br /&gt;
If you wish to make your scripts more generic you can use qsub's ability &lt;br /&gt;
to pass in environment variables to pass in arguments to your script.&lt;br /&gt;
The following example shows a case where an input and an output &lt;br /&gt;
file are passed in on the qsub line. Multiple variables can be &lt;br /&gt;
passed in using the qsub &amp;quot;-v&amp;quot; option and comma delimited. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque example of passing in arguments&lt;br /&gt;
# SciNet GPC&lt;br /&gt;
# &lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=48:00:00&lt;br /&gt;
#PBS -N my_job&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# YOUR CODE HERE&lt;br /&gt;
./run_my_code -f $INFILE -o $OUTFILE&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub script_name.sh -v INFILE=input.txt,OUTFILE=outfile.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== How can I run a job longer than 48 hours? ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
The SciNet queues have a queue limit of 48 hours.   This is pretty typical for systems of its size in Canada and elsewhere, and larger systems commonly have shorter limits.   The limits are there to ensure that every user gets a fair share of the system (so that no one user ties up lots of nodes for a long time), and for safety (so that if one memory board in one node fails in the middle of a very long job, you haven't lost a months' worth of work).&lt;br /&gt;
&lt;br /&gt;
Since many of us have simulations that require more than that much time, most widely-used scientific applications have &amp;quot;checkpoint-restart&amp;quot; functionality, where every so often the complete state of the calculation is stored as a checkpoint file, and one can restart a simulation from one of these.   In fact, these restart files tend to be quite useful for a number of purposes.&lt;br /&gt;
&lt;br /&gt;
If your job will take longer, you will have to submit your job in multiple parts, restarting from a checkpoint each time.  In this way, one can run a simulation much longer than the queue limit.  In fact, one can even write job scripts which automatically re-submit themselves until a run is completed, using [[FAQ#How_can_I_automatically_resubmit_a_job.3F | automatic resubmission. ]]&lt;br /&gt;
&lt;br /&gt;
=== Why did showstart say it would take 3 hours for my job to start before, and now it says my job will start in 10 hours? ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please look at the [[FAQ#How_do_priorities_work.2Fwhy_did_that_job_jump_ahead_of_mine_in_the_queue.3F | How do priorities work/why did that job jump ahead of mine in the queue? ]] page.&lt;br /&gt;
&lt;br /&gt;
===How do priorities work/why did that job jump ahead of mine in the queue?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
The [[Moab | queueing system]] used on SciNet machines is a [http://en.wikipedia.org/wiki/Priority_queue Priority Queue].  Jobs enter the queue at the back of the queue, and slowly make their way to the front as those ahead of them are run; but a job that enters the queue with a higher priority can `cut in line'.&lt;br /&gt;
&lt;br /&gt;
The main factor which determines priority is whether or not the user (or their PI) has an [http://www.scinet.utoronto.ca/support/Account_Allocations.htm LRAC or NRAC allocation].  These are competitively allocated grants of computer time; there is a call for proposals towards the end of every calendar year.    Users with an allocation have high priorities in an attempt to make sure that they can use the amount of computer time the committees granted them.   Their priority decreases as they approach their allotted usage over the current window of time; by the time that they have exhausted that allotted usage, their priority is the same as users with no allocation (unallocated, or `default' users).    Unallocated users have a fixed, low, priority.&lt;br /&gt;
&lt;br /&gt;
This priority system is called `fairshare'; the scheduler attempts to make sure everyone has their fair share of the machines, where the share that's fair has been determined by the allocation committee.    The fairshare window is a rolling window of two weeks; that is, any time you have a job in the queue, the fairshare calculation of its priority is given by how much of your allocation of the machine has been used in the last 14 days.&lt;br /&gt;
&lt;br /&gt;
A particular allocation might have some fraction of GPC - say 4% of the machine (if the PI had been allocated 10 million CPU hours on GPC). The allocations have labels; (called `Resource Allocation Proposal Identifiers', or RAPIs) they look something like&lt;br /&gt;
&lt;br /&gt;
  abc-123-ab&lt;br /&gt;
&lt;br /&gt;
where abc-123 is the PIs CCRI, and the suffix specifies which of the allocations granted to the PI is to be used.  These can be specified on a job-by-job basis.  On GPC, one adds the line&lt;br /&gt;
 #PBS -A RAPI&lt;br /&gt;
to your script; on TCS, one uses&lt;br /&gt;
 # @ account_no = RAPI&lt;br /&gt;
If the allocation to charge isn't specified, a default is used; each user has such a default, which can be changed at the same portal where one changes one's password, &lt;br /&gt;
&lt;br /&gt;
 https://portal.scinet.utoronto.ca/&lt;br /&gt;
&lt;br /&gt;
A jobs priority is determined primarily by the fairshare priority of the allocation it is being charged to; the previous 14 days worth of use under that allocation is calculated and compared to the allocated fraction (here, 5%) of the machine over that window (here, 14 days).   The fairshare priority is a decreasing function of the allocation left; if there is no allocation left (eg, jobs running under that allocation have already used 379,038 CPU hours in the past 14 days), the priority is the same as that of a user with no granted allocation.   (This last part has been the topic of some debate; as the machine gets more utilized, it will probably be the case that we allow RAC users who have greatly overused their quota to have their priorities to drop below that of unallocated users, to give the unallocated users some chance to run on our increasingly crowded system; this would have no undue effect on our allocated users as they still would be able to use the amount of resources they had been allocated by the committees.)   Note that all jobs charging the same allocation get the same fairshare priority.&lt;br /&gt;
&lt;br /&gt;
There are other factors that go into calculating priority, but fairshare is the most significant.   Other factors include&lt;br /&gt;
* amount of time waiting in queue (measured in units of the requested runtime).   A job that requests 1 hour in the queue and has been waiting 2 days will get a bump in its priority larger than a job that requests 2 days and has been waiting the same time.&lt;br /&gt;
* User adjustment of priorities ( See below ).&lt;br /&gt;
&lt;br /&gt;
The major effect of these subdominant terms is to shuffle the order of jobs running under the same allocation.&lt;br /&gt;
&lt;br /&gt;
===How do we manage job priorities within our research group?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Obviously, managing shared resources within a large group - whether it &lt;br /&gt;
is conference funding or CPU time - takes some doing.   &lt;br /&gt;
&lt;br /&gt;
It's important to note that the fairshare periods are intentionally kept &lt;br /&gt;
quite short - just two weeks long.   (These exact numbers subject to &lt;br /&gt;
change  as the year goes on and we better understand use patterns, but &lt;br /&gt;
they're unlikely to change radically).   So, for example, let us say that in your resource &lt;br /&gt;
allocation you have about 10% of the machine.   Then for someone to use &lt;br /&gt;
up the whole two week amount of time in 2 days, they'd have to use 70% &lt;br /&gt;
of the machine in those two days - which is unlikely to happen by &lt;br /&gt;
accident.  If that does happen,  &lt;br /&gt;
those using the same allocation as the person who used 70% of the &lt;br /&gt;
machine over the two days will suffer by having much lower priority for &lt;br /&gt;
their jobs, but only for the next 12 days - and even then, if there are &lt;br /&gt;
idle cpus they'll still be able to compute.&lt;br /&gt;
&lt;br /&gt;
There will be online tools for seeing how the allocation is being used, &lt;br /&gt;
and those people who are in charge in your group will be able to use &lt;br /&gt;
that information to manage the users, telling them to dial it down or &lt;br /&gt;
up.   We know that managing a large research group is hard, and we want &lt;br /&gt;
to make sure we provide you the information you need to do your job &lt;br /&gt;
effectively.&lt;br /&gt;
&lt;br /&gt;
One way for users within a group to manage their priorities within the group&lt;br /&gt;
is with [[Moab#Adjusting_Job_Priority | user-adjusted priorities]]; this is&lt;br /&gt;
described in more detail on the [[Moab | Scheduling System]] page.&lt;br /&gt;
&lt;br /&gt;
=== How do I charge jobs to my NRAC/LRAC allocation? ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please see the [[Moab#Accounting|accounting section of Moab page]].&lt;br /&gt;
&lt;br /&gt;
=== How does one check the amount of used CPU-hours in a project, and how does one get statistics for each user in the project? ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
This information is available on the scinet portal,https://portal.scinet.utoronto.ca, See also [[SciNet Usage Reports]].&lt;br /&gt;
&lt;br /&gt;
=== How does the Infiniband Upgrade affect my 2012 NRAC allocation ?===&lt;br /&gt;
&lt;br /&gt;
The  NRAC allocations for the current (2012) year that were based on ethernet and infiniband will carry over, however the allocation will be on the full GPC, not just the subsection.  So if you were allocated 500 hours on Infiniband your fairshare allocation will still be 500 hours, just 500 out or 30,000, instead of 500 out of 7,000.  If you received two allocations, one on gigE and one on IB, they will simply be combined. This should benefit all users as the desegregation of the GPC provides a greater pool of nodes increasing the probability of your job to run.&lt;br /&gt;
&lt;br /&gt;
==Monitoring jobs in the queue==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Why hasn't my job started?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Use the moab command &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
checkjob -v jobid&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and the last couple of lines should explain why a job hasn't started.  &lt;br /&gt;
&lt;br /&gt;
Please see [[Moab| Job Scheduling System (Moab) ]] for more detailed information&lt;br /&gt;
&lt;br /&gt;
===How do I figure out when my job will run?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please see [[Moab#Available_Resources| Job Scheduling System (Moab) ]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- ===My GPC job is Held, and checkjob says &amp;quot;Batch:PolicyViolation&amp;quot; ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
When this happens, you'll see your job stuck in a BatchHold state.  &lt;br /&gt;
This happens because the job you've submitted breaks one of the rules of the queues, and is being held until you modify it or kill it and re-submit a conforming job.  The most common problems are:&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===I submit my GPC job, and I get an email saying it was rejected===&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
This happens because the job you've submitted breaks one of the rules of the queues and is rejected. An email&lt;br /&gt;
is sent with the JOBID, JOBNAME, and the reason it was rejected.  The following is an example where a job&lt;br /&gt;
requests more than 48 hours and was rejected.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PBS Job Id: 3462493.gpc-sched&lt;br /&gt;
Job Name:   STDIN&lt;br /&gt;
job deleted&lt;br /&gt;
Job deleted at request of root@gpc-sched&lt;br /&gt;
MOAB_INFO:  job was rejected - job violates class configuration 'wclimit too high for class 'batch_ib' (345600 &amp;gt; 172800)'&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Jobs on the TCS or GPC may only run for 48 hours at a time; this restriction greatly increases responsiveness of the queue and queue throughput for all our users.  If your computation requires longer than that, as many do, you will have to [[ Checkpoints | checkpoint ]] your job and restart it after each 48-hour queue window.   You can manually re-submit jobs, or if you can have your job cleanly exit before the 48 hour window, there are ways to [[ FAQ#How_can_I_automatically_resubmit_a_job.3F | automatically resubmit jobs ]].&lt;br /&gt;
&lt;br /&gt;
Other rejections return a more cryptic error saying &amp;quot;job violates class configuration&amp;quot; such as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PBS Job Id: 3462409.gpc-sched&lt;br /&gt;
Job Name:   STDIN&lt;br /&gt;
job deleted&lt;br /&gt;
Job deleted at request of root@gpc-sched&lt;br /&gt;
MOAB_INFO:  job was rejected - job violates class configuration 'user required by class 'batch''&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The most common problems that result in this error are:&lt;br /&gt;
&lt;br /&gt;
* '''Incorrect number of processors per node''': Jobs on the GPC are scheduled per-node not per-core and since each node has 8 processor cores (ppn=8) the smallest job allowed is one node with 8 cores (nodes=1:ppn=8).  For serial jobs users must bundle or batch them together in groups of 8. See [[ FAQ#How_do_I_run_serial_jobs_on_GPC.3F | How do I run serial jobs on GPC? ]]&lt;br /&gt;
* '''No number of nodes specified''': Jobs submitted to the main queue must request a specific number of nodes, either in the submission script (with a line like &amp;lt;tt&amp;gt;#PBS -l nodes=2:ppn=8&amp;lt;/tt&amp;gt;) or on the command line (eg, &amp;lt;tt&amp;gt;qsub -l nodes=2:ppn=8,walltime=5:00:00 script.pbs&amp;lt;/tt&amp;gt;).  Note that for the debug queue, you can get away without specifying a number of nodes and a default of one will be assigned; for both technical and policy reasons, we do not enforce such a default for the main (&amp;quot;batch&amp;quot;) queue.&lt;br /&gt;
* '''There is a 15 minute walltime minimum''' on all queues except debug and if you set your walltime less than this, it will be rejected.&lt;br /&gt;
&lt;br /&gt;
===How can I monitor my running jobs on TCS?===&lt;br /&gt;
&lt;br /&gt;
How can I monitor the load of TCS jobs?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
You can get more information with the command &lt;br /&gt;
 /xcat/tools/tcs-scripts/LL/jobState.sh&lt;br /&gt;
which I alias as:&lt;br /&gt;
 alias llq1='/xcat/tools/tcs-scripts/LL/jobState.sh'&lt;br /&gt;
If you run &amp;quot;llq1 -n&amp;quot; you will see a listing of jobs together with a lot of information, including the load.&lt;br /&gt;
&lt;br /&gt;
==Errors in running jobs==&lt;br /&gt;
&lt;br /&gt;
===On GPC, `Job cannot be executed'===&lt;br /&gt;
&lt;br /&gt;
I get error messages like this trying to run on GPC:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PBS Job Id: 30414.gpc-sched&lt;br /&gt;
Job Name:   namd&lt;br /&gt;
Exec host:  gpc-f120n011/7+gpc-f120n011/6+gpc-f120n011/5+gpc-f120n011/4+gpc-f120n011/3+gpc-f120n011/2+gpc-f120n011/1+gpc-f120n011/0&lt;br /&gt;
Aborted by PBS Server &lt;br /&gt;
Job cannot be executed&lt;br /&gt;
See Administrator for help&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
PBS Job Id: 30414.gpc-sched&lt;br /&gt;
Job Name:   namd&lt;br /&gt;
Exec host:  gpc-f120n011/7+gpc-f120n011/6+gpc-f120n011/5+gpc-f120n011/4+gpc-f120n011/3+gpc-f120n011/2+gpc-f120n011/1+gpc-f120n011/0&lt;br /&gt;
An error has occurred processing your job, see below.&lt;br /&gt;
request to copy stageout files failed on node 'gpc-f120n011/7+gpc-f120n011/6+gpc-f120n011/5+gpc-f120n011/4+gpc-f120n011/3+gpc-f120n011/2+gpc-f120n011/1+gpc-f120n011/0' for job 30414.gpc-sched&lt;br /&gt;
&lt;br /&gt;
Unable to copy file 30414.gpc-sched.OU to USER@gpc-f101n084.scinet.local:/scratch/G/GROUP/USER/projects/sim-performance-test/runtime/l/namd/8/namd.o30414&lt;br /&gt;
*** error from copy&lt;br /&gt;
30414.gpc-sched.OU: No such file or directory&lt;br /&gt;
*** end error output&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Try doing the following:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mkdir ${SCRATCH}/.pbs_spool&lt;br /&gt;
ln -s ${SCRATCH}/.pbs_spool ~/.pbs_spool&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This is how all new accounts are setup on SciNet.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; on GPC for compute jobs is mounted as a read-only file system.   &lt;br /&gt;
PBS by default tries to spool its output  files to &amp;lt;tt&amp;gt;${HOME}/.pbs_spool&amp;lt;/tt&amp;gt;&lt;br /&gt;
which fails as it tries to write to a read-only file  &lt;br /&gt;
system.    New accounts at SciNet  get around this by having ${HOME}/.pbs_spool  &lt;br /&gt;
point to somewhere appropriate on &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt;, but if you've deleted that link&lt;br /&gt;
or directory, or had an old account, you will see errors like the above.&lt;br /&gt;
&lt;br /&gt;
'''On Feb 24, the input/output mechanism has been reconfigured to use a local ramdisk as the temporary location, which means that .pbs_spool is no longer needed and this error should not occur anymore.'''&lt;br /&gt;
&lt;br /&gt;
=== I couldn't find the  .o output file in the .pbs_spool directory as I used to ===&lt;br /&gt;
&lt;br /&gt;
On Feb 24 2011, the temporary location of standard input and output files was moved from the shared file system ${SCRATCH}/.pbs_spool to the&lt;br /&gt;
node-local directory /var/spool/torque/spool (which resides in ram). The final location after a job has finished is unchanged,&lt;br /&gt;
but to check the output/error of running jobs, users will now have to ssh into the (first) node assigned to the job and look in&lt;br /&gt;
/var/spool/torque/spool.&lt;br /&gt;
&lt;br /&gt;
This alleviates access contention to the temporary directory, especially for those users that are running a lot of jobs, and  reduces the burden on the file system in general.&lt;br /&gt;
&lt;br /&gt;
Note that it is good practice to redirect output to a file rather than to count on the scheduler to do this for you.&lt;br /&gt;
&lt;br /&gt;
=== My GPC job died, telling me `Copy Stageout Files Failed' ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
When a job runs on GPC, the script's standard output and error are redirected to &lt;br /&gt;
&amp;lt;tt&amp;gt;$PBS_JOBID.gpc-sched.OU&amp;lt;/tt&amp;gt; and &amp;lt;tt&amp;gt;$PBS_JOBID.gpc-sched.ER&amp;lt;/tt&amp;gt; in&lt;br /&gt;
/var/spool/torque/spool on the (first) node on which your job is running.  At the end of the job, those .OU and .ER files are copied to where the batch script tells them to be copied, by default &amp;lt;tt&amp;gt;$PBS_JOBNAME.o$PBS_JOBID&amp;lt;/tt&amp;gt; and&amp;lt;tt&amp;gt;$PBS_JOBNAME.e$PBS_JOBID&amp;lt;/tt&amp;gt;.   (You can set those filenames to be something clearer with the -e and -o options in your PBS script.)&lt;br /&gt;
&lt;br /&gt;
When you get errors like this:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
An error has occurred processing your job, see below.&lt;br /&gt;
request to copy stageout files failed on node&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
it means that the copying back process has failed in some way.  There could be a few reasons for this. The first thing to '''make sure that your .bashrc does not produce any output''', as the output-stageout is performed by bash and further output can cause this to fail.&lt;br /&gt;
But it also could have just been a random filesystem error, or it  could be that your job failed spectacularly enough to shortcircuit the normal job-termination process and those files just never got copied.&lt;br /&gt;
&lt;br /&gt;
Write to [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;] if your input/output files got lost, as we will probably be able to retrieve them for you (please supply at least the jobid, and any other information that may be relevant). &lt;br /&gt;
&lt;br /&gt;
Mind you that it is good practice to redirect output to a file rather than depending on the job scheduler to do this for you.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
&lt;br /&gt;
===Another transport will be used instead===&lt;br /&gt;
&lt;br /&gt;
I get error messages like the following when running on the GPC at the start of the run, although the job seems to proceed OK.   Is this a problem?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
[[45588,1],0]: A high-performance Open MPI point-to-point messaging module&lt;br /&gt;
was unable to find any relevant network interfaces:&lt;br /&gt;
&lt;br /&gt;
Module: OpenFabrics (openib)&lt;br /&gt;
  Host: gpc-f101n005&lt;br /&gt;
&lt;br /&gt;
Another transport will be used instead, although this may result in&lt;br /&gt;
lower performance.&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Everything's fine.   The two MPI libraries scinet provides work for both the InifiniBand and the Gigabit Ethernet interconnects, and will always try to use the fastest interconnect available.   In this case, you ran on normal gigabit GPC nodes with no infiniband; but the MPI libraries have no way of knowing this, and try the infiniband first anyway.  This is just a harmless `failover' message; it tried to use the infiniband, which doesn't exist on this node, then fell back on using Gigabit ethernet (`another transport').&lt;br /&gt;
&lt;br /&gt;
With OpenMPI, this can be avoided by not looking for infiniband; eg, by using the option&lt;br /&gt;
&lt;br /&gt;
--mca btl ^openib&lt;br /&gt;
&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===IB Memory Errors, eg &amp;lt;tt&amp;gt; reg_mr Cannot allocate memory &amp;lt;/tt&amp;gt;===&lt;br /&gt;
&lt;br /&gt;
Infiniband requires more memory than ethernet; it can use RDMA (remote direct memory access) transport for which it sets aside registered memory to transfer data.&lt;br /&gt;
&lt;br /&gt;
In our current network configuration, it requires a _lot_ more memory, particularly as you go to larger process counts; unfortunately, that means you can't get around the &amp;quot;I need more memory&amp;quot; problem the usual way, by running on more nodes.   Machines with different memory or &lt;br /&gt;
network configurations may exhibit this problem at higher or lower MPI &lt;br /&gt;
task counts.&lt;br /&gt;
&lt;br /&gt;
Right now, the best workaround is to reduce the number and size of OpenIB queues, using XRC: with the OpenMPI, add the following options to your mpirun command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-mca btl_openib_receive_queues X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32 -mca btl_openib_max_send_size 12288&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
With Intel MPI, you should be able to do&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load intelmpi/4.0.3.008&lt;br /&gt;
mpirun -genv I_MPI_FABRICS=shm:ofa  -genv I_MPI_OFA_USE_XRC=1 -genv I_MPI_OFA_DYNAMIC_QPS=1 -genv I_MPI_DEBUG=5 -np XX ./mycode&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
to the same end.  &lt;br /&gt;
&lt;br /&gt;
For more information see [[GPC MPI Versions]].&lt;br /&gt;
&lt;br /&gt;
===My compute job fails, saying &amp;lt;tt&amp;gt;libpng12.so.0: cannot open shared object file&amp;lt;/tt&amp;gt; or &amp;lt;tt&amp;gt;libjpeg.so.62: cannot open shared object file&amp;lt;/tt&amp;gt;===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
To maximize the amount of memory available for compute jobs, the compute nodes have a less complete system image than the development nodes.   In particular, since interactive graphics libraries like matplotlib and gnuplot are usually used interactively, the libraries for their use are included in the devel nodes' image but not the compute nodes.&lt;br /&gt;
&lt;br /&gt;
Many of these extra libraries are, however, available in the &amp;quot;extras&amp;quot; module.   So adding a &amp;quot;module load extras&amp;quot; to your job submission  script - or, for overkill, to your .bashrc - should enable these scripts to run on the compute nodes.&lt;br /&gt;
&lt;br /&gt;
==Data on SciNet disks==&lt;br /&gt;
&lt;br /&gt;
=== When will the 2011 NRAC disk space allocation be ready? ===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
We're still working on expanding our storage capacity to meet the 2011 NRAC requirements. It may take a few more months, but when it becomes available we'll make an announcement.&lt;br /&gt;
&lt;br /&gt;
===How do I find out my disk usage?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
The standard unix/linux utilities for finding the amount of disk space used by a directory are very slow, and notoriously inefficient on the GPFS filesystems that we run on the SciNet systems.  There are utilities that very quickly report your disk usage:&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;'''/scinet/gpc/bin/diskUsage'''&amp;lt;/tt&amp;gt; command, available on the login nodes, datamovers and the GPC devel nodes, provides information in a number of ways on the home, scratch, and project file systems. For instance, how much disk space is being used by yourself and your group (with the -a option), or how much your usage has changed over a certain period (&amp;quot;delta information&amp;quot;) or you may generate plots of your usage over time.&lt;br /&gt;
This information is only updated hourly!&lt;br /&gt;
&lt;br /&gt;
More information about these filesystems is available at the [[Data_Management | Data_Management]].&lt;br /&gt;
&lt;br /&gt;
===How do I transfer data to/from SciNet?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
All incoming connections to SciNet go through relatively low-speed connections to the &amp;lt;tt&amp;gt;login.scinet&amp;lt;/tt&amp;gt; gateways, so using scp to copy files the same way you ssh in is not an effective way to move lots of data.  Better tools are described in our page on [[Data_Management#Data_Transfer | Data Transfer]].&lt;br /&gt;
&lt;br /&gt;
===My group works with data files of size 1-2 GB.  Is this too large to  transfer by scp to login.scinet.utoronto.ca ?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Generally, occasion transfers of data less than 10GB is perfectly acceptible to so through the login nodes. See [[Data_Management#Data_Transfer | Data Transfer]].&lt;br /&gt;
&lt;br /&gt;
===How can I check if I have files in /scratch that are scheduled for automatic deletion?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please see [[Storage_Quickstart#Scratch_Disk_Purging_Policy | Storage At SciNet]]&lt;br /&gt;
&lt;br /&gt;
===How to allow my supervisor to manage files for me using ACL-based commands?===&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Please see [[Data_Management#File.2FOwnership_Management_.28ACL.29 | File/Ownership Management]]&lt;br /&gt;
&lt;br /&gt;
==Keep 'em Coming!==&lt;br /&gt;
&lt;br /&gt;
===Next question, please===&lt;br /&gt;
&lt;br /&gt;
Send your question to [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;];  we'll answer it asap!&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Infiniband_Upgrade&amp;diff=4696</id>
		<title>Infiniband Upgrade</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Infiniband_Upgrade&amp;diff=4696"/>
		<updated>2012-04-22T01:49:48Z</updated>

		<summary type="html">&lt;p&gt;Cneale: Added a note that the :compute-eth: parameter is no longer accepted&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;On Apr 19 2012, SciNet has upgraded the GPC to be fully Infiniband connected. This new Infiniband network replaces the current Ethernet connected section of the GPC with a 5:1 QDR Infiniband while the existing GPC 1:1 DDR Infiniband remains. The GPC now consists of 840 nodes of DDR (6,720 cores) and 3,024 nodes of QDR (24,192 cores).  The Infiniband sections are connected, but in general operation, jobs will remain in one section or the other. The GPC Infiniband (both QDR and DDR) are fully compatible in terms of hardware and software so no recompilation or different MPI flags is required.  Neither is recompilation  required to run jobs that used the Ethernet section of the GPC.&lt;br /&gt;
&lt;br /&gt;
== Job Submission ==&lt;br /&gt;
&lt;br /&gt;
In terms of job submission, for most users your job submission scripts will still work as expected, however all jobs will now run on infiniband, so the :ib: parameter used to ask for infiniband nodes will not be necessary anymore, however will still be accepted. By default a user's job will go to whichever network section best accommodates it, typically smaller jobs to the QDR and larger jobs to the DDR. However a user can override this by simply adding the flags &amp;quot;ddr&amp;quot; or &amp;quot;qdr&amp;quot; to the job resource request. Old scripts containing the :compute-eth: parameter will never run (although these jobs will enter the queue in the Idle state).&lt;br /&gt;
&lt;br /&gt;
For example, to request two nodes anywhere on the GPC (QDR or DDR), use&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#PBS -l nodes=2:ppn=8,walltime=1:00:00&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
in your job submission script.&lt;br /&gt;
&lt;br /&gt;
For two nodes using DDR, use&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#PBS -l nodes=2:ddr:ppn=8,walltime=1:00:00&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To get two nodes using QDR, instead, you would say&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#PBS -l nodes=2:qdr:ppn=8,walltime=1:00:00&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Parameters for mpirun ==&lt;br /&gt;
&lt;br /&gt;
No special MPI parameters are required to run a job using Infiniband. It will be used by default, so most users should just use the basic mpirun commands as outlined in [[GPC_MPI_Versions]]. For more detailed information please see the recent [[Media:Snug_techtalk_Infiniband.pdf | Techtalk on IB]].&lt;br /&gt;
&lt;br /&gt;
== NRAC Allocations and Fairshare ==&lt;br /&gt;
&lt;br /&gt;
The  NRAC allocations for the current year that were based on ethernet and infiniband will carry over, however the allocation will be on the full GPC, not just the subsection.  So if you were allocated 500 hours on Infiniband your fairshare allocation will still be 500 hours, just 500 out or 30,000, instead of 500 out of 7,000.  If you received two allocations, one on gigE and one on IB, they will simply be combined. This should benefit all users as the desegregation of the GPC provides a greater pool of nodes increasing the probability of your job to run.&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3840</id>
		<title>HPSS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3840"/>
		<updated>2011-07-28T16:30:40Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* HTAR */  added information about the error code for htar when the filename is too long&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''High Performance Storage System''' =&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to started in Jun/2011 with a select group of users. Instructions on this wiki are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The High Performance Storage System ([http://www.hpss-collaboration.org/index.shtml HPSS]) is a tape-backed hierarchical storage system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data can be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
Access and transfer of data into and out of HPSS is done under the control of the user, whose interaction is expected to be scripted and submitted as a batch job, using one or more of the following utilities:&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi HSI] is a client with an ftp-like functionality which can be used to archive and retrieve large files. It is also useful for browsing the contents of HPSS.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/htar HTAR] is a utility that creates tar formatted archives directly into HPSS. It also creates a separate index file (.idx) that can be accessed and browsed quickly.&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/ISH ISH] is a TUI utility that can perform an inventory of the files and directories in your tarballs.&lt;br /&gt;
&lt;br /&gt;
== '''Why should I use and trust HPSS?''' ==&lt;br /&gt;
* 10+ years history, used by 50+ facilities in the [http://www.top500.org “Top 500”] HPC list&lt;br /&gt;
* very reliable, data redundancy and data insurance built-in.&lt;br /&gt;
* highly scalable, reasonable performance at SciNet - Ingest: ~12 TB/day, Recall: ~24 TB/day (aggregated)&lt;br /&gt;
* HSI/HTAR clients also very reliable and used on several HPSS sites. ISH was written at SciNet.&lt;br /&gt;
* [[Media:HPSS_rational.pdf|HPSS fits well with the Storage Capacity Expansion Plan at SciNet]] (pdf presentation)&lt;br /&gt;
&lt;br /&gt;
== '''Guidelines''' ==&lt;br /&gt;
* Expanded storage capacity is provided on tape -- a media that is not suited for storing small files. Files smaller than ~200MB should be grouped into tarballs with tar or htar.&lt;br /&gt;
* Optimal performance for aggregated transfers and allocation on tapes is obtained with tarballs of size around 100GB.&lt;br /&gt;
* We strongly urge that you use the sample scripts we are providing as the basis for your job submissions &lt;br /&gt;
* Make sure to check the application's exit code and returned logs for errors after any data transfer or tarball creation process&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Access Through the Queue System'''  ==&lt;br /&gt;
All access to the archive system is done through the [[Moab|GPC queue system]].&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the HPSS should be scripted into jobs and submitted to the ''archive'' queue. See generic example below.&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N htar_create_tarball_in_hpss&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -m e&lt;br /&gt;
 &lt;br /&gt;
echo &amp;quot;Creating a htar of finished-job1/ directory tree into HPSS&amp;quot;&lt;br /&gt;
&lt;br /&gt;
trap &amp;quot;echo 'Job script not completed';exit 129&amp;quot; TERM INT&lt;br /&gt;
# Note that your initial directory in HPSS will be /archive/$(id -gn)/$(whoami)/&lt;br /&gt;
 &lt;br /&gt;
cd /scratch/$(whoami)/workarea/ &lt;br /&gt;
htar -cpf /archive/$(id -gn)/$(whoami)/finished-job1.tar finished-job1/ &lt;br /&gt;
status=$?&lt;br /&gt;
 &lt;br /&gt;
trap - TERM INT&lt;br /&gt;
 &lt;br /&gt;
if [ ! $status == 0 ]; then&lt;br /&gt;
   echo 'HTAR returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
'''Note:''' Always trap the execution of your jobs for abnormal terminations, and be sure to return the exit code&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The status of pending jobs can be monitored with showq specifying the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Job Dependencies ===&lt;br /&gt;
&lt;br /&gt;
Typically data will be recalled to /scratch when it is needed for analysis. Job dependencies can be constructed so that analysis jobs wait in the queue for data recalls before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the archive recalling job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency (lookup [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_data_recall data-recall.sh samples]):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-recall.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-recalled-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== '''HTAR''' ==&lt;br /&gt;
''' Please aggregate small files (&amp;lt;~200MB) into tarballs or htar files. '''&lt;br /&gt;
&lt;br /&gt;
HTAR is a utility that is used for aggregating a set of files and directories, by using a sophisticated multithreaded buffering scheme to write files directly from GPFS into HPSS, creating an archive file that conforms to the POSIX TAR specification, thereby achieving a high rate of performance. HTAR does not do gzip compression, however it already has a built-in checksum algorithm.&lt;br /&gt;
&lt;br /&gt;
'''Caution'''&lt;br /&gt;
* Files larger than 68 GB cannot be stored in an HTAR archive. If you attempt to start a transfer with any files larger than 68GB the whole HTAR session will fail, and you'll get a notification listing all those files, so that you can transfer them with HSI. &lt;br /&gt;
* Files with pathnames too long will be skipped (greater than 100 characters), so as to conform with TAR protocol [[(POSIX 1003.1 USTAR)]] -- Note that the HTAR exit code will erroneously indicate success. For now, you can check for this type of error by &amp;quot;grep Warning my.output&amp;quot; after the job has completed.&lt;br /&gt;
* Check the HTAR exit code and log file before removing any files from the GPFS active filesystems.&lt;br /&gt;
&lt;br /&gt;
=== HTAR Usage ===&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, and preserve mask attributes (-p), enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cpf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cpf /archive/$(id -gn)/$(whoami)/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cpf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the archive file called ''proj1.tar'' in HPSS into the ''project1/src'' directory in GPFS, and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xpmf proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;br /&gt;
&lt;br /&gt;
 &lt;br /&gt;
==== Sample tarball create ====&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N htar_create_tarball_in_hpss&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -m e&lt;br /&gt;
 &lt;br /&gt;
trap &amp;quot;echo 'Job script not completed';exit 129&amp;quot; TERM INT&lt;br /&gt;
# Note that your initial directory in HPSS will be /archive/$(id -gn)/$(whoami)/&lt;br /&gt;
&lt;br /&gt;
cd /scratch/$(whoami)/workarea/ &lt;br /&gt;
htar -cpf /archive/$(id -gn)/$(whoami)/finished-job1.tar finished-job1/ &lt;br /&gt;
status=$?&lt;br /&gt;
 &lt;br /&gt;
trap - TERM INT&lt;br /&gt;
 &lt;br /&gt;
if [ ! $status == 0 ]; then&lt;br /&gt;
   echo 'HTAR returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Note:''' If you attempt to start a transfer with any files larger than 68GB the whole HTAR session will fail, and you'll get a notification listing all those files, so that you can transfer them with HSI. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
----------------------------------------&lt;br /&gt;
INFO: File too large for htar to handle: finished-job1/file1 (86567185745 bytes)&lt;br /&gt;
INFO: File too large for htar to handle: finished-job1/file2 (71857244579 bytes)&lt;br /&gt;
ERROR: 2 oversize member files found - please correct and retry&lt;br /&gt;
ERROR: [FATAL] error(s) generating filename list &lt;br /&gt;
HTAR: HTAR FAILED&lt;br /&gt;
###WARNING  htar returned non-zero exit status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
 &lt;br /&gt;
==== Sample tarball list ====&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N htar_list_tarball_in_hpss&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -m e&lt;br /&gt;
 &lt;br /&gt;
trap &amp;quot;echo 'Job script not completed';exit 129&amp;quot; TERM INT&lt;br /&gt;
# Note that your initial directory in HPSS will be /archive/$(id -gn)/$(whoami)/&lt;br /&gt;
 &lt;br /&gt;
htar -tvf /archive/$(id -gn)/$(whoami)/finished-job1.tar&lt;br /&gt;
status=$?&lt;br /&gt;
 &lt;br /&gt;
trap - TERM INT&lt;br /&gt;
 &lt;br /&gt;
if [ ! $status == 0 ]; then&lt;br /&gt;
   echo 'HTAR returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
 &lt;br /&gt;
==== Sample tarball extract ====&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N htar_extract_tarball_from_hpss&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -m e&lt;br /&gt;
 &lt;br /&gt;
trap &amp;quot;echo 'Job script not completed';exit 129&amp;quot; TERM INT&lt;br /&gt;
# Note that your initial directory in HPSS will be /archive/$(id -gn)/$(whoami)/&lt;br /&gt;
 &lt;br /&gt;
cd /scratch/$(whoami)/recalled-from-hpss&lt;br /&gt;
htar -xpmf /archive/$(id -gn)/$(whoami)/finished-job1.tar&lt;br /&gt;
status=$?&lt;br /&gt;
 &lt;br /&gt;
trap - TERM INT&lt;br /&gt;
 &lt;br /&gt;
if [ ! $status == 0 ]; then&lt;br /&gt;
   echo 'HTAR returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI may be the primary client with which some users will interact with HPSS. It provides an ftp-like interface for archiving and retrieving tarballs or [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_transferring_directories directory trees]. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents in HPSS. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the file does not already exist in HPSS&lt;br /&gt;
 cput [options] GPFSpath [: HPSSpath]&lt;br /&gt;
|-&lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to GPFS only if a local copy does not already exist. &lt;br /&gt;
 cget [options] [GPFSpath :] HPSSpath&lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of HPSS.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands to GPFS&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
*There are 3 distinctions about HSI that you should keep in mind, and that can generate a bit of confusion when you're first learning how to use it:&lt;br /&gt;
** HSI doesn't currently support renaming directories during transfers on-the-fly, therefore the syntax for cput/cget may not work as one would expect in some scenarios, requiring some workarounds.&lt;br /&gt;
** HSI has an operator &amp;quot;:&amp;quot; which separates the GPFSpath and HPSSpath, and must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
** The order for referring to files in HSI syntax is different from FTP. In HSI the general format is always the same, GPFS first, HPSS second, cput or cget:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     GPFSfile : HPSSfile&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
For example, when using HSI to store the tarball file from GPFS into HPSS, then recall it to GPFS, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    cput tarball-in-GPFS : tarball-in-HPSS&lt;br /&gt;
    cget tarball-recalled : tarball-in-HPSS&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
unlike with FTP, where the following syntax would be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put tarball-in-GPFS tarball-in-HPSS &lt;br /&gt;
    get tarball-in-HPSS tarball-recalled&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;quot;mkdir LargeFilesDir; cd LargeFilesDir; cput tarball-in-GPFS : tarball-in-HPSS&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* More complex sequences can be performed using an except such as this:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;&amp;lt;EOF&lt;br /&gt;
      mkdir LargeFilesDir&lt;br /&gt;
      cd LargeFilesDir&lt;br /&gt;
      cput tarball-in-GPFS : tarball-in-HPSS&lt;br /&gt;
      lcd /scratch/$(whoami)/LargeFilesDir2/&lt;br /&gt;
      cput -Rup *  &lt;br /&gt;
    end&lt;br /&gt;
    EOF&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent, but we recommend that you always use full path, and organize the contents of HPSS, where the default HSI directory placement is /archive/$(id -gn)/$(whoami)/:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi cput tarball&lt;br /&gt;
    hsi cput tarball : tarball&lt;br /&gt;
    hsi cput /scratch/$(whoami)/tarball : /archive/$(id -gn)/$(whoami)/tarball&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* There are no known issues renaming files on-the-fly:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi cput /scratch/$(whoami)/tarball1 : /archive/$(id -gn)/$(whoami)/tarball2&lt;br /&gt;
    hsi cget /scratch/$(whoami)/tarball3 : /archive/$(id -gn)/$(whoami)/tarball2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* There are no issues transferring directory trees all their contents recursively (as in rsync), provided that you keep the same directory name on GPFS and HPSS. You may use '-u' option to resume a previously disrupted session, and the '-p' to  preserve timestamp.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi cput -Rup /scratch/$(whoami)/LargeFilesDir : /archive/$(id -gn)/$(whoami)/LargeFilesDir&lt;br /&gt;
OR&lt;br /&gt;
   hsi cget -Rup /scratch/$(whoami)/LargeFilesDir : /archive/$(id -gn)/$(whoami)/LargeFilesDir&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* However the syntax forms below will fail.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi cput -Rup /scratch/$(whoami)/LargeFilesDir : /archive/$(id -gn)/$(whoami)/LargeFilesDir2    (FAILS)&lt;br /&gt;
OR&lt;br /&gt;
   hsi cput -Rup /scratch/$(whoami)/LargeFilesDir/* : /archive/$(id -gn)/$(whoami)/LargeFilesDir2  (FAILS)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
One workaround is the following 2-steps process:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi cput -Rup /scratch/$(whoami)/LargeFilesDir : /archive/$(id -gn)/$(whoami)/LargeFilesDir &lt;br /&gt;
   hsi mv /archive/$(id -gn)/$(whoami)/LargeFilesDir /archive/$(id -gn)/$(whoami)/LargeFilesDir2   &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Another workaround is do a &amp;quot;cd&amp;quot; in GPFS first:&lt;br /&gt;
    lcd GPFSpath, cget -R ...&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;&amp;lt;EOF&lt;br /&gt;
      lcd /scratch/$(whoami)/LargeFilesDir&lt;br /&gt;
      mkdir /archive/$(id -gn)/$(whoami)/LargeFilesDir2&lt;br /&gt;
      cd /archive/$(id -gn)/$(whoami)/LargeFilesDir2&lt;br /&gt;
      cput -Rup *  &lt;br /&gt;
    end&lt;br /&gt;
    EOF&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Documentation === &lt;br /&gt;
Complete documentation on HSI is available from the Gleicher Enterprises links below. You may peruse those links and come with alternative syntax forms. You may even be already familiar with HPSS/HSI from other HPC facilities, that may or not have procedures similar to ours. HSI doesn't always work as expected when you go outside of our recommended syntax, so '''we strongly urge that you use the sample scripts we are providing as the basis''' for your job submissions&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/ HSI Introduction]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help hsi help]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes] &lt;br /&gt;
'''Note:''' HSI returns the highest-numbered exit code, in case of multiple operations in the same hsi session. You may use '/scinet/gpc/bin/exit2msg $status' to translate those codes into intelligible messages&lt;br /&gt;
&lt;br /&gt;
=== Typical Usage Scripts===&lt;br /&gt;
The most common interactions will be ''putting'' data into HPSS, examining the contents (ls,ish), and ''getting'' data back onto GPFS for inspection or analysis.&lt;br /&gt;
&lt;br /&gt;
==== Sample '''data offload''' ====&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
trap &amp;quot;echo 'Job script not completed';exit 129&amp;quot; TERM INT&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF1&lt;br /&gt;
mkdir put-away&lt;br /&gt;
cd put-away&lt;br /&gt;
cput /scratch/$(whoami)/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
end&lt;br /&gt;
EOF1&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo 'HSI returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF2&lt;br /&gt;
mkdir put-away&lt;br /&gt;
cd put-away&lt;br /&gt;
cput /scratch/$(whoami)/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
end&lt;br /&gt;
EOF2&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo 'HSI returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
trap - TERM INT&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Note:''' as in the above example, we recommend that you capture the (highest-numbered) exit code for each hsi session independently. And remember, you may improve your exit code verbosity by adding the excerpt below to your scripts:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo 'HSI returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== Sample '''data list''' ====&lt;br /&gt;
A very trivial way to list the contents of HPSS would be to just submit the HSI 'ls' command.&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_ls&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cd put-away&lt;br /&gt;
ls -R&lt;br /&gt;
end&lt;br /&gt;
EOF&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
However, we provide a much more useful and convenient way to explore the contents of HPSS with the inventory shell [[ISH]]. This example creates an index of all the files in a user's portion of the namespace. The list is placed in the directory /home/$(whoami)/.ish_register that can be inspected from the gpc-devel nodes.&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_index&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
INDEX_DIR=$HOME/.ish_register&lt;br /&gt;
if ! [ -e &amp;quot;$INDEX_DIR&amp;quot; ]; then&lt;br /&gt;
  mkdir -p $INDEX_DIR&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
export ISHREGISTER=&amp;quot;$INDEX_DIR&amp;quot;&lt;br /&gt;
/scinet/gpc/bin/ish hindex&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This index can be browsed or searched with ISH on the development nodes.&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
gpc-f104n084-$ /scinet/gpc/bin/ish ~/.ish_register/hpss.igz &lt;br /&gt;
[ish]hpss.igz&amp;gt; help&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
ISH is a powerful tool that is also useful for creating and browsing indices of tar and htar archives, so please look at the [[ISH|documentation]] or built in help.&lt;br /&gt;
&lt;br /&gt;
==== Sample '''data recall''' ====&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-recall.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N recall_files&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
trap &amp;quot;echo 'Job script not completed';exit 129&amp;quot; TERM INT&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$(whoami)/recalled-from-hpss&lt;br /&gt;
&lt;br /&gt;
# individual tarballs previously organized in HPSS inside the put-away-on-2010/ folder&lt;br /&gt;
hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
cget /scratch/$(whoami)/recalled-from-hpss/Jan-2010-jobs.tar.gz : /archive/$(id -gn)/$(whoami)/put-away-on-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
cget /scratch/$(whoami)/recalled-from-hpss/Feb-2010-jobs.tar.gz : /archive/$(id -gn)/$(whoami)/put-away-on-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
end&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
&lt;br /&gt;
trap - TERM INT&lt;br /&gt;
&lt;br /&gt;
if [ ! $status == 0 ]; then&lt;br /&gt;
   echo 'HSI returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We should emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do optimization, as in the following example:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-recall.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N recall_files_optimized&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
trap &amp;quot;echo 'Job script not completed';exit 129&amp;quot; TERM INT&lt;br /&gt;
mkdir -p /scratch/$(whoami)/recalled-from-hpss&lt;br /&gt;
&lt;br /&gt;
# individual tarballs previously organized in HPSS inside the put-away-on-2010/ folder&lt;br /&gt;
hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
lcd /scratch/$(whoami)/recalled-from-hpss/&lt;br /&gt;
cd /archive/$(id -gn)/$(whoami)/put-away-on-2010/&lt;br /&gt;
cget Jan-2010-jobs.tar.gz Feb-2010-jobs.tar.gz&lt;br /&gt;
end&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
&lt;br /&gt;
trap - TERM INT&lt;br /&gt;
 &lt;br /&gt;
if [ ! $status == 0 ]; then&lt;br /&gt;
   echo 'HSI returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== Sample '''transferring directories''' ====&lt;br /&gt;
Remember, it's not possible to rename directories on-the-fly:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi cget -Rup /scratch/$(whoami)/LargeFiles-recalled : /archive/$(id -gn)/$(whoami)/LargeFiles    (FAILS)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The workaround is:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-recall.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N recall_directories&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
trap &amp;quot;echo 'Job script not completed';exit 129&amp;quot; TERM INT&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$(whoami)/LargeFiles-recalled&lt;br /&gt;
&lt;br /&gt;
hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
lcd /scratch/$(whoami)/LargeFiles-recalled&lt;br /&gt;
cd /archive/$(id -gn)/$(whoami)/LargeFiles&lt;br /&gt;
cget -Rup *&lt;br /&gt;
end&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
 &lt;br /&gt;
trap - TERM INT&lt;br /&gt;
 &lt;br /&gt;
if [ ! $status == 0 ]; then&lt;br /&gt;
   echo 'HSI returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
 &lt;br /&gt;
==== Sample '''verify checksum''' ====&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N checksum_verified_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
thefile=&amp;lt;GPFSpath&amp;gt;&lt;br /&gt;
storedfile=&amp;lt;HPSSpath&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# Generate checksum on fly using a named pipe so that file is only read from GPFS once&lt;br /&gt;
mkfifo /tmp/NPIPE&lt;br /&gt;
cat $thefile  | tee /tmp/NPIPE | hsi -q put - : $storedfile &amp;amp;&lt;br /&gt;
pid=$!&lt;br /&gt;
md5sum /tmp/NPIPE |tee /tmp/$fname.md5&lt;br /&gt;
rm -f  /tmp/NPIPE&lt;br /&gt;
&lt;br /&gt;
# Check the exit code of the HSI process  &lt;br /&gt;
wait $pid&lt;br /&gt;
status=$?&lt;br /&gt;
&lt;br /&gt;
if [ ! $status == 0 ]; then&lt;br /&gt;
   echo 'HSI returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# change filename to stdin in checksum file&lt;br /&gt;
sed -i.1 &amp;quot;s+/tmp/NPIPE+-+&amp;quot; /tmp/$fname.md5&lt;br /&gt;
&lt;br /&gt;
# verify checksum&lt;br /&gt;
hsi -q get - : $storedfile  | md5sum -c  /tmp/$fname.md5&lt;br /&gt;
status=$?&lt;br /&gt;
 &lt;br /&gt;
if [ ! $status == 0 ]; then&lt;br /&gt;
   echo 'HSI returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help''']&lt;br /&gt;
 &lt;br /&gt;
== '''[[ISH|ISH]]''' ==&lt;br /&gt;
=== [[ISH|Documentation and Usage]] ===&lt;br /&gt;
 &lt;br /&gt;
 &lt;br /&gt;
[[Data Management|BACK TO Data Management]]&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3839</id>
		<title>HPSS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3839"/>
		<updated>2011-07-28T16:21:15Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* Sample data list */  Directory structure added to run ish in example listing (it is not in the $PATH by default on the devel nodes or on the archive queue nodes)&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''High Performance Storage System''' =&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to started in Jun/2011 with a select group of users. Instructions on this wiki are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The High Performance Storage System ([http://www.hpss-collaboration.org/index.shtml HPSS]) is a tape-backed hierarchical storage system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data can be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
Access and transfer of data into and out of HPSS is done under the control of the user, whose interaction is expected to be scripted and submitted as a batch job, using one or more of the following utilities:&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi HSI] is a client with an ftp-like functionality which can be used to archive and retrieve large files. It is also useful for browsing the contents of HPSS.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/htar HTAR] is a utility that creates tar formatted archives directly into HPSS. It also creates a separate index file (.idx) that can be accessed and browsed quickly.&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/ISH ISH] is a TUI utility that can perform an inventory of the files and directories in your tarballs.&lt;br /&gt;
&lt;br /&gt;
== '''Why should I use and trust HPSS?''' ==&lt;br /&gt;
* 10+ years history, used by 50+ facilities in the [http://www.top500.org “Top 500”] HPC list&lt;br /&gt;
* very reliable, data redundancy and data insurance built-in.&lt;br /&gt;
* highly scalable, reasonable performance at SciNet - Ingest: ~12 TB/day, Recall: ~24 TB/day (aggregated)&lt;br /&gt;
* HSI/HTAR clients also very reliable and used on several HPSS sites. ISH was written at SciNet.&lt;br /&gt;
* [[Media:HPSS_rational.pdf|HPSS fits well with the Storage Capacity Expansion Plan at SciNet]] (pdf presentation)&lt;br /&gt;
&lt;br /&gt;
== '''Guidelines''' ==&lt;br /&gt;
* Expanded storage capacity is provided on tape -- a media that is not suited for storing small files. Files smaller than ~200MB should be grouped into tarballs with tar or htar.&lt;br /&gt;
* Optimal performance for aggregated transfers and allocation on tapes is obtained with tarballs of size around 100GB.&lt;br /&gt;
* We strongly urge that you use the sample scripts we are providing as the basis for your job submissions &lt;br /&gt;
* Make sure to check the application's exit code and returned logs for errors after any data transfer or tarball creation process&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Access Through the Queue System'''  ==&lt;br /&gt;
All access to the archive system is done through the [[Moab|GPC queue system]].&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the HPSS should be scripted into jobs and submitted to the ''archive'' queue. See generic example below.&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N htar_create_tarball_in_hpss&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -m e&lt;br /&gt;
 &lt;br /&gt;
echo &amp;quot;Creating a htar of finished-job1/ directory tree into HPSS&amp;quot;&lt;br /&gt;
&lt;br /&gt;
trap &amp;quot;echo 'Job script not completed';exit 129&amp;quot; TERM INT&lt;br /&gt;
# Note that your initial directory in HPSS will be /archive/$(id -gn)/$(whoami)/&lt;br /&gt;
 &lt;br /&gt;
cd /scratch/$(whoami)/workarea/ &lt;br /&gt;
htar -cpf /archive/$(id -gn)/$(whoami)/finished-job1.tar finished-job1/ &lt;br /&gt;
status=$?&lt;br /&gt;
 &lt;br /&gt;
trap - TERM INT&lt;br /&gt;
 &lt;br /&gt;
if [ ! $status == 0 ]; then&lt;br /&gt;
   echo 'HTAR returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
'''Note:''' Always trap the execution of your jobs for abnormal terminations, and be sure to return the exit code&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The status of pending jobs can be monitored with showq specifying the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Job Dependencies ===&lt;br /&gt;
&lt;br /&gt;
Typically data will be recalled to /scratch when it is needed for analysis. Job dependencies can be constructed so that analysis jobs wait in the queue for data recalls before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the archive recalling job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency (lookup [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_data_recall data-recall.sh samples]):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-recall.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-recalled-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== '''HTAR''' ==&lt;br /&gt;
''' Please aggregate small files (&amp;lt;~200MB) into tarballs or htar files. '''&lt;br /&gt;
&lt;br /&gt;
HTAR is a utility that is used for aggregating a set of files and directories, by using a sophisticated multithreaded buffering scheme to write files directly from GPFS into HPSS, creating an archive file that conforms to the POSIX TAR specification, thereby achieving a high rate of performance. HTAR does not do gzip compression, however it already has a built-in checksum algorithm.&lt;br /&gt;
&lt;br /&gt;
'''Caution'''&lt;br /&gt;
* Files larger than 68 GB cannot be stored in an HTAR archive. If you attempt to start a transfer with any files larger than 68GB the whole HTAR session will fail, and you'll get a notification listing all those files, so that you can transfer them with HSI. &lt;br /&gt;
* Files with pathnames too long will be skipped (greater than 100 characters), so as to conform with TAR protocol [[(POSIX 1003.1 USTAR)]]&lt;br /&gt;
* Check the HTAR exit code and log file before removing any files from the GPFS active filesystems.&lt;br /&gt;
&lt;br /&gt;
=== HTAR Usage ===&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, and preserve mask attributes (-p), enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cpf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cpf /archive/$(id -gn)/$(whoami)/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cpf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the archive file called ''proj1.tar'' in HPSS into the ''project1/src'' directory in GPFS, and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xpmf proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;br /&gt;
&lt;br /&gt;
 &lt;br /&gt;
==== Sample tarball create ====&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N htar_create_tarball_in_hpss&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -m e&lt;br /&gt;
 &lt;br /&gt;
trap &amp;quot;echo 'Job script not completed';exit 129&amp;quot; TERM INT&lt;br /&gt;
# Note that your initial directory in HPSS will be /archive/$(id -gn)/$(whoami)/&lt;br /&gt;
&lt;br /&gt;
cd /scratch/$(whoami)/workarea/ &lt;br /&gt;
htar -cpf /archive/$(id -gn)/$(whoami)/finished-job1.tar finished-job1/ &lt;br /&gt;
status=$?&lt;br /&gt;
 &lt;br /&gt;
trap - TERM INT&lt;br /&gt;
 &lt;br /&gt;
if [ ! $status == 0 ]; then&lt;br /&gt;
   echo 'HTAR returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Note:''' If you attempt to start a transfer with any files larger than 68GB the whole HTAR session will fail, and you'll get a notification listing all those files, so that you can transfer them with HSI. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
----------------------------------------&lt;br /&gt;
INFO: File too large for htar to handle: finished-job1/file1 (86567185745 bytes)&lt;br /&gt;
INFO: File too large for htar to handle: finished-job1/file2 (71857244579 bytes)&lt;br /&gt;
ERROR: 2 oversize member files found - please correct and retry&lt;br /&gt;
ERROR: [FATAL] error(s) generating filename list &lt;br /&gt;
HTAR: HTAR FAILED&lt;br /&gt;
###WARNING  htar returned non-zero exit status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
 &lt;br /&gt;
==== Sample tarball list ====&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N htar_list_tarball_in_hpss&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -m e&lt;br /&gt;
 &lt;br /&gt;
trap &amp;quot;echo 'Job script not completed';exit 129&amp;quot; TERM INT&lt;br /&gt;
# Note that your initial directory in HPSS will be /archive/$(id -gn)/$(whoami)/&lt;br /&gt;
 &lt;br /&gt;
htar -tvf /archive/$(id -gn)/$(whoami)/finished-job1.tar&lt;br /&gt;
status=$?&lt;br /&gt;
 &lt;br /&gt;
trap - TERM INT&lt;br /&gt;
 &lt;br /&gt;
if [ ! $status == 0 ]; then&lt;br /&gt;
   echo 'HTAR returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
 &lt;br /&gt;
==== Sample tarball extract ====&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N htar_extract_tarball_from_hpss&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -m e&lt;br /&gt;
 &lt;br /&gt;
trap &amp;quot;echo 'Job script not completed';exit 129&amp;quot; TERM INT&lt;br /&gt;
# Note that your initial directory in HPSS will be /archive/$(id -gn)/$(whoami)/&lt;br /&gt;
 &lt;br /&gt;
cd /scratch/$(whoami)/recalled-from-hpss&lt;br /&gt;
htar -xpmf /archive/$(id -gn)/$(whoami)/finished-job1.tar&lt;br /&gt;
status=$?&lt;br /&gt;
 &lt;br /&gt;
trap - TERM INT&lt;br /&gt;
 &lt;br /&gt;
if [ ! $status == 0 ]; then&lt;br /&gt;
   echo 'HTAR returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI may be the primary client with which some users will interact with HPSS. It provides an ftp-like interface for archiving and retrieving tarballs or [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_transferring_directories directory trees]. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents in HPSS. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the file does not already exist in HPSS&lt;br /&gt;
 cput [options] GPFSpath [: HPSSpath]&lt;br /&gt;
|-&lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to GPFS only if a local copy does not already exist. &lt;br /&gt;
 cget [options] [GPFSpath :] HPSSpath&lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of HPSS.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands to GPFS&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
*There are 3 distinctions about HSI that you should keep in mind, and that can generate a bit of confusion when you're first learning how to use it:&lt;br /&gt;
** HSI doesn't currently support renaming directories during transfers on-the-fly, therefore the syntax for cput/cget may not work as one would expect in some scenarios, requiring some workarounds.&lt;br /&gt;
** HSI has an operator &amp;quot;:&amp;quot; which separates the GPFSpath and HPSSpath, and must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
** The order for referring to files in HSI syntax is different from FTP. In HSI the general format is always the same, GPFS first, HPSS second, cput or cget:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     GPFSfile : HPSSfile&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
For example, when using HSI to store the tarball file from GPFS into HPSS, then recall it to GPFS, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    cput tarball-in-GPFS : tarball-in-HPSS&lt;br /&gt;
    cget tarball-recalled : tarball-in-HPSS&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
unlike with FTP, where the following syntax would be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put tarball-in-GPFS tarball-in-HPSS &lt;br /&gt;
    get tarball-in-HPSS tarball-recalled&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;quot;mkdir LargeFilesDir; cd LargeFilesDir; cput tarball-in-GPFS : tarball-in-HPSS&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* More complex sequences can be performed using an except such as this:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;&amp;lt;EOF&lt;br /&gt;
      mkdir LargeFilesDir&lt;br /&gt;
      cd LargeFilesDir&lt;br /&gt;
      cput tarball-in-GPFS : tarball-in-HPSS&lt;br /&gt;
      lcd /scratch/$(whoami)/LargeFilesDir2/&lt;br /&gt;
      cput -Rup *  &lt;br /&gt;
    end&lt;br /&gt;
    EOF&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent, but we recommend that you always use full path, and organize the contents of HPSS, where the default HSI directory placement is /archive/$(id -gn)/$(whoami)/:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi cput tarball&lt;br /&gt;
    hsi cput tarball : tarball&lt;br /&gt;
    hsi cput /scratch/$(whoami)/tarball : /archive/$(id -gn)/$(whoami)/tarball&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* There are no known issues renaming files on-the-fly:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi cput /scratch/$(whoami)/tarball1 : /archive/$(id -gn)/$(whoami)/tarball2&lt;br /&gt;
    hsi cget /scratch/$(whoami)/tarball3 : /archive/$(id -gn)/$(whoami)/tarball2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* There are no issues transferring directory trees all their contents recursively (as in rsync), provided that you keep the same directory name on GPFS and HPSS. You may use '-u' option to resume a previously disrupted session, and the '-p' to  preserve timestamp.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi cput -Rup /scratch/$(whoami)/LargeFilesDir : /archive/$(id -gn)/$(whoami)/LargeFilesDir&lt;br /&gt;
OR&lt;br /&gt;
   hsi cget -Rup /scratch/$(whoami)/LargeFilesDir : /archive/$(id -gn)/$(whoami)/LargeFilesDir&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* However the syntax forms below will fail.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi cput -Rup /scratch/$(whoami)/LargeFilesDir : /archive/$(id -gn)/$(whoami)/LargeFilesDir2    (FAILS)&lt;br /&gt;
OR&lt;br /&gt;
   hsi cput -Rup /scratch/$(whoami)/LargeFilesDir/* : /archive/$(id -gn)/$(whoami)/LargeFilesDir2  (FAILS)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
One workaround is the following 2-steps process:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi cput -Rup /scratch/$(whoami)/LargeFilesDir : /archive/$(id -gn)/$(whoami)/LargeFilesDir &lt;br /&gt;
   hsi mv /archive/$(id -gn)/$(whoami)/LargeFilesDir /archive/$(id -gn)/$(whoami)/LargeFilesDir2   &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Another workaround is do a &amp;quot;cd&amp;quot; in GPFS first:&lt;br /&gt;
    lcd GPFSpath, cget -R ...&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;&amp;lt;EOF&lt;br /&gt;
      lcd /scratch/$(whoami)/LargeFilesDir&lt;br /&gt;
      mkdir /archive/$(id -gn)/$(whoami)/LargeFilesDir2&lt;br /&gt;
      cd /archive/$(id -gn)/$(whoami)/LargeFilesDir2&lt;br /&gt;
      cput -Rup *  &lt;br /&gt;
    end&lt;br /&gt;
    EOF&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Documentation === &lt;br /&gt;
Complete documentation on HSI is available from the Gleicher Enterprises links below. You may peruse those links and come with alternative syntax forms. You may even be already familiar with HPSS/HSI from other HPC facilities, that may or not have procedures similar to ours. HSI doesn't always work as expected when you go outside of our recommended syntax, so '''we strongly urge that you use the sample scripts we are providing as the basis''' for your job submissions&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/ HSI Introduction]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help hsi help]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes] &lt;br /&gt;
'''Note:''' HSI returns the highest-numbered exit code, in case of multiple operations in the same hsi session. You may use '/scinet/gpc/bin/exit2msg $status' to translate those codes into intelligible messages&lt;br /&gt;
&lt;br /&gt;
=== Typical Usage Scripts===&lt;br /&gt;
The most common interactions will be ''putting'' data into HPSS, examining the contents (ls,ish), and ''getting'' data back onto GPFS for inspection or analysis.&lt;br /&gt;
&lt;br /&gt;
==== Sample '''data offload''' ====&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
trap &amp;quot;echo 'Job script not completed';exit 129&amp;quot; TERM INT&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF1&lt;br /&gt;
mkdir put-away&lt;br /&gt;
cd put-away&lt;br /&gt;
cput /scratch/$(whoami)/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
end&lt;br /&gt;
EOF1&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo 'HSI returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF2&lt;br /&gt;
mkdir put-away&lt;br /&gt;
cd put-away&lt;br /&gt;
cput /scratch/$(whoami)/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
end&lt;br /&gt;
EOF2&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo 'HSI returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
trap - TERM INT&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Note:''' as in the above example, we recommend that you capture the (highest-numbered) exit code for each hsi session independently. And remember, you may improve your exit code verbosity by adding the excerpt below to your scripts:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo 'HSI returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== Sample '''data list''' ====&lt;br /&gt;
A very trivial way to list the contents of HPSS would be to just submit the HSI 'ls' command.&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_ls&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cd put-away&lt;br /&gt;
ls -R&lt;br /&gt;
end&lt;br /&gt;
EOF&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
However, we provide a much more useful and convenient way to explore the contents of HPSS with the inventory shell [[ISH]]. This example creates an index of all the files in a user's portion of the namespace. The list is placed in the directory /home/$(whoami)/.ish_register that can be inspected from the gpc-devel nodes.&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_index&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
INDEX_DIR=$HOME/.ish_register&lt;br /&gt;
if ! [ -e &amp;quot;$INDEX_DIR&amp;quot; ]; then&lt;br /&gt;
  mkdir -p $INDEX_DIR&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
export ISHREGISTER=&amp;quot;$INDEX_DIR&amp;quot;&lt;br /&gt;
/scinet/gpc/bin/ish hindex&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This index can be browsed or searched with ISH on the development nodes.&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
gpc-f104n084-$ /scinet/gpc/bin/ish ~/.ish_register/hpss.igz &lt;br /&gt;
[ish]hpss.igz&amp;gt; help&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
ISH is a powerful tool that is also useful for creating and browsing indices of tar and htar archives, so please look at the [[ISH|documentation]] or built in help.&lt;br /&gt;
&lt;br /&gt;
==== Sample '''data recall''' ====&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-recall.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N recall_files&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
trap &amp;quot;echo 'Job script not completed';exit 129&amp;quot; TERM INT&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$(whoami)/recalled-from-hpss&lt;br /&gt;
&lt;br /&gt;
# individual tarballs previously organized in HPSS inside the put-away-on-2010/ folder&lt;br /&gt;
hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
cget /scratch/$(whoami)/recalled-from-hpss/Jan-2010-jobs.tar.gz : /archive/$(id -gn)/$(whoami)/put-away-on-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
cget /scratch/$(whoami)/recalled-from-hpss/Feb-2010-jobs.tar.gz : /archive/$(id -gn)/$(whoami)/put-away-on-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
end&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
&lt;br /&gt;
trap - TERM INT&lt;br /&gt;
&lt;br /&gt;
if [ ! $status == 0 ]; then&lt;br /&gt;
   echo 'HSI returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We should emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do optimization, as in the following example:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-recall.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N recall_files_optimized&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
trap &amp;quot;echo 'Job script not completed';exit 129&amp;quot; TERM INT&lt;br /&gt;
mkdir -p /scratch/$(whoami)/recalled-from-hpss&lt;br /&gt;
&lt;br /&gt;
# individual tarballs previously organized in HPSS inside the put-away-on-2010/ folder&lt;br /&gt;
hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
lcd /scratch/$(whoami)/recalled-from-hpss/&lt;br /&gt;
cd /archive/$(id -gn)/$(whoami)/put-away-on-2010/&lt;br /&gt;
cget Jan-2010-jobs.tar.gz Feb-2010-jobs.tar.gz&lt;br /&gt;
end&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
&lt;br /&gt;
trap - TERM INT&lt;br /&gt;
 &lt;br /&gt;
if [ ! $status == 0 ]; then&lt;br /&gt;
   echo 'HSI returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== Sample '''transferring directories''' ====&lt;br /&gt;
Remember, it's not possible to rename directories on-the-fly:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi cget -Rup /scratch/$(whoami)/LargeFiles-recalled : /archive/$(id -gn)/$(whoami)/LargeFiles    (FAILS)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The workaround is:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-recall.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N recall_directories&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
trap &amp;quot;echo 'Job script not completed';exit 129&amp;quot; TERM INT&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$(whoami)/LargeFiles-recalled&lt;br /&gt;
&lt;br /&gt;
hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
lcd /scratch/$(whoami)/LargeFiles-recalled&lt;br /&gt;
cd /archive/$(id -gn)/$(whoami)/LargeFiles&lt;br /&gt;
cget -Rup *&lt;br /&gt;
end&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
 &lt;br /&gt;
trap - TERM INT&lt;br /&gt;
 &lt;br /&gt;
if [ ! $status == 0 ]; then&lt;br /&gt;
   echo 'HSI returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
 &lt;br /&gt;
==== Sample '''verify checksum''' ====&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N checksum_verified_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
thefile=&amp;lt;GPFSpath&amp;gt;&lt;br /&gt;
storedfile=&amp;lt;HPSSpath&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# Generate checksum on fly using a named pipe so that file is only read from GPFS once&lt;br /&gt;
mkfifo /tmp/NPIPE&lt;br /&gt;
cat $thefile  | tee /tmp/NPIPE | hsi -q put - : $storedfile &amp;amp;&lt;br /&gt;
pid=$!&lt;br /&gt;
md5sum /tmp/NPIPE |tee /tmp/$fname.md5&lt;br /&gt;
rm -f  /tmp/NPIPE&lt;br /&gt;
&lt;br /&gt;
# Check the exit code of the HSI process  &lt;br /&gt;
wait $pid&lt;br /&gt;
status=$?&lt;br /&gt;
&lt;br /&gt;
if [ ! $status == 0 ]; then&lt;br /&gt;
   echo 'HSI returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# change filename to stdin in checksum file&lt;br /&gt;
sed -i.1 &amp;quot;s+/tmp/NPIPE+-+&amp;quot; /tmp/$fname.md5&lt;br /&gt;
&lt;br /&gt;
# verify checksum&lt;br /&gt;
hsi -q get - : $storedfile  | md5sum -c  /tmp/$fname.md5&lt;br /&gt;
status=$?&lt;br /&gt;
 &lt;br /&gt;
if [ ! $status == 0 ]; then&lt;br /&gt;
   echo 'HSI returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help''']&lt;br /&gt;
 &lt;br /&gt;
== '''[[ISH|ISH]]''' ==&lt;br /&gt;
=== [[ISH|Documentation and Usage]] ===&lt;br /&gt;
 &lt;br /&gt;
 &lt;br /&gt;
[[Data Management|BACK TO Data Management]]&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3540</id>
		<title>HPSS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3540"/>
		<updated>2011-06-29T17:30:05Z</updated>

		<summary type="html">&lt;p&gt;Cneale: removed the section on interactive HSI&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''High Performance Storage System''' =&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The High Performance Storage System ([http://www.hpss-collaboration.org/index.shtml HPSS]) is a tape-backed hierarchical storage system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data can be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
Access and transfer of data into and out of HPSS is done under the control of the user, whose interaction is expected to be scripted and submitted as a batch job, using one or more of the following utilities:&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi HSI] is a client with an ftp-like functionality which can be used to archive and retrieve large files. It is also useful for browsing the contents of HPSS.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/htar HTAR] is a utility that creates tar formatted archives directly into HPSS. It also creates a separate index file (.idx) that can be accessed and browsed quickly.&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/ISH ISH] is a TUI utility that can perform an inventory of the files and directories in your tarballs.&lt;br /&gt;
&lt;br /&gt;
== '''Why should I use and trust HPSS?''' ==&lt;br /&gt;
* 10+ years history, used by 50+ facilities in the “Top 500” HPC list&lt;br /&gt;
* very reliable, data redundancy and data insurance built-in.&lt;br /&gt;
* highly scalable, reasonable performance at SciNet - Ingest: ~12 TB/day, Recall: ~24 TB/day (aggregated)&lt;br /&gt;
* HSI/HTAR clients also very reliable and used on several HPSS sites. ISH was written at SciNet.&lt;br /&gt;
* [[Media:HPSS_rational.pdf|HPSS fits well with the Storage Capacity Expansion Plan at SciNet]] (pdf presentation)&lt;br /&gt;
&lt;br /&gt;
== '''Guidelines''' ==&lt;br /&gt;
* Expanded storage capacity is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into tarballs with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the repository is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application's exit code and the returned log file for errors after all data transfers and any tarball creation process.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Access Through the Queue System'''  ==&lt;br /&gt;
All access to the archive system is done through the [[Moab|GPC queue system]].&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the HPSS should be scripted into jobs and submitted to the ''archive'' queue. See HSI example below.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_put_file_in_hpss&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The status of pending jobs can be monitored with showq specifying the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Recalling Data for Analysis ===&lt;br /&gt;
&lt;br /&gt;
Typically, data will be recalled to the /scratch file system when it is needed for analysis. Job dependencies can be used to have analysis jobs wait in the queue for data recalls before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-recall.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-recalled-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with HPSS. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents in HPSS. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the file does not already exist in HPSS&lt;br /&gt;
|-&lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local storage only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of HPSS.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
* Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* More complex sequences can be performed using a script such as this:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
Complete documentation of HSI is available on the Gleicher Enterprises web site.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/ HSI Introduction]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help hsi help]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
Other sites that have nicely arranges HSI documentation:&lt;br /&gt;
* https://computing.llnl.gov/LCdocs/hsi/&lt;br /&gt;
&lt;br /&gt;
=== Typical Usage ===&lt;br /&gt;
The most common interactions will be ''putting'' data into HPSS, examining the contents (ls), and ''getting'' data back onto one of the active filesystems for inspection or analysis.&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
# Note that upon executing hsi, your initial directory will be: /archive/$(groups)/$(whoami)/&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away&lt;br /&gt;
cd put-away&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
A convenient way to explore the contents of HPSS is with the inventory shell [[ISH]]. This example creates an index of all the files in a user's portion of the namespace. The list is placed in the file /home/$USER/HPSSdm/hsi.igz that can be inspected from the gpc-devel nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_index&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
TODAY=$(date +%Y%m%d)&lt;br /&gt;
INDEX_DIR=/home/$USER/HPSSdm&lt;br /&gt;
if [[ -! -e $INDEX_DIR ]];then&lt;br /&gt;
  mkdir $INDEX_DIR&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
export ISHREGISTER=$HOME/HPSSdm&lt;br /&gt;
ish hindex&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This index can be browsed or searched with ISH on the development nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc-f104n084-$ ish ~/HPSSdm/hsi.igz &lt;br /&gt;
[ish]hsi.igz&amp;gt; help&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
ISH is a powerful tool that is also useful for creating and browsing indices of tar and htar archives, so please look at the [[ISH|documentation]] or built in help.&lt;br /&gt;
&lt;br /&gt;
* sample '''data recall'''&lt;br /&gt;
   - This example should be modified to emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do optimization.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-recall.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N recall&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/recalled-from-hpss&lt;br /&gt;
hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
cget /scratch/$USER/recalled-from-hpss/Jan-2010-jobs.tar.gz : put-away-on-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
cget /scratch/$USER/recalled-from-hpss/Feb-2010-jobs.tar.gz : put-away-on-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
&lt;br /&gt;
exit $status&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI vs. FTP ===&lt;br /&gt;
HSI syntax and usage is very similar to that of FTP. Please note the following information adapted from the HSI man page:&lt;br /&gt;
&lt;br /&gt;
HSI supports several of the commonly used FTP commands, including &amp;quot;dir&amp;quot;,&amp;quot;get&amp;quot;,&amp;quot;ls&amp;quot;,&amp;quot;mdelete&amp;quot;,&amp;quot;mget&amp;quot;,&amp;quot;put&amp;quot;,&amp;quot;mput&amp;quot; and &amp;quot;prompt&amp;quot;, with the following differences:&lt;br /&gt;
&lt;br /&gt;
* The &amp;quot;dir&amp;quot; command is an alias for &amp;quot;ls&amp;quot; in HSI. The &amp;quot;ls&amp;quot; command supports an extensive set of options for displaying files, including wildcard pattern-matching, and the ability to recursively list a directory tree&lt;br /&gt;
* The &amp;quot;put&amp;quot; and &amp;quot;get&amp;quot; family of commands support recursion&lt;br /&gt;
* There are &amp;quot;conditional put&amp;quot; and &amp;quot;conditional&amp;quot; get commands (cput, cget)&lt;br /&gt;
* The syntax for renaming files during transfers with HSI is different from FTP. With HSI, the general format is always &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     &amp;quot;local_file : hpss_file&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and multiple such pairs may be specified on a single command line.&lt;br /&gt;
&lt;br /&gt;
For example, when using HSI to store the local file &amp;quot;file1&amp;quot; as &amp;quot;hpss_file1&amp;quot; into HPSS, then retrieve it back to the local filesystem as &amp;quot;file1.bak&amp;quot;, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 : hpss_file1&lt;br /&gt;
    get file1.bak : hpss_file1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* unlike with FTP, where the following syntax would be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 hpss_file1 &lt;br /&gt;
    get hpss_file1 file1.bak&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The &amp;quot;m&amp;quot; prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The &amp;quot;m&amp;quot; series of commands are intended to provide a measure of compatibility for FTP users.&lt;br /&gt;
&lt;br /&gt;
=== Other HSI Examples === &lt;br /&gt;
&lt;br /&gt;
* Creating tar archive of C source programs and header files on the fly by piping stdout:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HSI pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Retrieve the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Put a subdirectory ''LargeFiles'' and all its contents recursively. You may use '-u' option to resume a previously disrupted session.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   cput -R -u LargeFiles&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help''']&lt;br /&gt;
&lt;br /&gt;
* verify checksum&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N checksum_verified_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
thefile=&amp;lt;localpath&amp;gt;&lt;br /&gt;
storedfile=&amp;lt;hpsspath&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# Generate checksum on fly using a named pipe so that file is only read from GPFS once&lt;br /&gt;
mkfifo /tmp/NPIPE&lt;br /&gt;
cat $thefile  | tee /tmp/NPIPE | hsi -q put - : $storedfile &amp;amp;&lt;br /&gt;
pid=$!&lt;br /&gt;
md5sum /tmp/NPIPE |tee /tmp/$fname.md5&lt;br /&gt;
rm     /tmp/NPIPE&lt;br /&gt;
&lt;br /&gt;
# Check the exit code of the HSI process  &lt;br /&gt;
wait $pid&lt;br /&gt;
sc=$?&lt;br /&gt;
if [ $sc != 0 ];then&lt;br /&gt;
  echo &amp;quot;File transfer failed&amp;quot;&lt;br /&gt;
  exit $sc&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# change filename to stdin in checksum file&lt;br /&gt;
sed -i.1 &amp;quot;s+/tmp/NPIPE+-+&amp;quot; /tmp/$fname.md5&lt;br /&gt;
&lt;br /&gt;
# verify checksum&lt;br /&gt;
hsi -q get - : $storedfile  | md5sum -c  /tmp/$fname.md5&lt;br /&gt;
sc=$?&lt;br /&gt;
if [ $sc != 0 ]; then&lt;br /&gt;
  echo '!!! Job Failed !!!'&lt;br /&gt;
  echo 'error=' $sc&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Strange HSI nuances === &lt;br /&gt;
&lt;br /&gt;
* During interactive use, even though it appears that the keyboard up arrow will retrieve previous HSI commands, this does not work as expected and should be avoided.&lt;br /&gt;
* Tab completion does not work. Be careful with combinations of tab completion and the &amp;quot;*&amp;quot; character!&lt;br /&gt;
* echo $? does not work as expected from within HSI. Here is what happens:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;echo $?&lt;br /&gt;
echo turned on&lt;br /&gt;
&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;echo $?&lt;br /&gt;
echo turned off&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Thus, you must avoid the use of echo when checking the output of HSI commands. Note that $? is a bash variable, not an HSI variable, so this would not work in any event, but one might expect it to work, hence the warning.&lt;br /&gt;
&lt;br /&gt;
* A few more are listed here: https://computing.llnl.gov/LCdocs/hsi/index.jsp?show=s2.98&lt;br /&gt;
&lt;br /&gt;
== '''HTAR''' ==&lt;br /&gt;
''' Please aggregate small files (&amp;lt;~100MB) into tarballs or htar files. '''&lt;br /&gt;
&lt;br /&gt;
HTAR is a utility that is used for aggregating a set of files and directories, by using a sophisticated multithreaded buffering scheme to write files from the local filesystem directly into HPSS, creating an archive file that conforms to the POSIX TAR specification, thereby achieving a high rate of performance. &lt;br /&gt;
&lt;br /&gt;
=== '''CAUTION''' ===&lt;br /&gt;
* Files larger than 68 GB cannot be stored in an htar archive (you'll get an error message for the whole operation)&lt;br /&gt;
* HTAR archives cannot contain more than 1 million files.&lt;br /&gt;
* Check the HTAR exit code and log file before removing any files from the active filesystems.&lt;br /&gt;
&lt;br /&gt;
=== HTAR Usage ===&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3539</id>
		<title>HPSS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3539"/>
		<updated>2011-06-29T17:18:40Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* Strange HSI nuances */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''High Performance Storage System''' =&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The High Performance Storage System ([http://www.hpss-collaboration.org/index.shtml HPSS]) is a tape-backed hierarchical storage system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data can be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
Access and transfer of data into and out of HPSS is done under the control of the user, whose interaction is expected to be scripted and submitted as a batch job, using one or more of the following utilities:&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi HSI] is a client with an ftp-like functionality which can be used to archive and retrieve large files. It is also useful for browsing the contents of HPSS.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/htar HTAR] is a utility that creates tar formatted archives directly into HPSS. It also creates a separate index file (.idx) that can be accessed and browsed quickly.&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/ISH ISH] is a TUI utility that can perform an inventory of the files and directories in your tarballs.&lt;br /&gt;
&lt;br /&gt;
== '''Why should I use and trust HPSS?''' ==&lt;br /&gt;
* 10+ years history, used by 50+ facilities in the “Top 500” HPC list&lt;br /&gt;
* very reliable, data redundancy and data insurance built-in.&lt;br /&gt;
* highly scalable, reasonable performance at SciNet - Ingest: ~12 TB/day, Recall: ~24 TB/day (aggregated)&lt;br /&gt;
* HSI/HTAR clients also very reliable and used on several HPSS sites. ISH was written at SciNet.&lt;br /&gt;
* [[Media:HPSS_rational.pdf|HPSS fits well with the Storage Capacity Expansion Plan at SciNet]] (pdf presentation)&lt;br /&gt;
&lt;br /&gt;
== '''Guidelines''' ==&lt;br /&gt;
* Expanded storage capacity is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into tarballs with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the repository is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application's exit code and the returned log file for errors after all data transfers and any tarball creation process.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Access Through the Queue System'''  ==&lt;br /&gt;
All access to the archive system is done through the [[Moab|GPC queue system]].&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the HPSS should be scripted into jobs and submitted to the ''archive'' queue. See HSI example below.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_put_file_in_hpss&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The status of pending jobs can be monitored with showq specifying the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Recalling Data for Analysis ===&lt;br /&gt;
&lt;br /&gt;
Typically, data will be recalled to the /scratch file system when it is needed for analysis. Job dependencies can be used to have analysis jobs wait in the queue for data recalls before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-recall.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-recalled-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with HPSS. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents in HPSS. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the file does not already exist in HPSS&lt;br /&gt;
|-&lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local storage only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of HPSS.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
* Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* More complex sequences can be performed using a script such as this:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
Complete documentation of HSI is available on the Gleicher Enterprises web site.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/ HSI Introduction]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help hsi help]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
Other sites that have nicely arranges HSI documentation:&lt;br /&gt;
* https://computing.llnl.gov/LCdocs/hsi/&lt;br /&gt;
&lt;br /&gt;
=== Your First Time Using HSI ===&lt;br /&gt;
Once you are comfortable with HSI, you will script the process and submit it non-interactively to the queue. On your first tests, however, we suggest that you use an interactive queue to get familiar with the system. In the remainder of this subsection, you will find an example of an interactive test that you can run to become familiar with using HSI.&lt;br /&gt;
&lt;br /&gt;
==== Setup ====&lt;br /&gt;
&lt;br /&gt;
To begin the process, we create a directory structure to use for the tests:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mkdir BASE&lt;br /&gt;
echo 1 &amp;gt; BASE/file.1&lt;br /&gt;
echo 2 &amp;gt; BASE/file.2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Accessing HSI ====&lt;br /&gt;
&lt;br /&gt;
Now we obtain an interactive job on the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -l nodes=1:ppn=8,walltime=2:00:00 -q archive -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now we start HSI:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/usr/local/bin/hsi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And that provides us with the following prompt, where your initial directory will be: /archive/$(groups)/$(whoami)/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that the current directory on HSI (/archive/group/user) is empty by typing:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Also note that the current directory on the disk contains the directory structure that we created at the beginning:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
lls&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Where the output should contain the &amp;quot;BASE&amp;quot; directory.&lt;br /&gt;
&lt;br /&gt;
==== Offload ====&lt;br /&gt;
&lt;br /&gt;
Now we offload the BASE directory and all of its contents to the HPSS system. Running interactively, it is easy to learn that the -R flag is necessary to offload a directory. We thus use:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cput -R BASE&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The output appears as:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cput  'BASE/file.1' : 'file.1' ( 2 bytes, 0.2 KBS (cos=1300))&lt;br /&gt;
cput  'BASE/file.2' : 'file.2' ( 2 bytes, 0.5 KBS (cos=1300))&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And now when we look at the files on the HPSS&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls BASE&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
The output appears as:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/archive/group/user/BASE:&lt;br /&gt;
file.1  file.2  &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Recall ====&lt;br /&gt;
&lt;br /&gt;
We now recall the data from HPSS back to the GPFS disk. Once this entire simple test is complete, we suggest that you run through this test again using a real directory of your data. Thus, we do not delete the original directory on disk at this point. Instead, we create a new directory and recall the data from HPSS to this new directory where it can be checked for congruency to the original data if desired.&lt;br /&gt;
&lt;br /&gt;
First, we need to create a new directory on the disk. We will do this from within HSI, but you could also exit HSI (using the exit command or control-c) to make the directory changes and then run HSI again. Thus, continuing from above:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
lmkdir RECALL&lt;br /&gt;
lls&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
now lists both the BASE and RECALL directories on disk.&lt;br /&gt;
&lt;br /&gt;
We now recall the data from HPSS to GPFS:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
lcd RECALL&lt;br /&gt;
cget -R BASE&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And the output of the cget command is:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cget  '/project/.../TEST/RECALL/BASE/file.1' : '/archive/.../BASE/file.1' (2011/06/29 12:14:21 2 bytes, 4.0 KBS )&lt;br /&gt;
cget  '/project/.../TEST/RECALL/BASE/file.2' : '/archive/.../file.2' (2011/06/29 12:14:22 2 bytes, 3.7 KBS )&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We now exit HSI (using the exit command or control-c) and verify the existence of the directory that was brought back to GPFS.&lt;br /&gt;
&lt;br /&gt;
=== How does this procedure change when I start doing this non-interactively? ===&lt;br /&gt;
&lt;br /&gt;
* A major change will be the detection of errors. The bash $? variable returns the exit code of the last command. Hence, $? should be captured after every execution of HSI, as outlined at other places on this page.&lt;br /&gt;
&lt;br /&gt;
=== Typical Usage ===&lt;br /&gt;
The most common interactions will be ''putting'' data into HPSS, examining the contents (ls), and ''getting'' data back onto one of the active filesystems for inspection or analysis.&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
# Note that upon executing hsi, your initial directory will be: /archive/$(groups)/$(whoami)/&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away&lt;br /&gt;
cd put-away&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
A convenient way to explore the contents of HPSS is with the inventory shell [[ISH]]. This example creates an index of all the files in a user's portion of the namespace. The list is placed in the file /home/$USER/HPSSdm/hsi.igz that can be inspected from the gpc-devel nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_index&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
TODAY=$(date +%Y%m%d)&lt;br /&gt;
INDEX_DIR=/home/$USER/HPSSdm&lt;br /&gt;
if [[ -! -e $INDEX_DIR ]];then&lt;br /&gt;
  mkdir $INDEX_DIR&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
export ISHREGISTER=$HOME/HPSSdm&lt;br /&gt;
ish hindex&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This index can be browsed or searched with ISH on the development nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc-f104n084-$ ish ~/HPSSdm/hsi.igz &lt;br /&gt;
[ish]hsi.igz&amp;gt; help&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
ISH is a powerful tool that is also useful for creating and browsing indices of tar and htar archives, so please look at the [[ISH|documentation]] or built in help.&lt;br /&gt;
&lt;br /&gt;
* sample '''data recall'''&lt;br /&gt;
   - This example should be modified to emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do optimization.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-recall.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N recall&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/recalled-from-hpss&lt;br /&gt;
hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
cget /scratch/$USER/recalled-from-hpss/Jan-2010-jobs.tar.gz : put-away-on-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
cget /scratch/$USER/recalled-from-hpss/Feb-2010-jobs.tar.gz : put-away-on-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
&lt;br /&gt;
exit $status&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI vs. FTP ===&lt;br /&gt;
HSI syntax and usage is very similar to that of FTP. Please note the following information adapted from the HSI man page:&lt;br /&gt;
&lt;br /&gt;
HSI supports several of the commonly used FTP commands, including &amp;quot;dir&amp;quot;,&amp;quot;get&amp;quot;,&amp;quot;ls&amp;quot;,&amp;quot;mdelete&amp;quot;,&amp;quot;mget&amp;quot;,&amp;quot;put&amp;quot;,&amp;quot;mput&amp;quot; and &amp;quot;prompt&amp;quot;, with the following differences:&lt;br /&gt;
&lt;br /&gt;
* The &amp;quot;dir&amp;quot; command is an alias for &amp;quot;ls&amp;quot; in HSI. The &amp;quot;ls&amp;quot; command supports an extensive set of options for displaying files, including wildcard pattern-matching, and the ability to recursively list a directory tree&lt;br /&gt;
* The &amp;quot;put&amp;quot; and &amp;quot;get&amp;quot; family of commands support recursion&lt;br /&gt;
* There are &amp;quot;conditional put&amp;quot; and &amp;quot;conditional&amp;quot; get commands (cput, cget)&lt;br /&gt;
* The syntax for renaming files during transfers with HSI is different from FTP. With HSI, the general format is always &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     &amp;quot;local_file : hpss_file&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and multiple such pairs may be specified on a single command line.&lt;br /&gt;
&lt;br /&gt;
For example, when using HSI to store the local file &amp;quot;file1&amp;quot; as &amp;quot;hpss_file1&amp;quot; into HPSS, then retrieve it back to the local filesystem as &amp;quot;file1.bak&amp;quot;, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 : hpss_file1&lt;br /&gt;
    get file1.bak : hpss_file1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* unlike with FTP, where the following syntax would be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 hpss_file1 &lt;br /&gt;
    get hpss_file1 file1.bak&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The &amp;quot;m&amp;quot; prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The &amp;quot;m&amp;quot; series of commands are intended to provide a measure of compatibility for FTP users.&lt;br /&gt;
&lt;br /&gt;
=== Other HSI Examples === &lt;br /&gt;
&lt;br /&gt;
* Creating tar archive of C source programs and header files on the fly by piping stdout:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HSI pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Retrieve the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Put a subdirectory ''LargeFiles'' and all its contents recursively. You may use '-u' option to resume a previously disrupted session.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   cput -R -u LargeFiles&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help''']&lt;br /&gt;
&lt;br /&gt;
* verify checksum&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N checksum_verified_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
thefile=&amp;lt;localpath&amp;gt;&lt;br /&gt;
storedfile=&amp;lt;hpsspath&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# Generate checksum on fly using a named pipe so that file is only read from GPFS once&lt;br /&gt;
mkfifo /tmp/NPIPE&lt;br /&gt;
cat $thefile  | tee /tmp/NPIPE | hsi -q put - : $storedfile &amp;amp;&lt;br /&gt;
pid=$!&lt;br /&gt;
md5sum /tmp/NPIPE |tee /tmp/$fname.md5&lt;br /&gt;
rm     /tmp/NPIPE&lt;br /&gt;
&lt;br /&gt;
# Check the exit code of the HSI process  &lt;br /&gt;
wait $pid&lt;br /&gt;
sc=$?&lt;br /&gt;
if [ $sc != 0 ];then&lt;br /&gt;
  echo &amp;quot;File transfer failed&amp;quot;&lt;br /&gt;
  exit $sc&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# change filename to stdin in checksum file&lt;br /&gt;
sed -i.1 &amp;quot;s+/tmp/NPIPE+-+&amp;quot; /tmp/$fname.md5&lt;br /&gt;
&lt;br /&gt;
# verify checksum&lt;br /&gt;
hsi -q get - : $storedfile  | md5sum -c  /tmp/$fname.md5&lt;br /&gt;
sc=$?&lt;br /&gt;
if [ $sc != 0 ]; then&lt;br /&gt;
  echo '!!! Job Failed !!!'&lt;br /&gt;
  echo 'error=' $sc&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Strange HSI nuances === &lt;br /&gt;
&lt;br /&gt;
* During interactive use, even though it appears that the keyboard up arrow will retrieve previous HSI commands, this does not work as expected and should be avoided.&lt;br /&gt;
* Tab completion does not work. Be careful with combinations of tab completion and the &amp;quot;*&amp;quot; character!&lt;br /&gt;
* echo $? does not work as expected from within HSI. Here is what happens:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;echo $?&lt;br /&gt;
echo turned on&lt;br /&gt;
&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;echo $?&lt;br /&gt;
echo turned off&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Thus, you must avoid the use of echo when checking the output of HSI commands. Note that $? is a bash variable, not an HSI variable, so this would not work in any event, but one might expect it to work, hence the warning.&lt;br /&gt;
&lt;br /&gt;
* A few more are listed here: https://computing.llnl.gov/LCdocs/hsi/index.jsp?show=s2.98&lt;br /&gt;
&lt;br /&gt;
== '''HTAR''' ==&lt;br /&gt;
''' Please aggregate small files (&amp;lt;~100MB) into tarballs or htar files. '''&lt;br /&gt;
&lt;br /&gt;
HTAR is a utility that is used for aggregating a set of files and directories, by using a sophisticated multithreaded buffering scheme to write files from the local filesystem directly into HPSS, creating an archive file that conforms to the POSIX TAR specification, thereby achieving a high rate of performance. &lt;br /&gt;
&lt;br /&gt;
=== '''CAUTION''' ===&lt;br /&gt;
* Files larger than 68 GB cannot be stored in an htar archive (you'll get an error message for the whole operation)&lt;br /&gt;
* HTAR archives cannot contain more than 1 million files.&lt;br /&gt;
* Check the HTAR exit code and log file before removing any files from the active filesystems.&lt;br /&gt;
&lt;br /&gt;
=== HTAR Usage ===&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3538</id>
		<title>HPSS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3538"/>
		<updated>2011-06-29T17:17:08Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* HSI Documentation */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''High Performance Storage System''' =&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The High Performance Storage System ([http://www.hpss-collaboration.org/index.shtml HPSS]) is a tape-backed hierarchical storage system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data can be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
Access and transfer of data into and out of HPSS is done under the control of the user, whose interaction is expected to be scripted and submitted as a batch job, using one or more of the following utilities:&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi HSI] is a client with an ftp-like functionality which can be used to archive and retrieve large files. It is also useful for browsing the contents of HPSS.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/htar HTAR] is a utility that creates tar formatted archives directly into HPSS. It also creates a separate index file (.idx) that can be accessed and browsed quickly.&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/ISH ISH] is a TUI utility that can perform an inventory of the files and directories in your tarballs.&lt;br /&gt;
&lt;br /&gt;
== '''Why should I use and trust HPSS?''' ==&lt;br /&gt;
* 10+ years history, used by 50+ facilities in the “Top 500” HPC list&lt;br /&gt;
* very reliable, data redundancy and data insurance built-in.&lt;br /&gt;
* highly scalable, reasonable performance at SciNet - Ingest: ~12 TB/day, Recall: ~24 TB/day (aggregated)&lt;br /&gt;
* HSI/HTAR clients also very reliable and used on several HPSS sites. ISH was written at SciNet.&lt;br /&gt;
* [[Media:HPSS_rational.pdf|HPSS fits well with the Storage Capacity Expansion Plan at SciNet]] (pdf presentation)&lt;br /&gt;
&lt;br /&gt;
== '''Guidelines''' ==&lt;br /&gt;
* Expanded storage capacity is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into tarballs with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the repository is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application's exit code and the returned log file for errors after all data transfers and any tarball creation process.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Access Through the Queue System'''  ==&lt;br /&gt;
All access to the archive system is done through the [[Moab|GPC queue system]].&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the HPSS should be scripted into jobs and submitted to the ''archive'' queue. See HSI example below.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_put_file_in_hpss&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The status of pending jobs can be monitored with showq specifying the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Recalling Data for Analysis ===&lt;br /&gt;
&lt;br /&gt;
Typically, data will be recalled to the /scratch file system when it is needed for analysis. Job dependencies can be used to have analysis jobs wait in the queue for data recalls before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-recall.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-recalled-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with HPSS. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents in HPSS. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the file does not already exist in HPSS&lt;br /&gt;
|-&lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local storage only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of HPSS.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
* Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* More complex sequences can be performed using a script such as this:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
Complete documentation of HSI is available on the Gleicher Enterprises web site.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/ HSI Introduction]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help hsi help]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
Other sites that have nicely arranges HSI documentation:&lt;br /&gt;
* https://computing.llnl.gov/LCdocs/hsi/&lt;br /&gt;
&lt;br /&gt;
=== Your First Time Using HSI ===&lt;br /&gt;
Once you are comfortable with HSI, you will script the process and submit it non-interactively to the queue. On your first tests, however, we suggest that you use an interactive queue to get familiar with the system. In the remainder of this subsection, you will find an example of an interactive test that you can run to become familiar with using HSI.&lt;br /&gt;
&lt;br /&gt;
==== Setup ====&lt;br /&gt;
&lt;br /&gt;
To begin the process, we create a directory structure to use for the tests:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mkdir BASE&lt;br /&gt;
echo 1 &amp;gt; BASE/file.1&lt;br /&gt;
echo 2 &amp;gt; BASE/file.2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Accessing HSI ====&lt;br /&gt;
&lt;br /&gt;
Now we obtain an interactive job on the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -l nodes=1:ppn=8,walltime=2:00:00 -q archive -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now we start HSI:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/usr/local/bin/hsi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And that provides us with the following prompt, where your initial directory will be: /archive/$(groups)/$(whoami)/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that the current directory on HSI (/archive/group/user) is empty by typing:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Also note that the current directory on the disk contains the directory structure that we created at the beginning:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
lls&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Where the output should contain the &amp;quot;BASE&amp;quot; directory.&lt;br /&gt;
&lt;br /&gt;
==== Offload ====&lt;br /&gt;
&lt;br /&gt;
Now we offload the BASE directory and all of its contents to the HPSS system. Running interactively, it is easy to learn that the -R flag is necessary to offload a directory. We thus use:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cput -R BASE&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The output appears as:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cput  'BASE/file.1' : 'file.1' ( 2 bytes, 0.2 KBS (cos=1300))&lt;br /&gt;
cput  'BASE/file.2' : 'file.2' ( 2 bytes, 0.5 KBS (cos=1300))&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And now when we look at the files on the HPSS&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls BASE&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
The output appears as:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/archive/group/user/BASE:&lt;br /&gt;
file.1  file.2  &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Recall ====&lt;br /&gt;
&lt;br /&gt;
We now recall the data from HPSS back to the GPFS disk. Once this entire simple test is complete, we suggest that you run through this test again using a real directory of your data. Thus, we do not delete the original directory on disk at this point. Instead, we create a new directory and recall the data from HPSS to this new directory where it can be checked for congruency to the original data if desired.&lt;br /&gt;
&lt;br /&gt;
First, we need to create a new directory on the disk. We will do this from within HSI, but you could also exit HSI (using the exit command or control-c) to make the directory changes and then run HSI again. Thus, continuing from above:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
lmkdir RECALL&lt;br /&gt;
lls&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
now lists both the BASE and RECALL directories on disk.&lt;br /&gt;
&lt;br /&gt;
We now recall the data from HPSS to GPFS:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
lcd RECALL&lt;br /&gt;
cget -R BASE&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And the output of the cget command is:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cget  '/project/.../TEST/RECALL/BASE/file.1' : '/archive/.../BASE/file.1' (2011/06/29 12:14:21 2 bytes, 4.0 KBS )&lt;br /&gt;
cget  '/project/.../TEST/RECALL/BASE/file.2' : '/archive/.../file.2' (2011/06/29 12:14:22 2 bytes, 3.7 KBS )&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We now exit HSI (using the exit command or control-c) and verify the existence of the directory that was brought back to GPFS.&lt;br /&gt;
&lt;br /&gt;
=== How does this procedure change when I start doing this non-interactively? ===&lt;br /&gt;
&lt;br /&gt;
* A major change will be the detection of errors. The bash $? variable returns the exit code of the last command. Hence, $? should be captured after every execution of HSI, as outlined at other places on this page.&lt;br /&gt;
&lt;br /&gt;
=== Typical Usage ===&lt;br /&gt;
The most common interactions will be ''putting'' data into HPSS, examining the contents (ls), and ''getting'' data back onto one of the active filesystems for inspection or analysis.&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
# Note that upon executing hsi, your initial directory will be: /archive/$(groups)/$(whoami)/&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away&lt;br /&gt;
cd put-away&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
A convenient way to explore the contents of HPSS is with the inventory shell [[ISH]]. This example creates an index of all the files in a user's portion of the namespace. The list is placed in the file /home/$USER/HPSSdm/hsi.igz that can be inspected from the gpc-devel nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_index&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
TODAY=$(date +%Y%m%d)&lt;br /&gt;
INDEX_DIR=/home/$USER/HPSSdm&lt;br /&gt;
if [[ -! -e $INDEX_DIR ]];then&lt;br /&gt;
  mkdir $INDEX_DIR&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
export ISHREGISTER=$HOME/HPSSdm&lt;br /&gt;
ish hindex&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This index can be browsed or searched with ISH on the development nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc-f104n084-$ ish ~/HPSSdm/hsi.igz &lt;br /&gt;
[ish]hsi.igz&amp;gt; help&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
ISH is a powerful tool that is also useful for creating and browsing indices of tar and htar archives, so please look at the [[ISH|documentation]] or built in help.&lt;br /&gt;
&lt;br /&gt;
* sample '''data recall'''&lt;br /&gt;
   - This example should be modified to emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do optimization.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-recall.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N recall&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/recalled-from-hpss&lt;br /&gt;
hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
cget /scratch/$USER/recalled-from-hpss/Jan-2010-jobs.tar.gz : put-away-on-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
cget /scratch/$USER/recalled-from-hpss/Feb-2010-jobs.tar.gz : put-away-on-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
&lt;br /&gt;
exit $status&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI vs. FTP ===&lt;br /&gt;
HSI syntax and usage is very similar to that of FTP. Please note the following information adapted from the HSI man page:&lt;br /&gt;
&lt;br /&gt;
HSI supports several of the commonly used FTP commands, including &amp;quot;dir&amp;quot;,&amp;quot;get&amp;quot;,&amp;quot;ls&amp;quot;,&amp;quot;mdelete&amp;quot;,&amp;quot;mget&amp;quot;,&amp;quot;put&amp;quot;,&amp;quot;mput&amp;quot; and &amp;quot;prompt&amp;quot;, with the following differences:&lt;br /&gt;
&lt;br /&gt;
* The &amp;quot;dir&amp;quot; command is an alias for &amp;quot;ls&amp;quot; in HSI. The &amp;quot;ls&amp;quot; command supports an extensive set of options for displaying files, including wildcard pattern-matching, and the ability to recursively list a directory tree&lt;br /&gt;
* The &amp;quot;put&amp;quot; and &amp;quot;get&amp;quot; family of commands support recursion&lt;br /&gt;
* There are &amp;quot;conditional put&amp;quot; and &amp;quot;conditional&amp;quot; get commands (cput, cget)&lt;br /&gt;
* The syntax for renaming files during transfers with HSI is different from FTP. With HSI, the general format is always &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     &amp;quot;local_file : hpss_file&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and multiple such pairs may be specified on a single command line.&lt;br /&gt;
&lt;br /&gt;
For example, when using HSI to store the local file &amp;quot;file1&amp;quot; as &amp;quot;hpss_file1&amp;quot; into HPSS, then retrieve it back to the local filesystem as &amp;quot;file1.bak&amp;quot;, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 : hpss_file1&lt;br /&gt;
    get file1.bak : hpss_file1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* unlike with FTP, where the following syntax would be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 hpss_file1 &lt;br /&gt;
    get hpss_file1 file1.bak&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The &amp;quot;m&amp;quot; prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The &amp;quot;m&amp;quot; series of commands are intended to provide a measure of compatibility for FTP users.&lt;br /&gt;
&lt;br /&gt;
=== Other HSI Examples === &lt;br /&gt;
&lt;br /&gt;
* Creating tar archive of C source programs and header files on the fly by piping stdout:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HSI pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Retrieve the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Put a subdirectory ''LargeFiles'' and all its contents recursively. You may use '-u' option to resume a previously disrupted session.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   cput -R -u LargeFiles&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help''']&lt;br /&gt;
&lt;br /&gt;
* verify checksum&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N checksum_verified_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
thefile=&amp;lt;localpath&amp;gt;&lt;br /&gt;
storedfile=&amp;lt;hpsspath&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# Generate checksum on fly using a named pipe so that file is only read from GPFS once&lt;br /&gt;
mkfifo /tmp/NPIPE&lt;br /&gt;
cat $thefile  | tee /tmp/NPIPE | hsi -q put - : $storedfile &amp;amp;&lt;br /&gt;
pid=$!&lt;br /&gt;
md5sum /tmp/NPIPE |tee /tmp/$fname.md5&lt;br /&gt;
rm     /tmp/NPIPE&lt;br /&gt;
&lt;br /&gt;
# Check the exit code of the HSI process  &lt;br /&gt;
wait $pid&lt;br /&gt;
sc=$?&lt;br /&gt;
if [ $sc != 0 ];then&lt;br /&gt;
  echo &amp;quot;File transfer failed&amp;quot;&lt;br /&gt;
  exit $sc&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# change filename to stdin in checksum file&lt;br /&gt;
sed -i.1 &amp;quot;s+/tmp/NPIPE+-+&amp;quot; /tmp/$fname.md5&lt;br /&gt;
&lt;br /&gt;
# verify checksum&lt;br /&gt;
hsi -q get - : $storedfile  | md5sum -c  /tmp/$fname.md5&lt;br /&gt;
sc=$?&lt;br /&gt;
if [ $sc != 0 ]; then&lt;br /&gt;
  echo '!!! Job Failed !!!'&lt;br /&gt;
  echo 'error=' $sc&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Strange HSI nuances === &lt;br /&gt;
&lt;br /&gt;
* During interactive use, even though it appears that the keyboard up arrow will retrieve previous HSI commands, this does not work as expected and should be avoided.&lt;br /&gt;
* Tab completion does not work. Be careful with combinations of tab completion and the &amp;quot;*&amp;quot; character!&lt;br /&gt;
* echo $? does not work as expected from within HSI. Here is what happens:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;echo $?&lt;br /&gt;
echo turned on&lt;br /&gt;
&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;echo $?&lt;br /&gt;
echo turned off&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Thus, you must avoid the use of echo when checking the output of HSI commands. Note that $? is a bash variable, not an HSI variable, so this would not work in any event, but one might expect it to work, hence the warning.&lt;br /&gt;
&lt;br /&gt;
== '''HTAR''' ==&lt;br /&gt;
''' Please aggregate small files (&amp;lt;~100MB) into tarballs or htar files. '''&lt;br /&gt;
&lt;br /&gt;
HTAR is a utility that is used for aggregating a set of files and directories, by using a sophisticated multithreaded buffering scheme to write files from the local filesystem directly into HPSS, creating an archive file that conforms to the POSIX TAR specification, thereby achieving a high rate of performance. &lt;br /&gt;
&lt;br /&gt;
=== '''CAUTION''' ===&lt;br /&gt;
* Files larger than 68 GB cannot be stored in an htar archive (you'll get an error message for the whole operation)&lt;br /&gt;
* HTAR archives cannot contain more than 1 million files.&lt;br /&gt;
* Check the HTAR exit code and log file before removing any files from the active filesystems.&lt;br /&gt;
&lt;br /&gt;
=== HTAR Usage ===&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3537</id>
		<title>HPSS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3537"/>
		<updated>2011-06-29T16:46:36Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* Your First Time Using HSI */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''High Performance Storage System''' =&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The High Performance Storage System ([http://www.hpss-collaboration.org/index.shtml HPSS]) is a tape-backed hierarchical storage system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data can be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
Access and transfer of data into and out of HPSS is done under the control of the user, whose interaction is expected to be scripted and submitted as a batch job, using one or more of the following utilities:&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi HSI] is a client with an ftp-like functionality which can be used to archive and retrieve large files. It is also useful for browsing the contents of HPSS.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/htar HTAR] is a utility that creates tar formatted archives directly into HPSS. It also creates a separate index file (.idx) that can be accessed and browsed quickly.&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/ISH ISH] is a TUI utility that can perform an inventory of the files and directories in your tarballs.&lt;br /&gt;
&lt;br /&gt;
== '''Why should I use and trust HPSS?''' ==&lt;br /&gt;
* 10+ years history, used by 50+ facilities in the “Top 500” HPC list&lt;br /&gt;
* very reliable, data redundancy and data insurance built-in.&lt;br /&gt;
* highly scalable, reasonable performance at SciNet - Ingest: ~12 TB/day, Recall: ~24 TB/day (aggregated)&lt;br /&gt;
* HSI/HTAR clients also very reliable and used on several HPSS sites. ISH was written at SciNet.&lt;br /&gt;
* [[Media:HPSS_rational.pdf|HPSS fits well with the Storage Capacity Expansion Plan at SciNet]] (pdf presentation)&lt;br /&gt;
&lt;br /&gt;
== '''Guidelines''' ==&lt;br /&gt;
* Expanded storage capacity is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into tarballs with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the repository is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application's exit code and the returned log file for errors after all data transfers and any tarball creation process.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Access Through the Queue System'''  ==&lt;br /&gt;
All access to the archive system is done through the [[Moab|GPC queue system]].&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the HPSS should be scripted into jobs and submitted to the ''archive'' queue. See HSI example below.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_put_file_in_hpss&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The status of pending jobs can be monitored with showq specifying the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Recalling Data for Analysis ===&lt;br /&gt;
&lt;br /&gt;
Typically, data will be recalled to the /scratch file system when it is needed for analysis. Job dependencies can be used to have analysis jobs wait in the queue for data recalls before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-recall.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-recalled-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with HPSS. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents in HPSS. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the file does not already exist in HPSS&lt;br /&gt;
|-&lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local storage only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of HPSS.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
* Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* More complex sequences can be performed using a script such as this:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
Complete documentation of HSI is available on the Gleicher Enterprises web site.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/ HSI Introduction]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help hsi help]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Your First Time Using HSI ===&lt;br /&gt;
Once you are comfortable with HSI, you will script the process and submit it non-interactively to the queue. On your first tests, however, we suggest that you use an interactive queue to get familiar with the system. In the remainder of this subsection, you will find an example of an interactive test that you can run to become familiar with using HSI.&lt;br /&gt;
&lt;br /&gt;
==== Setup ====&lt;br /&gt;
&lt;br /&gt;
To begin the process, we create a directory structure to use for the tests:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mkdir BASE&lt;br /&gt;
echo 1 &amp;gt; BASE/file.1&lt;br /&gt;
echo 2 &amp;gt; BASE/file.2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Accessing HSI ====&lt;br /&gt;
&lt;br /&gt;
Now we obtain an interactive job on the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -l nodes=1:ppn=8,walltime=2:00:00 -q archive -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now we start HSI:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/usr/local/bin/hsi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And that provides us with the following prompt, where your initial directory will be: /archive/$(groups)/$(whoami)/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that the current directory on HSI (/archive/group/user) is empty by typing:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Also note that the current directory on the disk contains the directory structure that we created at the beginning:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
lls&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Where the output should contain the &amp;quot;BASE&amp;quot; directory.&lt;br /&gt;
&lt;br /&gt;
==== Offload ====&lt;br /&gt;
&lt;br /&gt;
Now we offload the BASE directory and all of its contents to the HPSS system. Running interactively, it is easy to learn that the -R flag is necessary to offload a directory. We thus use:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cput -R BASE&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The output appears as:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cput  'BASE/file.1' : 'file.1' ( 2 bytes, 0.2 KBS (cos=1300))&lt;br /&gt;
cput  'BASE/file.2' : 'file.2' ( 2 bytes, 0.5 KBS (cos=1300))&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And now when we look at the files on the HPSS&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls BASE&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
The output appears as:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/archive/group/user/BASE:&lt;br /&gt;
file.1  file.2  &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Recall ====&lt;br /&gt;
&lt;br /&gt;
We now recall the data from HPSS back to the GPFS disk. Once this entire simple test is complete, we suggest that you run through this test again using a real directory of your data. Thus, we do not delete the original directory on disk at this point. Instead, we create a new directory and recall the data from HPSS to this new directory where it can be checked for congruency to the original data if desired.&lt;br /&gt;
&lt;br /&gt;
First, we need to create a new directory on the disk. We will do this from within HSI, but you could also exit HSI (using the exit command or control-c) to make the directory changes and then run HSI again. Thus, continuing from above:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
lmkdir RECALL&lt;br /&gt;
lls&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
now lists both the BASE and RECALL directories on disk.&lt;br /&gt;
&lt;br /&gt;
We now recall the data from HPSS to GPFS:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
lcd RECALL&lt;br /&gt;
cget -R BASE&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And the output of the cget command is:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cget  '/project/.../TEST/RECALL/BASE/file.1' : '/archive/.../BASE/file.1' (2011/06/29 12:14:21 2 bytes, 4.0 KBS )&lt;br /&gt;
cget  '/project/.../TEST/RECALL/BASE/file.2' : '/archive/.../file.2' (2011/06/29 12:14:22 2 bytes, 3.7 KBS )&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We now exit HSI (using the exit command or control-c) and verify the existence of the directory that was brought back to GPFS.&lt;br /&gt;
&lt;br /&gt;
=== How does this procedure change when I start doing this non-interactively? ===&lt;br /&gt;
&lt;br /&gt;
* A major change will be the detection of errors. The bash $? variable returns the exit code of the last command. Hence, $? should be captured after every execution of HSI, as outlined at other places on this page.&lt;br /&gt;
&lt;br /&gt;
=== Typical Usage ===&lt;br /&gt;
The most common interactions will be ''putting'' data into HPSS, examining the contents (ls), and ''getting'' data back onto one of the active filesystems for inspection or analysis.&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
# Note that upon executing hsi, your initial directory will be: /archive/$(groups)/$(whoami)/&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away&lt;br /&gt;
cd put-away&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
A convenient way to explore the contents of HPSS is with the inventory shell [[ISH]]. This example creates an index of all the files in a user's portion of the namespace. The list is placed in the file /home/$USER/HPSSdm/hsi.igz that can be inspected from the gpc-devel nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_index&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
TODAY=$(date +%Y%m%d)&lt;br /&gt;
INDEX_DIR=/home/$USER/HPSSdm&lt;br /&gt;
if [[ -! -e $INDEX_DIR ]];then&lt;br /&gt;
  mkdir $INDEX_DIR&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
export ISHREGISTER=$HOME/HPSSdm&lt;br /&gt;
ish hindex&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This index can be browsed or searched with ISH on the development nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc-f104n084-$ ish ~/HPSSdm/hsi.igz &lt;br /&gt;
[ish]hsi.igz&amp;gt; help&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
ISH is a powerful tool that is also useful for creating and browsing indices of tar and htar archives, so please look at the [[ISH|documentation]] or built in help.&lt;br /&gt;
&lt;br /&gt;
* sample '''data recall'''&lt;br /&gt;
   - This example should be modified to emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do optimization.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-recall.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N recall&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/recalled-from-hpss&lt;br /&gt;
hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
cget /scratch/$USER/recalled-from-hpss/Jan-2010-jobs.tar.gz : put-away-on-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
cget /scratch/$USER/recalled-from-hpss/Feb-2010-jobs.tar.gz : put-away-on-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
&lt;br /&gt;
exit $status&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI vs. FTP ===&lt;br /&gt;
HSI syntax and usage is very similar to that of FTP. Please note the following information adapted from the HSI man page:&lt;br /&gt;
&lt;br /&gt;
HSI supports several of the commonly used FTP commands, including &amp;quot;dir&amp;quot;,&amp;quot;get&amp;quot;,&amp;quot;ls&amp;quot;,&amp;quot;mdelete&amp;quot;,&amp;quot;mget&amp;quot;,&amp;quot;put&amp;quot;,&amp;quot;mput&amp;quot; and &amp;quot;prompt&amp;quot;, with the following differences:&lt;br /&gt;
&lt;br /&gt;
* The &amp;quot;dir&amp;quot; command is an alias for &amp;quot;ls&amp;quot; in HSI. The &amp;quot;ls&amp;quot; command supports an extensive set of options for displaying files, including wildcard pattern-matching, and the ability to recursively list a directory tree&lt;br /&gt;
* The &amp;quot;put&amp;quot; and &amp;quot;get&amp;quot; family of commands support recursion&lt;br /&gt;
* There are &amp;quot;conditional put&amp;quot; and &amp;quot;conditional&amp;quot; get commands (cput, cget)&lt;br /&gt;
* The syntax for renaming files during transfers with HSI is different from FTP. With HSI, the general format is always &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     &amp;quot;local_file : hpss_file&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and multiple such pairs may be specified on a single command line.&lt;br /&gt;
&lt;br /&gt;
For example, when using HSI to store the local file &amp;quot;file1&amp;quot; as &amp;quot;hpss_file1&amp;quot; into HPSS, then retrieve it back to the local filesystem as &amp;quot;file1.bak&amp;quot;, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 : hpss_file1&lt;br /&gt;
    get file1.bak : hpss_file1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* unlike with FTP, where the following syntax would be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 hpss_file1 &lt;br /&gt;
    get hpss_file1 file1.bak&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The &amp;quot;m&amp;quot; prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The &amp;quot;m&amp;quot; series of commands are intended to provide a measure of compatibility for FTP users.&lt;br /&gt;
&lt;br /&gt;
=== Other HSI Examples === &lt;br /&gt;
&lt;br /&gt;
* Creating tar archive of C source programs and header files on the fly by piping stdout:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HSI pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Retrieve the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Put a subdirectory ''LargeFiles'' and all its contents recursively. You may use '-u' option to resume a previously disrupted session.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   cput -R -u LargeFiles&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help''']&lt;br /&gt;
&lt;br /&gt;
* verify checksum&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N checksum_verified_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
thefile=&amp;lt;localpath&amp;gt;&lt;br /&gt;
storedfile=&amp;lt;hpsspath&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# Generate checksum on fly using a named pipe so that file is only read from GPFS once&lt;br /&gt;
mkfifo /tmp/NPIPE&lt;br /&gt;
cat $thefile  | tee /tmp/NPIPE | hsi -q put - : $storedfile &amp;amp;&lt;br /&gt;
pid=$!&lt;br /&gt;
md5sum /tmp/NPIPE |tee /tmp/$fname.md5&lt;br /&gt;
rm     /tmp/NPIPE&lt;br /&gt;
&lt;br /&gt;
# Check the exit code of the HSI process  &lt;br /&gt;
wait $pid&lt;br /&gt;
sc=$?&lt;br /&gt;
if [ $sc != 0 ];then&lt;br /&gt;
  echo &amp;quot;File transfer failed&amp;quot;&lt;br /&gt;
  exit $sc&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# change filename to stdin in checksum file&lt;br /&gt;
sed -i.1 &amp;quot;s+/tmp/NPIPE+-+&amp;quot; /tmp/$fname.md5&lt;br /&gt;
&lt;br /&gt;
# verify checksum&lt;br /&gt;
hsi -q get - : $storedfile  | md5sum -c  /tmp/$fname.md5&lt;br /&gt;
sc=$?&lt;br /&gt;
if [ $sc != 0 ]; then&lt;br /&gt;
  echo '!!! Job Failed !!!'&lt;br /&gt;
  echo 'error=' $sc&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Strange HSI nuances === &lt;br /&gt;
&lt;br /&gt;
* During interactive use, even though it appears that the keyboard up arrow will retrieve previous HSI commands, this does not work as expected and should be avoided.&lt;br /&gt;
* Tab completion does not work. Be careful with combinations of tab completion and the &amp;quot;*&amp;quot; character!&lt;br /&gt;
* echo $? does not work as expected from within HSI. Here is what happens:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;echo $?&lt;br /&gt;
echo turned on&lt;br /&gt;
&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;echo $?&lt;br /&gt;
echo turned off&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Thus, you must avoid the use of echo when checking the output of HSI commands. Note that $? is a bash variable, not an HSI variable, so this would not work in any event, but one might expect it to work, hence the warning.&lt;br /&gt;
&lt;br /&gt;
== '''HTAR''' ==&lt;br /&gt;
''' Please aggregate small files (&amp;lt;~100MB) into tarballs or htar files. '''&lt;br /&gt;
&lt;br /&gt;
HTAR is a utility that is used for aggregating a set of files and directories, by using a sophisticated multithreaded buffering scheme to write files from the local filesystem directly into HPSS, creating an archive file that conforms to the POSIX TAR specification, thereby achieving a high rate of performance. &lt;br /&gt;
&lt;br /&gt;
=== '''CAUTION''' ===&lt;br /&gt;
* Files larger than 68 GB cannot be stored in an htar archive (you'll get an error message for the whole operation)&lt;br /&gt;
* HTAR archives cannot contain more than 1 million files.&lt;br /&gt;
* Check the HTAR exit code and log file before removing any files from the active filesystems.&lt;br /&gt;
&lt;br /&gt;
=== HTAR Usage ===&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3536</id>
		<title>HPSS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3536"/>
		<updated>2011-06-29T16:41:39Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* Strange HSI nuances */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''High Performance Storage System''' =&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The High Performance Storage System ([http://www.hpss-collaboration.org/index.shtml HPSS]) is a tape-backed hierarchical storage system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data can be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
Access and transfer of data into and out of HPSS is done under the control of the user, whose interaction is expected to be scripted and submitted as a batch job, using one or more of the following utilities:&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi HSI] is a client with an ftp-like functionality which can be used to archive and retrieve large files. It is also useful for browsing the contents of HPSS.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/htar HTAR] is a utility that creates tar formatted archives directly into HPSS. It also creates a separate index file (.idx) that can be accessed and browsed quickly.&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/ISH ISH] is a TUI utility that can perform an inventory of the files and directories in your tarballs.&lt;br /&gt;
&lt;br /&gt;
== '''Why should I use and trust HPSS?''' ==&lt;br /&gt;
* 10+ years history, used by 50+ facilities in the “Top 500” HPC list&lt;br /&gt;
* very reliable, data redundancy and data insurance built-in.&lt;br /&gt;
* highly scalable, reasonable performance at SciNet - Ingest: ~12 TB/day, Recall: ~24 TB/day (aggregated)&lt;br /&gt;
* HSI/HTAR clients also very reliable and used on several HPSS sites. ISH was written at SciNet.&lt;br /&gt;
* [[Media:HPSS_rational.pdf|HPSS fits well with the Storage Capacity Expansion Plan at SciNet]] (pdf presentation)&lt;br /&gt;
&lt;br /&gt;
== '''Guidelines''' ==&lt;br /&gt;
* Expanded storage capacity is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into tarballs with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the repository is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application's exit code and the returned log file for errors after all data transfers and any tarball creation process.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Access Through the Queue System'''  ==&lt;br /&gt;
All access to the archive system is done through the [[Moab|GPC queue system]].&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the HPSS should be scripted into jobs and submitted to the ''archive'' queue. See HSI example below.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_put_file_in_hpss&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The status of pending jobs can be monitored with showq specifying the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Recalling Data for Analysis ===&lt;br /&gt;
&lt;br /&gt;
Typically, data will be recalled to the /scratch file system when it is needed for analysis. Job dependencies can be used to have analysis jobs wait in the queue for data recalls before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-recall.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-recalled-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with HPSS. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents in HPSS. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the file does not already exist in HPSS&lt;br /&gt;
|-&lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local storage only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of HPSS.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
* Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* More complex sequences can be performed using a script such as this:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
Complete documentation of HSI is available on the Gleicher Enterprises web site.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/ HSI Introduction]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help hsi help]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Your First Time Using HSI ===&lt;br /&gt;
Once you are comfortable with HSI, you will script the process and submit it non-interactively to the queue. On your first tests, however, we suggest that you use an interactive queue to get familiar with the system. In the remainder of this subsection, you will find an example of an interactive test that you can run to become familiar with using HSI.&lt;br /&gt;
&lt;br /&gt;
==== Setup ====&lt;br /&gt;
&lt;br /&gt;
To begin the process, we create a directory structure to use for the tests:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mkdir BASE&lt;br /&gt;
echo 1 &amp;gt; BASE/file.1&lt;br /&gt;
echo 2 &amp;gt; BASE/file.2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Accessing HSI ====&lt;br /&gt;
&lt;br /&gt;
Now we obtain an interactive job on the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -l nodes=1:ppn=8,walltime=2:00:00 -q archive -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now we start HSI:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/usr/local/bin/hsi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And that provides us with the following prompt, where your initial directory will be: /archive/$(groups)/$(whoami)/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that the current directory on HSI (/archive/group/user) is empty by typing:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Also note that the current directory on the disk contains the directory structure that we created at the beginning:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
lls&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Where the output should contain the &amp;quot;BASE&amp;quot; directory.&lt;br /&gt;
&lt;br /&gt;
==== Offload ====&lt;br /&gt;
&lt;br /&gt;
Now we offload the BASE directory and all of its contents to the HPSS system. Running interactively, it is easy to learn that the -R flag is necessary to offload a directory. We thus use:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cput -R BASE&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The output appears as:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cput  'BASE/file.1' : 'file.1' ( 2 bytes, 0.2 KBS (cos=1300))&lt;br /&gt;
cput  'BASE/file.2' : 'file.2' ( 2 bytes, 0.5 KBS (cos=1300))&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And now when we look at the files on the HPSS&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls BASE&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
The output appears as:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/archive/group/user/BASE:&lt;br /&gt;
file.1  file.2  &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Recall ====&lt;br /&gt;
&lt;br /&gt;
We now recall the data from HPSS back to the GPFS disk. Once this entire simple test is complete, we suggest that you run through this test again using a real directory of your data. Thus, we do not delete the original directory on disk at this point. Instead, we create a new directory and recall the data from HPSS to this new directory where it can be checked for congruency to the original data if desired.&lt;br /&gt;
&lt;br /&gt;
First, we need to create a new directory on the disk. We will do this from within HSI, but you could also exit HSI (using the exit command or control-c) to make the directory changes and then run HSI again. Thus, continuing from above:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
lmkdir RECALL&lt;br /&gt;
lls&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
now lists both the BASE and RECALL directories on disk.&lt;br /&gt;
&lt;br /&gt;
We now recall the data from HPSS to GPFS:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
lcd RECALL&lt;br /&gt;
cget -R BASE&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And the output of the cget command is:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cget  '/project/.../TEST/RECALL/BASE/file.1' : '/archive/.../BASE/file.1' (2011/06/29 12:14:21 2 bytes, 4.0 KBS )&lt;br /&gt;
cget  '/project/.../TEST/RECALL/BASE/file.2' : '/archive/.../file.2' (2011/06/29 12:14:22 2 bytes, 3.7 KBS )&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We now exit HSI (using the exit command or control-c) and verify the existence of the directory that was brought back to GPFS.&lt;br /&gt;
&lt;br /&gt;
=== Typical Usage ===&lt;br /&gt;
The most common interactions will be ''putting'' data into HPSS, examining the contents (ls), and ''getting'' data back onto one of the active filesystems for inspection or analysis.&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
# Note that upon executing hsi, your initial directory will be: /archive/$(groups)/$(whoami)/&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away&lt;br /&gt;
cd put-away&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
A convenient way to explore the contents of HPSS is with the inventory shell [[ISH]]. This example creates an index of all the files in a user's portion of the namespace. The list is placed in the file /home/$USER/HPSSdm/hsi.igz that can be inspected from the gpc-devel nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_index&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
TODAY=$(date +%Y%m%d)&lt;br /&gt;
INDEX_DIR=/home/$USER/HPSSdm&lt;br /&gt;
if [[ -! -e $INDEX_DIR ]];then&lt;br /&gt;
  mkdir $INDEX_DIR&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
export ISHREGISTER=$HOME/HPSSdm&lt;br /&gt;
ish hindex&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This index can be browsed or searched with ISH on the development nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc-f104n084-$ ish ~/HPSSdm/hsi.igz &lt;br /&gt;
[ish]hsi.igz&amp;gt; help&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
ISH is a powerful tool that is also useful for creating and browsing indices of tar and htar archives, so please look at the [[ISH|documentation]] or built in help.&lt;br /&gt;
&lt;br /&gt;
* sample '''data recall'''&lt;br /&gt;
   - This example should be modified to emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do optimization.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-recall.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N recall&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/recalled-from-hpss&lt;br /&gt;
hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
cget /scratch/$USER/recalled-from-hpss/Jan-2010-jobs.tar.gz : put-away-on-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
cget /scratch/$USER/recalled-from-hpss/Feb-2010-jobs.tar.gz : put-away-on-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
&lt;br /&gt;
exit $status&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI vs. FTP ===&lt;br /&gt;
HSI syntax and usage is very similar to that of FTP. Please note the following information adapted from the HSI man page:&lt;br /&gt;
&lt;br /&gt;
HSI supports several of the commonly used FTP commands, including &amp;quot;dir&amp;quot;,&amp;quot;get&amp;quot;,&amp;quot;ls&amp;quot;,&amp;quot;mdelete&amp;quot;,&amp;quot;mget&amp;quot;,&amp;quot;put&amp;quot;,&amp;quot;mput&amp;quot; and &amp;quot;prompt&amp;quot;, with the following differences:&lt;br /&gt;
&lt;br /&gt;
* The &amp;quot;dir&amp;quot; command is an alias for &amp;quot;ls&amp;quot; in HSI. The &amp;quot;ls&amp;quot; command supports an extensive set of options for displaying files, including wildcard pattern-matching, and the ability to recursively list a directory tree&lt;br /&gt;
* The &amp;quot;put&amp;quot; and &amp;quot;get&amp;quot; family of commands support recursion&lt;br /&gt;
* There are &amp;quot;conditional put&amp;quot; and &amp;quot;conditional&amp;quot; get commands (cput, cget)&lt;br /&gt;
* The syntax for renaming files during transfers with HSI is different from FTP. With HSI, the general format is always &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     &amp;quot;local_file : hpss_file&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and multiple such pairs may be specified on a single command line.&lt;br /&gt;
&lt;br /&gt;
For example, when using HSI to store the local file &amp;quot;file1&amp;quot; as &amp;quot;hpss_file1&amp;quot; into HPSS, then retrieve it back to the local filesystem as &amp;quot;file1.bak&amp;quot;, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 : hpss_file1&lt;br /&gt;
    get file1.bak : hpss_file1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* unlike with FTP, where the following syntax would be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 hpss_file1 &lt;br /&gt;
    get hpss_file1 file1.bak&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The &amp;quot;m&amp;quot; prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The &amp;quot;m&amp;quot; series of commands are intended to provide a measure of compatibility for FTP users.&lt;br /&gt;
&lt;br /&gt;
=== Other HSI Examples === &lt;br /&gt;
&lt;br /&gt;
* Creating tar archive of C source programs and header files on the fly by piping stdout:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HSI pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Retrieve the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Put a subdirectory ''LargeFiles'' and all its contents recursively. You may use '-u' option to resume a previously disrupted session.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   cput -R -u LargeFiles&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help''']&lt;br /&gt;
&lt;br /&gt;
* verify checksum&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N checksum_verified_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
thefile=&amp;lt;localpath&amp;gt;&lt;br /&gt;
storedfile=&amp;lt;hpsspath&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# Generate checksum on fly using a named pipe so that file is only read from GPFS once&lt;br /&gt;
mkfifo /tmp/NPIPE&lt;br /&gt;
cat $thefile  | tee /tmp/NPIPE | hsi -q put - : $storedfile &amp;amp;&lt;br /&gt;
pid=$!&lt;br /&gt;
md5sum /tmp/NPIPE |tee /tmp/$fname.md5&lt;br /&gt;
rm     /tmp/NPIPE&lt;br /&gt;
&lt;br /&gt;
# Check the exit code of the HSI process  &lt;br /&gt;
wait $pid&lt;br /&gt;
sc=$?&lt;br /&gt;
if [ $sc != 0 ];then&lt;br /&gt;
  echo &amp;quot;File transfer failed&amp;quot;&lt;br /&gt;
  exit $sc&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# change filename to stdin in checksum file&lt;br /&gt;
sed -i.1 &amp;quot;s+/tmp/NPIPE+-+&amp;quot; /tmp/$fname.md5&lt;br /&gt;
&lt;br /&gt;
# verify checksum&lt;br /&gt;
hsi -q get - : $storedfile  | md5sum -c  /tmp/$fname.md5&lt;br /&gt;
sc=$?&lt;br /&gt;
if [ $sc != 0 ]; then&lt;br /&gt;
  echo '!!! Job Failed !!!'&lt;br /&gt;
  echo 'error=' $sc&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Strange HSI nuances === &lt;br /&gt;
&lt;br /&gt;
* During interactive use, even though it appears that the keyboard up arrow will retrieve previous HSI commands, this does not work as expected and should be avoided.&lt;br /&gt;
* Tab completion does not work. Be careful with combinations of tab completion and the &amp;quot;*&amp;quot; character!&lt;br /&gt;
* echo $? does not work as expected from within HSI. Here is what happens:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;echo $?&lt;br /&gt;
echo turned on&lt;br /&gt;
&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;echo $?&lt;br /&gt;
echo turned off&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Thus, you must avoid the use of echo when checking the output of HSI commands. Note that $? is a bash variable, not an HSI variable, so this would not work in any event, but one might expect it to work, hence the warning.&lt;br /&gt;
&lt;br /&gt;
== '''HTAR''' ==&lt;br /&gt;
''' Please aggregate small files (&amp;lt;~100MB) into tarballs or htar files. '''&lt;br /&gt;
&lt;br /&gt;
HTAR is a utility that is used for aggregating a set of files and directories, by using a sophisticated multithreaded buffering scheme to write files from the local filesystem directly into HPSS, creating an archive file that conforms to the POSIX TAR specification, thereby achieving a high rate of performance. &lt;br /&gt;
&lt;br /&gt;
=== '''CAUTION''' ===&lt;br /&gt;
* Files larger than 68 GB cannot be stored in an htar archive (you'll get an error message for the whole operation)&lt;br /&gt;
* HTAR archives cannot contain more than 1 million files.&lt;br /&gt;
* Check the HTAR exit code and log file before removing any files from the active filesystems.&lt;br /&gt;
&lt;br /&gt;
=== HTAR Usage ===&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3535</id>
		<title>HPSS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3535"/>
		<updated>2011-06-29T16:38:33Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* Strange HSI nuances */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''High Performance Storage System''' =&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The High Performance Storage System ([http://www.hpss-collaboration.org/index.shtml HPSS]) is a tape-backed hierarchical storage system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data can be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
Access and transfer of data into and out of HPSS is done under the control of the user, whose interaction is expected to be scripted and submitted as a batch job, using one or more of the following utilities:&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi HSI] is a client with an ftp-like functionality which can be used to archive and retrieve large files. It is also useful for browsing the contents of HPSS.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/htar HTAR] is a utility that creates tar formatted archives directly into HPSS. It also creates a separate index file (.idx) that can be accessed and browsed quickly.&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/ISH ISH] is a TUI utility that can perform an inventory of the files and directories in your tarballs.&lt;br /&gt;
&lt;br /&gt;
== '''Why should I use and trust HPSS?''' ==&lt;br /&gt;
* 10+ years history, used by 50+ facilities in the “Top 500” HPC list&lt;br /&gt;
* very reliable, data redundancy and data insurance built-in.&lt;br /&gt;
* highly scalable, reasonable performance at SciNet - Ingest: ~12 TB/day, Recall: ~24 TB/day (aggregated)&lt;br /&gt;
* HSI/HTAR clients also very reliable and used on several HPSS sites. ISH was written at SciNet.&lt;br /&gt;
* [[Media:HPSS_rational.pdf|HPSS fits well with the Storage Capacity Expansion Plan at SciNet]] (pdf presentation)&lt;br /&gt;
&lt;br /&gt;
== '''Guidelines''' ==&lt;br /&gt;
* Expanded storage capacity is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into tarballs with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the repository is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application's exit code and the returned log file for errors after all data transfers and any tarball creation process.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Access Through the Queue System'''  ==&lt;br /&gt;
All access to the archive system is done through the [[Moab|GPC queue system]].&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the HPSS should be scripted into jobs and submitted to the ''archive'' queue. See HSI example below.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_put_file_in_hpss&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The status of pending jobs can be monitored with showq specifying the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Recalling Data for Analysis ===&lt;br /&gt;
&lt;br /&gt;
Typically, data will be recalled to the /scratch file system when it is needed for analysis. Job dependencies can be used to have analysis jobs wait in the queue for data recalls before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-recall.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-recalled-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with HPSS. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents in HPSS. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the file does not already exist in HPSS&lt;br /&gt;
|-&lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local storage only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of HPSS.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
* Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* More complex sequences can be performed using a script such as this:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
Complete documentation of HSI is available on the Gleicher Enterprises web site.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/ HSI Introduction]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help hsi help]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Your First Time Using HSI ===&lt;br /&gt;
Once you are comfortable with HSI, you will script the process and submit it non-interactively to the queue. On your first tests, however, we suggest that you use an interactive queue to get familiar with the system. In the remainder of this subsection, you will find an example of an interactive test that you can run to become familiar with using HSI.&lt;br /&gt;
&lt;br /&gt;
==== Setup ====&lt;br /&gt;
&lt;br /&gt;
To begin the process, we create a directory structure to use for the tests:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mkdir BASE&lt;br /&gt;
echo 1 &amp;gt; BASE/file.1&lt;br /&gt;
echo 2 &amp;gt; BASE/file.2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Accessing HSI ====&lt;br /&gt;
&lt;br /&gt;
Now we obtain an interactive job on the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -l nodes=1:ppn=8,walltime=2:00:00 -q archive -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now we start HSI:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/usr/local/bin/hsi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And that provides us with the following prompt, where your initial directory will be: /archive/$(groups)/$(whoami)/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that the current directory on HSI (/archive/group/user) is empty by typing:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Also note that the current directory on the disk contains the directory structure that we created at the beginning:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
lls&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Where the output should contain the &amp;quot;BASE&amp;quot; directory.&lt;br /&gt;
&lt;br /&gt;
==== Offload ====&lt;br /&gt;
&lt;br /&gt;
Now we offload the BASE directory and all of its contents to the HPSS system. Running interactively, it is easy to learn that the -R flag is necessary to offload a directory. We thus use:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cput -R BASE&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The output appears as:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cput  'BASE/file.1' : 'file.1' ( 2 bytes, 0.2 KBS (cos=1300))&lt;br /&gt;
cput  'BASE/file.2' : 'file.2' ( 2 bytes, 0.5 KBS (cos=1300))&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And now when we look at the files on the HPSS&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls BASE&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
The output appears as:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/archive/group/user/BASE:&lt;br /&gt;
file.1  file.2  &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Recall ====&lt;br /&gt;
&lt;br /&gt;
We now recall the data from HPSS back to the GPFS disk. Once this entire simple test is complete, we suggest that you run through this test again using a real directory of your data. Thus, we do not delete the original directory on disk at this point. Instead, we create a new directory and recall the data from HPSS to this new directory where it can be checked for congruency to the original data if desired.&lt;br /&gt;
&lt;br /&gt;
First, we need to create a new directory on the disk. We will do this from within HSI, but you could also exit HSI (using the exit command or control-c) to make the directory changes and then run HSI again. Thus, continuing from above:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
lmkdir RECALL&lt;br /&gt;
lls&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
now lists both the BASE and RECALL directories on disk.&lt;br /&gt;
&lt;br /&gt;
We now recall the data from HPSS to GPFS:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
lcd RECALL&lt;br /&gt;
cget -R BASE&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And the output of the cget command is:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cget  '/project/.../TEST/RECALL/BASE/file.1' : '/archive/.../BASE/file.1' (2011/06/29 12:14:21 2 bytes, 4.0 KBS )&lt;br /&gt;
cget  '/project/.../TEST/RECALL/BASE/file.2' : '/archive/.../file.2' (2011/06/29 12:14:22 2 bytes, 3.7 KBS )&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We now exit HSI (using the exit command or control-c) and verify the existence of the directory that was brought back to GPFS.&lt;br /&gt;
&lt;br /&gt;
=== Typical Usage ===&lt;br /&gt;
The most common interactions will be ''putting'' data into HPSS, examining the contents (ls), and ''getting'' data back onto one of the active filesystems for inspection or analysis.&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
# Note that upon executing hsi, your initial directory will be: /archive/$(groups)/$(whoami)/&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away&lt;br /&gt;
cd put-away&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
A convenient way to explore the contents of HPSS is with the inventory shell [[ISH]]. This example creates an index of all the files in a user's portion of the namespace. The list is placed in the file /home/$USER/HPSSdm/hsi.igz that can be inspected from the gpc-devel nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_index&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
TODAY=$(date +%Y%m%d)&lt;br /&gt;
INDEX_DIR=/home/$USER/HPSSdm&lt;br /&gt;
if [[ -! -e $INDEX_DIR ]];then&lt;br /&gt;
  mkdir $INDEX_DIR&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
export ISHREGISTER=$HOME/HPSSdm&lt;br /&gt;
ish hindex&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This index can be browsed or searched with ISH on the development nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc-f104n084-$ ish ~/HPSSdm/hsi.igz &lt;br /&gt;
[ish]hsi.igz&amp;gt; help&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
ISH is a powerful tool that is also useful for creating and browsing indices of tar and htar archives, so please look at the [[ISH|documentation]] or built in help.&lt;br /&gt;
&lt;br /&gt;
* sample '''data recall'''&lt;br /&gt;
   - This example should be modified to emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do optimization.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-recall.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N recall&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/recalled-from-hpss&lt;br /&gt;
hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
cget /scratch/$USER/recalled-from-hpss/Jan-2010-jobs.tar.gz : put-away-on-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
cget /scratch/$USER/recalled-from-hpss/Feb-2010-jobs.tar.gz : put-away-on-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
&lt;br /&gt;
exit $status&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI vs. FTP ===&lt;br /&gt;
HSI syntax and usage is very similar to that of FTP. Please note the following information adapted from the HSI man page:&lt;br /&gt;
&lt;br /&gt;
HSI supports several of the commonly used FTP commands, including &amp;quot;dir&amp;quot;,&amp;quot;get&amp;quot;,&amp;quot;ls&amp;quot;,&amp;quot;mdelete&amp;quot;,&amp;quot;mget&amp;quot;,&amp;quot;put&amp;quot;,&amp;quot;mput&amp;quot; and &amp;quot;prompt&amp;quot;, with the following differences:&lt;br /&gt;
&lt;br /&gt;
* The &amp;quot;dir&amp;quot; command is an alias for &amp;quot;ls&amp;quot; in HSI. The &amp;quot;ls&amp;quot; command supports an extensive set of options for displaying files, including wildcard pattern-matching, and the ability to recursively list a directory tree&lt;br /&gt;
* The &amp;quot;put&amp;quot; and &amp;quot;get&amp;quot; family of commands support recursion&lt;br /&gt;
* There are &amp;quot;conditional put&amp;quot; and &amp;quot;conditional&amp;quot; get commands (cput, cget)&lt;br /&gt;
* The syntax for renaming files during transfers with HSI is different from FTP. With HSI, the general format is always &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     &amp;quot;local_file : hpss_file&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and multiple such pairs may be specified on a single command line.&lt;br /&gt;
&lt;br /&gt;
For example, when using HSI to store the local file &amp;quot;file1&amp;quot; as &amp;quot;hpss_file1&amp;quot; into HPSS, then retrieve it back to the local filesystem as &amp;quot;file1.bak&amp;quot;, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 : hpss_file1&lt;br /&gt;
    get file1.bak : hpss_file1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* unlike with FTP, where the following syntax would be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 hpss_file1 &lt;br /&gt;
    get hpss_file1 file1.bak&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The &amp;quot;m&amp;quot; prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The &amp;quot;m&amp;quot; series of commands are intended to provide a measure of compatibility for FTP users.&lt;br /&gt;
&lt;br /&gt;
=== Other HSI Examples === &lt;br /&gt;
&lt;br /&gt;
* Creating tar archive of C source programs and header files on the fly by piping stdout:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HSI pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Retrieve the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Put a subdirectory ''LargeFiles'' and all its contents recursively. You may use '-u' option to resume a previously disrupted session.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   cput -R -u LargeFiles&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help''']&lt;br /&gt;
&lt;br /&gt;
* verify checksum&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N checksum_verified_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
thefile=&amp;lt;localpath&amp;gt;&lt;br /&gt;
storedfile=&amp;lt;hpsspath&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# Generate checksum on fly using a named pipe so that file is only read from GPFS once&lt;br /&gt;
mkfifo /tmp/NPIPE&lt;br /&gt;
cat $thefile  | tee /tmp/NPIPE | hsi -q put - : $storedfile &amp;amp;&lt;br /&gt;
pid=$!&lt;br /&gt;
md5sum /tmp/NPIPE |tee /tmp/$fname.md5&lt;br /&gt;
rm     /tmp/NPIPE&lt;br /&gt;
&lt;br /&gt;
# Check the exit code of the HSI process  &lt;br /&gt;
wait $pid&lt;br /&gt;
sc=$?&lt;br /&gt;
if [ $sc != 0 ];then&lt;br /&gt;
  echo &amp;quot;File transfer failed&amp;quot;&lt;br /&gt;
  exit $sc&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# change filename to stdin in checksum file&lt;br /&gt;
sed -i.1 &amp;quot;s+/tmp/NPIPE+-+&amp;quot; /tmp/$fname.md5&lt;br /&gt;
&lt;br /&gt;
# verify checksum&lt;br /&gt;
hsi -q get - : $storedfile  | md5sum -c  /tmp/$fname.md5&lt;br /&gt;
sc=$?&lt;br /&gt;
if [ $sc != 0 ]; then&lt;br /&gt;
  echo '!!! Job Failed !!!'&lt;br /&gt;
  echo 'error=' $sc&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Strange HSI nuances === &lt;br /&gt;
&lt;br /&gt;
* During interactive use, even though it appears that the keyboard up arrow will retrieve previous HSI commands, this does not work as expected and should be avoided.&lt;br /&gt;
* Tab completion does not work. Be careful with combinations of tab completion and the &amp;quot;*&amp;quot; character!&lt;br /&gt;
* echo $? does not work as expected from within HSI. Here is what happens:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;echo $?&lt;br /&gt;
echo turned on&lt;br /&gt;
&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;echo $?&lt;br /&gt;
echo turned off&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Thus, you must avoid the use of echo when checking the output of HSI commands.&lt;br /&gt;
&lt;br /&gt;
== '''HTAR''' ==&lt;br /&gt;
''' Please aggregate small files (&amp;lt;~100MB) into tarballs or htar files. '''&lt;br /&gt;
&lt;br /&gt;
HTAR is a utility that is used for aggregating a set of files and directories, by using a sophisticated multithreaded buffering scheme to write files from the local filesystem directly into HPSS, creating an archive file that conforms to the POSIX TAR specification, thereby achieving a high rate of performance. &lt;br /&gt;
&lt;br /&gt;
=== '''CAUTION''' ===&lt;br /&gt;
* Files larger than 68 GB cannot be stored in an htar archive (you'll get an error message for the whole operation)&lt;br /&gt;
* HTAR archives cannot contain more than 1 million files.&lt;br /&gt;
* Check the HTAR exit code and log file before removing any files from the active filesystems.&lt;br /&gt;
&lt;br /&gt;
=== HTAR Usage ===&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3534</id>
		<title>HPSS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3534"/>
		<updated>2011-06-29T16:35:10Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* Strange HSI nuances */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''High Performance Storage System''' =&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The High Performance Storage System ([http://www.hpss-collaboration.org/index.shtml HPSS]) is a tape-backed hierarchical storage system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data can be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
Access and transfer of data into and out of HPSS is done under the control of the user, whose interaction is expected to be scripted and submitted as a batch job, using one or more of the following utilities:&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi HSI] is a client with an ftp-like functionality which can be used to archive and retrieve large files. It is also useful for browsing the contents of HPSS.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/htar HTAR] is a utility that creates tar formatted archives directly into HPSS. It also creates a separate index file (.idx) that can be accessed and browsed quickly.&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/ISH ISH] is a TUI utility that can perform an inventory of the files and directories in your tarballs.&lt;br /&gt;
&lt;br /&gt;
== '''Why should I use and trust HPSS?''' ==&lt;br /&gt;
* 10+ years history, used by 50+ facilities in the “Top 500” HPC list&lt;br /&gt;
* very reliable, data redundancy and data insurance built-in.&lt;br /&gt;
* highly scalable, reasonable performance at SciNet - Ingest: ~12 TB/day, Recall: ~24 TB/day (aggregated)&lt;br /&gt;
* HSI/HTAR clients also very reliable and used on several HPSS sites. ISH was written at SciNet.&lt;br /&gt;
* [[Media:HPSS_rational.pdf|HPSS fits well with the Storage Capacity Expansion Plan at SciNet]] (pdf presentation)&lt;br /&gt;
&lt;br /&gt;
== '''Guidelines''' ==&lt;br /&gt;
* Expanded storage capacity is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into tarballs with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the repository is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application's exit code and the returned log file for errors after all data transfers and any tarball creation process.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Access Through the Queue System'''  ==&lt;br /&gt;
All access to the archive system is done through the [[Moab|GPC queue system]].&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the HPSS should be scripted into jobs and submitted to the ''archive'' queue. See HSI example below.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_put_file_in_hpss&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The status of pending jobs can be monitored with showq specifying the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Recalling Data for Analysis ===&lt;br /&gt;
&lt;br /&gt;
Typically, data will be recalled to the /scratch file system when it is needed for analysis. Job dependencies can be used to have analysis jobs wait in the queue for data recalls before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-recall.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-recalled-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with HPSS. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents in HPSS. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the file does not already exist in HPSS&lt;br /&gt;
|-&lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local storage only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of HPSS.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
* Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* More complex sequences can be performed using a script such as this:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
Complete documentation of HSI is available on the Gleicher Enterprises web site.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/ HSI Introduction]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help hsi help]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Your First Time Using HSI ===&lt;br /&gt;
Once you are comfortable with HSI, you will script the process and submit it non-interactively to the queue. On your first tests, however, we suggest that you use an interactive queue to get familiar with the system. In the remainder of this subsection, you will find an example of an interactive test that you can run to become familiar with using HSI.&lt;br /&gt;
&lt;br /&gt;
==== Setup ====&lt;br /&gt;
&lt;br /&gt;
To begin the process, we create a directory structure to use for the tests:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mkdir BASE&lt;br /&gt;
echo 1 &amp;gt; BASE/file.1&lt;br /&gt;
echo 2 &amp;gt; BASE/file.2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Accessing HSI ====&lt;br /&gt;
&lt;br /&gt;
Now we obtain an interactive job on the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -l nodes=1:ppn=8,walltime=2:00:00 -q archive -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now we start HSI:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/usr/local/bin/hsi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And that provides us with the following prompt, where your initial directory will be: /archive/$(groups)/$(whoami)/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that the current directory on HSI (/archive/group/user) is empty by typing:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Also note that the current directory on the disk contains the directory structure that we created at the beginning:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
lls&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Where the output should contain the &amp;quot;BASE&amp;quot; directory.&lt;br /&gt;
&lt;br /&gt;
==== Offload ====&lt;br /&gt;
&lt;br /&gt;
Now we offload the BASE directory and all of its contents to the HPSS system. Running interactively, it is easy to learn that the -R flag is necessary to offload a directory. We thus use:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cput -R BASE&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The output appears as:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cput  'BASE/file.1' : 'file.1' ( 2 bytes, 0.2 KBS (cos=1300))&lt;br /&gt;
cput  'BASE/file.2' : 'file.2' ( 2 bytes, 0.5 KBS (cos=1300))&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And now when we look at the files on the HPSS&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls BASE&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
The output appears as:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/archive/group/user/BASE:&lt;br /&gt;
file.1  file.2  &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Recall ====&lt;br /&gt;
&lt;br /&gt;
We now recall the data from HPSS back to the GPFS disk. Once this entire simple test is complete, we suggest that you run through this test again using a real directory of your data. Thus, we do not delete the original directory on disk at this point. Instead, we create a new directory and recall the data from HPSS to this new directory where it can be checked for congruency to the original data if desired.&lt;br /&gt;
&lt;br /&gt;
First, we need to create a new directory on the disk. We will do this from within HSI, but you could also exit HSI (using the exit command or control-c) to make the directory changes and then run HSI again. Thus, continuing from above:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
lmkdir RECALL&lt;br /&gt;
lls&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
now lists both the BASE and RECALL directories on disk.&lt;br /&gt;
&lt;br /&gt;
We now recall the data from HPSS to GPFS:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
lcd RECALL&lt;br /&gt;
cget -R BASE&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And the output of the cget command is:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cget  '/project/.../TEST/RECALL/BASE/file.1' : '/archive/.../BASE/file.1' (2011/06/29 12:14:21 2 bytes, 4.0 KBS )&lt;br /&gt;
cget  '/project/.../TEST/RECALL/BASE/file.2' : '/archive/.../file.2' (2011/06/29 12:14:22 2 bytes, 3.7 KBS )&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We now exit HSI (using the exit command or control-c) and verify the existence of the directory that was brought back to GPFS.&lt;br /&gt;
&lt;br /&gt;
=== Typical Usage ===&lt;br /&gt;
The most common interactions will be ''putting'' data into HPSS, examining the contents (ls), and ''getting'' data back onto one of the active filesystems for inspection or analysis.&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
# Note that upon executing hsi, your initial directory will be: /archive/$(groups)/$(whoami)/&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away&lt;br /&gt;
cd put-away&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
A convenient way to explore the contents of HPSS is with the inventory shell [[ISH]]. This example creates an index of all the files in a user's portion of the namespace. The list is placed in the file /home/$USER/HPSSdm/hsi.igz that can be inspected from the gpc-devel nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_index&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
TODAY=$(date +%Y%m%d)&lt;br /&gt;
INDEX_DIR=/home/$USER/HPSSdm&lt;br /&gt;
if [[ -! -e $INDEX_DIR ]];then&lt;br /&gt;
  mkdir $INDEX_DIR&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
export ISHREGISTER=$HOME/HPSSdm&lt;br /&gt;
ish hindex&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This index can be browsed or searched with ISH on the development nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc-f104n084-$ ish ~/HPSSdm/hsi.igz &lt;br /&gt;
[ish]hsi.igz&amp;gt; help&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
ISH is a powerful tool that is also useful for creating and browsing indices of tar and htar archives, so please look at the [[ISH|documentation]] or built in help.&lt;br /&gt;
&lt;br /&gt;
* sample '''data recall'''&lt;br /&gt;
   - This example should be modified to emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do optimization.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-recall.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N recall&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/recalled-from-hpss&lt;br /&gt;
hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
cget /scratch/$USER/recalled-from-hpss/Jan-2010-jobs.tar.gz : put-away-on-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
cget /scratch/$USER/recalled-from-hpss/Feb-2010-jobs.tar.gz : put-away-on-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
&lt;br /&gt;
exit $status&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI vs. FTP ===&lt;br /&gt;
HSI syntax and usage is very similar to that of FTP. Please note the following information adapted from the HSI man page:&lt;br /&gt;
&lt;br /&gt;
HSI supports several of the commonly used FTP commands, including &amp;quot;dir&amp;quot;,&amp;quot;get&amp;quot;,&amp;quot;ls&amp;quot;,&amp;quot;mdelete&amp;quot;,&amp;quot;mget&amp;quot;,&amp;quot;put&amp;quot;,&amp;quot;mput&amp;quot; and &amp;quot;prompt&amp;quot;, with the following differences:&lt;br /&gt;
&lt;br /&gt;
* The &amp;quot;dir&amp;quot; command is an alias for &amp;quot;ls&amp;quot; in HSI. The &amp;quot;ls&amp;quot; command supports an extensive set of options for displaying files, including wildcard pattern-matching, and the ability to recursively list a directory tree&lt;br /&gt;
* The &amp;quot;put&amp;quot; and &amp;quot;get&amp;quot; family of commands support recursion&lt;br /&gt;
* There are &amp;quot;conditional put&amp;quot; and &amp;quot;conditional&amp;quot; get commands (cput, cget)&lt;br /&gt;
* The syntax for renaming files during transfers with HSI is different from FTP. With HSI, the general format is always &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     &amp;quot;local_file : hpss_file&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and multiple such pairs may be specified on a single command line.&lt;br /&gt;
&lt;br /&gt;
For example, when using HSI to store the local file &amp;quot;file1&amp;quot; as &amp;quot;hpss_file1&amp;quot; into HPSS, then retrieve it back to the local filesystem as &amp;quot;file1.bak&amp;quot;, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 : hpss_file1&lt;br /&gt;
    get file1.bak : hpss_file1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* unlike with FTP, where the following syntax would be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 hpss_file1 &lt;br /&gt;
    get hpss_file1 file1.bak&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The &amp;quot;m&amp;quot; prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The &amp;quot;m&amp;quot; series of commands are intended to provide a measure of compatibility for FTP users.&lt;br /&gt;
&lt;br /&gt;
=== Other HSI Examples === &lt;br /&gt;
&lt;br /&gt;
* Creating tar archive of C source programs and header files on the fly by piping stdout:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HSI pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Retrieve the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Put a subdirectory ''LargeFiles'' and all its contents recursively. You may use '-u' option to resume a previously disrupted session.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   cput -R -u LargeFiles&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help''']&lt;br /&gt;
&lt;br /&gt;
* verify checksum&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N checksum_verified_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
thefile=&amp;lt;localpath&amp;gt;&lt;br /&gt;
storedfile=&amp;lt;hpsspath&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# Generate checksum on fly using a named pipe so that file is only read from GPFS once&lt;br /&gt;
mkfifo /tmp/NPIPE&lt;br /&gt;
cat $thefile  | tee /tmp/NPIPE | hsi -q put - : $storedfile &amp;amp;&lt;br /&gt;
pid=$!&lt;br /&gt;
md5sum /tmp/NPIPE |tee /tmp/$fname.md5&lt;br /&gt;
rm     /tmp/NPIPE&lt;br /&gt;
&lt;br /&gt;
# Check the exit code of the HSI process  &lt;br /&gt;
wait $pid&lt;br /&gt;
sc=$?&lt;br /&gt;
if [ $sc != 0 ];then&lt;br /&gt;
  echo &amp;quot;File transfer failed&amp;quot;&lt;br /&gt;
  exit $sc&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# change filename to stdin in checksum file&lt;br /&gt;
sed -i.1 &amp;quot;s+/tmp/NPIPE+-+&amp;quot; /tmp/$fname.md5&lt;br /&gt;
&lt;br /&gt;
# verify checksum&lt;br /&gt;
hsi -q get - : $storedfile  | md5sum -c  /tmp/$fname.md5&lt;br /&gt;
sc=$?&lt;br /&gt;
if [ $sc != 0 ]; then&lt;br /&gt;
  echo '!!! Job Failed !!!'&lt;br /&gt;
  echo 'error=' $sc&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Strange HSI nuances === &lt;br /&gt;
&lt;br /&gt;
* During interactive use, even though it appears that the keyboard up arrow will retrieve previous HSI commands, this does not work as expected and should be avoided.&lt;br /&gt;
* Tab completion does not work. Be careful with combinations of tab completion and the &amp;quot;*&amp;quot; character!&lt;br /&gt;
&lt;br /&gt;
== '''HTAR''' ==&lt;br /&gt;
''' Please aggregate small files (&amp;lt;~100MB) into tarballs or htar files. '''&lt;br /&gt;
&lt;br /&gt;
HTAR is a utility that is used for aggregating a set of files and directories, by using a sophisticated multithreaded buffering scheme to write files from the local filesystem directly into HPSS, creating an archive file that conforms to the POSIX TAR specification, thereby achieving a high rate of performance. &lt;br /&gt;
&lt;br /&gt;
=== '''CAUTION''' ===&lt;br /&gt;
* Files larger than 68 GB cannot be stored in an htar archive (you'll get an error message for the whole operation)&lt;br /&gt;
* HTAR archives cannot contain more than 1 million files.&lt;br /&gt;
* Check the HTAR exit code and log file before removing any files from the active filesystems.&lt;br /&gt;
&lt;br /&gt;
=== HTAR Usage ===&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3533</id>
		<title>HPSS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3533"/>
		<updated>2011-06-29T16:34:02Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* Your First Time Using HSI */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''High Performance Storage System''' =&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The High Performance Storage System ([http://www.hpss-collaboration.org/index.shtml HPSS]) is a tape-backed hierarchical storage system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data can be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
Access and transfer of data into and out of HPSS is done under the control of the user, whose interaction is expected to be scripted and submitted as a batch job, using one or more of the following utilities:&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi HSI] is a client with an ftp-like functionality which can be used to archive and retrieve large files. It is also useful for browsing the contents of HPSS.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/htar HTAR] is a utility that creates tar formatted archives directly into HPSS. It also creates a separate index file (.idx) that can be accessed and browsed quickly.&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/ISH ISH] is a TUI utility that can perform an inventory of the files and directories in your tarballs.&lt;br /&gt;
&lt;br /&gt;
== '''Why should I use and trust HPSS?''' ==&lt;br /&gt;
* 10+ years history, used by 50+ facilities in the “Top 500” HPC list&lt;br /&gt;
* very reliable, data redundancy and data insurance built-in.&lt;br /&gt;
* highly scalable, reasonable performance at SciNet - Ingest: ~12 TB/day, Recall: ~24 TB/day (aggregated)&lt;br /&gt;
* HSI/HTAR clients also very reliable and used on several HPSS sites. ISH was written at SciNet.&lt;br /&gt;
* [[Media:HPSS_rational.pdf|HPSS fits well with the Storage Capacity Expansion Plan at SciNet]] (pdf presentation)&lt;br /&gt;
&lt;br /&gt;
== '''Guidelines''' ==&lt;br /&gt;
* Expanded storage capacity is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into tarballs with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the repository is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application's exit code and the returned log file for errors after all data transfers and any tarball creation process.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Access Through the Queue System'''  ==&lt;br /&gt;
All access to the archive system is done through the [[Moab|GPC queue system]].&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the HPSS should be scripted into jobs and submitted to the ''archive'' queue. See HSI example below.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_put_file_in_hpss&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The status of pending jobs can be monitored with showq specifying the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Recalling Data for Analysis ===&lt;br /&gt;
&lt;br /&gt;
Typically, data will be recalled to the /scratch file system when it is needed for analysis. Job dependencies can be used to have analysis jobs wait in the queue for data recalls before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-recall.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-recalled-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with HPSS. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents in HPSS. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the file does not already exist in HPSS&lt;br /&gt;
|-&lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local storage only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of HPSS.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
* Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* More complex sequences can be performed using a script such as this:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
Complete documentation of HSI is available on the Gleicher Enterprises web site.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/ HSI Introduction]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help hsi help]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Your First Time Using HSI ===&lt;br /&gt;
Once you are comfortable with HSI, you will script the process and submit it non-interactively to the queue. On your first tests, however, we suggest that you use an interactive queue to get familiar with the system. In the remainder of this subsection, you will find an example of an interactive test that you can run to become familiar with using HSI.&lt;br /&gt;
&lt;br /&gt;
==== Setup ====&lt;br /&gt;
&lt;br /&gt;
To begin the process, we create a directory structure to use for the tests:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mkdir BASE&lt;br /&gt;
echo 1 &amp;gt; BASE/file.1&lt;br /&gt;
echo 2 &amp;gt; BASE/file.2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Accessing HSI ====&lt;br /&gt;
&lt;br /&gt;
Now we obtain an interactive job on the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -l nodes=1:ppn=8,walltime=2:00:00 -q archive -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now we start HSI:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/usr/local/bin/hsi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And that provides us with the following prompt, where your initial directory will be: /archive/$(groups)/$(whoami)/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that the current directory on HSI (/archive/group/user) is empty by typing:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Also note that the current directory on the disk contains the directory structure that we created at the beginning:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
lls&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Where the output should contain the &amp;quot;BASE&amp;quot; directory.&lt;br /&gt;
&lt;br /&gt;
==== Offload ====&lt;br /&gt;
&lt;br /&gt;
Now we offload the BASE directory and all of its contents to the HPSS system. Running interactively, it is easy to learn that the -R flag is necessary to offload a directory. We thus use:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cput -R BASE&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The output appears as:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cput  'BASE/file.1' : 'file.1' ( 2 bytes, 0.2 KBS (cos=1300))&lt;br /&gt;
cput  'BASE/file.2' : 'file.2' ( 2 bytes, 0.5 KBS (cos=1300))&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And now when we look at the files on the HPSS&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls BASE&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
The output appears as:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/archive/group/user/BASE:&lt;br /&gt;
file.1  file.2  &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Recall ====&lt;br /&gt;
&lt;br /&gt;
We now recall the data from HPSS back to the GPFS disk. Once this entire simple test is complete, we suggest that you run through this test again using a real directory of your data. Thus, we do not delete the original directory on disk at this point. Instead, we create a new directory and recall the data from HPSS to this new directory where it can be checked for congruency to the original data if desired.&lt;br /&gt;
&lt;br /&gt;
First, we need to create a new directory on the disk. We will do this from within HSI, but you could also exit HSI (using the exit command or control-c) to make the directory changes and then run HSI again. Thus, continuing from above:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
lmkdir RECALL&lt;br /&gt;
lls&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
now lists both the BASE and RECALL directories on disk.&lt;br /&gt;
&lt;br /&gt;
We now recall the data from HPSS to GPFS:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
lcd RECALL&lt;br /&gt;
cget -R BASE&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And the output of the cget command is:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cget  '/project/.../TEST/RECALL/BASE/file.1' : '/archive/.../BASE/file.1' (2011/06/29 12:14:21 2 bytes, 4.0 KBS )&lt;br /&gt;
cget  '/project/.../TEST/RECALL/BASE/file.2' : '/archive/.../file.2' (2011/06/29 12:14:22 2 bytes, 3.7 KBS )&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We now exit HSI (using the exit command or control-c) and verify the existence of the directory that was brought back to GPFS.&lt;br /&gt;
&lt;br /&gt;
=== Typical Usage ===&lt;br /&gt;
The most common interactions will be ''putting'' data into HPSS, examining the contents (ls), and ''getting'' data back onto one of the active filesystems for inspection or analysis.&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
# Note that upon executing hsi, your initial directory will be: /archive/$(groups)/$(whoami)/&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away&lt;br /&gt;
cd put-away&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
A convenient way to explore the contents of HPSS is with the inventory shell [[ISH]]. This example creates an index of all the files in a user's portion of the namespace. The list is placed in the file /home/$USER/HPSSdm/hsi.igz that can be inspected from the gpc-devel nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_index&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
TODAY=$(date +%Y%m%d)&lt;br /&gt;
INDEX_DIR=/home/$USER/HPSSdm&lt;br /&gt;
if [[ -! -e $INDEX_DIR ]];then&lt;br /&gt;
  mkdir $INDEX_DIR&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
export ISHREGISTER=$HOME/HPSSdm&lt;br /&gt;
ish hindex&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This index can be browsed or searched with ISH on the development nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc-f104n084-$ ish ~/HPSSdm/hsi.igz &lt;br /&gt;
[ish]hsi.igz&amp;gt; help&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
ISH is a powerful tool that is also useful for creating and browsing indices of tar and htar archives, so please look at the [[ISH|documentation]] or built in help.&lt;br /&gt;
&lt;br /&gt;
* sample '''data recall'''&lt;br /&gt;
   - This example should be modified to emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do optimization.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-recall.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N recall&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/recalled-from-hpss&lt;br /&gt;
hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
cget /scratch/$USER/recalled-from-hpss/Jan-2010-jobs.tar.gz : put-away-on-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
cget /scratch/$USER/recalled-from-hpss/Feb-2010-jobs.tar.gz : put-away-on-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
&lt;br /&gt;
exit $status&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI vs. FTP ===&lt;br /&gt;
HSI syntax and usage is very similar to that of FTP. Please note the following information adapted from the HSI man page:&lt;br /&gt;
&lt;br /&gt;
HSI supports several of the commonly used FTP commands, including &amp;quot;dir&amp;quot;,&amp;quot;get&amp;quot;,&amp;quot;ls&amp;quot;,&amp;quot;mdelete&amp;quot;,&amp;quot;mget&amp;quot;,&amp;quot;put&amp;quot;,&amp;quot;mput&amp;quot; and &amp;quot;prompt&amp;quot;, with the following differences:&lt;br /&gt;
&lt;br /&gt;
* The &amp;quot;dir&amp;quot; command is an alias for &amp;quot;ls&amp;quot; in HSI. The &amp;quot;ls&amp;quot; command supports an extensive set of options for displaying files, including wildcard pattern-matching, and the ability to recursively list a directory tree&lt;br /&gt;
* The &amp;quot;put&amp;quot; and &amp;quot;get&amp;quot; family of commands support recursion&lt;br /&gt;
* There are &amp;quot;conditional put&amp;quot; and &amp;quot;conditional&amp;quot; get commands (cput, cget)&lt;br /&gt;
* The syntax for renaming files during transfers with HSI is different from FTP. With HSI, the general format is always &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     &amp;quot;local_file : hpss_file&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and multiple such pairs may be specified on a single command line.&lt;br /&gt;
&lt;br /&gt;
For example, when using HSI to store the local file &amp;quot;file1&amp;quot; as &amp;quot;hpss_file1&amp;quot; into HPSS, then retrieve it back to the local filesystem as &amp;quot;file1.bak&amp;quot;, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 : hpss_file1&lt;br /&gt;
    get file1.bak : hpss_file1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* unlike with FTP, where the following syntax would be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 hpss_file1 &lt;br /&gt;
    get hpss_file1 file1.bak&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The &amp;quot;m&amp;quot; prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The &amp;quot;m&amp;quot; series of commands are intended to provide a measure of compatibility for FTP users.&lt;br /&gt;
&lt;br /&gt;
=== Other HSI Examples === &lt;br /&gt;
&lt;br /&gt;
* Creating tar archive of C source programs and header files on the fly by piping stdout:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HSI pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Retrieve the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Put a subdirectory ''LargeFiles'' and all its contents recursively. You may use '-u' option to resume a previously disrupted session.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   cput -R -u LargeFiles&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help''']&lt;br /&gt;
&lt;br /&gt;
* verify checksum&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N checksum_verified_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
thefile=&amp;lt;localpath&amp;gt;&lt;br /&gt;
storedfile=&amp;lt;hpsspath&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# Generate checksum on fly using a named pipe so that file is only read from GPFS once&lt;br /&gt;
mkfifo /tmp/NPIPE&lt;br /&gt;
cat $thefile  | tee /tmp/NPIPE | hsi -q put - : $storedfile &amp;amp;&lt;br /&gt;
pid=$!&lt;br /&gt;
md5sum /tmp/NPIPE |tee /tmp/$fname.md5&lt;br /&gt;
rm     /tmp/NPIPE&lt;br /&gt;
&lt;br /&gt;
# Check the exit code of the HSI process  &lt;br /&gt;
wait $pid&lt;br /&gt;
sc=$?&lt;br /&gt;
if [ $sc != 0 ];then&lt;br /&gt;
  echo &amp;quot;File transfer failed&amp;quot;&lt;br /&gt;
  exit $sc&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# change filename to stdin in checksum file&lt;br /&gt;
sed -i.1 &amp;quot;s+/tmp/NPIPE+-+&amp;quot; /tmp/$fname.md5&lt;br /&gt;
&lt;br /&gt;
# verify checksum&lt;br /&gt;
hsi -q get - : $storedfile  | md5sum -c  /tmp/$fname.md5&lt;br /&gt;
sc=$?&lt;br /&gt;
if [ $sc != 0 ]; then&lt;br /&gt;
  echo '!!! Job Failed !!!'&lt;br /&gt;
  echo 'error=' $sc&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Strange HSI nuances === &lt;br /&gt;
&lt;br /&gt;
* During interactive use, even though it appears that the keyboard up arrow will retrieve previous HSI commands, this does not work as expected and should be avoided.&lt;br /&gt;
&lt;br /&gt;
== '''HTAR''' ==&lt;br /&gt;
''' Please aggregate small files (&amp;lt;~100MB) into tarballs or htar files. '''&lt;br /&gt;
&lt;br /&gt;
HTAR is a utility that is used for aggregating a set of files and directories, by using a sophisticated multithreaded buffering scheme to write files from the local filesystem directly into HPSS, creating an archive file that conforms to the POSIX TAR specification, thereby achieving a high rate of performance. &lt;br /&gt;
&lt;br /&gt;
=== '''CAUTION''' ===&lt;br /&gt;
* Files larger than 68 GB cannot be stored in an htar archive (you'll get an error message for the whole operation)&lt;br /&gt;
* HTAR archives cannot contain more than 1 million files.&lt;br /&gt;
* Check the HTAR exit code and log file before removing any files from the active filesystems.&lt;br /&gt;
&lt;br /&gt;
=== HTAR Usage ===&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3532</id>
		<title>HPSS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3532"/>
		<updated>2011-06-29T16:21:19Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* Your First Time Using HSI */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''High Performance Storage System''' =&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The High Performance Storage System ([http://www.hpss-collaboration.org/index.shtml HPSS]) is a tape-backed hierarchical storage system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data can be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
Access and transfer of data into and out of HPSS is done under the control of the user, whose interaction is expected to be scripted and submitted as a batch job, using one or more of the following utilities:&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi HSI] is a client with an ftp-like functionality which can be used to archive and retrieve large files. It is also useful for browsing the contents of HPSS.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/htar HTAR] is a utility that creates tar formatted archives directly into HPSS. It also creates a separate index file (.idx) that can be accessed and browsed quickly.&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/ISH ISH] is a TUI utility that can perform an inventory of the files and directories in your tarballs.&lt;br /&gt;
&lt;br /&gt;
== '''Why should I use and trust HPSS?''' ==&lt;br /&gt;
* 10+ years history, used by 50+ facilities in the “Top 500” HPC list&lt;br /&gt;
* very reliable, data redundancy and data insurance built-in.&lt;br /&gt;
* highly scalable, reasonable performance at SciNet - Ingest: ~12 TB/day, Recall: ~24 TB/day (aggregated)&lt;br /&gt;
* HSI/HTAR clients also very reliable and used on several HPSS sites. ISH was written at SciNet.&lt;br /&gt;
* [[Media:HPSS_rational.pdf|HPSS fits well with the Storage Capacity Expansion Plan at SciNet]] (pdf presentation)&lt;br /&gt;
&lt;br /&gt;
== '''Guidelines''' ==&lt;br /&gt;
* Expanded storage capacity is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into tarballs with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the repository is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application's exit code and the returned log file for errors after all data transfers and any tarball creation process.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Access Through the Queue System'''  ==&lt;br /&gt;
All access to the archive system is done through the [[Moab|GPC queue system]].&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the HPSS should be scripted into jobs and submitted to the ''archive'' queue. See HSI example below.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_put_file_in_hpss&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The status of pending jobs can be monitored with showq specifying the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Recalling Data for Analysis ===&lt;br /&gt;
&lt;br /&gt;
Typically, data will be recalled to the /scratch file system when it is needed for analysis. Job dependencies can be used to have analysis jobs wait in the queue for data recalls before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-recall.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-recalled-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with HPSS. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents in HPSS. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the file does not already exist in HPSS&lt;br /&gt;
|-&lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local storage only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of HPSS.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
* Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* More complex sequences can be performed using a script such as this:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
Complete documentation of HSI is available on the Gleicher Enterprises web site.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/ HSI Introduction]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help hsi help]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Your First Time Using HSI ===&lt;br /&gt;
Once you are comfortable with HSI, you will script the process and submit it non-interactively to the queue. On your first tests, however, we suggest that you use an interactive queue to get familiar with the system. In the remainder of this subsection, you will find an example of an interactive test that you can run to become familiar with using HSI.&lt;br /&gt;
&lt;br /&gt;
==== Setup ====&lt;br /&gt;
&lt;br /&gt;
To begin the process, we create a directory structure to use for the tests:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mkdir BASE&lt;br /&gt;
echo 1 &amp;gt; BASE/file.1&lt;br /&gt;
echo 2 &amp;gt; BASE/file.2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Accessing HSI ====&lt;br /&gt;
&lt;br /&gt;
Now we obtain an interactive job on the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -l nodes=1:ppn=8,walltime=2:00:00 -q archive -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now we start HSI:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/usr/local/bin/hsi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And that provides us with the following prompt, where your initial directory will be: /archive/$(groups)/$(whoami)/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that the current directory on HSI (/archive/group/user) is empty by typing:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Also note that the current directory on the disk contains the directory structure that we created at the beginning:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
lls&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Where the output should contain the &amp;quot;BASE&amp;quot; directory.&lt;br /&gt;
&lt;br /&gt;
==== Upload ====&lt;br /&gt;
&lt;br /&gt;
Now we upload the BASE directory and all of its contents to the HPSS system. Running interactively, it is easy to learn that the -R flag is necessary to upload a directory. We thus use:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cput -R BASE&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The output appears as:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cput  'BASE/file.1' : 'file.1' ( 2 bytes, 0.2 KBS (cos=1300))&lt;br /&gt;
cput  'BASE/file.2' : 'file.2' ( 2 bytes, 0.5 KBS (cos=1300))&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And now when we look at the files on the HPSS&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls BASE&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
The output appears as:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/archive/group/user/BASE:&lt;br /&gt;
file.1  file.2  &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Download ====&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Typical Usage ===&lt;br /&gt;
The most common interactions will be ''putting'' data into HPSS, examining the contents (ls), and ''getting'' data back onto one of the active filesystems for inspection or analysis.&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
# Note that upon executing hsi, your initial directory will be: /archive/$(groups)/$(whoami)/&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away&lt;br /&gt;
cd put-away&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
A convenient way to explore the contents of HPSS is with the inventory shell [[ISH]]. This example creates an index of all the files in a user's portion of the namespace. The list is placed in the file /home/$USER/HPSSdm/hsi.igz that can be inspected from the gpc-devel nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_index&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
TODAY=$(date +%Y%m%d)&lt;br /&gt;
INDEX_DIR=/home/$USER/HPSSdm&lt;br /&gt;
if [[ -! -e $INDEX_DIR ]];then&lt;br /&gt;
  mkdir $INDEX_DIR&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
export ISHREGISTER=$HOME/HPSSdm&lt;br /&gt;
ish hindex&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This index can be browsed or searched with ISH on the development nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc-f104n084-$ ish ~/HPSSdm/hsi.igz &lt;br /&gt;
[ish]hsi.igz&amp;gt; help&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
ISH is a powerful tool that is also useful for creating and browsing indices of tar and htar archives, so please look at the [[ISH|documentation]] or built in help.&lt;br /&gt;
&lt;br /&gt;
* sample '''data recall'''&lt;br /&gt;
   - This example should be modified to emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do optimization.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-recall.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N recall&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/recalled-from-hpss&lt;br /&gt;
hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
cget /scratch/$USER/recalled-from-hpss/Jan-2010-jobs.tar.gz : put-away-on-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
cget /scratch/$USER/recalled-from-hpss/Feb-2010-jobs.tar.gz : put-away-on-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
&lt;br /&gt;
exit $status&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI vs. FTP ===&lt;br /&gt;
HSI syntax and usage is very similar to that of FTP. Please note the following information adapted from the HSI man page:&lt;br /&gt;
&lt;br /&gt;
HSI supports several of the commonly used FTP commands, including &amp;quot;dir&amp;quot;,&amp;quot;get&amp;quot;,&amp;quot;ls&amp;quot;,&amp;quot;mdelete&amp;quot;,&amp;quot;mget&amp;quot;,&amp;quot;put&amp;quot;,&amp;quot;mput&amp;quot; and &amp;quot;prompt&amp;quot;, with the following differences:&lt;br /&gt;
&lt;br /&gt;
* The &amp;quot;dir&amp;quot; command is an alias for &amp;quot;ls&amp;quot; in HSI. The &amp;quot;ls&amp;quot; command supports an extensive set of options for displaying files, including wildcard pattern-matching, and the ability to recursively list a directory tree&lt;br /&gt;
* The &amp;quot;put&amp;quot; and &amp;quot;get&amp;quot; family of commands support recursion&lt;br /&gt;
* There are &amp;quot;conditional put&amp;quot; and &amp;quot;conditional&amp;quot; get commands (cput, cget)&lt;br /&gt;
* The syntax for renaming files during transfers with HSI is different from FTP. With HSI, the general format is always &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     &amp;quot;local_file : hpss_file&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and multiple such pairs may be specified on a single command line.&lt;br /&gt;
&lt;br /&gt;
For example, when using HSI to store the local file &amp;quot;file1&amp;quot; as &amp;quot;hpss_file1&amp;quot; into HPSS, then retrieve it back to the local filesystem as &amp;quot;file1.bak&amp;quot;, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 : hpss_file1&lt;br /&gt;
    get file1.bak : hpss_file1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* unlike with FTP, where the following syntax would be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 hpss_file1 &lt;br /&gt;
    get hpss_file1 file1.bak&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The &amp;quot;m&amp;quot; prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The &amp;quot;m&amp;quot; series of commands are intended to provide a measure of compatibility for FTP users.&lt;br /&gt;
&lt;br /&gt;
=== Other HSI Examples === &lt;br /&gt;
&lt;br /&gt;
* Creating tar archive of C source programs and header files on the fly by piping stdout:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HSI pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Retrieve the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Put a subdirectory ''LargeFiles'' and all its contents recursively. You may use '-u' option to resume a previously disrupted session.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   cput -R -u LargeFiles&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help''']&lt;br /&gt;
&lt;br /&gt;
* verify checksum&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N checksum_verified_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
thefile=&amp;lt;localpath&amp;gt;&lt;br /&gt;
storedfile=&amp;lt;hpsspath&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# Generate checksum on fly using a named pipe so that file is only read from GPFS once&lt;br /&gt;
mkfifo /tmp/NPIPE&lt;br /&gt;
cat $thefile  | tee /tmp/NPIPE | hsi -q put - : $storedfile &amp;amp;&lt;br /&gt;
pid=$!&lt;br /&gt;
md5sum /tmp/NPIPE |tee /tmp/$fname.md5&lt;br /&gt;
rm     /tmp/NPIPE&lt;br /&gt;
&lt;br /&gt;
# Check the exit code of the HSI process  &lt;br /&gt;
wait $pid&lt;br /&gt;
sc=$?&lt;br /&gt;
if [ $sc != 0 ];then&lt;br /&gt;
  echo &amp;quot;File transfer failed&amp;quot;&lt;br /&gt;
  exit $sc&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# change filename to stdin in checksum file&lt;br /&gt;
sed -i.1 &amp;quot;s+/tmp/NPIPE+-+&amp;quot; /tmp/$fname.md5&lt;br /&gt;
&lt;br /&gt;
# verify checksum&lt;br /&gt;
hsi -q get - : $storedfile  | md5sum -c  /tmp/$fname.md5&lt;br /&gt;
sc=$?&lt;br /&gt;
if [ $sc != 0 ]; then&lt;br /&gt;
  echo '!!! Job Failed !!!'&lt;br /&gt;
  echo 'error=' $sc&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Strange HSI nuances === &lt;br /&gt;
&lt;br /&gt;
* During interactive use, even though it appears that the keyboard up arrow will retrieve previous HSI commands, this does not work as expected and should be avoided.&lt;br /&gt;
&lt;br /&gt;
== '''HTAR''' ==&lt;br /&gt;
''' Please aggregate small files (&amp;lt;~100MB) into tarballs or htar files. '''&lt;br /&gt;
&lt;br /&gt;
HTAR is a utility that is used for aggregating a set of files and directories, by using a sophisticated multithreaded buffering scheme to write files from the local filesystem directly into HPSS, creating an archive file that conforms to the POSIX TAR specification, thereby achieving a high rate of performance. &lt;br /&gt;
&lt;br /&gt;
=== '''CAUTION''' ===&lt;br /&gt;
* Files larger than 68 GB cannot be stored in an htar archive (you'll get an error message for the whole operation)&lt;br /&gt;
* HTAR archives cannot contain more than 1 million files.&lt;br /&gt;
* Check the HTAR exit code and log file before removing any files from the active filesystems.&lt;br /&gt;
&lt;br /&gt;
=== HTAR Usage ===&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3531</id>
		<title>HPSS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3531"/>
		<updated>2011-06-29T16:18:22Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* Your First Time Using HSI */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''High Performance Storage System''' =&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The High Performance Storage System ([http://www.hpss-collaboration.org/index.shtml HPSS]) is a tape-backed hierarchical storage system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data can be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
Access and transfer of data into and out of HPSS is done under the control of the user, whose interaction is expected to be scripted and submitted as a batch job, using one or more of the following utilities:&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi HSI] is a client with an ftp-like functionality which can be used to archive and retrieve large files. It is also useful for browsing the contents of HPSS.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/htar HTAR] is a utility that creates tar formatted archives directly into HPSS. It also creates a separate index file (.idx) that can be accessed and browsed quickly.&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/ISH ISH] is a TUI utility that can perform an inventory of the files and directories in your tarballs.&lt;br /&gt;
&lt;br /&gt;
== '''Why should I use and trust HPSS?''' ==&lt;br /&gt;
* 10+ years history, used by 50+ facilities in the “Top 500” HPC list&lt;br /&gt;
* very reliable, data redundancy and data insurance built-in.&lt;br /&gt;
* highly scalable, reasonable performance at SciNet - Ingest: ~12 TB/day, Recall: ~24 TB/day (aggregated)&lt;br /&gt;
* HSI/HTAR clients also very reliable and used on several HPSS sites. ISH was written at SciNet.&lt;br /&gt;
* [[Media:HPSS_rational.pdf|HPSS fits well with the Storage Capacity Expansion Plan at SciNet]] (pdf presentation)&lt;br /&gt;
&lt;br /&gt;
== '''Guidelines''' ==&lt;br /&gt;
* Expanded storage capacity is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into tarballs with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the repository is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application's exit code and the returned log file for errors after all data transfers and any tarball creation process.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Access Through the Queue System'''  ==&lt;br /&gt;
All access to the archive system is done through the [[Moab|GPC queue system]].&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the HPSS should be scripted into jobs and submitted to the ''archive'' queue. See HSI example below.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_put_file_in_hpss&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The status of pending jobs can be monitored with showq specifying the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Recalling Data for Analysis ===&lt;br /&gt;
&lt;br /&gt;
Typically, data will be recalled to the /scratch file system when it is needed for analysis. Job dependencies can be used to have analysis jobs wait in the queue for data recalls before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-recall.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-recalled-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with HPSS. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents in HPSS. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the file does not already exist in HPSS&lt;br /&gt;
|-&lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local storage only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of HPSS.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
* Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* More complex sequences can be performed using a script such as this:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
Complete documentation of HSI is available on the Gleicher Enterprises web site.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/ HSI Introduction]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help hsi help]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Your First Time Using HSI ===&lt;br /&gt;
Once you are comfortable with HSI, you will script the process and submit it non-interactively to the queue. On your first tests, however, we suggest that you use an interactive queue to get familiar with the system. In the remainder of this subsection, you will find an example of an interactive test that you can run to become familiar with using HSI.&lt;br /&gt;
&lt;br /&gt;
To begin the process, we create a directory structure to use for the tests:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mkdir BASE&lt;br /&gt;
echo 1 &amp;gt; BASE/file.1&lt;br /&gt;
echo 2 &amp;gt; BASE/file.2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now we obtain an interactive job on the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -l nodes=1:ppn=8,walltime=2:00:00 -q archive -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now we start HSI:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/usr/local/bin/hsi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And that provides us with the following prompt, where your initial directory will be: /archive/$(groups)/$(whoami)/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that the current directory on HSI (/archive/group/user) is empty by typing:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Also note that the current directory on the disk contains the directory structure that we created at the beginning:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
lls&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Where the output should contain the &amp;quot;BASE&amp;quot; directory.&lt;br /&gt;
&lt;br /&gt;
Now we upload the BASE directory and all of its contents to the HPSS system. Running interactively, it is easy to learn that the -R flag is necessary to upload a directory. We thus use:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cput -R BASE&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The output appears as:&lt;br /&gt;
cput  'BASE/file.1' : 'file.1' ( 2 bytes, 0.2 KBS (cos=1300))&lt;br /&gt;
cput  'BASE/file.2' : 'file.2' ( 2 bytes, 0.5 KBS (cos=1300))&lt;br /&gt;
&lt;br /&gt;
And now when we look at the files on the HPSS&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls BASE&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
/archive/group/user/BASE:&lt;br /&gt;
file.1  file.2  &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Typical Usage ===&lt;br /&gt;
The most common interactions will be ''putting'' data into HPSS, examining the contents (ls), and ''getting'' data back onto one of the active filesystems for inspection or analysis.&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
# Note that upon executing hsi, your initial directory will be: /archive/$(groups)/$(whoami)/&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away&lt;br /&gt;
cd put-away&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
A convenient way to explore the contents of HPSS is with the inventory shell [[ISH]]. This example creates an index of all the files in a user's portion of the namespace. The list is placed in the file /home/$USER/HPSSdm/hsi.igz that can be inspected from the gpc-devel nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_index&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
TODAY=$(date +%Y%m%d)&lt;br /&gt;
INDEX_DIR=/home/$USER/HPSSdm&lt;br /&gt;
if [[ -! -e $INDEX_DIR ]];then&lt;br /&gt;
  mkdir $INDEX_DIR&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
export ISHREGISTER=$HOME/HPSSdm&lt;br /&gt;
ish hindex&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This index can be browsed or searched with ISH on the development nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc-f104n084-$ ish ~/HPSSdm/hsi.igz &lt;br /&gt;
[ish]hsi.igz&amp;gt; help&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
ISH is a powerful tool that is also useful for creating and browsing indices of tar and htar archives, so please look at the [[ISH|documentation]] or built in help.&lt;br /&gt;
&lt;br /&gt;
* sample '''data recall'''&lt;br /&gt;
   - This example should be modified to emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do optimization.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-recall.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N recall&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/recalled-from-hpss&lt;br /&gt;
hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
cget /scratch/$USER/recalled-from-hpss/Jan-2010-jobs.tar.gz : put-away-on-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
cget /scratch/$USER/recalled-from-hpss/Feb-2010-jobs.tar.gz : put-away-on-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
&lt;br /&gt;
exit $status&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI vs. FTP ===&lt;br /&gt;
HSI syntax and usage is very similar to that of FTP. Please note the following information adapted from the HSI man page:&lt;br /&gt;
&lt;br /&gt;
HSI supports several of the commonly used FTP commands, including &amp;quot;dir&amp;quot;,&amp;quot;get&amp;quot;,&amp;quot;ls&amp;quot;,&amp;quot;mdelete&amp;quot;,&amp;quot;mget&amp;quot;,&amp;quot;put&amp;quot;,&amp;quot;mput&amp;quot; and &amp;quot;prompt&amp;quot;, with the following differences:&lt;br /&gt;
&lt;br /&gt;
* The &amp;quot;dir&amp;quot; command is an alias for &amp;quot;ls&amp;quot; in HSI. The &amp;quot;ls&amp;quot; command supports an extensive set of options for displaying files, including wildcard pattern-matching, and the ability to recursively list a directory tree&lt;br /&gt;
* The &amp;quot;put&amp;quot; and &amp;quot;get&amp;quot; family of commands support recursion&lt;br /&gt;
* There are &amp;quot;conditional put&amp;quot; and &amp;quot;conditional&amp;quot; get commands (cput, cget)&lt;br /&gt;
* The syntax for renaming files during transfers with HSI is different from FTP. With HSI, the general format is always &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     &amp;quot;local_file : hpss_file&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and multiple such pairs may be specified on a single command line.&lt;br /&gt;
&lt;br /&gt;
For example, when using HSI to store the local file &amp;quot;file1&amp;quot; as &amp;quot;hpss_file1&amp;quot; into HPSS, then retrieve it back to the local filesystem as &amp;quot;file1.bak&amp;quot;, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 : hpss_file1&lt;br /&gt;
    get file1.bak : hpss_file1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* unlike with FTP, where the following syntax would be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 hpss_file1 &lt;br /&gt;
    get hpss_file1 file1.bak&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The &amp;quot;m&amp;quot; prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The &amp;quot;m&amp;quot; series of commands are intended to provide a measure of compatibility for FTP users.&lt;br /&gt;
&lt;br /&gt;
=== Other HSI Examples === &lt;br /&gt;
&lt;br /&gt;
* Creating tar archive of C source programs and header files on the fly by piping stdout:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HSI pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Retrieve the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Put a subdirectory ''LargeFiles'' and all its contents recursively. You may use '-u' option to resume a previously disrupted session.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   cput -R -u LargeFiles&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help''']&lt;br /&gt;
&lt;br /&gt;
* verify checksum&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N checksum_verified_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
thefile=&amp;lt;localpath&amp;gt;&lt;br /&gt;
storedfile=&amp;lt;hpsspath&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# Generate checksum on fly using a named pipe so that file is only read from GPFS once&lt;br /&gt;
mkfifo /tmp/NPIPE&lt;br /&gt;
cat $thefile  | tee /tmp/NPIPE | hsi -q put - : $storedfile &amp;amp;&lt;br /&gt;
pid=$!&lt;br /&gt;
md5sum /tmp/NPIPE |tee /tmp/$fname.md5&lt;br /&gt;
rm     /tmp/NPIPE&lt;br /&gt;
&lt;br /&gt;
# Check the exit code of the HSI process  &lt;br /&gt;
wait $pid&lt;br /&gt;
sc=$?&lt;br /&gt;
if [ $sc != 0 ];then&lt;br /&gt;
  echo &amp;quot;File transfer failed&amp;quot;&lt;br /&gt;
  exit $sc&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# change filename to stdin in checksum file&lt;br /&gt;
sed -i.1 &amp;quot;s+/tmp/NPIPE+-+&amp;quot; /tmp/$fname.md5&lt;br /&gt;
&lt;br /&gt;
# verify checksum&lt;br /&gt;
hsi -q get - : $storedfile  | md5sum -c  /tmp/$fname.md5&lt;br /&gt;
sc=$?&lt;br /&gt;
if [ $sc != 0 ]; then&lt;br /&gt;
  echo '!!! Job Failed !!!'&lt;br /&gt;
  echo 'error=' $sc&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Strange HSI nuances === &lt;br /&gt;
&lt;br /&gt;
* During interactive use, even though it appears that the keyboard up arrow will retrieve previous HSI commands, this does not work as expected and should be avoided.&lt;br /&gt;
&lt;br /&gt;
== '''HTAR''' ==&lt;br /&gt;
''' Please aggregate small files (&amp;lt;~100MB) into tarballs or htar files. '''&lt;br /&gt;
&lt;br /&gt;
HTAR is a utility that is used for aggregating a set of files and directories, by using a sophisticated multithreaded buffering scheme to write files from the local filesystem directly into HPSS, creating an archive file that conforms to the POSIX TAR specification, thereby achieving a high rate of performance. &lt;br /&gt;
&lt;br /&gt;
=== '''CAUTION''' ===&lt;br /&gt;
* Files larger than 68 GB cannot be stored in an htar archive (you'll get an error message for the whole operation)&lt;br /&gt;
* HTAR archives cannot contain more than 1 million files.&lt;br /&gt;
* Check the HTAR exit code and log file before removing any files from the active filesystems.&lt;br /&gt;
&lt;br /&gt;
=== HTAR Usage ===&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3530</id>
		<title>HPSS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3530"/>
		<updated>2011-06-29T16:16:36Z</updated>

		<summary type="html">&lt;p&gt;Cneale: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''High Performance Storage System''' =&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The High Performance Storage System ([http://www.hpss-collaboration.org/index.shtml HPSS]) is a tape-backed hierarchical storage system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data can be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
Access and transfer of data into and out of HPSS is done under the control of the user, whose interaction is expected to be scripted and submitted as a batch job, using one or more of the following utilities:&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi HSI] is a client with an ftp-like functionality which can be used to archive and retrieve large files. It is also useful for browsing the contents of HPSS.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/htar HTAR] is a utility that creates tar formatted archives directly into HPSS. It also creates a separate index file (.idx) that can be accessed and browsed quickly.&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/ISH ISH] is a TUI utility that can perform an inventory of the files and directories in your tarballs.&lt;br /&gt;
&lt;br /&gt;
== '''Why should I use and trust HPSS?''' ==&lt;br /&gt;
* 10+ years history, used by 50+ facilities in the “Top 500” HPC list&lt;br /&gt;
* very reliable, data redundancy and data insurance built-in.&lt;br /&gt;
* highly scalable, reasonable performance at SciNet - Ingest: ~12 TB/day, Recall: ~24 TB/day (aggregated)&lt;br /&gt;
* HSI/HTAR clients also very reliable and used on several HPSS sites. ISH was written at SciNet.&lt;br /&gt;
* [[Media:HPSS_rational.pdf|HPSS fits well with the Storage Capacity Expansion Plan at SciNet]] (pdf presentation)&lt;br /&gt;
&lt;br /&gt;
== '''Guidelines''' ==&lt;br /&gt;
* Expanded storage capacity is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into tarballs with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the repository is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application's exit code and the returned log file for errors after all data transfers and any tarball creation process.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Access Through the Queue System'''  ==&lt;br /&gt;
All access to the archive system is done through the [[Moab|GPC queue system]].&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the HPSS should be scripted into jobs and submitted to the ''archive'' queue. See HSI example below.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_put_file_in_hpss&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The status of pending jobs can be monitored with showq specifying the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Recalling Data for Analysis ===&lt;br /&gt;
&lt;br /&gt;
Typically, data will be recalled to the /scratch file system when it is needed for analysis. Job dependencies can be used to have analysis jobs wait in the queue for data recalls before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-recall.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-recalled-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with HPSS. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents in HPSS. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the file does not already exist in HPSS&lt;br /&gt;
|-&lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local storage only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of HPSS.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
* Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* More complex sequences can be performed using a script such as this:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
Complete documentation of HSI is available on the Gleicher Enterprises web site.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/ HSI Introduction]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help hsi help]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Your First Time Using HSI ===&lt;br /&gt;
Once you are comfortable with HSI, you will script the process and submit it non-interactively to the queue. On your first tests, however, we suggest that you use an interactive queue to get familiar with the system. In the remainder of this subsection, you will find an example of an interactive test that you can run to become familiar with using HSI.&lt;br /&gt;
&lt;br /&gt;
To begin the process, we create a directory structure to use for the tests:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mkdir BASE&lt;br /&gt;
echo 1 &amp;gt; BASE/file.1&lt;br /&gt;
echo 2 &amp;gt; BASE/file.2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now we obtain an interactive job on the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -l nodes=1:ppn=8,walltime=2:00:00 -q archive -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now we start HSI:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/usr/local/bin/hsi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And that provides us with the following prompt, where your initial directory will be: /archive/$(groups)/$(whoami)/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that the current directory on HSI (/archive/group/user) is empty by typing:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Also note that the current directory on the disk contains the directory structure that we created at the beginning:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
lls&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Where the output should contain the &amp;quot;BASE&amp;quot; directory.&lt;br /&gt;
&lt;br /&gt;
Now we upload the BASE directory and all of its contents to the HPSS system. Running interactively, it is easy to learn that the -R flag is necessary to upload a directory. We thus use:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cput -R BASE&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[HSI]/archive/pomes/cneale-&amp;gt;cput -R BASE&lt;br /&gt;
cput  'BASE/file.1' : 'file.1' ( 2 bytes, 0.2 KBS (cos=1300))&lt;br /&gt;
cput  'BASE/file.2' : 'file.2' ( 2 bytes, 0.5 KBS (cos=1300))&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Typical Usage ===&lt;br /&gt;
The most common interactions will be ''putting'' data into HPSS, examining the contents (ls), and ''getting'' data back onto one of the active filesystems for inspection or analysis.&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
# Note that upon executing hsi, your initial directory will be: /archive/$(groups)/$(whoami)/&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away&lt;br /&gt;
cd put-away&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
A convenient way to explore the contents of HPSS is with the inventory shell [[ISH]]. This example creates an index of all the files in a user's portion of the namespace. The list is placed in the file /home/$USER/HPSSdm/hsi.igz that can be inspected from the gpc-devel nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_index&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
TODAY=$(date +%Y%m%d)&lt;br /&gt;
INDEX_DIR=/home/$USER/HPSSdm&lt;br /&gt;
if [[ -! -e $INDEX_DIR ]];then&lt;br /&gt;
  mkdir $INDEX_DIR&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
export ISHREGISTER=$HOME/HPSSdm&lt;br /&gt;
ish hindex&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This index can be browsed or searched with ISH on the development nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc-f104n084-$ ish ~/HPSSdm/hsi.igz &lt;br /&gt;
[ish]hsi.igz&amp;gt; help&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
ISH is a powerful tool that is also useful for creating and browsing indices of tar and htar archives, so please look at the [[ISH|documentation]] or built in help.&lt;br /&gt;
&lt;br /&gt;
* sample '''data recall'''&lt;br /&gt;
   - This example should be modified to emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do optimization.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-recall.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N recall&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/recalled-from-hpss&lt;br /&gt;
hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
cget /scratch/$USER/recalled-from-hpss/Jan-2010-jobs.tar.gz : put-away-on-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
cget /scratch/$USER/recalled-from-hpss/Feb-2010-jobs.tar.gz : put-away-on-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
&lt;br /&gt;
exit $status&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI vs. FTP ===&lt;br /&gt;
HSI syntax and usage is very similar to that of FTP. Please note the following information adapted from the HSI man page:&lt;br /&gt;
&lt;br /&gt;
HSI supports several of the commonly used FTP commands, including &amp;quot;dir&amp;quot;,&amp;quot;get&amp;quot;,&amp;quot;ls&amp;quot;,&amp;quot;mdelete&amp;quot;,&amp;quot;mget&amp;quot;,&amp;quot;put&amp;quot;,&amp;quot;mput&amp;quot; and &amp;quot;prompt&amp;quot;, with the following differences:&lt;br /&gt;
&lt;br /&gt;
* The &amp;quot;dir&amp;quot; command is an alias for &amp;quot;ls&amp;quot; in HSI. The &amp;quot;ls&amp;quot; command supports an extensive set of options for displaying files, including wildcard pattern-matching, and the ability to recursively list a directory tree&lt;br /&gt;
* The &amp;quot;put&amp;quot; and &amp;quot;get&amp;quot; family of commands support recursion&lt;br /&gt;
* There are &amp;quot;conditional put&amp;quot; and &amp;quot;conditional&amp;quot; get commands (cput, cget)&lt;br /&gt;
* The syntax for renaming files during transfers with HSI is different from FTP. With HSI, the general format is always &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     &amp;quot;local_file : hpss_file&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and multiple such pairs may be specified on a single command line.&lt;br /&gt;
&lt;br /&gt;
For example, when using HSI to store the local file &amp;quot;file1&amp;quot; as &amp;quot;hpss_file1&amp;quot; into HPSS, then retrieve it back to the local filesystem as &amp;quot;file1.bak&amp;quot;, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 : hpss_file1&lt;br /&gt;
    get file1.bak : hpss_file1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* unlike with FTP, where the following syntax would be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 hpss_file1 &lt;br /&gt;
    get hpss_file1 file1.bak&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The &amp;quot;m&amp;quot; prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The &amp;quot;m&amp;quot; series of commands are intended to provide a measure of compatibility for FTP users.&lt;br /&gt;
&lt;br /&gt;
=== Other HSI Examples === &lt;br /&gt;
&lt;br /&gt;
* Creating tar archive of C source programs and header files on the fly by piping stdout:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HSI pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Retrieve the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Put a subdirectory ''LargeFiles'' and all its contents recursively. You may use '-u' option to resume a previously disrupted session.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   cput -R -u LargeFiles&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help''']&lt;br /&gt;
&lt;br /&gt;
* verify checksum&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N checksum_verified_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
thefile=&amp;lt;localpath&amp;gt;&lt;br /&gt;
storedfile=&amp;lt;hpsspath&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# Generate checksum on fly using a named pipe so that file is only read from GPFS once&lt;br /&gt;
mkfifo /tmp/NPIPE&lt;br /&gt;
cat $thefile  | tee /tmp/NPIPE | hsi -q put - : $storedfile &amp;amp;&lt;br /&gt;
pid=$!&lt;br /&gt;
md5sum /tmp/NPIPE |tee /tmp/$fname.md5&lt;br /&gt;
rm     /tmp/NPIPE&lt;br /&gt;
&lt;br /&gt;
# Check the exit code of the HSI process  &lt;br /&gt;
wait $pid&lt;br /&gt;
sc=$?&lt;br /&gt;
if [ $sc != 0 ];then&lt;br /&gt;
  echo &amp;quot;File transfer failed&amp;quot;&lt;br /&gt;
  exit $sc&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# change filename to stdin in checksum file&lt;br /&gt;
sed -i.1 &amp;quot;s+/tmp/NPIPE+-+&amp;quot; /tmp/$fname.md5&lt;br /&gt;
&lt;br /&gt;
# verify checksum&lt;br /&gt;
hsi -q get - : $storedfile  | md5sum -c  /tmp/$fname.md5&lt;br /&gt;
sc=$?&lt;br /&gt;
if [ $sc != 0 ]; then&lt;br /&gt;
  echo '!!! Job Failed !!!'&lt;br /&gt;
  echo 'error=' $sc&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Strange HSI nuances === &lt;br /&gt;
&lt;br /&gt;
* During interactive use, even though it appears that the keyboard up arrow will retrieve previous HSI commands, this does not work as expected and should be avoided.&lt;br /&gt;
&lt;br /&gt;
== '''HTAR''' ==&lt;br /&gt;
''' Please aggregate small files (&amp;lt;~100MB) into tarballs or htar files. '''&lt;br /&gt;
&lt;br /&gt;
HTAR is a utility that is used for aggregating a set of files and directories, by using a sophisticated multithreaded buffering scheme to write files from the local filesystem directly into HPSS, creating an archive file that conforms to the POSIX TAR specification, thereby achieving a high rate of performance. &lt;br /&gt;
&lt;br /&gt;
=== '''CAUTION''' ===&lt;br /&gt;
* Files larger than 68 GB cannot be stored in an htar archive (you'll get an error message for the whole operation)&lt;br /&gt;
* HTAR archives cannot contain more than 1 million files.&lt;br /&gt;
* Check the HTAR exit code and log file before removing any files from the active filesystems.&lt;br /&gt;
&lt;br /&gt;
=== HTAR Usage ===&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3529</id>
		<title>HPSS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3529"/>
		<updated>2011-06-29T16:15:28Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* Your First Time Using HSI */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''High Performance Storage System''' =&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The High Performance Storage System ([http://www.hpss-collaboration.org/index.shtml HPSS]) is a tape-backed hierarchical storage system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data can be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
Access and transfer of data into and out of HPSS is done under the control of the user, whose interaction is expected to be scripted and submitted as a batch job, using one or more of the following utilities:&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi HSI] is a client with an ftp-like functionality which can be used to archive and retrieve large files. It is also useful for browsing the contents of HPSS.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/htar HTAR] is a utility that creates tar formatted archives directly into HPSS. It also creates a separate index file (.idx) that can be accessed and browsed quickly.&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/ISH ISH] is a TUI utility that can perform an inventory of the files and directories in your tarballs.&lt;br /&gt;
&lt;br /&gt;
== '''Why should I use and trust HPSS?''' ==&lt;br /&gt;
* 10+ years history, used by 50+ facilities in the “Top 500” HPC list&lt;br /&gt;
* very reliable, data redundancy and data insurance built-in.&lt;br /&gt;
* highly scalable, reasonable performance at SciNet - Ingest: ~12 TB/day, Recall: ~24 TB/day (aggregated)&lt;br /&gt;
* HSI/HTAR clients also very reliable and used on several HPSS sites. ISH was written at SciNet.&lt;br /&gt;
* [[Media:HPSS_rational.pdf|HPSS fits well with the Storage Capacity Expansion Plan at SciNet]] (pdf presentation)&lt;br /&gt;
&lt;br /&gt;
== '''Guidelines''' ==&lt;br /&gt;
* Expanded storage capacity is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into tarballs with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the repository is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application's exit code and the returned log file for errors after all data transfers and any tarball creation process.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Access Through the Queue System'''  ==&lt;br /&gt;
All access to the archive system is done through the [[Moab|GPC queue system]].&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the HPSS should be scripted into jobs and submitted to the ''archive'' queue. See HSI example below.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_put_file_in_hpss&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The status of pending jobs can be monitored with showq specifying the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Recalling Data for Analysis ===&lt;br /&gt;
&lt;br /&gt;
Typically, data will be recalled to the /scratch file system when it is needed for analysis. Job dependencies can be used to have analysis jobs wait in the queue for data recalls before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-recall.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-recalled-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with HPSS. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents in HPSS. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the file does not already exist in HPSS&lt;br /&gt;
|-&lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local storage only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of HPSS.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
* Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* More complex sequences can be performed using a script such as this:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
Complete documentation of HSI is available on the Gleicher Enterprises web site.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/ HSI Introduction]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help hsi help]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Your First Time Using HSI ===&lt;br /&gt;
Once you are comfortable with HSI, you will script the process and submit it non-interactively to the queue. On your first tests, however, we suggest that you use an interactive queue to get familiar with the system. In the remainder of this subsection, you will find an example of an interactive test that you can run to become familiar with using HSI.&lt;br /&gt;
&lt;br /&gt;
To begin the process, we create a directory structure to use for the tests:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mkdir BASE&lt;br /&gt;
echo 1 &amp;gt; BASE/file.1&lt;br /&gt;
echo 2 &amp;gt; BASE/file.2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now we obtain an interactive job on the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -l nodes=1:ppn=8,walltime=2:00:00 -q archive -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now we start HSI:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/usr/local/bin/hsi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And that provides us with the following prompt, where your initial directory will be: /archive/$(groups)/$(whoami)/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that the current directory on HSI (/archive/group/user) is empty by typing:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Also note that the current directory on the disk contains the directory structure that we created at the beginning:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
lls&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Where the output should contain the &amp;quot;BASE&amp;quot; directory.&lt;br /&gt;
&lt;br /&gt;
Now we upload the BASE directory and all of its contents to the HPSS system. Running interactively, it is easy to learn that the -R flag is necessary to upload a directory. We thus use:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cput -R BASE&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[HSI]/archive/pomes/cneale-&amp;gt;cput -R BASE&lt;br /&gt;
cput  'BASE/file.1' : 'file.1' ( 2 bytes, 0.2 KBS (cos=1300))&lt;br /&gt;
cput  'BASE/file.2' : 'file.2' ( 2 bytes, 0.5 KBS (cos=1300))&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Typical Usage ===&lt;br /&gt;
The most common interactions will be ''putting'' data into HPSS, examining the contents (ls), and ''getting'' data back onto one of the active filesystems for inspection or analysis.&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
# Note that upon executing hsi, your initial directory will be: /archive/$(groups)/$(whoami)/&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away&lt;br /&gt;
cd put-away&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
A convenient way to explore the contents of HPSS is with the inventory shell [[ISH]]. This example creates an index of all the files in a user's portion of the namespace. The list is placed in the file /home/$USER/HPSSdm/hsi.igz that can be inspected from the gpc-devel nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_index&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
TODAY=$(date +%Y%m%d)&lt;br /&gt;
INDEX_DIR=/home/$USER/HPSSdm&lt;br /&gt;
if [[ -! -e $INDEX_DIR ]];then&lt;br /&gt;
  mkdir $INDEX_DIR&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
export ISHREGISTER=$HOME/HPSSdm&lt;br /&gt;
ish hindex&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This index can be browsed or searched with ISH on the development nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc-f104n084-$ ish ~/HPSSdm/hsi.igz &lt;br /&gt;
[ish]hsi.igz&amp;gt; help&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
ISH is a powerful tool that is also useful for creating and browsing indices of tar and htar archives, so please look at the [[ISH|documentation]] or built in help.&lt;br /&gt;
&lt;br /&gt;
* sample '''data recall'''&lt;br /&gt;
   - This example should be modified to emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do optimization.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-recall.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N recall&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/recalled-from-hpss&lt;br /&gt;
hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
cget /scratch/$USER/recalled-from-hpss/Jan-2010-jobs.tar.gz : put-away-on-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
cget /scratch/$USER/recalled-from-hpss/Feb-2010-jobs.tar.gz : put-away-on-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
&lt;br /&gt;
exit $status&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI vs. FTP ===&lt;br /&gt;
HSI syntax and usage is very similar to that of FTP. Please note the following information adapted from the HSI man page:&lt;br /&gt;
&lt;br /&gt;
HSI supports several of the commonly used FTP commands, including &amp;quot;dir&amp;quot;,&amp;quot;get&amp;quot;,&amp;quot;ls&amp;quot;,&amp;quot;mdelete&amp;quot;,&amp;quot;mget&amp;quot;,&amp;quot;put&amp;quot;,&amp;quot;mput&amp;quot; and &amp;quot;prompt&amp;quot;, with the following differences:&lt;br /&gt;
&lt;br /&gt;
* The &amp;quot;dir&amp;quot; command is an alias for &amp;quot;ls&amp;quot; in HSI. The &amp;quot;ls&amp;quot; command supports an extensive set of options for displaying files, including wildcard pattern-matching, and the ability to recursively list a directory tree&lt;br /&gt;
* The &amp;quot;put&amp;quot; and &amp;quot;get&amp;quot; family of commands support recursion&lt;br /&gt;
* There are &amp;quot;conditional put&amp;quot; and &amp;quot;conditional&amp;quot; get commands (cput, cget)&lt;br /&gt;
* The syntax for renaming files during transfers with HSI is different from FTP. With HSI, the general format is always &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     &amp;quot;local_file : hpss_file&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and multiple such pairs may be specified on a single command line.&lt;br /&gt;
&lt;br /&gt;
For example, when using HSI to store the local file &amp;quot;file1&amp;quot; as &amp;quot;hpss_file1&amp;quot; into HPSS, then retrieve it back to the local filesystem as &amp;quot;file1.bak&amp;quot;, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 : hpss_file1&lt;br /&gt;
    get file1.bak : hpss_file1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* unlike with FTP, where the following syntax would be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 hpss_file1 &lt;br /&gt;
    get hpss_file1 file1.bak&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The &amp;quot;m&amp;quot; prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The &amp;quot;m&amp;quot; series of commands are intended to provide a measure of compatibility for FTP users.&lt;br /&gt;
&lt;br /&gt;
=== Other HSI Examples === &lt;br /&gt;
&lt;br /&gt;
* Creating tar archive of C source programs and header files on the fly by piping stdout:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HSI pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Retrieve the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Put a subdirectory ''LargeFiles'' and all its contents recursively. You may use '-u' option to resume a previously disrupted session.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   cput -R -u LargeFiles&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help''']&lt;br /&gt;
&lt;br /&gt;
* verify checksum&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N checksum_verified_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
thefile=&amp;lt;localpath&amp;gt;&lt;br /&gt;
storedfile=&amp;lt;hpsspath&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# Generate checksum on fly using a named pipe so that file is only read from GPFS once&lt;br /&gt;
mkfifo /tmp/NPIPE&lt;br /&gt;
cat $thefile  | tee /tmp/NPIPE | hsi -q put - : $storedfile &amp;amp;&lt;br /&gt;
pid=$!&lt;br /&gt;
md5sum /tmp/NPIPE |tee /tmp/$fname.md5&lt;br /&gt;
rm     /tmp/NPIPE&lt;br /&gt;
&lt;br /&gt;
# Check the exit code of the HSI process  &lt;br /&gt;
wait $pid&lt;br /&gt;
sc=$?&lt;br /&gt;
if [ $sc != 0 ];then&lt;br /&gt;
  echo &amp;quot;File transfer failed&amp;quot;&lt;br /&gt;
  exit $sc&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# change filename to stdin in checksum file&lt;br /&gt;
sed -i.1 &amp;quot;s+/tmp/NPIPE+-+&amp;quot; /tmp/$fname.md5&lt;br /&gt;
&lt;br /&gt;
# verify checksum&lt;br /&gt;
hsi -q get - : $storedfile  | md5sum -c  /tmp/$fname.md5&lt;br /&gt;
sc=$?&lt;br /&gt;
if [ $sc != 0 ]; then&lt;br /&gt;
  echo '!!! Job Failed !!!'&lt;br /&gt;
  echo 'error=' $sc&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''HTAR''' ==&lt;br /&gt;
''' Please aggregate small files (&amp;lt;~100MB) into tarballs or htar files. '''&lt;br /&gt;
&lt;br /&gt;
HTAR is a utility that is used for aggregating a set of files and directories, by using a sophisticated multithreaded buffering scheme to write files from the local filesystem directly into HPSS, creating an archive file that conforms to the POSIX TAR specification, thereby achieving a high rate of performance. &lt;br /&gt;
&lt;br /&gt;
=== '''CAUTION''' ===&lt;br /&gt;
* Files larger than 68 GB cannot be stored in an htar archive (you'll get an error message for the whole operation)&lt;br /&gt;
* HTAR archives cannot contain more than 1 million files.&lt;br /&gt;
* Check the HTAR exit code and log file before removing any files from the active filesystems.&lt;br /&gt;
&lt;br /&gt;
=== HTAR Usage ===&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3528</id>
		<title>HPSS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3528"/>
		<updated>2011-06-29T16:09:35Z</updated>

		<summary type="html">&lt;p&gt;Cneale: Begin adding a new section titled &amp;quot;Your First Time Using HSI&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''High Performance Storage System''' =&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The High Performance Storage System ([http://www.hpss-collaboration.org/index.shtml HPSS]) is a tape-backed hierarchical storage system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data can be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
Access and transfer of data into and out of HPSS is done under the control of the user, whose interaction is expected to be scripted and submitted as a batch job, using one or more of the following utilities:&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi HSI] is a client with an ftp-like functionality which can be used to archive and retrieve large files. It is also useful for browsing the contents of HPSS.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/htar HTAR] is a utility that creates tar formatted archives directly into HPSS. It also creates a separate index file (.idx) that can be accessed and browsed quickly.&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/ISH ISH] is a TUI utility that can perform an inventory of the files and directories in your tarballs.&lt;br /&gt;
&lt;br /&gt;
== '''Why should I use and trust HPSS?''' ==&lt;br /&gt;
* 10+ years history, used by 50+ facilities in the “Top 500” HPC list&lt;br /&gt;
* very reliable, data redundancy and data insurance built-in.&lt;br /&gt;
* highly scalable, reasonable performance at SciNet - Ingest: ~12 TB/day, Recall: ~24 TB/day (aggregated)&lt;br /&gt;
* HSI/HTAR clients also very reliable and used on several HPSS sites. ISH was written at SciNet.&lt;br /&gt;
* [[Media:HPSS_rational.pdf|HPSS fits well with the Storage Capacity Expansion Plan at SciNet]] (pdf presentation)&lt;br /&gt;
&lt;br /&gt;
== '''Guidelines''' ==&lt;br /&gt;
* Expanded storage capacity is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into tarballs with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the repository is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application's exit code and the returned log file for errors after all data transfers and any tarball creation process.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Access Through the Queue System'''  ==&lt;br /&gt;
All access to the archive system is done through the [[Moab|GPC queue system]].&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the HPSS should be scripted into jobs and submitted to the ''archive'' queue. See HSI example below.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_put_file_in_hpss&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The status of pending jobs can be monitored with showq specifying the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Recalling Data for Analysis ===&lt;br /&gt;
&lt;br /&gt;
Typically, data will be recalled to the /scratch file system when it is needed for analysis. Job dependencies can be used to have analysis jobs wait in the queue for data recalls before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-recall.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-recalled-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with HPSS. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents in HPSS. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the file does not already exist in HPSS&lt;br /&gt;
|-&lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local storage only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of HPSS.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
* Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* More complex sequences can be performed using a script such as this:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
Complete documentation of HSI is available on the Gleicher Enterprises web site.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/ HSI Introduction]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help hsi help]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Your First Time Using HSI ===&lt;br /&gt;
Once you are comfortable with HSI, you will script the process and submit it non-interactively to the queue. On your first tests, however, we suggest that you use an interactive queue to get familiar with the system. In the remainder of this subsection, you will find an example of an interactive test that you can run to become familiar with using HSI.&lt;br /&gt;
&lt;br /&gt;
To begin the process, we create a directory structure to use for the tests:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mkdir BASE&lt;br /&gt;
echo 1 &amp;gt; BASE/file.1&lt;br /&gt;
echo 2 &amp;gt; BASE/file.2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now we obtain an interactive job on the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -l nodes=1:ppn=8,walltime=2:00:00 -q archive -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now we start HSI:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/usr/local/bin/hsi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And that provides us with the following prompt, where your initial directory will be: /archive/$(groups)/$(whoami)/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that the current directory on HSI (/archive/group/user) is empty by typing:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Also note that the current directory on the disk contains the directory structure that we created at the beginning:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
lls&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Where the ouput should contain the &amp;quot;BASE&amp;quot; directory.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Typical Usage ===&lt;br /&gt;
The most common interactions will be ''putting'' data into HPSS, examining the contents (ls), and ''getting'' data back onto one of the active filesystems for inspection or analysis.&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
# Note that upon executing hsi, your initial directory will be: /archive/$(groups)/$(whoami)/&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away&lt;br /&gt;
cd put-away&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
A convenient way to explore the contents of HPSS is with the inventory shell [[ISH]]. This example creates an index of all the files in a user's portion of the namespace. The list is placed in the file /home/$USER/HPSSdm/hsi.igz that can be inspected from the gpc-devel nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_index&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
TODAY=$(date +%Y%m%d)&lt;br /&gt;
INDEX_DIR=/home/$USER/HPSSdm&lt;br /&gt;
if [[ -! -e $INDEX_DIR ]];then&lt;br /&gt;
  mkdir $INDEX_DIR&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
export ISHREGISTER=$HOME/HPSSdm&lt;br /&gt;
ish hindex&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This index can be browsed or searched with ISH on the development nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc-f104n084-$ ish ~/HPSSdm/hsi.igz &lt;br /&gt;
[ish]hsi.igz&amp;gt; help&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
ISH is a powerful tool that is also useful for creating and browsing indices of tar and htar archives, so please look at the [[ISH|documentation]] or built in help.&lt;br /&gt;
&lt;br /&gt;
* sample '''data recall'''&lt;br /&gt;
   - This example should be modified to emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do optimization.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-recall.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N recall&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/recalled-from-hpss&lt;br /&gt;
hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
cget /scratch/$USER/recalled-from-hpss/Jan-2010-jobs.tar.gz : put-away-on-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
cget /scratch/$USER/recalled-from-hpss/Feb-2010-jobs.tar.gz : put-away-on-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
&lt;br /&gt;
exit $status&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI vs. FTP ===&lt;br /&gt;
HSI syntax and usage is very similar to that of FTP. Please note the following information adapted from the HSI man page:&lt;br /&gt;
&lt;br /&gt;
HSI supports several of the commonly used FTP commands, including &amp;quot;dir&amp;quot;,&amp;quot;get&amp;quot;,&amp;quot;ls&amp;quot;,&amp;quot;mdelete&amp;quot;,&amp;quot;mget&amp;quot;,&amp;quot;put&amp;quot;,&amp;quot;mput&amp;quot; and &amp;quot;prompt&amp;quot;, with the following differences:&lt;br /&gt;
&lt;br /&gt;
* The &amp;quot;dir&amp;quot; command is an alias for &amp;quot;ls&amp;quot; in HSI. The &amp;quot;ls&amp;quot; command supports an extensive set of options for displaying files, including wildcard pattern-matching, and the ability to recursively list a directory tree&lt;br /&gt;
* The &amp;quot;put&amp;quot; and &amp;quot;get&amp;quot; family of commands support recursion&lt;br /&gt;
* There are &amp;quot;conditional put&amp;quot; and &amp;quot;conditional&amp;quot; get commands (cput, cget)&lt;br /&gt;
* The syntax for renaming files during transfers with HSI is different from FTP. With HSI, the general format is always &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     &amp;quot;local_file : hpss_file&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and multiple such pairs may be specified on a single command line.&lt;br /&gt;
&lt;br /&gt;
For example, when using HSI to store the local file &amp;quot;file1&amp;quot; as &amp;quot;hpss_file1&amp;quot; into HPSS, then retrieve it back to the local filesystem as &amp;quot;file1.bak&amp;quot;, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 : hpss_file1&lt;br /&gt;
    get file1.bak : hpss_file1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* unlike with FTP, where the following syntax would be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 hpss_file1 &lt;br /&gt;
    get hpss_file1 file1.bak&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The &amp;quot;m&amp;quot; prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The &amp;quot;m&amp;quot; series of commands are intended to provide a measure of compatibility for FTP users.&lt;br /&gt;
&lt;br /&gt;
=== Other HSI Examples === &lt;br /&gt;
&lt;br /&gt;
* Creating tar archive of C source programs and header files on the fly by piping stdout:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HSI pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Retrieve the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Put a subdirectory ''LargeFiles'' and all its contents recursively. You may use '-u' option to resume a previously disrupted session.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   cput -R -u LargeFiles&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help''']&lt;br /&gt;
&lt;br /&gt;
* verify checksum&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N checksum_verified_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
thefile=&amp;lt;localpath&amp;gt;&lt;br /&gt;
storedfile=&amp;lt;hpsspath&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# Generate checksum on fly using a named pipe so that file is only read from GPFS once&lt;br /&gt;
mkfifo /tmp/NPIPE&lt;br /&gt;
cat $thefile  | tee /tmp/NPIPE | hsi -q put - : $storedfile &amp;amp;&lt;br /&gt;
pid=$!&lt;br /&gt;
md5sum /tmp/NPIPE |tee /tmp/$fname.md5&lt;br /&gt;
rm     /tmp/NPIPE&lt;br /&gt;
&lt;br /&gt;
# Check the exit code of the HSI process  &lt;br /&gt;
wait $pid&lt;br /&gt;
sc=$?&lt;br /&gt;
if [ $sc != 0 ];then&lt;br /&gt;
  echo &amp;quot;File transfer failed&amp;quot;&lt;br /&gt;
  exit $sc&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# change filename to stdin in checksum file&lt;br /&gt;
sed -i.1 &amp;quot;s+/tmp/NPIPE+-+&amp;quot; /tmp/$fname.md5&lt;br /&gt;
&lt;br /&gt;
# verify checksum&lt;br /&gt;
hsi -q get - : $storedfile  | md5sum -c  /tmp/$fname.md5&lt;br /&gt;
sc=$?&lt;br /&gt;
if [ $sc != 0 ]; then&lt;br /&gt;
  echo '!!! Job Failed !!!'&lt;br /&gt;
  echo 'error=' $sc&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''HTAR''' ==&lt;br /&gt;
''' Please aggregate small files (&amp;lt;~100MB) into tarballs or htar files. '''&lt;br /&gt;
&lt;br /&gt;
HTAR is a utility that is used for aggregating a set of files and directories, by using a sophisticated multithreaded buffering scheme to write files from the local filesystem directly into HPSS, creating an archive file that conforms to the POSIX TAR specification, thereby achieving a high rate of performance. &lt;br /&gt;
&lt;br /&gt;
=== '''CAUTION''' ===&lt;br /&gt;
* Files larger than 68 GB cannot be stored in an htar archive (you'll get an error message for the whole operation)&lt;br /&gt;
* HTAR archives cannot contain more than 1 million files.&lt;br /&gt;
* Check the HTAR exit code and log file before removing any files from the active filesystems.&lt;br /&gt;
&lt;br /&gt;
=== HTAR Usage ===&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3516</id>
		<title>HPSS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3516"/>
		<updated>2011-06-27T18:11:05Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* High Performance Storage System */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''High Performance Storage System''' =&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The High Performance Storage System ([http://www.hpss-collaboration.org/index.shtml HPSS]) is a tape-backed hierarchical storage system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data will be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
Access and transfer of data into and out of HPSS is done under the control of the user, whose interaction is expected to be scripted and submitted as a batch job, using one or more of the following utilities:&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi HSI] is a client with an ftp-like interface which can be used to archive and retrieve large files. It is also useful for browsing the contents of HPSS.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/htar HTAR] is a utility that creates tar formatted archives directly into HPSS. It also creates a separate index file (.idx) that can be accessed quickly.&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/ISH ISH] is a TUI utility that can perform an inventory of the files and directories in your tarballs.&lt;br /&gt;
&lt;br /&gt;
== '''Why should I use and trust HPSS?''' ==&lt;br /&gt;
* 10+ years history, used by 50+ facilities in the “Top 500” HPC list&lt;br /&gt;
* very reliable, data redundancy and data insurance built-in.&lt;br /&gt;
* highly scalable, reasonable performance at SciNet - Ingest: ~12 TB/day, Recall: ~24 TB/day (aggregated)&lt;br /&gt;
* HSI/HTAR clients also very reliable and used on several HPSS sites. ISH was written at SciNet.&lt;br /&gt;
* [[Media:HPSS_rational.pdf|HPSS fits well with the Storage Capacity Expansion Plan at SciNet]] (pdf presentation)&lt;br /&gt;
&lt;br /&gt;
== '''Guidelines''' ==&lt;br /&gt;
* Expanded storage capacity is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into tarballs with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the repository is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application's exit code and the returned log file for errors after all data transfers and any tarball creation process.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Access Through the Queue System'''  ==&lt;br /&gt;
All access to the archive system is done through the [[Moab|GPC queue system]].&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the HPSS should be scripted into jobs and submitted to the ''archive'' queue. See HSI example below.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_put_file_in_hpss&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The status of pending jobs can be monitored with showq specifying the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Recalling Data for Analysis ===&lt;br /&gt;
&lt;br /&gt;
Typically, data will be recalled to the /scratch file system when it is needed for analysis. Job dependencies can be used to have analysis jobs wait in the queue for data recalls before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-recall.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-recalled-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with HPSS. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents in HPSS. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the file does not already exist in HPSS&lt;br /&gt;
|-&lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local storage only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of HPSS.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
* Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* More complex sequences can be performed using a script such as this:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
Complete documentation of HSI is available on the Gleicher Enterprises web site.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/ HSI Introduction]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help hsi help]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Typical Usage ===&lt;br /&gt;
The most common interactions will be ''putting'' data into HPSS, examining the contents (ls), and ''getting'' data back onto one of the active filesystems for inspection or analysis.&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away&lt;br /&gt;
cd put-away&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
A convenient way to explore the contents of HPSS is with the inventory shell [[ISH]]. This example creates an index of all the files in a user's portion of the namespace. The list is placed in the file /home/$USER/HPSSdm/hsi.igz that can be inspected from the gpc-devel nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_index&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
TODAY=$(date +%Y%m%d)&lt;br /&gt;
INDEX_DIR=/home/$USER/HPSSdm&lt;br /&gt;
if [[ -! -e $INDEX_DIR ]];then&lt;br /&gt;
  mkdir $INDEX_DIR&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
export ISHREGISTER=$HOME/HPSSdm&lt;br /&gt;
ish hindex&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This index can be browsed or searched with ISH on the development nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc-f104n084-$ ish ~/HPSSdm/hsi.igz &lt;br /&gt;
[ish]hsi.igz&amp;gt; help&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
ISH is a powerful tool that is also useful for creating and browsing indices of tar and htar archives, so please look at the [[ISH|documentation]] or built in help.&lt;br /&gt;
&lt;br /&gt;
* sample '''data recall'''&lt;br /&gt;
   - This example should be modified to emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do optimization.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-recall.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N recall&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/recalled-from-hpss&lt;br /&gt;
hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
cget /scratch/$USER/recalled-from-hpss/Jan-2010-jobs.tar.gz : put-away-on-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
cget /scratch/$USER/recalled-from-hpss/Feb-2010-jobs.tar.gz : put-away-on-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
&lt;br /&gt;
exit $status&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI vs. FTP ===&lt;br /&gt;
HSI syntax and usage is very similar to that of FTP. Please note the following information adapted from the HSI man page:&lt;br /&gt;
&lt;br /&gt;
HSI supports several of the commonly used FTP commands, including &amp;quot;dir&amp;quot;,&amp;quot;get&amp;quot;,&amp;quot;ls&amp;quot;,&amp;quot;mdelete&amp;quot;,&amp;quot;mget&amp;quot;,&amp;quot;put&amp;quot;,&amp;quot;mput&amp;quot; and &amp;quot;prompt&amp;quot;, with the following differences:&lt;br /&gt;
&lt;br /&gt;
* The &amp;quot;dir&amp;quot; command is an alias for &amp;quot;ls&amp;quot; in HSI. The &amp;quot;ls&amp;quot; command supports an extensive set of options for displaying files, including wildcard pattern-matching, and the ability to recursively list a directory tree&lt;br /&gt;
* The &amp;quot;put&amp;quot; and &amp;quot;get&amp;quot; family of commands support recursion&lt;br /&gt;
* There are &amp;quot;conditional put&amp;quot; and &amp;quot;conditional&amp;quot; get commands (cput, cget)&lt;br /&gt;
* The syntax for renaming files during transfers with HSI is different from FTP. With HSI, the general format is always &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     &amp;quot;local_file : hpss_file&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and multiple such pairs may be specified on a single command line.&lt;br /&gt;
&lt;br /&gt;
For example, when using HSI to store the local file &amp;quot;file1&amp;quot; as &amp;quot;hpss_file1&amp;quot; into HPSS, then retrieve it back to the local filesystem as &amp;quot;file1.bak&amp;quot;, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 : hpss_file1&lt;br /&gt;
    get file1.bak : hpss_file1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* unlike with FTP, where the following syntax would be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 hpss_file1 &lt;br /&gt;
    get hpss_file1 file1.bak&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The &amp;quot;m&amp;quot; prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The &amp;quot;m&amp;quot; series of commands are intended to provide a measure of compatibility for FTP users.&lt;br /&gt;
&lt;br /&gt;
=== Other HSI Examples === &lt;br /&gt;
&lt;br /&gt;
* Creating tar archive of C source programs and header files on the fly by piping stdout:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HSI pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Retrieve the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Put a subdirectory ''LargeFiles'' and all its contents recursively. You may use '-u' option to resume a previously disrupted session.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   cput -R -u LargeFiles&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help''']&lt;br /&gt;
&lt;br /&gt;
== '''HTAR''' ==&lt;br /&gt;
''' Please aggregate small files (&amp;lt;~100MB) into tarballs or htar files. '''&lt;br /&gt;
&lt;br /&gt;
HTAR is a utility that is used for aggregating a set of files and directories, by using a sophisticated multithreaded buffering scheme to write files from the local filesystem directly into HPSS, creating an archive file that conforms to the POSIX TAR specification, thereby achieving a high rate of performance. &lt;br /&gt;
&lt;br /&gt;
=== '''CAUTION''' ===&lt;br /&gt;
* Files larger than 68 GB cannot be stored in an htar archive (you'll get an error message for the whole operation)&lt;br /&gt;
* HTAR archives cannot contain more than 1 million files.&lt;br /&gt;
* Check the HTAR exit code and log file before removing any files from the active filesystems.&lt;br /&gt;
&lt;br /&gt;
=== HTAR Usage ===&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=GPC_Quickstart&amp;diff=2501</id>
		<title>GPC Quickstart</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=GPC_Quickstart&amp;diff=2501"/>
		<updated>2011-01-08T19:48:02Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* Memory Configuration */ added suggestion from Jason on how to assess time in queue&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:University_of_Tor_79284gm-a.jpg|center|300px|thumb]]&lt;br /&gt;
|name=General Purpose Cluster (GPC)&lt;br /&gt;
|installed=June 2009&lt;br /&gt;
|operatingsystem= Linux&lt;br /&gt;
|loginnode= gpc01..gpc04 (from &amp;lt;tt&amp;gt;login.scinet&amp;lt;/tt&amp;gt;)&lt;br /&gt;
|numberofnodes=3780&lt;br /&gt;
|rampernode=16 Gb &lt;br /&gt;
|corespernode=8&lt;br /&gt;
|interconnect=1/4 on Infiniband, rest on GigE&lt;br /&gt;
|vendorcompilers=icc (C) ifort (fortran) icpc (C++)&lt;br /&gt;
|queuetype=[[Moab | Moab/Torque]]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
===Specifications===&lt;br /&gt;
The General Purpose Cluster is an extremely large cluster (ranked [http://www.top500.org/list/2009/06/100 16th] in the world at its inception, and fastest in Canada) and is where most computations are to be done at SciNet.  It is an IBM iDataPlex cluster based on Intel's Nehalem architecture (one of the [http://www.hpcwire.com/features/HPC-Vendors-Jump-On-Nehalem-42360237.html first in the world] to make use of the new chips). The GPC consists of 3,780 nodes with a total of 30,240  2.5GHz cores, with 16GB RAM per node (2GB per core). Approximately one quarter of the cluster is interconnected with non-blocking 4x-DDR InfiniBand while the rest of the nodes are connected with gigabit ethernet.  The compute nodes are accessed through a queuing system that allows jobs with a maximum wall time of 48 hours.&lt;br /&gt;
&lt;br /&gt;
===Login===&lt;br /&gt;
&lt;br /&gt;
First login via [[Ssh | ssh]] with your SciNet account at &amp;lt;tt&amp;gt;login.scinet.utoronto.ca&amp;lt;/tt&amp;gt;, and from there you can proceed to the Development nodes (&amp;lt;tt&amp;gt;gpc01&amp;lt;/tt&amp;gt;,&amp;lt;tt&amp;gt;gpc02&amp;lt;/tt&amp;gt;,&amp;lt;tt&amp;gt;gpc03&amp;lt;/tt&amp;gt;, or &amp;lt;tt&amp;gt;gpc04&amp;lt;/tt&amp;gt;)to compile/test your code.&lt;br /&gt;
&lt;br /&gt;
===Compile/Devel Nodes===&lt;br /&gt;
&lt;br /&gt;
From a scinet login node you can ssh to &amp;lt;tt&amp;gt;gpc01&amp;lt;/tt&amp;gt;..&amp;lt;tt&amp;gt;gpc04&amp;lt;/tt&amp;gt;.  These nodes have the same hardware configuration as most of the compute nodes -- 8 Nehalem processing cores with 16GB RAM and Gigabit ethernet.  You can compile and test your codes on these nodes. To interactively test on more than 8 processors, or to test your code over an InfiniBand connection, you can submit an [[GPC_Quickstart#Submitting_an_Interactive_Job | interactive job request]].&lt;br /&gt;
&lt;br /&gt;
Your [[Storage_Quickstart | home directory]] is in &amp;lt;tt&amp;gt;/home/USER&amp;lt;/tt&amp;gt;; you have 10GB there that is backed up. '''Your home directory cannot be written to by the compute nodes!''' Thus, to run jobs, you'll use the &amp;lt;tt&amp;gt;/scratch/USER&amp;lt;/tt&amp;gt; directory. Here, there is a large amount of disk space, but it is not backed up. Thus it makes sense to keep your codes in /home, compile there, and then run them in the /scratch directory.&lt;br /&gt;
&lt;br /&gt;
===Modules and Environment Variables===&lt;br /&gt;
&lt;br /&gt;
To use most packages on the SciNet machines - including any of the compilers - , you will have to use the `modules' command.  The command &amp;lt;tt&amp;gt;module load some-package&amp;lt;/tt&amp;gt; will set your environment variables (&amp;lt;tt&amp;gt;PATH&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;LD_LIBRARY_PATH&amp;lt;/tt&amp;gt;, etc) to include the default version of that package.   &amp;lt;tt&amp;gt;module load some-package/specific-version&amp;lt;/tt&amp;gt; will load a specific version of that package.  This makes it very easy for different users to use different versions of compilers, MPI versions, libraries etc.&lt;br /&gt;
&lt;br /&gt;
Note that to use even the gcc compilers you will have to do&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load gcc&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
but in fact you probably should use the intel compilers installed on this system as they usually produce faster code (and sometimes, much faster.)&lt;br /&gt;
&lt;br /&gt;
A list of the installed software is available in [[Software_and_Libraries | Software &amp;amp; Libraries]] and can &lt;br /&gt;
be seen on the system by typing &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module avail&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To load a module (for example, the default version of the intel compilers)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load intel&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
To unload a module&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module unload intel&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
To unload all modules&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module purge&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These commands should go in your .bashrc files and/or in your submission scripts to make sure you&lt;br /&gt;
are using the correct packages.&lt;br /&gt;
&lt;br /&gt;
===Compilers===&lt;br /&gt;
&lt;br /&gt;
The intel compilers are icc/icpc/ifort for C/C++/Fortran, and are available with the default module &amp;quot;intel&amp;quot;.  The intel compilers are recommended over the GNU compilers.  Documentation about icpc is available at &lt;br /&gt;
http://software.intel.com/en-us/articles/intel-software-technical-documentation/.  The Intel compilers accept many of the options that the GNU compilers accept, but tend to produce faster programs on our system.  If, for some reason, you really need the GNU compilers, the latest version of the GNU compiler collection (currently 4.4.0) is available by loading the &amp;quot;gcc&amp;quot; module, with gcc/g++/gfortran for C/C++/Fortran.   Note that f77/g77 is not supported. &lt;br /&gt;
&lt;br /&gt;
To ensure that the intel compilers are in your &amp;lt;tt&amp;gt;PATH&amp;lt;/tt&amp;gt; and their libraries are in your &amp;lt;tt&amp;gt;LD_LIBRARY_PATH&amp;lt;/tt&amp;gt;, use the command&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load intel&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This should likely go in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt; file so that it will automatically be loaded.&lt;br /&gt;
&lt;br /&gt;
Optimize your code for the GPC machine using of at least the following compiler flags: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   -O3 -xHost&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
(or &amp;lt;tt&amp;gt;-O3 -march=native&amp;lt;/tt&amp;gt; for the GNU compilers). &lt;br /&gt;
&lt;br /&gt;
*If your program uses openmp, add &amp;lt;tt&amp;gt;-openmp&amp;lt;/tt&amp;gt; (&amp;lt;tt&amp;gt;-fopenmp&amp;lt;/tt&amp;gt; for GNU compilers).&lt;br /&gt;
*If you get the warning &amp;lt;tt&amp;gt;feupdatreenv is not implemented&amp;lt;/tt&amp;gt;, add -limf to the link line.&lt;br /&gt;
*If you need to link in the MKL libraries, you are well advised to use the Intel(R) Math Kernel Library Link Line Advisor: http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/ for help in devising the list of libraries to link with your code. '''Note that this give the link line for the command prompt. When using this in Makefiles, replace $MKLPATH by ${MKLPATH}.'''&lt;br /&gt;
&lt;br /&gt;
===[[ GPC_MPI_Versions | MPI]]===&lt;br /&gt;
&lt;br /&gt;
SciNet currently provides multiple MPI libraries for the GPC; [http://www.open-mpi.org/ OpenMPI], and [http://software.intel.com/en-us/intel-mpi-library/ IntelMPI].  We currently recommend OpenMPI as the default, as it quite reliably demonstrates good performance on both the infiniband and ethernet networks.  For full details and options see the complete [[ GPC_MPI_Versions | '''MPI''']] section.&lt;br /&gt;
&lt;br /&gt;
The MPI libraries are compiled with both the gnu compiler suite and the intel compiler suite.   To use (for instance) the intel-compiled OpenMPI libraries, which we recommend as the default (and use for most of our examples here), use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load openmpi intel&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
&lt;br /&gt;
in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt;.   Other combinations behave similarly.&lt;br /&gt;
&lt;br /&gt;
The MPI libraries define the wrappers mpicc/mpicxx/mpif90/mpif77 as wrappers around the appropriate compilers, which ensure the appropriate include and library directories and used in the compilation and linking steps.&lt;br /&gt;
&lt;br /&gt;
We currently recommend the Intel + OpenMPI combination.  However, if you require the GNU compilers as well as MPI, you would want to find the most recent openmpi module available with `gcc' in the version name.  This will enable development and runtime with gcc/g++/gfortran  and OpenMPI.  You can make this your default by putting the module load line in your ~/.bashrc file.&lt;br /&gt;
&lt;br /&gt;
For mixed OpenMP/MPI code using Intel MPI, add the compilation flag -mt_mpi for full thread-safety (no such flag is necessary for OpenMPI).&lt;br /&gt;
&lt;br /&gt;
===Submitting A Batch Job===&lt;br /&gt;
&lt;br /&gt;
The SciNet machines are shared systems, and jobs that are to run on them are submitted to a queue; the&lt;br /&gt;
[[Moab | scheduler]] then orders the jobs in order to make the best use of the machine, and has them launched &lt;br /&gt;
when resources become availble.   The intervention of the scheduler can mean that the jobs aren't&lt;br /&gt;
quite run in a  first-in first-out order.&lt;br /&gt;
&lt;br /&gt;
The maximum [[wallclock time]] for a job in the queue is 48 hours; computations that will take longer than&lt;br /&gt;
this must be broken into 48-hour chunks and run as several jobs.  The usual way to do this is with [[checkpoints]],&lt;br /&gt;
writing out the complete state of the computation every so often in such a way that a job can be restarted from&lt;br /&gt;
this state information and continue on from where it left off.  Generating [[checkpoints]] is a good idea anyway,&lt;br /&gt;
as in the unlikely event of a hardware failure during your run, it allows you to restart without having lost much work.&lt;br /&gt;
&lt;br /&gt;
There are limits to how many jobs you can submit.  If your group has a default account, up to 32 nodes at a time for 48 hours per job on the GPC cluster are allowed to be queued. This is a total limit, e.g., you could request 64 nodes for 24 hours.  Jobs of users with an LRAC or NRAC allocation will run at a higher priority than others while their resources last. Because of the group-based allocation, it is conceivable that your jobs won't run if your colleagues have already exhausted your group's limits.&lt;br /&gt;
&lt;br /&gt;
Note that scheduling big jobs greatly affects the queuer and other users, so you have to talk to us first to run massively parallel jobs (&amp;gt; 2048 cores). We will help make sure that your jobs start and run efficiently.&lt;br /&gt;
&lt;br /&gt;
If your job should run in fewer than  48 hours, specify that in your script -- your job &lt;br /&gt;
will start sooner.   (It's easier for the [[Moab | scheduler]] to fit in a short job than a long job).  On the downside, the&lt;br /&gt;
job will be killed automatically by the queue manager software at the end of the specified [[wallclock time]], so if you&lt;br /&gt;
guess wrong you might lose some work.  So the standard procedure is to estimate how long your job will take and&lt;br /&gt;
add 10% or so. &lt;br /&gt;
&lt;br /&gt;
You interact with the queuing system through the queue/resource manager, [[Moab | Moab]] and [[Moab | Torque]].  To see all the jobs in the queue use&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To submit your own job, you must write a script which describes the job and how it is to be run (a sample script [[GPC_Quickstart#Submission_Script | follows]]) and submit it to the queue, using the command&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub [SCRIPT-FILE-NAME]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where you will replace &amp;lt;tt&amp;gt;[SCRIPT-FILE-NAME]&amp;lt;/tt&amp;gt; with the file containing the submission script.   This will return a job ID, for example 31415, which is used to identify the jobs.  Information about a queued job can be found using&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ checkjob [JOB-ID]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and jobs can be canceled with the command&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ canceljob [JOB-ID]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Again, these commands have many options, which can be read about on their man pages.&lt;br /&gt;
&lt;br /&gt;
Much more information on the queueing system is available on our [[Moab | queue]] page.&lt;br /&gt;
&lt;br /&gt;
====Batch Submission Script: MPI====&lt;br /&gt;
&lt;br /&gt;
A sample submission script is shown below for an mpi job using ethernet with the &amp;lt;tt&amp;gt; #PBS &amp;lt;/tt&amp;gt; directives at the top and the rest being &lt;br /&gt;
what will be executed on the compute node.  &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for SciNet GPC (ethernet)&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=2:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N test&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND; -np = nodes*ppn&lt;br /&gt;
mpirun -np 16 -hostfile $PBS_NODEFILE ./a.out&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The lines that begin &amp;lt;tt&amp;gt;#PBS&amp;lt;/tt&amp;gt; are commands that are parsed and interpreted by qsub at submission time, and control administrative things about your job.   In this example, the script above requests two nodes, using 8 processors per node, for a [[wallclock time]] of one hour.  (The resources required by the job are listed on the &amp;lt;tt&amp;gt;#PBS -l&amp;lt;/tt&amp;gt; line.)   Other options can be given in other &amp;lt;tt&amp;gt;#PBS&amp;lt;/tt&amp;gt; lines, such as &amp;lt;tt&amp;gt;#PBS -N&amp;lt;/tt&amp;gt;, which sets the name of the job.   &lt;br /&gt;
&lt;br /&gt;
The rest of the script is run as a bash script at run time.   A bash shell on the first node of the two nodes that are requested executes these commands as a normal bash script, just as if you had run this as a shell script from the terminal.   The only difference is that PBS sets certain environment variables that you can use in the script.  &amp;lt;tt&amp;gt;$PBS_O_WORKDIR&amp;lt;/tt&amp;gt; is set to be the directory that the command was 'submitted' from - eg,  &amp;lt;tt&amp;gt;/scratch/USER/SOMEDIRECTORY&amp;lt;/tt&amp;gt; - and &amp;lt;tt&amp;gt;$PBS_NODEFILE&amp;lt;/tt&amp;gt; is the name of a file which contains all the nodes on which programs should execute.   Using these environment variables, the script then uses the &amp;lt;tt&amp;gt;mpirun&amp;lt;/tt&amp;gt; command to launch the job.   Assumed here is that the user has a line like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load openmpi intel&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
&lt;br /&gt;
in their &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
* Note: The different versions of MPI require different commands to launch the run, and thus different scripts. The above script is  specific for the openmpi module.  For the intelmpi module on ethernet, the last line of the script should read&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun -r ssh -np 16 -env I_MPI_DEVICE ssm ./a.out&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
For full MPI details see [[ GPC_MPI_Versions | MPI]]&lt;br /&gt;
&lt;br /&gt;
====Submitting Collections of Serial Jobs====&lt;br /&gt;
&lt;br /&gt;
SciNet-approved methods for running collections of serial jobs can be found on the [[User_Serial|serial run wiki page]].&lt;br /&gt;
&lt;br /&gt;
====Batch Submission Script: OpenMP====&lt;br /&gt;
&lt;br /&gt;
For running OpenMP jobs, the procedure is similar as for MPI jobs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for SciNet GPC (OpenMP)&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N test&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=8&lt;br /&gt;
./a.out&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that [[Introduction_To_Performance#Throughput | in some circumstances]] it can be more efficient to run (say) two jobs each running on four threads than one job running on eight threads.   In that case you can use the same `ampersand-and-wait' technique outlined for serial jobs (see [[User_Serial|serial run wiki page]]) for less-than-eight-core OpenMP jobs.&lt;br /&gt;
&lt;br /&gt;
====Hybrid MPI/OpenMP jobs====&lt;br /&gt;
&lt;br /&gt;
'''Using Intel MPI'''&lt;br /&gt;
&lt;br /&gt;
Here is how to run hybrid codes using intelmpi::&lt;br /&gt;
&lt;br /&gt;
http://software.intel.com/en-us/articles/hybrid-applications-intelmpi-openmp/&lt;br /&gt;
&lt;br /&gt;
Make sure you compile with the -mt_mpi option to the compilers to use the thread safe libraries. &lt;br /&gt;
Set the environment variable I_MPI_PIN_DOMAIN:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ export I_MPI_PIN_DOMAIN=omp&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
This will set the process pinning domain size to be equal to OMP_NUM_THREADS (which you should set to the desired number of threads per mpi process). Therefore, each MPI process can create $OMP_NUM_THREADS number of children threads for running within the corresponding domain. If OMP_NUM_THREADS is not set, each node is treated as a separate domain (which will allow as many threads per MPI processes as there are cores).&lt;br /&gt;
&lt;br /&gt;
In addition, when invoking mpirun, you should add the argument &amp;quot;-ppn X&amp;quot;, where X is the number of MPI processes per node.&lt;br /&gt;
For example:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun -r ssh -ppn 2 -np 8 [executable]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
would start 2 mpi processes of &amp;lt;tt&amp;gt;[executable]&amp;lt;/tt&amp;gt; per node for a total of 8 processes, so mpirun will try to run mpi processes on 4 nodes&lt;br /&gt;
(OMP_NUM_THREADS is then probably best set at 4).&lt;br /&gt;
Your job script should still ask for these 4 nodes with the line&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
     #PBS -l nodes=4:ppn=8,walltime=....&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
(&amp;lt;tt&amp;gt;ppn=8&amp;lt;/tt&amp;gt; is not a mistake here; the ppn parameter has a different meaning for PBS and for mpirun)&lt;br /&gt;
&lt;br /&gt;
''The ppn parameter to ''mpirun'' is very important! Without it, eight mpi jobs would get bunched on the first node in this example, leaving 3 nodes unused.''&lt;br /&gt;
&lt;br /&gt;
NOTE: In order to pin OpenMP threads inside the domain, use the corresponding OpenMP feature by setting the KMP_AFFINITY environment variable, see [http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/fortran/lin/compiler_f/optaps/common/optaps_openmp_thread_affinity.htm#KMP_AFFINITY_Environment_Variable|Intel's Compiler User and Reference Guide].&lt;br /&gt;
&lt;br /&gt;
The IntelMPI manual is referenced on the front page of our wiki:&lt;br /&gt;
&lt;br /&gt;
http://software.intel.com/sites/products/documentation/hpc/mpi/linux/reference_manual.pdf&lt;br /&gt;
&lt;br /&gt;
For the above example of a total of 8 processes on 4 nodes, you could use the following script:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for SciNet GPC (hybrid job)&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=4:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N test&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# SET THE NUMBER OF THREADS PER PROCESS:&lt;br /&gt;
export OMP_NUM_THREADS=4&lt;br /&gt;
&lt;br /&gt;
# PIN THE MPI DOMAINS ACCORDING TO OMP&lt;br /&gt;
export I_MPI_PIN_DOMAIN=omp&lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND; -np = nodes*ppn&lt;br /&gt;
mpirun -r ssh -ppn 2 -np 8 ./a.out&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Using Open MPI'''&lt;br /&gt;
&lt;br /&gt;
For mixed MPI/OpenMP jobs using OpenMPI, which is the default for many users, the procedure is similar, but details differ.&lt;br /&gt;
&lt;br /&gt;
* Request the number of nodes in the PBS script.&lt;br /&gt;
* Set OMP_NUM_THREADS to the number of threads per MPI process.&lt;br /&gt;
* In addition to the -np parameter for mpirun, add the argument &amp;lt;tt&amp;gt;--bynode&amp;lt;/tt&amp;gt;, so that the mpi processes are not bunched up.&lt;br /&gt;
&lt;br /&gt;
So for example, to start a total of 8 processes on 4 nodes, you could use the following script&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for SciNet GPC (hybrid job)&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=4:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N test&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# SET THE NUMBER OF THREADS PER PROCESS:&lt;br /&gt;
export OMP_NUM_THREADS=4&lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND; -np = nodes*processes_per_nodes; --byhost forces a round robin of nodes.&lt;br /&gt;
mpirun -np 8 --bynode -hostfile $PBS_NODEFILE ./a.out&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Submitting an Interactive (Debug) Job===&lt;br /&gt;
&lt;br /&gt;
It is sometimes convenient to run a job interactively; this can be very handy for debugging purposes.  In this case, you type a &amp;lt;tt&amp;gt;qsub&amp;lt;/tt&amp;gt; command which submits an interactive job to the queue; when the scheduler selects this job to run, then it starts a shell running on the first node of the job, which connects to your terminal.  You can then type any series of commands (for instance, the same commands listed as in the batch submission script above) to run a job interactively.&lt;br /&gt;
&lt;br /&gt;
For example, to start the same sort of job as in the batch submission script above, but interactively, one would type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -I -l nodes=2:ppn=8,walltime=1:00:00&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This is exactly the &amp;lt;tt&amp;gt;#PBS -l&amp;lt;/tt&amp;gt; line in the batch script above (which requests all 8 processors on each of 2 nodes for one hour), but prepended with a &amp;lt;tt&amp;gt;-I&amp;lt;/tt&amp;gt; for `interactive'.   When this job begins, your terminal will now show you as being logged in to one of the compute nodes, and one can type in any shell command, run &amp;lt;tt&amp;gt;mpirun&amp;lt;/tt&amp;gt;, etc.   When you exit the shell, the job will end.  Interactive jobs can be used with any of the [[ Moab#GPC | GPC queues ]] however, there is a short&lt;br /&gt;
high turnover queue called [[ Moab#debug | debug ]] which can be especially useful when the system is busy.&lt;br /&gt;
&lt;br /&gt;
===Ethernet vs. Infiniband===&lt;br /&gt;
&lt;br /&gt;
About 1/4 of the GPC (862 nodes or 6896 cores) is connected with a high bandwidth low-latency fabric called&lt;br /&gt;
[http://en.wikipedia.org/wiki/InfiniBand InfiniBand].  Many jobs which require tight coupling to scale well greatly benefit from this interconnect;&lt;br /&gt;
other types of jobs, which have relatively modest communications, do not require this and run fine on Gigabit ethernet.&lt;br /&gt;
&lt;br /&gt;
Jobs which require the InfiniBand for good performance can request the nodes that have the `&amp;lt;tt&amp;gt;ib&amp;lt;/tt&amp;gt;' feature in the &amp;lt;tt&amp;gt;#PBS -l&amp;lt;/tt&amp;gt; line,&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#PBS -l nodes=2:ib:ppn=8,walltime=1:00:00&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Furthermore, if your mpirun command specifically requests a fabric in its options (eg. ssm), you will have to change those options as well. See [[GPC MPI Versions]].&lt;br /&gt;
&lt;br /&gt;
Because there are a limited number of these nodes, your job will start running faster if you do not request them (e.g. if you use the scripts as shown above), as this increases the number of nodes available to run your job. In fact, the InfiniBand nodes are to be used only for jobs that are known to scale well and  will benefit from this type of interconnect. As such the minimum number of nodes requested has to be at least 2, as single node jobs will not benefit from using an&lt;br /&gt;
Infiniband node. The MPI libraries provided by SciNet automatically correctly use either the InfiniBand or ethernet interconnect depending on which nodes your job runs on.&lt;br /&gt;
&lt;br /&gt;
===HyperThreading===&lt;br /&gt;
&lt;br /&gt;
Each GPC compute node has 8 Nehalem cores (2 sockets each with a four-core Intel Xeon E5540 @ 2.53GHz).  Thus, to make full use of the computing power of a GPC node, you must be running least 8 &amp;quot;tasks&amp;quot; -- MPI processes, or OpenMP threads.&lt;br /&gt;
&lt;br /&gt;
Under most circumstances, running exactly 8 tasks is the most efficient way to use these nodes.  However, sometimes software design (eg, having one thread for communication and one for computation) can usefully `oversubscribe' the number of physical cores, and running (say) twice as many tasks as cores can be a useful strategy.   If your code is highly memory-bandwidth bound, having one task ready to run while another waits for memory access can make more effective use of the processor.&lt;br /&gt;
&lt;br /&gt;
The Nehalem processors have hardware support for such two-way overloading of processors, through &amp;quot;HyperThreading&amp;quot;; there are an extra set of registers on each core to facilitate rapid switching between two tasks, making it look to the operating system that there are in fact 16 cores per node.   Depending on the nature of your code, making use of these virtual extra cores may speed up or slow down your computation; you should run small test cases before running production jobs in this manner.  In most cases, the speed difference will be under 10%.  Some of our users have obtained an 8% speedup by running gromacs with 16 tasks instead of 8 on a single node (mpirun -np 16 ./gromacs/mdrun -npme 4 is 108% the speed of mpirun -np 8 ./gromacs/mdrun with -npme 2 or -1).&lt;br /&gt;
&lt;br /&gt;
====HyperThreading with OpenMP====&lt;br /&gt;
&lt;br /&gt;
To use hyperthreading with an OpenMP job, one just runs twice as many threads as one would have previously; eg, if you were running 8 threads before (&amp;lt;tt&amp;gt;export OMP_NUM_THREADS=8&amp;lt;/tt&amp;gt;) you would run with 16 (&amp;lt;tt&amp;gt;export OMP_NUM_THREADS=16&amp;lt;/tt&amp;gt;).  Everything else remains the same, including the job submission script; one still uses &amp;lt;tt&amp;gt;ppn=8&amp;lt;/tt&amp;gt; in the submission of the job, as Torque has no way of knowing (or reason for caring) that you will be running on 16 `virtual' cores rather than 8 physical cores.&lt;br /&gt;
&lt;br /&gt;
====HyperThreading with MPI====&lt;br /&gt;
&lt;br /&gt;
To use hyperthreading with an MPI job, one just runs twice as many MPI processes as one would have previously; eg, if you were running on three nodes using 8 MPI tasks per node and used &amp;lt;tt&amp;gt;mpirun ... -np 24&amp;lt;/tt&amp;gt;, you could run instead with &amp;lt;tt&amp;gt;-np 48&amp;lt;/tt&amp;gt;.  Everything else remains the same, including the job submission script; one still uses &amp;lt;tt&amp;gt;ppn=8&amp;lt;/tt&amp;gt; in the submission of the job, as Torque has no way of knowing (or reason for caring) that you will be running on 16 `virtual' cores rather than 8 physical cores.&lt;br /&gt;
&lt;br /&gt;
Note that if you are using OpenMPI (as is the default), there is another consideration; OpenMPI assumes that there is no oversubscription and each task very aggressively makes full use of a core when it is waiting for a message (eg, the waits are &amp;quot;busywaits&amp;quot;).  If you find a significant slowdown when running multiple MPI tasks per core with OpenMPI, you may want to try adding the additional option to mpirun: &amp;lt;tt&amp;gt;--mca mpi_yield_when_idle 1&amp;lt;/tt&amp;gt;.  This will increase the latency of individual messages, but free up the core to do additional work while waiting.&lt;br /&gt;
&lt;br /&gt;
With IntelMPI, the problem should be less pronounced, but you can still improve things by using &amp;lt;tt&amp;gt;mpirun -genv I_MPI_SPIN_COUNT 1 ...&amp;lt;/tt&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Examples of hyperthreading with MPI'''&lt;br /&gt;
&lt;br /&gt;
Hyperthreading using gromacs: https://support.scinet.utoronto.ca/wiki/index.php/Gromacs#Hyperthreading_with_Gromacs&lt;br /&gt;
&lt;br /&gt;
====HyperThreading with Hybrid MPI/OpenMP codes====&lt;br /&gt;
&lt;br /&gt;
With a hybrid code, one has extra flexibility in how to assign the &amp;quot;extra&amp;quot; cores -- you could run extra MPI tasks or extra OpenMPI threads.  As with all hybrid codes, the combination which results in the best performance depends very strongly on the nature of your code, and you should experiment with different combinations.   In addition, with hybrid codes processor and memory affinity issues become very important; if you're unsure as to how to tune your application for best performance, please make an appointment with the SciNet technical analysts for more help.&lt;br /&gt;
&lt;br /&gt;
===Memory Configuration===&lt;br /&gt;
&lt;br /&gt;
'''16G'''&lt;br /&gt;
&lt;br /&gt;
There are 3756 nodes which have 16G of memory, and is the primary configuration in the GPC. These nodes will be used by default.&lt;br /&gt;
&lt;br /&gt;
'''18G'''&lt;br /&gt;
&lt;br /&gt;
There are 24 Infiniband nodes which have 18G of memory. These nodes have a fully populated memory configuration that maximizes memory bandwidth. To&lt;br /&gt;
request these nodes use:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -l nodes=2:ib:m18g:ppn=8,walltime=1:00:00 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''32G'''&lt;br /&gt;
&lt;br /&gt;
There are 84 Infiniband nodes which have 32G of memory. To request these nodes use:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -l nodes=2:ib:m32g:ppn=8,walltime=1:00:00 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''128G'''&lt;br /&gt;
&lt;br /&gt;
There are two stand-alone large memory (128GB) nodes, &amp;lt;tt&amp;gt;gpc-lrgmem01&amp;lt;/tt&amp;gt; and &amp;lt;tt&amp;gt;gpc-lrgmem02&amp;lt;/tt&amp;gt; which are primarily to be used for data analysis of runs.  They have 16 cores and are intel machines running linux, but they are not the same architecture (Nehalem) as the GPC compute nodes, so codes may have to be compiled separately for these machines.  They can be accessed using a specific &amp;lt;tt&amp;gt;largemem&amp;lt;/tt&amp;gt; queue.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -l nodes=2:ppn=8,walltime=1:00:00 -q largemem -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To estimate your time of access to these nodes, use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showq -w class=largemem&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
because showstart seems to always return &amp;quot;INFINITY&amp;quot;&lt;br /&gt;
&lt;br /&gt;
===Ram Disk===&lt;br /&gt;
&lt;br /&gt;
On the GPC nodes, there is a `ram disk' available - up to half of the memory on the node may be used as a temporary file system.  This is particularly useful for use in the early stages of migrating destop-computing codes to a High Performance Computing platform such as the GPC.    It is much faster than real disk and does not require network traffic; however, each node sees its own ramdisk and cannot see files on that of other nodes.   This is a very easy way to cache writes (by writing them to fast ram disk instead of slow `real' disk); and then one would periodically copy the files to files on /scratch or /project so that they are available after the job has completed.&lt;br /&gt;
&lt;br /&gt;
To use the ramdisk, create and read to / write from files in /dev/shm/.. just as one would to (eg) /scratch/USER/.  Only the amount of RAM needed to store the files will be taken up by the temporary file system; thus if you have 8 serial jobs each requiring 1 GB of RAM, and 1GB is taken up by various OS services, you would still have approximately 7GB available to use as ramdisk on a 16GB node.   However, if you were to write 8 GB of data to the RAM disk, this would exceed available memory and your job would likely crash.&lt;br /&gt;
   &lt;br /&gt;
NOTE: it is very important to delete your files from ram disk at the end of your job.   If you do not do this, the next user to use that node will have less RAM available than they might expect, and this might kill their jobs.&lt;br /&gt;
&lt;br /&gt;
More details on how to setup your script to use the ramdisk can be found on the [[User_Ramdisk|Ramdisk wiki page]].&lt;br /&gt;
&lt;br /&gt;
=== Managing jobs on the Queuing system ===&lt;br /&gt;
Information on checking available resources, starting, viewing, managing and canceling jobs on [[Moab | Moab/Torque]]&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Data_Management&amp;diff=2322</id>
		<title>Data Management</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Data_Management&amp;diff=2322"/>
		<updated>2010-12-12T16:56:17Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* Selective */  added &amp;quot;or&amp;quot;'s to listing of how to use dsmmigrate and dsmrecall, because otherwise it looked like either use (a) the top 3 commands -or- (b) the bottome 2 commands in each list&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Storage Space==&lt;br /&gt;
SciNet's storage system is based on IBM's [http://en.wikipedia.org/wiki/IBM_General_Parallel_File_System GPFS] (General Parallel File System).   There are two main systems for user data: &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt;, a small, backed-up space where user home directories are located, and &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt;, a large system for input or output data for jobs; data on &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt; is not only not backed up (a third storage system, /project, exist only for groups with LRAC/NRAC allocations). Data placed on scratch will be deleted if it has not been accessed in 3 months.  SciNet does not provide long-term storage for large data sets.  &lt;br /&gt;
&lt;br /&gt;
===Overview of the different file systems===&lt;br /&gt;
&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! {{Hl2}} | file system &lt;br /&gt;
! {{Hl2}} | purpose &lt;br /&gt;
! {{Hl2}} | quota &lt;br /&gt;
! {{Hl2}} | block size &lt;br /&gt;
! {{Hl2}} | backed up&lt;br /&gt;
! {{Hl2}} | purged&lt;br /&gt;
! {{Hl2}} | access &lt;br /&gt;
|- &lt;br /&gt;
| /home&lt;br /&gt;
| development&lt;br /&gt;
| 10 GB&lt;br /&gt;
| 256 KB&lt;br /&gt;
| yes&lt;br /&gt;
| never&lt;br /&gt;
| read-only on compute nodes (r/w on login, devel and datamover1) &lt;br /&gt;
|- &lt;br /&gt;
| /scratch&lt;br /&gt;
| computation&lt;br /&gt;
| 20 TB&lt;br /&gt;
| 4 MB&lt;br /&gt;
| no&lt;br /&gt;
| files &amp;gt; 3 month&lt;br /&gt;
| read/write on all nodes&lt;br /&gt;
|- &lt;br /&gt;
| /project&lt;br /&gt;
| computation&lt;br /&gt;
| by allocation&lt;br /&gt;
| 256 KB&lt;br /&gt;
| no&lt;br /&gt;
| never&lt;br /&gt;
| read/write on all nodes&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Home Disk Space===&lt;br /&gt;
&lt;br /&gt;
Every SciNet user gets a 10GB directory on &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; which is regularly backed-up.   Home is visible from &amp;lt;tt&amp;gt;login.scinet&amp;lt;/tt&amp;gt; nodes, and from the development nodes on [[GPC_Quickstart | GPC]] and the [[TCS_Quickstart | TCS]].  However, on the compute nodes of the GPC clusters -- as when jobs are running -- &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; is mounted '''''read-only'''''; thus GPC jobs can read files in /home but cannot write to files there.   &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; is a good place to put code, input files for runs, and anything else that needs to be kept to reproduce runs.  On the other hand, &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; is not a good place to put many small files, since&lt;br /&gt;
the block size for the file system is 256KB, so you would quickly run out of disk quota and you will make the backup system very slow.&lt;br /&gt;
&lt;br /&gt;
If your application absolutely insists on writing material to your home account and you can't find a way to instruct it to write somewhere else, an alternative is to create a link pointing from your account under /home to a location under /scratch.&lt;br /&gt;
&lt;br /&gt;
===Scratch Disk Space===&lt;br /&gt;
&lt;br /&gt;
Every SciNet user also gets a directory in &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt;.   Scratch is visible from the &amp;lt;tt&amp;gt;login.scinet&amp;lt;/tt&amp;gt; nodes,  the development nodes on [[GPC_Quickstart | GPC]] and the [[TCS_Quickstart | TCS]], and on the compute nodes of the clusters, mounted as read-write.   Thus jobs would normally write their output somewhere in &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt;.  There are '''NO''' backups of anything on &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
There is a large amount of space available on &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt; but it is purged routinely so that all users running jobs and generating large outputs will have room to store their data temporarily.  Computational results which you want to keep longer than this must be copied (using &amp;lt;tt&amp;gt;scp&amp;lt;/tt&amp;gt;) off of SciNet entirely and to your local system.   SciNet does not routinely provide long-term storage for large data sets.&lt;br /&gt;
&lt;br /&gt;
===Scratch Disk Purging Policy===&lt;br /&gt;
&lt;br /&gt;
In order to ensure that there is always significant space available for running jobs '''we automatically delete files in /scratch that have not been accessed for more than 3 months by the actual deletion day on the 15th of each month'''. This policy is subject to revision depending on its effectiveness. More details about the purging process and how users can check if their files will be deleted follows. If you have files scheduled for deletion you should move them to a more permanent locations such as your departmental server or your /project space (for PIs who have either been allocated disk space by the LRAC or have bought diskspace).&lt;br /&gt;
&lt;br /&gt;
On the '''first''' of each month, a list of files scheduled for purging is produced, and an email notification is sent to each user on that list. Furthermore, at/or about the '''12th''' of each month a 2nd scan produces a more current assessment and another email notification is sent. This way users can double check that they have indeed taken care of all the files they needed to relocate before the purging deadline. Those files will be automatically deleted on the '''15th''' of the same month unless they have been accessed or relocated in the interim. If you have files scheduled for deletion then they will be listed in a file in /scratch/todelete/current, which has your userid and groupid in the filename. For example, if user xxyz wants to check if they have files scheduled for deletion they can issue the following command on a system which mounts /scratch (e.g. a scinet login node): '''ls -l1 /scratch/todelete/current |grep xxyz'''. In the example below, the name of this file indicates that user xxyz is part of group abc, has 9,560 files scheduled for deletion and they take up 1.0TB of space:&lt;br /&gt;
&lt;br /&gt;
 [xxyz@scinet04 ~]$ ls -l1 /scratch/todelete/current |grep xxyz&lt;br /&gt;
 -rw-r----- 1 xxyz     root       1733059 Jan 12 11:46 10001___xxyz_______abc_________1.00T_____9560files&lt;br /&gt;
&lt;br /&gt;
The file itself contains a list of all files scheduled for deletion (in the last column) and can be viewed with standard commands like more/less/cat - e.g. '''more /scratch/todelete/current/10001___xxyz_______abc_________1.00T_____9560files'''&lt;br /&gt;
&lt;br /&gt;
Similarly, you can also verify all other users on your group by using the ls command with grep on your group. For example: '''ls -l1 /scratch/todelete/current |grep abc'''. That will list all other users in the same group that xxyz is part of, and have files to be purged on the 15th. Members of the same group have access to each other's contents.&lt;br /&gt;
&lt;br /&gt;
'''NOTE:''' Preparing these assessments takes several hours. If you change the access/modification time of a file in the interim, that will not be detected until the next cycle. A way for you to get immediate feedback is to use the ''''ls -lu'''' command on the file. If the file atime has been updated, coming the purging date on the 15th it will not be deleted any longer.&lt;br /&gt;
&lt;br /&gt;
===Project Disk Space===&lt;br /&gt;
&lt;br /&gt;
Investigators who have been granted allocations through the [http://www.scinet.utoronto.ca/resources/Account_Allocations.htm LRAC/NRAC Application Process] may have been allocated disk space in addition to compute time.   For the period of time that the allocation is granted, they will have disk space on the &amp;lt;tt&amp;gt;/project&amp;lt;/tt&amp;gt; disk system.  Space on the project systems are not purged, but neither are they backed up.   All members of the investigators groups will have access to these systems, which will be mounted read/write everywhere.&lt;br /&gt;
&lt;br /&gt;
===How much Disk Space Do I have left?===&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;'''/scinet/gpc/bin/diskUsage'''&amp;lt;/tt&amp;gt; command, available on the login nodes, datamovers and the GPC devel nodes, provides information in a number of ways on the home, scratch, and project file systems. For instance, how much disk space is being used by yourself and your group (with the -a option), or how much your usage has changed over a certain period (&amp;quot;delta information&amp;quot;) or you may generate plots of your usage over time. Please see the usage help below for more details.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: diskUsage [-h|-?| [-a] [-u &amp;lt;user&amp;gt;] [-de|-plot]&lt;br /&gt;
       -h|-?: help&lt;br /&gt;
       -a: list usages of all members on the group&lt;br /&gt;
       -u &amp;lt;user&amp;gt;: as another user on your group&lt;br /&gt;
       -de: include delta information&lt;br /&gt;
       -plot: create plots of disk usages&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note that information on usage and quota is only updated hourly!&lt;br /&gt;
&lt;br /&gt;
===Performance===&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/IBM_General_Parallel_File_System GPFS] is a high-performance filesystem which provides rapid reads and writes to large datasets in parallel from many nodes.  As a consequence of this design, however, '''the file system performs quite ''poorly'' at accessing data sets which consist of many, small files.'''  For instance, you will find that reading data in from one 4MB file is enormously faster than from 100 40KB files.   Such small files are also quite wasteful of space, as the blocksize for the filesystem is 4MB.   This is something you should keep in mind when planning your input/output strategy for runs on SciNet.&lt;br /&gt;
&lt;br /&gt;
For instance, if you run multi-process jobs, having each process write to a file of its own is not an scalable I/O solution. A directory gets locked by the first process accessing it, so all other processes have to wait for it. Not only has the code just become considerably less parallel, chances are the file system will have a time-out while waiting for your other processes, leading your program to crash mysteriously.&lt;br /&gt;
Consider using MPI-IO (part of the MPI-2 standard), which allows files to be opened simultaneously&lt;br /&gt;
by different processes, or using a dedicated process for I/O to which all other processes send their&lt;br /&gt;
data, and which subsequently writes this data to a single file.&lt;br /&gt;
&lt;br /&gt;
===Local Disk===&lt;br /&gt;
&lt;br /&gt;
The compute nodes on the GPC '''do not contain hard drives''' so there is no local disk available to use during your computation.  You can however use part of a compute nodes RAM like a local disk ('ramdisk') but this will reduce how much memory is available for your&lt;br /&gt;
program.  This can be accessed using &amp;lt;tt&amp;gt;/dev/shm/&amp;lt;/tt&amp;gt; and is currently set to 8GB.  Anything written&lt;br /&gt;
to this location that you want to keep must be copied back to the &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt; filesystem as &amp;lt;tt&amp;gt;/dev/shm&amp;lt;/tt&amp;gt; is wiped after each job and since it is in memory will not survive through a reboot of the node. More on ramdisk usage can be found [[User_Ramdisk | here]].&lt;br /&gt;
&lt;br /&gt;
Note that the absense of hard drives also means that the nodes cannot swap memory, so be sure that your computation fits within memory.&lt;br /&gt;
&lt;br /&gt;
==Data Transfer==&lt;br /&gt;
{{:Data_Transfer}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==File/Ownership Management (ACL)==&lt;br /&gt;
* By default, at SciNet, users within the same group have read permission to each other's files (not write)&lt;br /&gt;
* You may use access control list ('''ACL''') to allow your supervisor (or another user within your group) to manage files for you (i.e., create, move, rename, delete), while still retaining your access and permission as the original owner of the files/directories.&lt;br /&gt;
* '''NOTE''': We highly recommend that you never give write permission to other users on the top level of your home directory (/home/[owner]), since that would seriously compromise your privacy, in addition to disable ssh key authentication, among other things. If necessary, make specific sub-directories under your home directory so that other users can manipulate/access files from those.&lt;br /&gt;
* If you need to set up permissions across groups [mailto:support@scinet.utoronto.ca contact us] (and the other group's supervisor!).&lt;br /&gt;
&lt;br /&gt;
===Using  setfacl/getfacl===&lt;br /&gt;
* To allow [supervisor] to manage files in /project/group/[owner] using '''setfacl''' and '''getfacl''' commands, follow the 3-steps below as the [owner] account from a shell:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1) $ /scinet/gpc/bin/setfacl -d -m user:[supervisor]:rwx /project/group/[owner]&lt;br /&gt;
   (every *new* file/directory inside [owner] will inherit [supervisor] ownership by default from now on)&lt;br /&gt;
&lt;br /&gt;
2) $ /scinet/gpc/bin/setfacl -d -m user:[owner]:rwx /project/group/[owner]&lt;br /&gt;
   (but will also inherit [owner] ownership, ie, ownership of both by default, for files/directories created by [supervisor])&lt;br /&gt;
&lt;br /&gt;
3) $ /scinet/gpc/bin/setfacl -Rm user:[supervisor]:rwx /project/group/[owner]&lt;br /&gt;
   (recursively modify all *existing* files/directories inside [owner] to also be rwx by [supervisor])&lt;br /&gt;
&lt;br /&gt;
   $ /scinet/gpc/bin/getfacl /project/group/[owner]&lt;br /&gt;
   (to determine the current ACL attributes)&lt;br /&gt;
&lt;br /&gt;
   $ /scinet/gpc/bin/setfacl -b /project/group/[owner]&lt;br /&gt;
   (to remove any previously set ACL)&lt;br /&gt;
&lt;br /&gt;
PS: on the datamovers getfacl, setfacl and chacl will be on your path&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
For more information on using [http://linux.die.net/man/1/setfacl &amp;lt;tt&amp;gt;setfacl&amp;lt;/tt&amp;gt;] or [http://linux.die.net/man/1/getfacl &amp;lt;tt&amp;gt;getfacl&amp;lt;/tt&amp;gt;] see their man pages.&lt;br /&gt;
&lt;br /&gt;
===Using mmputacl/mmgetacl===&lt;br /&gt;
* Alternatively, you may use gpfs' native '''mmputacl''' and '''mmgetacl''' commands. The advantages are that you can set &amp;quot;control&amp;quot; permission and that [http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.gpfs.doc%2Fgpfs31%2Fbl1adm1160.html POSIX or NFS v4 style ACL] are supported. You will need first to create a /tmp/supervisor.acl file with the following contents:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user::rwxc&lt;br /&gt;
group::----&lt;br /&gt;
other::----&lt;br /&gt;
mask::rwxc&lt;br /&gt;
user:[owner]:rwxc&lt;br /&gt;
user:[supervisor]:rwxc&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then issue the following 2 commands:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1) $ mmputacl -i /tmp/supervisor.acl /project/group/[owner]&lt;br /&gt;
2) $ mmputacl -d -i /tmp/supervisor.acl /project/group/[owner]&lt;br /&gt;
   (every *new* file/directory inside [owner] will inherit [supervisor] ownership by default as well as &lt;br /&gt;
   [owner] ownership, ie, ownership of both by default, for files/directories created by [supervisor])&lt;br /&gt;
&lt;br /&gt;
   $ mmgetacl /project/group/[owner]&lt;br /&gt;
   (to determine the current ACL attributes)&lt;br /&gt;
&lt;br /&gt;
   $ mmdelacl -d /project/group/[owner]&lt;br /&gt;
   (to remove any previously set ACL)&lt;br /&gt;
&lt;br /&gt;
   $ mmeditacl /project/group/[owner]&lt;br /&gt;
   (to create or change a GPFS access control list)&lt;br /&gt;
   (for this command to work set the EDITOR environment variable: export EDITOR=/usr/bin/vi)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is no option to recursively add or remove ACL attributes using a gpfs built-in command. You'll need to use the -i option as above for each file or directory individually. [[Data_Management#bash_script_that_you_may_adapt_to_recursively_add_or_remove_ACL_attributes_using_gpfs_built-in_commands | Here is a sample bash script you may use for that purpose]]&lt;br /&gt;
&lt;br /&gt;
For more information on using [http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.gpfs.doc%2Fgpfs31%2Fbl1adm11120.html &amp;lt;tt&amp;gt;mmputacl&amp;lt;/tt&amp;gt;] or [http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.gpfs.doc%2Fgpfs31%2Fbl1adm11120.html &amp;lt;tt&amp;gt;mmgetaclacl&amp;lt;/tt&amp;gt;] see their man pages.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Appendix===&lt;br /&gt;
====bash script that you may adapt to recursively add or remove ACL attributes using gpfs built-in commands====&lt;br /&gt;
Courtesy of Agata Disks (http://csngwinfo.in2p3.fr/mediawiki/index.php/GPFS_ACL)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# USAGE&lt;br /&gt;
#     - on one directory:     ./set_acl.sh dir_name&lt;br /&gt;
#     - on more directories:  ./set_acl.sh 'dir_nam*'&lt;br /&gt;
#&lt;br /&gt;
&lt;br /&gt;
# Path of the file that contains the ACL&lt;br /&gt;
ACL_FILE_PATH=/agatadisks/data/acl_file.acl&lt;br /&gt;
&lt;br /&gt;
# Directories onto the ACLs have to be set&lt;br /&gt;
dirs=$1&lt;br /&gt;
&lt;br /&gt;
# Recursive function that sets ACL to files and directories&lt;br /&gt;
set_acl () {&lt;br /&gt;
  curr_dir=$1&lt;br /&gt;
  for args in $curr_dir/*&lt;br /&gt;
  do&lt;br /&gt;
    if [ -f $args ]; then&lt;br /&gt;
      echo &amp;quot;ACL set on file $args&amp;quot;&lt;br /&gt;
      mmputacl -i $ACL_FILE_PATH $args&lt;br /&gt;
      if [ $? -ne 0 ]; then&lt;br /&gt;
        echo &amp;quot;ERROR: ACL not set on $args&amp;quot;&lt;br /&gt;
        exit -1&lt;br /&gt;
      fi&lt;br /&gt;
    fi&lt;br /&gt;
    if [ -d $args ]; then&lt;br /&gt;
      # Set Default ACL in directory&lt;br /&gt;
      mmputacl -i $ACL_FILE_PATH $args -d&lt;br /&gt;
      if [ $? -ne 0 ]; then&lt;br /&gt;
        echo &amp;quot;ERROR: Default ACL not set on $args&amp;quot;&lt;br /&gt;
        exit -1&lt;br /&gt;
      fi&lt;br /&gt;
      echo &amp;quot;Default ACL set on directory $args&amp;quot;&lt;br /&gt;
      # Set ACL in directory&lt;br /&gt;
      mmputacl -i $ACL_FILE_PATH $args&lt;br /&gt;
      if [ $? -ne 0 ]; then&lt;br /&gt;
        echo &amp;quot;ERROR: ACL not set on $args&amp;quot;&lt;br /&gt;
        exit -1&lt;br /&gt;
      fi&lt;br /&gt;
      echo &amp;quot;ACL set on directory $args&amp;quot;&lt;br /&gt;
      set_acl $args&lt;br /&gt;
    fi&lt;br /&gt;
  done&lt;br /&gt;
}&lt;br /&gt;
for dir in $dirs&lt;br /&gt;
do&lt;br /&gt;
  if [ ! -d $dir ]; then&lt;br /&gt;
    echo &amp;quot;ERROR: $dir is not a directory&amp;quot;&lt;br /&gt;
    exit -1&lt;br /&gt;
  fi&lt;br /&gt;
  set_acl $dir&lt;br /&gt;
done&lt;br /&gt;
exit 0&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Hierarchical Storage Management (HSM)==&lt;br /&gt;
'''(a pilot project is starting in July/2010 with a select group of users)'''&lt;br /&gt;
&lt;br /&gt;
===Basic Concepts===&lt;br /&gt;
'''Hierarchical Storage Management (HSM)''' is a data storage technique which automatically moves data between high-cost and low-cost storage media. HSM systems exist because high-speed storage devices, such as hard disk drive arrays, are more expensive (per byte stored) than slower devices, such as optical discs and magnetic tape drives. While it would be ideal to have all data available on high-speed devices all the time, this is prohibitively expensive for many organizations. Instead, HSM systems store the bulk of the enterprise's data on slower devices, and then copy data to faster disk drives when needed. In effect, HSM turns the fast disk drives into caches for the slower mass storage devices. The HSM system monitors the way data is used and makes best guesses as to which data can safely be moved to slower devices and which data should stay on the fast devices.&lt;br /&gt;
&lt;br /&gt;
In a typical HSM scenario, data files which are frequently used are stored on disk drives, but are eventually migrated to tape if they are not used for a certain period of time, typically a few months. If a user does reuse a file which is on tape, it is automatically moved back to disk storage. The advantage is that the total amount of stored data can be much larger than the capacity of the disk storage available, but since only rarely-used files are on tape, most users will not notice any slowdown.&lt;br /&gt;
&lt;br /&gt;
The HSM client provides both ''automatic'' and ''selective migration''. Once file ''migration'' begins, the HSM client sends a copy of your file to storage volumes on disk devices or devices that support removable media, such as tape and replaces the original file with a ''stub file'' on HSM managed file system (aka ''repository'' at SciNet)&lt;br /&gt;
&lt;br /&gt;
'''Repository''' commonly refers to a location for long-term storage, often for safety or preservation. &lt;br /&gt;
&lt;br /&gt;
'''Migration''', in the context of HSM, refers to set of actions that move files from the front-end disk based repository to a back-end tape library system (often invisible or inaccessible to users)&lt;br /&gt;
&lt;br /&gt;
'''Relocation''', in the context of SciNet, refers to the use of unix commands such as copy, move, tar or rsync to get data into the repository.&lt;br /&gt;
&lt;br /&gt;
'''The stub file''' is a small replacement file that makes it appear as though the original file is on the repository. It contains required metadata information to locate and recall a migrated file and to respond to specific UNIX commands without recalling the file.&lt;br /&gt;
&lt;br /&gt;
'''Automatic migration''' periodically monitors space usage and automatically migrates eligible files according to the options and settings that have been selected. The HSM client provides two types of automatic migration: ''threshold migration'' and ''demand migration''.&lt;br /&gt;
&lt;br /&gt;
'''Threshold migration''' maintains a specific level of free space on the repository file system. When disk usage reaches the high threshold percentage, eligible files are migrated to tapes automatically. When space usage drops to the low threshold set for the file system, file migration stops.&lt;br /&gt;
&lt;br /&gt;
'''Demand migration''' responds to an out-of-space condition on the repository file system. Demand migration starts automatically if the file system runs out of space (usually triggered at 90%). For HSM, as files are migrated (oldest/largest first), space becomes available on the file system, and the process or event that caused the out-of-space condition can be resumed.&lt;br /&gt;
&lt;br /&gt;
'''Selective migration''' often an user given HSM command, that migrates specific files from the repository at will, in anticipation of the automatic migration, or independently of the system wide eligibility criteria. For example, if you know that you will not be using a particular group of files for an extended time, you can migrate them, so as to free additional space on the repository.&lt;br /&gt;
&lt;br /&gt;
'''Reclamation''' is the process of reclaiming unused space on a tape (applies to Virtual Tapes as well). Over time, as files/directories get deleted or updated on the repository, a process will expire old data, creating gaps of unused storage on the tapes. Since tapes are sequential media, typical tape handling software can only write data to the end of the tape, so these gaps of “Empty Space” cannot be used. The process entails periodically and in a rolling fashion copying active data from the &amp;quot;Swiss Cheese&amp;quot; like tapes to unused tapes on a compacted form.&lt;br /&gt;
&lt;br /&gt;
'''Optimal environment:''' HSM should be used in an environment where the old and large files which need to be preserved are not used regularly. Files that are needed frequently should not be migrated at all, otherwise HSM would act as a cache, by migrating files and shortly after the migration the same files will be recalled. This is not advisable. The repository file system needs to be large enough to hold all regularly used files. If the file system is too small and cannot hold all regularly needed files (**), HSM is permanently recalling requested files, getting beyond the high-threshold limit, migrating other files to get below the low-threshold limit and so on.&lt;br /&gt;
&lt;br /&gt;
===Deployment at SciNet===&lt;br /&gt;
HSM is performed by a dedicated IBM software made up of a number of HSM daemons running on '''datamover2'''. These daemons constantly monitor the usage of the '''/repository''' GPFS and, depending on a predefined set of policies, data may be automatically or manually migrated to the Tivoli Storage Management server (TSM), and kept on our library of LTO-4 tapes.&lt;br /&gt;
&lt;br /&gt;
'''/repository is a 15TB &amp;quot;transient&amp;quot; location''' accessible only from datamover2. Users may relocate data as required from /scratch or /project to /repository in a number of ways, such as copy, move, tar or rsync. &amp;quot;Transient&amp;quot; refers to the fact that /repository works like a &amp;quot;Black Hole&amp;quot;: in the background '''it is constantly being emptied''', even while you relocate data in from other file systems. What is left behind is the directory tree with the stub files (0 byte in size at SciNet) and the metadata associated with it, which takes up about 1-2%. But, even if /repository is at 1% to start with, we ask that you please do not initiate a relocation of more than a 10TB chunk at once, so that the system has time to process your data and still allow other user(s) to migrate/recall some material before reaching 100% full. &lt;br /&gt;
&lt;br /&gt;
Inside /repository, data is segregated on a per group basis, just as in /project. Within groups, users and group supervisors can structure materials anyway they prefer. But the recommendation is that those involved spend some time designing that structure ahead of time, since you may merge data from project and/or scratch (or even home). In tests we performed, we've been able to reorganize the FS structure after migration, change the name and ownership of directories and stubs, and still recall files under the new path and ownership. HSM does seem to keep a very symbiotic relation between the metadata and the inode attributes at the file system level, without necessarily having to replicate these changes with tape recall &amp;amp; migration operations. But please don't abuse this flexibility. If possible, keep your initial layout structure somewhat fixed over time.&lt;br /&gt;
&lt;br /&gt;
We also recommend that users bundle files in tar-balls of at least 10GB before relocation, and keep a listing of those files somewhere; in fact you may use the 'tar' command to create the tar-ball directly in /repository on-the-fly. See examples below:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -czvf /repository/[group]/[user]/myproject1.tar.gz /project/[group]/[user]/project1/ &amp;gt; /project/[group]/[user]/myproject1-repository-listing.txt&lt;br /&gt;
&lt;br /&gt;
or&lt;br /&gt;
&lt;br /&gt;
tar -czvf /repository/[group]/[user]/myscratch1.tar.gz /scratch/[user]/scratchdata1/ &amp;gt; /home/[user]/myscratch1-repository-listing.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It is important to keep a listing of the files that are in each tar on a partition other than the HSM repository so that you can quickly decide which tar you need to recover. While the tar stub will always exist on the HSM disk, you will not be able to run tar --list on the stub without recalling the full tar file back from tape to disk. The redirection in the examples above accomplishes this.&lt;br /&gt;
&lt;br /&gt;
The important is to avoid the relocation of many thousands (or millions) of small files. It's very demanding on the system to constantly scan/reconcile all these files on the file system, tapes, metadata and database. A good reference is '''average file size &amp;gt; 100MB''' in /repository. Deep directory nesting in general also increases the time required to traverse a file system and thus should be avoided where possible.&lt;br /&gt;
&lt;br /&gt;
===Performance===&lt;br /&gt;
Unlike /project or /scratch, /repository is only a 2 tier disk raid, so don't expect transfer rates much higher than 60MB/sec on a rsync session for example. In another words, a 10TB offload operation will typically take 2 days to complete, if made up of large files. On the other hand, we have conducted experiments where we migrated only 1TB, but with 1 million files expanded, and that took nearly a day! This is a situation should to be avoided for many TeraBytes. Performance is as much a function of the number of files as the amount of data&lt;br /&gt;
&lt;br /&gt;
As for the &amp;quot;ideal tar-ball size&amp;quot;, experiments have shown that an isolated 10GB tar-ball typically takes 10-15 minutes to be pulled back, considering all tape operations involved. That seems like a reasonable amount of time to wait for a group of files kept off-line for an extended period of time. Also consider that pulling back an individual tiny file could still take as long as 5-8 minutes. So, it's pretty clear that you get the best pay for the buck by tar'ing your material, and you won't tie up the tape system for too long. As for the upper limit, you can probably bundle files in 100-500GB tar-balls, provided that you're OK with waiting a couple of hours for them to be recalled at a later date; at least from SciNet's perceptive, it would be a very efficient migration.&lt;br /&gt;
&lt;br /&gt;
'''Please be sure to contact us to schedule your transfers IN or OUT of the system, to avoid conflict with other users or within the system settings.''' For instance, if you recall large amounts of data at once, let's say 7.5TB (about half of /repository), we would have to adjust the high threshold accordingly for that period (to 50%), so we don't induce the never ending migrate/recall issues (**) described on the ''Optimal environment''.&lt;br /&gt;
&lt;br /&gt;
===How to migrate/recall data===&lt;br /&gt;
&lt;br /&gt;
====Automatic====&lt;br /&gt;
We currently setup /repository with '''High and Low thresholds of 2% and 1% respectively'''. That means, at regular intervals the file system is monitored to determine if the 2% usage mark has been reached or surpassed. In that case, data is automatically migrated to tapes, oldest (or largest) first, until the file system is down to 1%, if possible (metadata is not migrated). Since data may be copied/moved/rsync'ed/tar'ed in faster than /repository can be emptied, you may observe 80-90% disk usages sporadically (hence the 10TB chunk of data limit). For now at SciNet we migrate every file in /repository to tapes.&lt;br /&gt;
&lt;br /&gt;
To recall a file automatically all you have to do is '''access''' it. There are many ways you can do this. For example, you may view a file with 'cat', 'more', 'vi/vim', etc. You may also copy the file (or directory) from /repository to another location. '''Please be patient:''' the file will have to be pulled back from tape, and this will take some time, longer if it happens to be at the end of a tape.&lt;br /&gt;
&lt;br /&gt;
====Selective====&lt;br /&gt;
&lt;br /&gt;
Used to overwrite the internal priority of HSM (oldest/largest) or to migrate files/directories &amp;quot;immediately&amp;quot;. The recommendation is to '''not wait''' for the automatic migration cycle to kick in, since this could take some 6 to 12 hours at SciNet. If you already know that you relocated material to repository with the intention of having it migrated to tapes, you can just use dsmmigrate as soon as the rsync to repository has finished, for instance.&lt;br /&gt;
&lt;br /&gt;
(files won't be migrated until they have &amp;quot;aged&amp;quot; for at least 5 minutes, that is, after their last access/modification time)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
dsmmigrate [path to FILE]&lt;br /&gt;
or&lt;br /&gt;
dsmmigrate -R -D /repository/[group]/[user]/[directory]&lt;br /&gt;
or&lt;br /&gt;
dsmmigrate /repository/scinet/pinto/blahblahblah.tar.Z&lt;br /&gt;
or&lt;br /&gt;
{&lt;br /&gt;
 cd /repository/scinet/pinto/&lt;br /&gt;
 dsmmigrate blahblahblah.tar.Z&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To selectively recall data, just type:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
dsmrecall [path to FILE]&lt;br /&gt;
or&lt;br /&gt;
dsmrecall -R -D /repository/[group]/[user]/[directory]&lt;br /&gt;
or&lt;br /&gt;
dsmrecall /repository/scinet/pinto/blahblahblah.tar.Z&lt;br /&gt;
or&lt;br /&gt;
{&lt;br /&gt;
 cd /repository/scinet/pinto/&lt;br /&gt;
 dsmrecall blahblahblah.tar.Z&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Note:''' We've been finding that the search for new candidates for automatic migration takes much longer once repository is already full of files/stubs. That is to be expected, hence the recommendation to not wait and proceed with the selective migration of your own files/directories asap.&lt;br /&gt;
&lt;br /&gt;
===Disaster Recovery===&lt;br /&gt;
&lt;br /&gt;
As with any disk based storage, although it's a raid 5 file system, repository is not immune to failures. We do not do regular backups, but it's possible to do a full recovery in case of catastrophic loss of repository. '''For that it's important that all files have been completely migrated to tapes''' before hand. That puts the onus on users to ensure this migration is indeed finished (with selective migration) for the relocated material before they delete the originals from /project or /scratch.&lt;br /&gt;
&lt;br /&gt;
===Common HSM commands===&lt;br /&gt;
Some traditional unix/linux commands, such as 'ls' or 'rm' for instance, will work with the stub file as the real files. But others, such as 'du' or 'df', you better use a HSM equivalent, which will give you more meaningful information in the context of HSM. They only work inside /repository. Some of them will be executable only by root, such as 'dsmrm', in which case you'll be notified.&lt;br /&gt;
&lt;br /&gt;
===='''dsmls'''====&lt;br /&gt;
to check status of files; used in the directory where you expect to have migrated files&lt;br /&gt;
&lt;br /&gt;
'''r''': ''resident''    (the file is on repository only)&lt;br /&gt;
&lt;br /&gt;
'''m''': ''migrated''    (only the stub of the file is on repository)&lt;br /&gt;
&lt;br /&gt;
'''p''': ''premigrated'' (the file is on repository and on tape)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: dsmls [-Noheader] [-Recursive] [-Help] [file specs|-FIlelist=file]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc-logindm02-$ dsmls -R a3&lt;br /&gt;
IBM Tivoli Storage Manager&lt;br /&gt;
Command Line Space Management Client Interface&lt;br /&gt;
  Client Version 6, Release 1, Level 0.0  &lt;br /&gt;
  Client date/time: 07/27/2010 12:06:36&lt;br /&gt;
(c) Copyright by IBM Corporation and other(s) 1990, 2009. All Rights Reserved.&lt;br /&gt;
&lt;br /&gt;
      Actual     Resident     Resident  File   File&lt;br /&gt;
        Size         Size     Blk (KB)  State  Name&lt;br /&gt;
       &amp;lt;dir&amp;gt;         8192            8   -      a3/&lt;br /&gt;
&lt;br /&gt;
/repository/scinet/pinto/a3:&lt;br /&gt;
 34008432640            0            0   m      32G-1&lt;br /&gt;
 34008432640  34008432640            0   r      32G-2&lt;br /&gt;
 34008432640  34008432640            0   p      32G-3&lt;br /&gt;
           0            0            0   r      dsmerror.log&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===='''dsmdu'''==== &lt;br /&gt;
disk usage on the original files/directory&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: dsmdu [-Allfiles] [-Summary] [-Help] [directory names]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===='''dsmdf'''====&lt;br /&gt;
disk free on the HSM file system.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: dsmdf [-Help] [-Detail] [file systems]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===='''dsmmigrate'''====&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: dsmmigrate [-Recursive] [-Premigrate] [-Detail] [-Help] filespecs|-FIlelist=file &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===='''dsmrecall'''====&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: dsmrecall [-Recursive] [-Detail] [-Help] file specs|-FIlelist=file&lt;br /&gt;
   or  dsmrecall [-Detail] -offset=XXXX[kmgKMG] -size=XXXX[kmgKMG] file specs &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To have an idea of what HSM is doing on datamover2 at a given time:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[pinto@gpc-logindm02 ~]$ ps -def | grep dsm | grep -v mmfs&lt;br /&gt;
&lt;br /&gt;
root      2455 15190  0 16:26 ?        00:00:00 dsmmonitord&lt;br /&gt;
root      2456  2455  2 16:26 ?        00:05:38 dsmautomig -2 system::/repository&lt;br /&gt;
pinto    10997 10637 30 16:40 pts/3    01:14:20 dsmmigrate -R -D pinto&lt;br /&gt;
root     12857     1  0 16:15 ?        00:00:00 dsmrecalld&lt;br /&gt;
root     13013 12857  0 16:15 ?        00:00:01 dsmrecalld&lt;br /&gt;
root     13015 12857  0 16:15 ?        00:00:00 dsmrecalld&lt;br /&gt;
root     15190     1  0 16:15 ?        00:00:00 dsmmonitord&lt;br /&gt;
root     16936     1  3 16:15 ?        00:10:44 dsmscoutd&lt;br /&gt;
root     17217     1 13 16:16 ?        00:36:49 dsmrootd&lt;br /&gt;
root     18732  2456  4 17:51 ?        00:07:19 dsmautomig -2 system::/repository&lt;br /&gt;
root     18737  2456  0 17:51 ?        00:00:26 dsmautomig -2 system::/repository&lt;br /&gt;
pinto    24533 10363  0 20:48 pts/2    00:00:00 grep dsm&lt;br /&gt;
root     25090     1  0 06:42 ?        00:00:08 dsmwatchd nodetach&lt;br /&gt;
root     30840 13013  0 17:15 ?        00:00:02 dsmrecalld&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the above example, dsmmonitord, dsmrecalld, dsmscoutd, dsmrootd and dsmwatchd are the 5 typical HSM daemons, and they always running. In addition, there are 3 streams of dsmautomig (triggered by threshold migration) and 1 stream of dsmmigrate (selective migration initiated by user pinto).&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Data_Management&amp;diff=2321</id>
		<title>Data Management</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Data_Management&amp;diff=2321"/>
		<updated>2010-12-12T16:54:52Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* Deployment at SciNet */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Storage Space==&lt;br /&gt;
SciNet's storage system is based on IBM's [http://en.wikipedia.org/wiki/IBM_General_Parallel_File_System GPFS] (General Parallel File System).   There are two main systems for user data: &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt;, a small, backed-up space where user home directories are located, and &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt;, a large system for input or output data for jobs; data on &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt; is not only not backed up (a third storage system, /project, exist only for groups with LRAC/NRAC allocations). Data placed on scratch will be deleted if it has not been accessed in 3 months.  SciNet does not provide long-term storage for large data sets.  &lt;br /&gt;
&lt;br /&gt;
===Overview of the different file systems===&lt;br /&gt;
&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! {{Hl2}} | file system &lt;br /&gt;
! {{Hl2}} | purpose &lt;br /&gt;
! {{Hl2}} | quota &lt;br /&gt;
! {{Hl2}} | block size &lt;br /&gt;
! {{Hl2}} | backed up&lt;br /&gt;
! {{Hl2}} | purged&lt;br /&gt;
! {{Hl2}} | access &lt;br /&gt;
|- &lt;br /&gt;
| /home&lt;br /&gt;
| development&lt;br /&gt;
| 10 GB&lt;br /&gt;
| 256 KB&lt;br /&gt;
| yes&lt;br /&gt;
| never&lt;br /&gt;
| read-only on compute nodes (r/w on login, devel and datamover1) &lt;br /&gt;
|- &lt;br /&gt;
| /scratch&lt;br /&gt;
| computation&lt;br /&gt;
| 20 TB&lt;br /&gt;
| 4 MB&lt;br /&gt;
| no&lt;br /&gt;
| files &amp;gt; 3 month&lt;br /&gt;
| read/write on all nodes&lt;br /&gt;
|- &lt;br /&gt;
| /project&lt;br /&gt;
| computation&lt;br /&gt;
| by allocation&lt;br /&gt;
| 256 KB&lt;br /&gt;
| no&lt;br /&gt;
| never&lt;br /&gt;
| read/write on all nodes&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Home Disk Space===&lt;br /&gt;
&lt;br /&gt;
Every SciNet user gets a 10GB directory on &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; which is regularly backed-up.   Home is visible from &amp;lt;tt&amp;gt;login.scinet&amp;lt;/tt&amp;gt; nodes, and from the development nodes on [[GPC_Quickstart | GPC]] and the [[TCS_Quickstart | TCS]].  However, on the compute nodes of the GPC clusters -- as when jobs are running -- &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; is mounted '''''read-only'''''; thus GPC jobs can read files in /home but cannot write to files there.   &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; is a good place to put code, input files for runs, and anything else that needs to be kept to reproduce runs.  On the other hand, &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; is not a good place to put many small files, since&lt;br /&gt;
the block size for the file system is 256KB, so you would quickly run out of disk quota and you will make the backup system very slow.&lt;br /&gt;
&lt;br /&gt;
If your application absolutely insists on writing material to your home account and you can't find a way to instruct it to write somewhere else, an alternative is to create a link pointing from your account under /home to a location under /scratch.&lt;br /&gt;
&lt;br /&gt;
===Scratch Disk Space===&lt;br /&gt;
&lt;br /&gt;
Every SciNet user also gets a directory in &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt;.   Scratch is visible from the &amp;lt;tt&amp;gt;login.scinet&amp;lt;/tt&amp;gt; nodes,  the development nodes on [[GPC_Quickstart | GPC]] and the [[TCS_Quickstart | TCS]], and on the compute nodes of the clusters, mounted as read-write.   Thus jobs would normally write their output somewhere in &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt;.  There are '''NO''' backups of anything on &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
There is a large amount of space available on &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt; but it is purged routinely so that all users running jobs and generating large outputs will have room to store their data temporarily.  Computational results which you want to keep longer than this must be copied (using &amp;lt;tt&amp;gt;scp&amp;lt;/tt&amp;gt;) off of SciNet entirely and to your local system.   SciNet does not routinely provide long-term storage for large data sets.&lt;br /&gt;
&lt;br /&gt;
===Scratch Disk Purging Policy===&lt;br /&gt;
&lt;br /&gt;
In order to ensure that there is always significant space available for running jobs '''we automatically delete files in /scratch that have not been accessed for more than 3 months by the actual deletion day on the 15th of each month'''. This policy is subject to revision depending on its effectiveness. More details about the purging process and how users can check if their files will be deleted follows. If you have files scheduled for deletion you should move them to a more permanent locations such as your departmental server or your /project space (for PIs who have either been allocated disk space by the LRAC or have bought diskspace).&lt;br /&gt;
&lt;br /&gt;
On the '''first''' of each month, a list of files scheduled for purging is produced, and an email notification is sent to each user on that list. Furthermore, at/or about the '''12th''' of each month a 2nd scan produces a more current assessment and another email notification is sent. This way users can double check that they have indeed taken care of all the files they needed to relocate before the purging deadline. Those files will be automatically deleted on the '''15th''' of the same month unless they have been accessed or relocated in the interim. If you have files scheduled for deletion then they will be listed in a file in /scratch/todelete/current, which has your userid and groupid in the filename. For example, if user xxyz wants to check if they have files scheduled for deletion they can issue the following command on a system which mounts /scratch (e.g. a scinet login node): '''ls -l1 /scratch/todelete/current |grep xxyz'''. In the example below, the name of this file indicates that user xxyz is part of group abc, has 9,560 files scheduled for deletion and they take up 1.0TB of space:&lt;br /&gt;
&lt;br /&gt;
 [xxyz@scinet04 ~]$ ls -l1 /scratch/todelete/current |grep xxyz&lt;br /&gt;
 -rw-r----- 1 xxyz     root       1733059 Jan 12 11:46 10001___xxyz_______abc_________1.00T_____9560files&lt;br /&gt;
&lt;br /&gt;
The file itself contains a list of all files scheduled for deletion (in the last column) and can be viewed with standard commands like more/less/cat - e.g. '''more /scratch/todelete/current/10001___xxyz_______abc_________1.00T_____9560files'''&lt;br /&gt;
&lt;br /&gt;
Similarly, you can also verify all other users on your group by using the ls command with grep on your group. For example: '''ls -l1 /scratch/todelete/current |grep abc'''. That will list all other users in the same group that xxyz is part of, and have files to be purged on the 15th. Members of the same group have access to each other's contents.&lt;br /&gt;
&lt;br /&gt;
'''NOTE:''' Preparing these assessments takes several hours. If you change the access/modification time of a file in the interim, that will not be detected until the next cycle. A way for you to get immediate feedback is to use the ''''ls -lu'''' command on the file. If the file atime has been updated, coming the purging date on the 15th it will not be deleted any longer.&lt;br /&gt;
&lt;br /&gt;
===Project Disk Space===&lt;br /&gt;
&lt;br /&gt;
Investigators who have been granted allocations through the [http://www.scinet.utoronto.ca/resources/Account_Allocations.htm LRAC/NRAC Application Process] may have been allocated disk space in addition to compute time.   For the period of time that the allocation is granted, they will have disk space on the &amp;lt;tt&amp;gt;/project&amp;lt;/tt&amp;gt; disk system.  Space on the project systems are not purged, but neither are they backed up.   All members of the investigators groups will have access to these systems, which will be mounted read/write everywhere.&lt;br /&gt;
&lt;br /&gt;
===How much Disk Space Do I have left?===&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;'''/scinet/gpc/bin/diskUsage'''&amp;lt;/tt&amp;gt; command, available on the login nodes, datamovers and the GPC devel nodes, provides information in a number of ways on the home, scratch, and project file systems. For instance, how much disk space is being used by yourself and your group (with the -a option), or how much your usage has changed over a certain period (&amp;quot;delta information&amp;quot;) or you may generate plots of your usage over time. Please see the usage help below for more details.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: diskUsage [-h|-?| [-a] [-u &amp;lt;user&amp;gt;] [-de|-plot]&lt;br /&gt;
       -h|-?: help&lt;br /&gt;
       -a: list usages of all members on the group&lt;br /&gt;
       -u &amp;lt;user&amp;gt;: as another user on your group&lt;br /&gt;
       -de: include delta information&lt;br /&gt;
       -plot: create plots of disk usages&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note that information on usage and quota is only updated hourly!&lt;br /&gt;
&lt;br /&gt;
===Performance===&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/IBM_General_Parallel_File_System GPFS] is a high-performance filesystem which provides rapid reads and writes to large datasets in parallel from many nodes.  As a consequence of this design, however, '''the file system performs quite ''poorly'' at accessing data sets which consist of many, small files.'''  For instance, you will find that reading data in from one 4MB file is enormously faster than from 100 40KB files.   Such small files are also quite wasteful of space, as the blocksize for the filesystem is 4MB.   This is something you should keep in mind when planning your input/output strategy for runs on SciNet.&lt;br /&gt;
&lt;br /&gt;
For instance, if you run multi-process jobs, having each process write to a file of its own is not an scalable I/O solution. A directory gets locked by the first process accessing it, so all other processes have to wait for it. Not only has the code just become considerably less parallel, chances are the file system will have a time-out while waiting for your other processes, leading your program to crash mysteriously.&lt;br /&gt;
Consider using MPI-IO (part of the MPI-2 standard), which allows files to be opened simultaneously&lt;br /&gt;
by different processes, or using a dedicated process for I/O to which all other processes send their&lt;br /&gt;
data, and which subsequently writes this data to a single file.&lt;br /&gt;
&lt;br /&gt;
===Local Disk===&lt;br /&gt;
&lt;br /&gt;
The compute nodes on the GPC '''do not contain hard drives''' so there is no local disk available to use during your computation.  You can however use part of a compute nodes RAM like a local disk ('ramdisk') but this will reduce how much memory is available for your&lt;br /&gt;
program.  This can be accessed using &amp;lt;tt&amp;gt;/dev/shm/&amp;lt;/tt&amp;gt; and is currently set to 8GB.  Anything written&lt;br /&gt;
to this location that you want to keep must be copied back to the &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt; filesystem as &amp;lt;tt&amp;gt;/dev/shm&amp;lt;/tt&amp;gt; is wiped after each job and since it is in memory will not survive through a reboot of the node. More on ramdisk usage can be found [[User_Ramdisk | here]].&lt;br /&gt;
&lt;br /&gt;
Note that the absense of hard drives also means that the nodes cannot swap memory, so be sure that your computation fits within memory.&lt;br /&gt;
&lt;br /&gt;
==Data Transfer==&lt;br /&gt;
{{:Data_Transfer}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==File/Ownership Management (ACL)==&lt;br /&gt;
* By default, at SciNet, users within the same group have read permission to each other's files (not write)&lt;br /&gt;
* You may use access control list ('''ACL''') to allow your supervisor (or another user within your group) to manage files for you (i.e., create, move, rename, delete), while still retaining your access and permission as the original owner of the files/directories.&lt;br /&gt;
* '''NOTE''': We highly recommend that you never give write permission to other users on the top level of your home directory (/home/[owner]), since that would seriously compromise your privacy, in addition to disable ssh key authentication, among other things. If necessary, make specific sub-directories under your home directory so that other users can manipulate/access files from those.&lt;br /&gt;
* If you need to set up permissions across groups [mailto:support@scinet.utoronto.ca contact us] (and the other group's supervisor!).&lt;br /&gt;
&lt;br /&gt;
===Using  setfacl/getfacl===&lt;br /&gt;
* To allow [supervisor] to manage files in /project/group/[owner] using '''setfacl''' and '''getfacl''' commands, follow the 3-steps below as the [owner] account from a shell:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1) $ /scinet/gpc/bin/setfacl -d -m user:[supervisor]:rwx /project/group/[owner]&lt;br /&gt;
   (every *new* file/directory inside [owner] will inherit [supervisor] ownership by default from now on)&lt;br /&gt;
&lt;br /&gt;
2) $ /scinet/gpc/bin/setfacl -d -m user:[owner]:rwx /project/group/[owner]&lt;br /&gt;
   (but will also inherit [owner] ownership, ie, ownership of both by default, for files/directories created by [supervisor])&lt;br /&gt;
&lt;br /&gt;
3) $ /scinet/gpc/bin/setfacl -Rm user:[supervisor]:rwx /project/group/[owner]&lt;br /&gt;
   (recursively modify all *existing* files/directories inside [owner] to also be rwx by [supervisor])&lt;br /&gt;
&lt;br /&gt;
   $ /scinet/gpc/bin/getfacl /project/group/[owner]&lt;br /&gt;
   (to determine the current ACL attributes)&lt;br /&gt;
&lt;br /&gt;
   $ /scinet/gpc/bin/setfacl -b /project/group/[owner]&lt;br /&gt;
   (to remove any previously set ACL)&lt;br /&gt;
&lt;br /&gt;
PS: on the datamovers getfacl, setfacl and chacl will be on your path&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
For more information on using [http://linux.die.net/man/1/setfacl &amp;lt;tt&amp;gt;setfacl&amp;lt;/tt&amp;gt;] or [http://linux.die.net/man/1/getfacl &amp;lt;tt&amp;gt;getfacl&amp;lt;/tt&amp;gt;] see their man pages.&lt;br /&gt;
&lt;br /&gt;
===Using mmputacl/mmgetacl===&lt;br /&gt;
* Alternatively, you may use gpfs' native '''mmputacl''' and '''mmgetacl''' commands. The advantages are that you can set &amp;quot;control&amp;quot; permission and that [http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.gpfs.doc%2Fgpfs31%2Fbl1adm1160.html POSIX or NFS v4 style ACL] are supported. You will need first to create a /tmp/supervisor.acl file with the following contents:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user::rwxc&lt;br /&gt;
group::----&lt;br /&gt;
other::----&lt;br /&gt;
mask::rwxc&lt;br /&gt;
user:[owner]:rwxc&lt;br /&gt;
user:[supervisor]:rwxc&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then issue the following 2 commands:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1) $ mmputacl -i /tmp/supervisor.acl /project/group/[owner]&lt;br /&gt;
2) $ mmputacl -d -i /tmp/supervisor.acl /project/group/[owner]&lt;br /&gt;
   (every *new* file/directory inside [owner] will inherit [supervisor] ownership by default as well as &lt;br /&gt;
   [owner] ownership, ie, ownership of both by default, for files/directories created by [supervisor])&lt;br /&gt;
&lt;br /&gt;
   $ mmgetacl /project/group/[owner]&lt;br /&gt;
   (to determine the current ACL attributes)&lt;br /&gt;
&lt;br /&gt;
   $ mmdelacl -d /project/group/[owner]&lt;br /&gt;
   (to remove any previously set ACL)&lt;br /&gt;
&lt;br /&gt;
   $ mmeditacl /project/group/[owner]&lt;br /&gt;
   (to create or change a GPFS access control list)&lt;br /&gt;
   (for this command to work set the EDITOR environment variable: export EDITOR=/usr/bin/vi)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is no option to recursively add or remove ACL attributes using a gpfs built-in command. You'll need to use the -i option as above for each file or directory individually. [[Data_Management#bash_script_that_you_may_adapt_to_recursively_add_or_remove_ACL_attributes_using_gpfs_built-in_commands | Here is a sample bash script you may use for that purpose]]&lt;br /&gt;
&lt;br /&gt;
For more information on using [http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.gpfs.doc%2Fgpfs31%2Fbl1adm11120.html &amp;lt;tt&amp;gt;mmputacl&amp;lt;/tt&amp;gt;] or [http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.gpfs.doc%2Fgpfs31%2Fbl1adm11120.html &amp;lt;tt&amp;gt;mmgetaclacl&amp;lt;/tt&amp;gt;] see their man pages.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Appendix===&lt;br /&gt;
====bash script that you may adapt to recursively add or remove ACL attributes using gpfs built-in commands====&lt;br /&gt;
Courtesy of Agata Disks (http://csngwinfo.in2p3.fr/mediawiki/index.php/GPFS_ACL)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# USAGE&lt;br /&gt;
#     - on one directory:     ./set_acl.sh dir_name&lt;br /&gt;
#     - on more directories:  ./set_acl.sh 'dir_nam*'&lt;br /&gt;
#&lt;br /&gt;
&lt;br /&gt;
# Path of the file that contains the ACL&lt;br /&gt;
ACL_FILE_PATH=/agatadisks/data/acl_file.acl&lt;br /&gt;
&lt;br /&gt;
# Directories onto the ACLs have to be set&lt;br /&gt;
dirs=$1&lt;br /&gt;
&lt;br /&gt;
# Recursive function that sets ACL to files and directories&lt;br /&gt;
set_acl () {&lt;br /&gt;
  curr_dir=$1&lt;br /&gt;
  for args in $curr_dir/*&lt;br /&gt;
  do&lt;br /&gt;
    if [ -f $args ]; then&lt;br /&gt;
      echo &amp;quot;ACL set on file $args&amp;quot;&lt;br /&gt;
      mmputacl -i $ACL_FILE_PATH $args&lt;br /&gt;
      if [ $? -ne 0 ]; then&lt;br /&gt;
        echo &amp;quot;ERROR: ACL not set on $args&amp;quot;&lt;br /&gt;
        exit -1&lt;br /&gt;
      fi&lt;br /&gt;
    fi&lt;br /&gt;
    if [ -d $args ]; then&lt;br /&gt;
      # Set Default ACL in directory&lt;br /&gt;
      mmputacl -i $ACL_FILE_PATH $args -d&lt;br /&gt;
      if [ $? -ne 0 ]; then&lt;br /&gt;
        echo &amp;quot;ERROR: Default ACL not set on $args&amp;quot;&lt;br /&gt;
        exit -1&lt;br /&gt;
      fi&lt;br /&gt;
      echo &amp;quot;Default ACL set on directory $args&amp;quot;&lt;br /&gt;
      # Set ACL in directory&lt;br /&gt;
      mmputacl -i $ACL_FILE_PATH $args&lt;br /&gt;
      if [ $? -ne 0 ]; then&lt;br /&gt;
        echo &amp;quot;ERROR: ACL not set on $args&amp;quot;&lt;br /&gt;
        exit -1&lt;br /&gt;
      fi&lt;br /&gt;
      echo &amp;quot;ACL set on directory $args&amp;quot;&lt;br /&gt;
      set_acl $args&lt;br /&gt;
    fi&lt;br /&gt;
  done&lt;br /&gt;
}&lt;br /&gt;
for dir in $dirs&lt;br /&gt;
do&lt;br /&gt;
  if [ ! -d $dir ]; then&lt;br /&gt;
    echo &amp;quot;ERROR: $dir is not a directory&amp;quot;&lt;br /&gt;
    exit -1&lt;br /&gt;
  fi&lt;br /&gt;
  set_acl $dir&lt;br /&gt;
done&lt;br /&gt;
exit 0&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Hierarchical Storage Management (HSM)==&lt;br /&gt;
'''(a pilot project is starting in July/2010 with a select group of users)'''&lt;br /&gt;
&lt;br /&gt;
===Basic Concepts===&lt;br /&gt;
'''Hierarchical Storage Management (HSM)''' is a data storage technique which automatically moves data between high-cost and low-cost storage media. HSM systems exist because high-speed storage devices, such as hard disk drive arrays, are more expensive (per byte stored) than slower devices, such as optical discs and magnetic tape drives. While it would be ideal to have all data available on high-speed devices all the time, this is prohibitively expensive for many organizations. Instead, HSM systems store the bulk of the enterprise's data on slower devices, and then copy data to faster disk drives when needed. In effect, HSM turns the fast disk drives into caches for the slower mass storage devices. The HSM system monitors the way data is used and makes best guesses as to which data can safely be moved to slower devices and which data should stay on the fast devices.&lt;br /&gt;
&lt;br /&gt;
In a typical HSM scenario, data files which are frequently used are stored on disk drives, but are eventually migrated to tape if they are not used for a certain period of time, typically a few months. If a user does reuse a file which is on tape, it is automatically moved back to disk storage. The advantage is that the total amount of stored data can be much larger than the capacity of the disk storage available, but since only rarely-used files are on tape, most users will not notice any slowdown.&lt;br /&gt;
&lt;br /&gt;
The HSM client provides both ''automatic'' and ''selective migration''. Once file ''migration'' begins, the HSM client sends a copy of your file to storage volumes on disk devices or devices that support removable media, such as tape and replaces the original file with a ''stub file'' on HSM managed file system (aka ''repository'' at SciNet)&lt;br /&gt;
&lt;br /&gt;
'''Repository''' commonly refers to a location for long-term storage, often for safety or preservation. &lt;br /&gt;
&lt;br /&gt;
'''Migration''', in the context of HSM, refers to set of actions that move files from the front-end disk based repository to a back-end tape library system (often invisible or inaccessible to users)&lt;br /&gt;
&lt;br /&gt;
'''Relocation''', in the context of SciNet, refers to the use of unix commands such as copy, move, tar or rsync to get data into the repository.&lt;br /&gt;
&lt;br /&gt;
'''The stub file''' is a small replacement file that makes it appear as though the original file is on the repository. It contains required metadata information to locate and recall a migrated file and to respond to specific UNIX commands without recalling the file.&lt;br /&gt;
&lt;br /&gt;
'''Automatic migration''' periodically monitors space usage and automatically migrates eligible files according to the options and settings that have been selected. The HSM client provides two types of automatic migration: ''threshold migration'' and ''demand migration''.&lt;br /&gt;
&lt;br /&gt;
'''Threshold migration''' maintains a specific level of free space on the repository file system. When disk usage reaches the high threshold percentage, eligible files are migrated to tapes automatically. When space usage drops to the low threshold set for the file system, file migration stops.&lt;br /&gt;
&lt;br /&gt;
'''Demand migration''' responds to an out-of-space condition on the repository file system. Demand migration starts automatically if the file system runs out of space (usually triggered at 90%). For HSM, as files are migrated (oldest/largest first), space becomes available on the file system, and the process or event that caused the out-of-space condition can be resumed.&lt;br /&gt;
&lt;br /&gt;
'''Selective migration''' often an user given HSM command, that migrates specific files from the repository at will, in anticipation of the automatic migration, or independently of the system wide eligibility criteria. For example, if you know that you will not be using a particular group of files for an extended time, you can migrate them, so as to free additional space on the repository.&lt;br /&gt;
&lt;br /&gt;
'''Reclamation''' is the process of reclaiming unused space on a tape (applies to Virtual Tapes as well). Over time, as files/directories get deleted or updated on the repository, a process will expire old data, creating gaps of unused storage on the tapes. Since tapes are sequential media, typical tape handling software can only write data to the end of the tape, so these gaps of “Empty Space” cannot be used. The process entails periodically and in a rolling fashion copying active data from the &amp;quot;Swiss Cheese&amp;quot; like tapes to unused tapes on a compacted form.&lt;br /&gt;
&lt;br /&gt;
'''Optimal environment:''' HSM should be used in an environment where the old and large files which need to be preserved are not used regularly. Files that are needed frequently should not be migrated at all, otherwise HSM would act as a cache, by migrating files and shortly after the migration the same files will be recalled. This is not advisable. The repository file system needs to be large enough to hold all regularly used files. If the file system is too small and cannot hold all regularly needed files (**), HSM is permanently recalling requested files, getting beyond the high-threshold limit, migrating other files to get below the low-threshold limit and so on.&lt;br /&gt;
&lt;br /&gt;
===Deployment at SciNet===&lt;br /&gt;
HSM is performed by a dedicated IBM software made up of a number of HSM daemons running on '''datamover2'''. These daemons constantly monitor the usage of the '''/repository''' GPFS and, depending on a predefined set of policies, data may be automatically or manually migrated to the Tivoli Storage Management server (TSM), and kept on our library of LTO-4 tapes.&lt;br /&gt;
&lt;br /&gt;
'''/repository is a 15TB &amp;quot;transient&amp;quot; location''' accessible only from datamover2. Users may relocate data as required from /scratch or /project to /repository in a number of ways, such as copy, move, tar or rsync. &amp;quot;Transient&amp;quot; refers to the fact that /repository works like a &amp;quot;Black Hole&amp;quot;: in the background '''it is constantly being emptied''', even while you relocate data in from other file systems. What is left behind is the directory tree with the stub files (0 byte in size at SciNet) and the metadata associated with it, which takes up about 1-2%. But, even if /repository is at 1% to start with, we ask that you please do not initiate a relocation of more than a 10TB chunk at once, so that the system has time to process your data and still allow other user(s) to migrate/recall some material before reaching 100% full. &lt;br /&gt;
&lt;br /&gt;
Inside /repository, data is segregated on a per group basis, just as in /project. Within groups, users and group supervisors can structure materials anyway they prefer. But the recommendation is that those involved spend some time designing that structure ahead of time, since you may merge data from project and/or scratch (or even home). In tests we performed, we've been able to reorganize the FS structure after migration, change the name and ownership of directories and stubs, and still recall files under the new path and ownership. HSM does seem to keep a very symbiotic relation between the metadata and the inode attributes at the file system level, without necessarily having to replicate these changes with tape recall &amp;amp; migration operations. But please don't abuse this flexibility. If possible, keep your initial layout structure somewhat fixed over time.&lt;br /&gt;
&lt;br /&gt;
We also recommend that users bundle files in tar-balls of at least 10GB before relocation, and keep a listing of those files somewhere; in fact you may use the 'tar' command to create the tar-ball directly in /repository on-the-fly. See examples below:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -czvf /repository/[group]/[user]/myproject1.tar.gz /project/[group]/[user]/project1/ &amp;gt; /project/[group]/[user]/myproject1-repository-listing.txt&lt;br /&gt;
&lt;br /&gt;
or&lt;br /&gt;
&lt;br /&gt;
tar -czvf /repository/[group]/[user]/myscratch1.tar.gz /scratch/[user]/scratchdata1/ &amp;gt; /home/[user]/myscratch1-repository-listing.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It is important to keep a listing of the files that are in each tar on a partition other than the HSM repository so that you can quickly decide which tar you need to recover. While the tar stub will always exist on the HSM disk, you will not be able to run tar --list on the stub without recalling the full tar file back from tape to disk. The redirection in the examples above accomplishes this.&lt;br /&gt;
&lt;br /&gt;
The important is to avoid the relocation of many thousands (or millions) of small files. It's very demanding on the system to constantly scan/reconcile all these files on the file system, tapes, metadata and database. A good reference is '''average file size &amp;gt; 100MB''' in /repository. Deep directory nesting in general also increases the time required to traverse a file system and thus should be avoided where possible.&lt;br /&gt;
&lt;br /&gt;
===Performance===&lt;br /&gt;
Unlike /project or /scratch, /repository is only a 2 tier disk raid, so don't expect transfer rates much higher than 60MB/sec on a rsync session for example. In another words, a 10TB offload operation will typically take 2 days to complete, if made up of large files. On the other hand, we have conducted experiments where we migrated only 1TB, but with 1 million files expanded, and that took nearly a day! This is a situation should to be avoided for many TeraBytes. Performance is as much a function of the number of files as the amount of data&lt;br /&gt;
&lt;br /&gt;
As for the &amp;quot;ideal tar-ball size&amp;quot;, experiments have shown that an isolated 10GB tar-ball typically takes 10-15 minutes to be pulled back, considering all tape operations involved. That seems like a reasonable amount of time to wait for a group of files kept off-line for an extended period of time. Also consider that pulling back an individual tiny file could still take as long as 5-8 minutes. So, it's pretty clear that you get the best pay for the buck by tar'ing your material, and you won't tie up the tape system for too long. As for the upper limit, you can probably bundle files in 100-500GB tar-balls, provided that you're OK with waiting a couple of hours for them to be recalled at a later date; at least from SciNet's perceptive, it would be a very efficient migration.&lt;br /&gt;
&lt;br /&gt;
'''Please be sure to contact us to schedule your transfers IN or OUT of the system, to avoid conflict with other users or within the system settings.''' For instance, if you recall large amounts of data at once, let's say 7.5TB (about half of /repository), we would have to adjust the high threshold accordingly for that period (to 50%), so we don't induce the never ending migrate/recall issues (**) described on the ''Optimal environment''.&lt;br /&gt;
&lt;br /&gt;
===How to migrate/recall data===&lt;br /&gt;
&lt;br /&gt;
====Automatic====&lt;br /&gt;
We currently setup /repository with '''High and Low thresholds of 2% and 1% respectively'''. That means, at regular intervals the file system is monitored to determine if the 2% usage mark has been reached or surpassed. In that case, data is automatically migrated to tapes, oldest (or largest) first, until the file system is down to 1%, if possible (metadata is not migrated). Since data may be copied/moved/rsync'ed/tar'ed in faster than /repository can be emptied, you may observe 80-90% disk usages sporadically (hence the 10TB chunk of data limit). For now at SciNet we migrate every file in /repository to tapes.&lt;br /&gt;
&lt;br /&gt;
To recall a file automatically all you have to do is '''access''' it. There are many ways you can do this. For example, you may view a file with 'cat', 'more', 'vi/vim', etc. You may also copy the file (or directory) from /repository to another location. '''Please be patient:''' the file will have to be pulled back from tape, and this will take some time, longer if it happens to be at the end of a tape.&lt;br /&gt;
&lt;br /&gt;
====Selective====&lt;br /&gt;
&lt;br /&gt;
Used to overwrite the internal priority of HSM (oldest/largest) or to migrate files/directories &amp;quot;immediately&amp;quot;. The recommendation is to '''not wait''' for the automatic migration cycle to kick in, since this could take some 6 to 12 hours at SciNet. If you already know that you relocated material to repository with the intention of having it migrated to tapes, you can just use dsmmigrate as soon as the rsync to repository has finished, for instance.&lt;br /&gt;
&lt;br /&gt;
(files won't be migrated until they have &amp;quot;aged&amp;quot; for at least 5 minutes, that is, after their last access/modification time)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
dsmmigrate [path to FILE]&lt;br /&gt;
dsmmigrate -R -D /repository/[group]/[user]/[directory]&lt;br /&gt;
dsmmigrate /repository/scinet/pinto/blahblahblah.tar.Z&lt;br /&gt;
or&lt;br /&gt;
cd /repository/scinet/pinto/&lt;br /&gt;
dsmmigrate blahblahblah.tar.Z&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To selectively recall data, just type:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
dsmrecall [path to FILE]&lt;br /&gt;
dsmrecall -R -D /repository/[group]/[user]/[directory]&lt;br /&gt;
dsmrecall /repository/scinet/pinto/blahblahblah.tar.Z&lt;br /&gt;
or&lt;br /&gt;
cd /repository/scinet/pinto/&lt;br /&gt;
dsmrecall blahblahblah.tar.Z&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Note:''' We've been finding that the search for new candidates for automatic migration takes much longer once repository is already full of files/stubs. That is to be expected, hence the recommendation to not wait and proceed with the selective migration of your own files/directories asap.&lt;br /&gt;
&lt;br /&gt;
===Disaster Recovery===&lt;br /&gt;
&lt;br /&gt;
As with any disk based storage, although it's a raid 5 file system, repository is not immune to failures. We do not do regular backups, but it's possible to do a full recovery in case of catastrophic loss of repository. '''For that it's important that all files have been completely migrated to tapes''' before hand. That puts the onus on users to ensure this migration is indeed finished (with selective migration) for the relocated material before they delete the originals from /project or /scratch.&lt;br /&gt;
&lt;br /&gt;
===Common HSM commands===&lt;br /&gt;
Some traditional unix/linux commands, such as 'ls' or 'rm' for instance, will work with the stub file as the real files. But others, such as 'du' or 'df', you better use a HSM equivalent, which will give you more meaningful information in the context of HSM. They only work inside /repository. Some of them will be executable only by root, such as 'dsmrm', in which case you'll be notified.&lt;br /&gt;
&lt;br /&gt;
===='''dsmls'''====&lt;br /&gt;
to check status of files; used in the directory where you expect to have migrated files&lt;br /&gt;
&lt;br /&gt;
'''r''': ''resident''    (the file is on repository only)&lt;br /&gt;
&lt;br /&gt;
'''m''': ''migrated''    (only the stub of the file is on repository)&lt;br /&gt;
&lt;br /&gt;
'''p''': ''premigrated'' (the file is on repository and on tape)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: dsmls [-Noheader] [-Recursive] [-Help] [file specs|-FIlelist=file]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc-logindm02-$ dsmls -R a3&lt;br /&gt;
IBM Tivoli Storage Manager&lt;br /&gt;
Command Line Space Management Client Interface&lt;br /&gt;
  Client Version 6, Release 1, Level 0.0  &lt;br /&gt;
  Client date/time: 07/27/2010 12:06:36&lt;br /&gt;
(c) Copyright by IBM Corporation and other(s) 1990, 2009. All Rights Reserved.&lt;br /&gt;
&lt;br /&gt;
      Actual     Resident     Resident  File   File&lt;br /&gt;
        Size         Size     Blk (KB)  State  Name&lt;br /&gt;
       &amp;lt;dir&amp;gt;         8192            8   -      a3/&lt;br /&gt;
&lt;br /&gt;
/repository/scinet/pinto/a3:&lt;br /&gt;
 34008432640            0            0   m      32G-1&lt;br /&gt;
 34008432640  34008432640            0   r      32G-2&lt;br /&gt;
 34008432640  34008432640            0   p      32G-3&lt;br /&gt;
           0            0            0   r      dsmerror.log&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===='''dsmdu'''==== &lt;br /&gt;
disk usage on the original files/directory&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: dsmdu [-Allfiles] [-Summary] [-Help] [directory names]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===='''dsmdf'''====&lt;br /&gt;
disk free on the HSM file system.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: dsmdf [-Help] [-Detail] [file systems]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===='''dsmmigrate'''====&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: dsmmigrate [-Recursive] [-Premigrate] [-Detail] [-Help] filespecs|-FIlelist=file &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===='''dsmrecall'''====&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: dsmrecall [-Recursive] [-Detail] [-Help] file specs|-FIlelist=file&lt;br /&gt;
   or  dsmrecall [-Detail] -offset=XXXX[kmgKMG] -size=XXXX[kmgKMG] file specs &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To have an idea of what HSM is doing on datamover2 at a given time:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[pinto@gpc-logindm02 ~]$ ps -def | grep dsm | grep -v mmfs&lt;br /&gt;
&lt;br /&gt;
root      2455 15190  0 16:26 ?        00:00:00 dsmmonitord&lt;br /&gt;
root      2456  2455  2 16:26 ?        00:05:38 dsmautomig -2 system::/repository&lt;br /&gt;
pinto    10997 10637 30 16:40 pts/3    01:14:20 dsmmigrate -R -D pinto&lt;br /&gt;
root     12857     1  0 16:15 ?        00:00:00 dsmrecalld&lt;br /&gt;
root     13013 12857  0 16:15 ?        00:00:01 dsmrecalld&lt;br /&gt;
root     13015 12857  0 16:15 ?        00:00:00 dsmrecalld&lt;br /&gt;
root     15190     1  0 16:15 ?        00:00:00 dsmmonitord&lt;br /&gt;
root     16936     1  3 16:15 ?        00:10:44 dsmscoutd&lt;br /&gt;
root     17217     1 13 16:16 ?        00:36:49 dsmrootd&lt;br /&gt;
root     18732  2456  4 17:51 ?        00:07:19 dsmautomig -2 system::/repository&lt;br /&gt;
root     18737  2456  0 17:51 ?        00:00:26 dsmautomig -2 system::/repository&lt;br /&gt;
pinto    24533 10363  0 20:48 pts/2    00:00:00 grep dsm&lt;br /&gt;
root     25090     1  0 06:42 ?        00:00:08 dsmwatchd nodetach&lt;br /&gt;
root     30840 13013  0 17:15 ?        00:00:02 dsmrecalld&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the above example, dsmmonitord, dsmrecalld, dsmscoutd, dsmrootd and dsmwatchd are the 5 typical HSM daemons, and they always running. In addition, there are 3 streams of dsmautomig (triggered by threshold migration) and 1 stream of dsmmigrate (selective migration initiated by user pinto).&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Data_Management&amp;diff=2320</id>
		<title>Data Management</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Data_Management&amp;diff=2320"/>
		<updated>2010-12-12T16:51:38Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* Deployment at SciNet */  quick description of why to keep a listing of files inside tars on a partition outside of HSM&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Storage Space==&lt;br /&gt;
SciNet's storage system is based on IBM's [http://en.wikipedia.org/wiki/IBM_General_Parallel_File_System GPFS] (General Parallel File System).   There are two main systems for user data: &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt;, a small, backed-up space where user home directories are located, and &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt;, a large system for input or output data for jobs; data on &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt; is not only not backed up (a third storage system, /project, exist only for groups with LRAC/NRAC allocations). Data placed on scratch will be deleted if it has not been accessed in 3 months.  SciNet does not provide long-term storage for large data sets.  &lt;br /&gt;
&lt;br /&gt;
===Overview of the different file systems===&lt;br /&gt;
&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! {{Hl2}} | file system &lt;br /&gt;
! {{Hl2}} | purpose &lt;br /&gt;
! {{Hl2}} | quota &lt;br /&gt;
! {{Hl2}} | block size &lt;br /&gt;
! {{Hl2}} | backed up&lt;br /&gt;
! {{Hl2}} | purged&lt;br /&gt;
! {{Hl2}} | access &lt;br /&gt;
|- &lt;br /&gt;
| /home&lt;br /&gt;
| development&lt;br /&gt;
| 10 GB&lt;br /&gt;
| 256 KB&lt;br /&gt;
| yes&lt;br /&gt;
| never&lt;br /&gt;
| read-only on compute nodes (r/w on login, devel and datamover1) &lt;br /&gt;
|- &lt;br /&gt;
| /scratch&lt;br /&gt;
| computation&lt;br /&gt;
| 20 TB&lt;br /&gt;
| 4 MB&lt;br /&gt;
| no&lt;br /&gt;
| files &amp;gt; 3 month&lt;br /&gt;
| read/write on all nodes&lt;br /&gt;
|- &lt;br /&gt;
| /project&lt;br /&gt;
| computation&lt;br /&gt;
| by allocation&lt;br /&gt;
| 256 KB&lt;br /&gt;
| no&lt;br /&gt;
| never&lt;br /&gt;
| read/write on all nodes&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Home Disk Space===&lt;br /&gt;
&lt;br /&gt;
Every SciNet user gets a 10GB directory on &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; which is regularly backed-up.   Home is visible from &amp;lt;tt&amp;gt;login.scinet&amp;lt;/tt&amp;gt; nodes, and from the development nodes on [[GPC_Quickstart | GPC]] and the [[TCS_Quickstart | TCS]].  However, on the compute nodes of the GPC clusters -- as when jobs are running -- &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; is mounted '''''read-only'''''; thus GPC jobs can read files in /home but cannot write to files there.   &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; is a good place to put code, input files for runs, and anything else that needs to be kept to reproduce runs.  On the other hand, &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; is not a good place to put many small files, since&lt;br /&gt;
the block size for the file system is 256KB, so you would quickly run out of disk quota and you will make the backup system very slow.&lt;br /&gt;
&lt;br /&gt;
If your application absolutely insists on writing material to your home account and you can't find a way to instruct it to write somewhere else, an alternative is to create a link pointing from your account under /home to a location under /scratch.&lt;br /&gt;
&lt;br /&gt;
===Scratch Disk Space===&lt;br /&gt;
&lt;br /&gt;
Every SciNet user also gets a directory in &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt;.   Scratch is visible from the &amp;lt;tt&amp;gt;login.scinet&amp;lt;/tt&amp;gt; nodes,  the development nodes on [[GPC_Quickstart | GPC]] and the [[TCS_Quickstart | TCS]], and on the compute nodes of the clusters, mounted as read-write.   Thus jobs would normally write their output somewhere in &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt;.  There are '''NO''' backups of anything on &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
There is a large amount of space available on &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt; but it is purged routinely so that all users running jobs and generating large outputs will have room to store their data temporarily.  Computational results which you want to keep longer than this must be copied (using &amp;lt;tt&amp;gt;scp&amp;lt;/tt&amp;gt;) off of SciNet entirely and to your local system.   SciNet does not routinely provide long-term storage for large data sets.&lt;br /&gt;
&lt;br /&gt;
===Scratch Disk Purging Policy===&lt;br /&gt;
&lt;br /&gt;
In order to ensure that there is always significant space available for running jobs '''we automatically delete files in /scratch that have not been accessed for more than 3 months by the actual deletion day on the 15th of each month'''. This policy is subject to revision depending on its effectiveness. More details about the purging process and how users can check if their files will be deleted follows. If you have files scheduled for deletion you should move them to a more permanent locations such as your departmental server or your /project space (for PIs who have either been allocated disk space by the LRAC or have bought diskspace).&lt;br /&gt;
&lt;br /&gt;
On the '''first''' of each month, a list of files scheduled for purging is produced, and an email notification is sent to each user on that list. Furthermore, at/or about the '''12th''' of each month a 2nd scan produces a more current assessment and another email notification is sent. This way users can double check that they have indeed taken care of all the files they needed to relocate before the purging deadline. Those files will be automatically deleted on the '''15th''' of the same month unless they have been accessed or relocated in the interim. If you have files scheduled for deletion then they will be listed in a file in /scratch/todelete/current, which has your userid and groupid in the filename. For example, if user xxyz wants to check if they have files scheduled for deletion they can issue the following command on a system which mounts /scratch (e.g. a scinet login node): '''ls -l1 /scratch/todelete/current |grep xxyz'''. In the example below, the name of this file indicates that user xxyz is part of group abc, has 9,560 files scheduled for deletion and they take up 1.0TB of space:&lt;br /&gt;
&lt;br /&gt;
 [xxyz@scinet04 ~]$ ls -l1 /scratch/todelete/current |grep xxyz&lt;br /&gt;
 -rw-r----- 1 xxyz     root       1733059 Jan 12 11:46 10001___xxyz_______abc_________1.00T_____9560files&lt;br /&gt;
&lt;br /&gt;
The file itself contains a list of all files scheduled for deletion (in the last column) and can be viewed with standard commands like more/less/cat - e.g. '''more /scratch/todelete/current/10001___xxyz_______abc_________1.00T_____9560files'''&lt;br /&gt;
&lt;br /&gt;
Similarly, you can also verify all other users on your group by using the ls command with grep on your group. For example: '''ls -l1 /scratch/todelete/current |grep abc'''. That will list all other users in the same group that xxyz is part of, and have files to be purged on the 15th. Members of the same group have access to each other's contents.&lt;br /&gt;
&lt;br /&gt;
'''NOTE:''' Preparing these assessments takes several hours. If you change the access/modification time of a file in the interim, that will not be detected until the next cycle. A way for you to get immediate feedback is to use the ''''ls -lu'''' command on the file. If the file atime has been updated, coming the purging date on the 15th it will not be deleted any longer.&lt;br /&gt;
&lt;br /&gt;
===Project Disk Space===&lt;br /&gt;
&lt;br /&gt;
Investigators who have been granted allocations through the [http://www.scinet.utoronto.ca/resources/Account_Allocations.htm LRAC/NRAC Application Process] may have been allocated disk space in addition to compute time.   For the period of time that the allocation is granted, they will have disk space on the &amp;lt;tt&amp;gt;/project&amp;lt;/tt&amp;gt; disk system.  Space on the project systems are not purged, but neither are they backed up.   All members of the investigators groups will have access to these systems, which will be mounted read/write everywhere.&lt;br /&gt;
&lt;br /&gt;
===How much Disk Space Do I have left?===&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;'''/scinet/gpc/bin/diskUsage'''&amp;lt;/tt&amp;gt; command, available on the login nodes, datamovers and the GPC devel nodes, provides information in a number of ways on the home, scratch, and project file systems. For instance, how much disk space is being used by yourself and your group (with the -a option), or how much your usage has changed over a certain period (&amp;quot;delta information&amp;quot;) or you may generate plots of your usage over time. Please see the usage help below for more details.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: diskUsage [-h|-?| [-a] [-u &amp;lt;user&amp;gt;] [-de|-plot]&lt;br /&gt;
       -h|-?: help&lt;br /&gt;
       -a: list usages of all members on the group&lt;br /&gt;
       -u &amp;lt;user&amp;gt;: as another user on your group&lt;br /&gt;
       -de: include delta information&lt;br /&gt;
       -plot: create plots of disk usages&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note that information on usage and quota is only updated hourly!&lt;br /&gt;
&lt;br /&gt;
===Performance===&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/IBM_General_Parallel_File_System GPFS] is a high-performance filesystem which provides rapid reads and writes to large datasets in parallel from many nodes.  As a consequence of this design, however, '''the file system performs quite ''poorly'' at accessing data sets which consist of many, small files.'''  For instance, you will find that reading data in from one 4MB file is enormously faster than from 100 40KB files.   Such small files are also quite wasteful of space, as the blocksize for the filesystem is 4MB.   This is something you should keep in mind when planning your input/output strategy for runs on SciNet.&lt;br /&gt;
&lt;br /&gt;
For instance, if you run multi-process jobs, having each process write to a file of its own is not an scalable I/O solution. A directory gets locked by the first process accessing it, so all other processes have to wait for it. Not only has the code just become considerably less parallel, chances are the file system will have a time-out while waiting for your other processes, leading your program to crash mysteriously.&lt;br /&gt;
Consider using MPI-IO (part of the MPI-2 standard), which allows files to be opened simultaneously&lt;br /&gt;
by different processes, or using a dedicated process for I/O to which all other processes send their&lt;br /&gt;
data, and which subsequently writes this data to a single file.&lt;br /&gt;
&lt;br /&gt;
===Local Disk===&lt;br /&gt;
&lt;br /&gt;
The compute nodes on the GPC '''do not contain hard drives''' so there is no local disk available to use during your computation.  You can however use part of a compute nodes RAM like a local disk ('ramdisk') but this will reduce how much memory is available for your&lt;br /&gt;
program.  This can be accessed using &amp;lt;tt&amp;gt;/dev/shm/&amp;lt;/tt&amp;gt; and is currently set to 8GB.  Anything written&lt;br /&gt;
to this location that you want to keep must be copied back to the &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt; filesystem as &amp;lt;tt&amp;gt;/dev/shm&amp;lt;/tt&amp;gt; is wiped after each job and since it is in memory will not survive through a reboot of the node. More on ramdisk usage can be found [[User_Ramdisk | here]].&lt;br /&gt;
&lt;br /&gt;
Note that the absense of hard drives also means that the nodes cannot swap memory, so be sure that your computation fits within memory.&lt;br /&gt;
&lt;br /&gt;
==Data Transfer==&lt;br /&gt;
{{:Data_Transfer}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==File/Ownership Management (ACL)==&lt;br /&gt;
* By default, at SciNet, users within the same group have read permission to each other's files (not write)&lt;br /&gt;
* You may use access control list ('''ACL''') to allow your supervisor (or another user within your group) to manage files for you (i.e., create, move, rename, delete), while still retaining your access and permission as the original owner of the files/directories.&lt;br /&gt;
* '''NOTE''': We highly recommend that you never give write permission to other users on the top level of your home directory (/home/[owner]), since that would seriously compromise your privacy, in addition to disable ssh key authentication, among other things. If necessary, make specific sub-directories under your home directory so that other users can manipulate/access files from those.&lt;br /&gt;
* If you need to set up permissions across groups [mailto:support@scinet.utoronto.ca contact us] (and the other group's supervisor!).&lt;br /&gt;
&lt;br /&gt;
===Using  setfacl/getfacl===&lt;br /&gt;
* To allow [supervisor] to manage files in /project/group/[owner] using '''setfacl''' and '''getfacl''' commands, follow the 3-steps below as the [owner] account from a shell:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1) $ /scinet/gpc/bin/setfacl -d -m user:[supervisor]:rwx /project/group/[owner]&lt;br /&gt;
   (every *new* file/directory inside [owner] will inherit [supervisor] ownership by default from now on)&lt;br /&gt;
&lt;br /&gt;
2) $ /scinet/gpc/bin/setfacl -d -m user:[owner]:rwx /project/group/[owner]&lt;br /&gt;
   (but will also inherit [owner] ownership, ie, ownership of both by default, for files/directories created by [supervisor])&lt;br /&gt;
&lt;br /&gt;
3) $ /scinet/gpc/bin/setfacl -Rm user:[supervisor]:rwx /project/group/[owner]&lt;br /&gt;
   (recursively modify all *existing* files/directories inside [owner] to also be rwx by [supervisor])&lt;br /&gt;
&lt;br /&gt;
   $ /scinet/gpc/bin/getfacl /project/group/[owner]&lt;br /&gt;
   (to determine the current ACL attributes)&lt;br /&gt;
&lt;br /&gt;
   $ /scinet/gpc/bin/setfacl -b /project/group/[owner]&lt;br /&gt;
   (to remove any previously set ACL)&lt;br /&gt;
&lt;br /&gt;
PS: on the datamovers getfacl, setfacl and chacl will be on your path&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
For more information on using [http://linux.die.net/man/1/setfacl &amp;lt;tt&amp;gt;setfacl&amp;lt;/tt&amp;gt;] or [http://linux.die.net/man/1/getfacl &amp;lt;tt&amp;gt;getfacl&amp;lt;/tt&amp;gt;] see their man pages.&lt;br /&gt;
&lt;br /&gt;
===Using mmputacl/mmgetacl===&lt;br /&gt;
* Alternatively, you may use gpfs' native '''mmputacl''' and '''mmgetacl''' commands. The advantages are that you can set &amp;quot;control&amp;quot; permission and that [http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.gpfs.doc%2Fgpfs31%2Fbl1adm1160.html POSIX or NFS v4 style ACL] are supported. You will need first to create a /tmp/supervisor.acl file with the following contents:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user::rwxc&lt;br /&gt;
group::----&lt;br /&gt;
other::----&lt;br /&gt;
mask::rwxc&lt;br /&gt;
user:[owner]:rwxc&lt;br /&gt;
user:[supervisor]:rwxc&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then issue the following 2 commands:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1) $ mmputacl -i /tmp/supervisor.acl /project/group/[owner]&lt;br /&gt;
2) $ mmputacl -d -i /tmp/supervisor.acl /project/group/[owner]&lt;br /&gt;
   (every *new* file/directory inside [owner] will inherit [supervisor] ownership by default as well as &lt;br /&gt;
   [owner] ownership, ie, ownership of both by default, for files/directories created by [supervisor])&lt;br /&gt;
&lt;br /&gt;
   $ mmgetacl /project/group/[owner]&lt;br /&gt;
   (to determine the current ACL attributes)&lt;br /&gt;
&lt;br /&gt;
   $ mmdelacl -d /project/group/[owner]&lt;br /&gt;
   (to remove any previously set ACL)&lt;br /&gt;
&lt;br /&gt;
   $ mmeditacl /project/group/[owner]&lt;br /&gt;
   (to create or change a GPFS access control list)&lt;br /&gt;
   (for this command to work set the EDITOR environment variable: export EDITOR=/usr/bin/vi)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is no option to recursively add or remove ACL attributes using a gpfs built-in command. You'll need to use the -i option as above for each file or directory individually. [[Data_Management#bash_script_that_you_may_adapt_to_recursively_add_or_remove_ACL_attributes_using_gpfs_built-in_commands | Here is a sample bash script you may use for that purpose]]&lt;br /&gt;
&lt;br /&gt;
For more information on using [http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.gpfs.doc%2Fgpfs31%2Fbl1adm11120.html &amp;lt;tt&amp;gt;mmputacl&amp;lt;/tt&amp;gt;] or [http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.gpfs.doc%2Fgpfs31%2Fbl1adm11120.html &amp;lt;tt&amp;gt;mmgetaclacl&amp;lt;/tt&amp;gt;] see their man pages.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Appendix===&lt;br /&gt;
====bash script that you may adapt to recursively add or remove ACL attributes using gpfs built-in commands====&lt;br /&gt;
Courtesy of Agata Disks (http://csngwinfo.in2p3.fr/mediawiki/index.php/GPFS_ACL)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# USAGE&lt;br /&gt;
#     - on one directory:     ./set_acl.sh dir_name&lt;br /&gt;
#     - on more directories:  ./set_acl.sh 'dir_nam*'&lt;br /&gt;
#&lt;br /&gt;
&lt;br /&gt;
# Path of the file that contains the ACL&lt;br /&gt;
ACL_FILE_PATH=/agatadisks/data/acl_file.acl&lt;br /&gt;
&lt;br /&gt;
# Directories onto the ACLs have to be set&lt;br /&gt;
dirs=$1&lt;br /&gt;
&lt;br /&gt;
# Recursive function that sets ACL to files and directories&lt;br /&gt;
set_acl () {&lt;br /&gt;
  curr_dir=$1&lt;br /&gt;
  for args in $curr_dir/*&lt;br /&gt;
  do&lt;br /&gt;
    if [ -f $args ]; then&lt;br /&gt;
      echo &amp;quot;ACL set on file $args&amp;quot;&lt;br /&gt;
      mmputacl -i $ACL_FILE_PATH $args&lt;br /&gt;
      if [ $? -ne 0 ]; then&lt;br /&gt;
        echo &amp;quot;ERROR: ACL not set on $args&amp;quot;&lt;br /&gt;
        exit -1&lt;br /&gt;
      fi&lt;br /&gt;
    fi&lt;br /&gt;
    if [ -d $args ]; then&lt;br /&gt;
      # Set Default ACL in directory&lt;br /&gt;
      mmputacl -i $ACL_FILE_PATH $args -d&lt;br /&gt;
      if [ $? -ne 0 ]; then&lt;br /&gt;
        echo &amp;quot;ERROR: Default ACL not set on $args&amp;quot;&lt;br /&gt;
        exit -1&lt;br /&gt;
      fi&lt;br /&gt;
      echo &amp;quot;Default ACL set on directory $args&amp;quot;&lt;br /&gt;
      # Set ACL in directory&lt;br /&gt;
      mmputacl -i $ACL_FILE_PATH $args&lt;br /&gt;
      if [ $? -ne 0 ]; then&lt;br /&gt;
        echo &amp;quot;ERROR: ACL not set on $args&amp;quot;&lt;br /&gt;
        exit -1&lt;br /&gt;
      fi&lt;br /&gt;
      echo &amp;quot;ACL set on directory $args&amp;quot;&lt;br /&gt;
      set_acl $args&lt;br /&gt;
    fi&lt;br /&gt;
  done&lt;br /&gt;
}&lt;br /&gt;
for dir in $dirs&lt;br /&gt;
do&lt;br /&gt;
  if [ ! -d $dir ]; then&lt;br /&gt;
    echo &amp;quot;ERROR: $dir is not a directory&amp;quot;&lt;br /&gt;
    exit -1&lt;br /&gt;
  fi&lt;br /&gt;
  set_acl $dir&lt;br /&gt;
done&lt;br /&gt;
exit 0&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Hierarchical Storage Management (HSM)==&lt;br /&gt;
'''(a pilot project is starting in July/2010 with a select group of users)'''&lt;br /&gt;
&lt;br /&gt;
===Basic Concepts===&lt;br /&gt;
'''Hierarchical Storage Management (HSM)''' is a data storage technique which automatically moves data between high-cost and low-cost storage media. HSM systems exist because high-speed storage devices, such as hard disk drive arrays, are more expensive (per byte stored) than slower devices, such as optical discs and magnetic tape drives. While it would be ideal to have all data available on high-speed devices all the time, this is prohibitively expensive for many organizations. Instead, HSM systems store the bulk of the enterprise's data on slower devices, and then copy data to faster disk drives when needed. In effect, HSM turns the fast disk drives into caches for the slower mass storage devices. The HSM system monitors the way data is used and makes best guesses as to which data can safely be moved to slower devices and which data should stay on the fast devices.&lt;br /&gt;
&lt;br /&gt;
In a typical HSM scenario, data files which are frequently used are stored on disk drives, but are eventually migrated to tape if they are not used for a certain period of time, typically a few months. If a user does reuse a file which is on tape, it is automatically moved back to disk storage. The advantage is that the total amount of stored data can be much larger than the capacity of the disk storage available, but since only rarely-used files are on tape, most users will not notice any slowdown.&lt;br /&gt;
&lt;br /&gt;
The HSM client provides both ''automatic'' and ''selective migration''. Once file ''migration'' begins, the HSM client sends a copy of your file to storage volumes on disk devices or devices that support removable media, such as tape and replaces the original file with a ''stub file'' on HSM managed file system (aka ''repository'' at SciNet)&lt;br /&gt;
&lt;br /&gt;
'''Repository''' commonly refers to a location for long-term storage, often for safety or preservation. &lt;br /&gt;
&lt;br /&gt;
'''Migration''', in the context of HSM, refers to set of actions that move files from the front-end disk based repository to a back-end tape library system (often invisible or inaccessible to users)&lt;br /&gt;
&lt;br /&gt;
'''Relocation''', in the context of SciNet, refers to the use of unix commands such as copy, move, tar or rsync to get data into the repository.&lt;br /&gt;
&lt;br /&gt;
'''The stub file''' is a small replacement file that makes it appear as though the original file is on the repository. It contains required metadata information to locate and recall a migrated file and to respond to specific UNIX commands without recalling the file.&lt;br /&gt;
&lt;br /&gt;
'''Automatic migration''' periodically monitors space usage and automatically migrates eligible files according to the options and settings that have been selected. The HSM client provides two types of automatic migration: ''threshold migration'' and ''demand migration''.&lt;br /&gt;
&lt;br /&gt;
'''Threshold migration''' maintains a specific level of free space on the repository file system. When disk usage reaches the high threshold percentage, eligible files are migrated to tapes automatically. When space usage drops to the low threshold set for the file system, file migration stops.&lt;br /&gt;
&lt;br /&gt;
'''Demand migration''' responds to an out-of-space condition on the repository file system. Demand migration starts automatically if the file system runs out of space (usually triggered at 90%). For HSM, as files are migrated (oldest/largest first), space becomes available on the file system, and the process or event that caused the out-of-space condition can be resumed.&lt;br /&gt;
&lt;br /&gt;
'''Selective migration''' often an user given HSM command, that migrates specific files from the repository at will, in anticipation of the automatic migration, or independently of the system wide eligibility criteria. For example, if you know that you will not be using a particular group of files for an extended time, you can migrate them, so as to free additional space on the repository.&lt;br /&gt;
&lt;br /&gt;
'''Reclamation''' is the process of reclaiming unused space on a tape (applies to Virtual Tapes as well). Over time, as files/directories get deleted or updated on the repository, a process will expire old data, creating gaps of unused storage on the tapes. Since tapes are sequential media, typical tape handling software can only write data to the end of the tape, so these gaps of “Empty Space” cannot be used. The process entails periodically and in a rolling fashion copying active data from the &amp;quot;Swiss Cheese&amp;quot; like tapes to unused tapes on a compacted form.&lt;br /&gt;
&lt;br /&gt;
'''Optimal environment:''' HSM should be used in an environment where the old and large files which need to be preserved are not used regularly. Files that are needed frequently should not be migrated at all, otherwise HSM would act as a cache, by migrating files and shortly after the migration the same files will be recalled. This is not advisable. The repository file system needs to be large enough to hold all regularly used files. If the file system is too small and cannot hold all regularly needed files (**), HSM is permanently recalling requested files, getting beyond the high-threshold limit, migrating other files to get below the low-threshold limit and so on.&lt;br /&gt;
&lt;br /&gt;
===Deployment at SciNet===&lt;br /&gt;
HSM is performed by a dedicated IBM software made up of a number of HSM daemons running on '''datamover2'''. These daemons constantly monitor the usage of the '''/repository''' GPFS and, depending on a predefined set of policies, data may be automatically or manually migrated to the Tivoli Storage Management server (TSM), and kept on our library of LTO-4 tapes.&lt;br /&gt;
&lt;br /&gt;
'''/repository is a 15TB &amp;quot;transient&amp;quot; location''' accessible only from datamover2. Users may relocate data as required from /scratch or /project to /repository in a number of ways, such as copy, move, tar or rsync. &amp;quot;Transient&amp;quot; refers to the fact that /repository works like a &amp;quot;Black Hole&amp;quot;: in the background '''it is constantly being emptied''', even while you relocate data in from other file systems. What is left behind is the directory tree with the stub files (0 byte in size at SciNet) and the metadata associated with it, which takes up about 1-2%. But, even if /repository is at 1% to start with, we ask that you please do not initiate a relocation of more than a 10TB chunk at once, so that the system has time to process your data and still allow other user(s) to migrate/recall some material before reaching 100% full. &lt;br /&gt;
&lt;br /&gt;
Inside /repository, data is segregated on a per group basis, just as in /project. Within groups, users and group supervisors can structure materials anyway they prefer. But the recommendation is that those involved spend some time designing that structure ahead of time, since you may merge data from project and/or scratch (or even home). In tests we performed, we've been able to reorganize the FS structure after migration, change the name and ownership of directories and stubs, and still recall files under the new path and ownership. HSM does seem to keep a very symbiotic relation between the metadata and the inode attributes at the file system level, without necessarily having to replicate these changes with tape recall &amp;amp; migration operations. But please don't abuse this flexibility. If possible, keep your initial layout structure somewhat fixed over time.&lt;br /&gt;
&lt;br /&gt;
We also recommend that users bundle files in tar-balls of at least 10GB before relocation, and keep a listing of those files somewhere; in fact you may use the 'tar' command to create the tar-ball directly in /repository on-the-fly. See examples below:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -czvf /repository/[group]/[user]/myproject1.tar.gz /project/[group]/[user]/project1/ &amp;gt; /project/[group]/[user]/myproject1-repository-listing.txt&lt;br /&gt;
&lt;br /&gt;
or&lt;br /&gt;
&lt;br /&gt;
tar -czvf /repository/[group]/[user]/myscratch1.tar.gz /scratch/[user]/scratchdata1/ &amp;gt; /home/[user]/myscratch1-repository-listing.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It is important to keep a listing of the files that are in each tar on a partition other than the HSM repository so that you can quickly decide which tar you need to recover. While the tar stub will always exist on the HSM disk, you will not be able to run tar --list on the stub without migrating the full tar file back from tape to disk. The redirection in the examples above accomplishes this.&lt;br /&gt;
&lt;br /&gt;
The important is to avoid the relocation of many thousands (or millions) of small files. It's very demanding on the system to constantly scan/reconcile all these files on the file system, tapes, metadata and database. A good reference is '''average file size &amp;gt; 100MB''' in /repository. Deep directory nesting in general also increases the time required to traverse a file system and thus should be avoided where possible.&lt;br /&gt;
&lt;br /&gt;
===Performance===&lt;br /&gt;
Unlike /project or /scratch, /repository is only a 2 tier disk raid, so don't expect transfer rates much higher than 60MB/sec on a rsync session for example. In another words, a 10TB offload operation will typically take 2 days to complete, if made up of large files. On the other hand, we have conducted experiments where we migrated only 1TB, but with 1 million files expanded, and that took nearly a day! This is a situation should to be avoided for many TeraBytes. Performance is as much a function of the number of files as the amount of data&lt;br /&gt;
&lt;br /&gt;
As for the &amp;quot;ideal tar-ball size&amp;quot;, experiments have shown that an isolated 10GB tar-ball typically takes 10-15 minutes to be pulled back, considering all tape operations involved. That seems like a reasonable amount of time to wait for a group of files kept off-line for an extended period of time. Also consider that pulling back an individual tiny file could still take as long as 5-8 minutes. So, it's pretty clear that you get the best pay for the buck by tar'ing your material, and you won't tie up the tape system for too long. As for the upper limit, you can probably bundle files in 100-500GB tar-balls, provided that you're OK with waiting a couple of hours for them to be recalled at a later date; at least from SciNet's perceptive, it would be a very efficient migration.&lt;br /&gt;
&lt;br /&gt;
'''Please be sure to contact us to schedule your transfers IN or OUT of the system, to avoid conflict with other users or within the system settings.''' For instance, if you recall large amounts of data at once, let's say 7.5TB (about half of /repository), we would have to adjust the high threshold accordingly for that period (to 50%), so we don't induce the never ending migrate/recall issues (**) described on the ''Optimal environment''.&lt;br /&gt;
&lt;br /&gt;
===How to migrate/recall data===&lt;br /&gt;
&lt;br /&gt;
====Automatic====&lt;br /&gt;
We currently setup /repository with '''High and Low thresholds of 2% and 1% respectively'''. That means, at regular intervals the file system is monitored to determine if the 2% usage mark has been reached or surpassed. In that case, data is automatically migrated to tapes, oldest (or largest) first, until the file system is down to 1%, if possible (metadata is not migrated). Since data may be copied/moved/rsync'ed/tar'ed in faster than /repository can be emptied, you may observe 80-90% disk usages sporadically (hence the 10TB chunk of data limit). For now at SciNet we migrate every file in /repository to tapes.&lt;br /&gt;
&lt;br /&gt;
To recall a file automatically all you have to do is '''access''' it. There are many ways you can do this. For example, you may view a file with 'cat', 'more', 'vi/vim', etc. You may also copy the file (or directory) from /repository to another location. '''Please be patient:''' the file will have to be pulled back from tape, and this will take some time, longer if it happens to be at the end of a tape.&lt;br /&gt;
&lt;br /&gt;
====Selective====&lt;br /&gt;
&lt;br /&gt;
Used to overwrite the internal priority of HSM (oldest/largest) or to migrate files/directories &amp;quot;immediately&amp;quot;. The recommendation is to '''not wait''' for the automatic migration cycle to kick in, since this could take some 6 to 12 hours at SciNet. If you already know that you relocated material to repository with the intention of having it migrated to tapes, you can just use dsmmigrate as soon as the rsync to repository has finished, for instance.&lt;br /&gt;
&lt;br /&gt;
(files won't be migrated until they have &amp;quot;aged&amp;quot; for at least 5 minutes, that is, after their last access/modification time)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
dsmmigrate [path to FILE]&lt;br /&gt;
dsmmigrate -R -D /repository/[group]/[user]/[directory]&lt;br /&gt;
dsmmigrate /repository/scinet/pinto/blahblahblah.tar.Z&lt;br /&gt;
or&lt;br /&gt;
cd /repository/scinet/pinto/&lt;br /&gt;
dsmmigrate blahblahblah.tar.Z&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To selectively recall data, just type:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
dsmrecall [path to FILE]&lt;br /&gt;
dsmrecall -R -D /repository/[group]/[user]/[directory]&lt;br /&gt;
dsmrecall /repository/scinet/pinto/blahblahblah.tar.Z&lt;br /&gt;
or&lt;br /&gt;
cd /repository/scinet/pinto/&lt;br /&gt;
dsmrecall blahblahblah.tar.Z&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Note:''' We've been finding that the search for new candidates for automatic migration takes much longer once repository is already full of files/stubs. That is to be expected, hence the recommendation to not wait and proceed with the selective migration of your own files/directories asap.&lt;br /&gt;
&lt;br /&gt;
===Disaster Recovery===&lt;br /&gt;
&lt;br /&gt;
As with any disk based storage, although it's a raid 5 file system, repository is not immune to failures. We do not do regular backups, but it's possible to do a full recovery in case of catastrophic loss of repository. '''For that it's important that all files have been completely migrated to tapes''' before hand. That puts the onus on users to ensure this migration is indeed finished (with selective migration) for the relocated material before they delete the originals from /project or /scratch.&lt;br /&gt;
&lt;br /&gt;
===Common HSM commands===&lt;br /&gt;
Some traditional unix/linux commands, such as 'ls' or 'rm' for instance, will work with the stub file as the real files. But others, such as 'du' or 'df', you better use a HSM equivalent, which will give you more meaningful information in the context of HSM. They only work inside /repository. Some of them will be executable only by root, such as 'dsmrm', in which case you'll be notified.&lt;br /&gt;
&lt;br /&gt;
===='''dsmls'''====&lt;br /&gt;
to check status of files; used in the directory where you expect to have migrated files&lt;br /&gt;
&lt;br /&gt;
'''r''': ''resident''    (the file is on repository only)&lt;br /&gt;
&lt;br /&gt;
'''m''': ''migrated''    (only the stub of the file is on repository)&lt;br /&gt;
&lt;br /&gt;
'''p''': ''premigrated'' (the file is on repository and on tape)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: dsmls [-Noheader] [-Recursive] [-Help] [file specs|-FIlelist=file]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc-logindm02-$ dsmls -R a3&lt;br /&gt;
IBM Tivoli Storage Manager&lt;br /&gt;
Command Line Space Management Client Interface&lt;br /&gt;
  Client Version 6, Release 1, Level 0.0  &lt;br /&gt;
  Client date/time: 07/27/2010 12:06:36&lt;br /&gt;
(c) Copyright by IBM Corporation and other(s) 1990, 2009. All Rights Reserved.&lt;br /&gt;
&lt;br /&gt;
      Actual     Resident     Resident  File   File&lt;br /&gt;
        Size         Size     Blk (KB)  State  Name&lt;br /&gt;
       &amp;lt;dir&amp;gt;         8192            8   -      a3/&lt;br /&gt;
&lt;br /&gt;
/repository/scinet/pinto/a3:&lt;br /&gt;
 34008432640            0            0   m      32G-1&lt;br /&gt;
 34008432640  34008432640            0   r      32G-2&lt;br /&gt;
 34008432640  34008432640            0   p      32G-3&lt;br /&gt;
           0            0            0   r      dsmerror.log&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===='''dsmdu'''==== &lt;br /&gt;
disk usage on the original files/directory&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: dsmdu [-Allfiles] [-Summary] [-Help] [directory names]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===='''dsmdf'''====&lt;br /&gt;
disk free on the HSM file system.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: dsmdf [-Help] [-Detail] [file systems]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===='''dsmmigrate'''====&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: dsmmigrate [-Recursive] [-Premigrate] [-Detail] [-Help] filespecs|-FIlelist=file &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===='''dsmrecall'''====&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: dsmrecall [-Recursive] [-Detail] [-Help] file specs|-FIlelist=file&lt;br /&gt;
   or  dsmrecall [-Detail] -offset=XXXX[kmgKMG] -size=XXXX[kmgKMG] file specs &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To have an idea of what HSM is doing on datamover2 at a given time:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[pinto@gpc-logindm02 ~]$ ps -def | grep dsm | grep -v mmfs&lt;br /&gt;
&lt;br /&gt;
root      2455 15190  0 16:26 ?        00:00:00 dsmmonitord&lt;br /&gt;
root      2456  2455  2 16:26 ?        00:05:38 dsmautomig -2 system::/repository&lt;br /&gt;
pinto    10997 10637 30 16:40 pts/3    01:14:20 dsmmigrate -R -D pinto&lt;br /&gt;
root     12857     1  0 16:15 ?        00:00:00 dsmrecalld&lt;br /&gt;
root     13013 12857  0 16:15 ?        00:00:01 dsmrecalld&lt;br /&gt;
root     13015 12857  0 16:15 ?        00:00:00 dsmrecalld&lt;br /&gt;
root     15190     1  0 16:15 ?        00:00:00 dsmmonitord&lt;br /&gt;
root     16936     1  3 16:15 ?        00:10:44 dsmscoutd&lt;br /&gt;
root     17217     1 13 16:16 ?        00:36:49 dsmrootd&lt;br /&gt;
root     18732  2456  4 17:51 ?        00:07:19 dsmautomig -2 system::/repository&lt;br /&gt;
root     18737  2456  0 17:51 ?        00:00:26 dsmautomig -2 system::/repository&lt;br /&gt;
pinto    24533 10363  0 20:48 pts/2    00:00:00 grep dsm&lt;br /&gt;
root     25090     1  0 06:42 ?        00:00:08 dsmwatchd nodetach&lt;br /&gt;
root     30840 13013  0 17:15 ?        00:00:02 dsmrecalld&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the above example, dsmmonitord, dsmrecalld, dsmscoutd, dsmrootd and dsmwatchd are the 5 typical HSM daemons, and they always running. In addition, there are 3 streams of dsmautomig (triggered by threshold migration) and 1 stream of dsmmigrate (selective migration initiated by user pinto).&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Data_Management&amp;diff=2319</id>
		<title>Data Management</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Data_Management&amp;diff=2319"/>
		<updated>2010-12-10T15:25:50Z</updated>

		<summary type="html">&lt;p&gt;Cneale: moved 5.1 to section 3 since it is counterintuitive that the ACL information is split around HSM and got me confused&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Storage Space==&lt;br /&gt;
SciNet's storage system is based on IBM's [http://en.wikipedia.org/wiki/IBM_General_Parallel_File_System GPFS] (General Parallel File System).   There are two main systems for user data: &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt;, a small, backed-up space where user home directories are located, and &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt;, a large system for input or output data for jobs; data on &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt; is not only not backed up (a third storage system, /project, exist only for groups with LRAC/NRAC allocations). Data placed on scratch will be deleted if it has not been accessed in 3 months.  SciNet does not provide long-term storage for large data sets.  &lt;br /&gt;
&lt;br /&gt;
===Overview of the different file systems===&lt;br /&gt;
&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! {{Hl2}} | file system &lt;br /&gt;
! {{Hl2}} | purpose &lt;br /&gt;
! {{Hl2}} | quota &lt;br /&gt;
! {{Hl2}} | block size &lt;br /&gt;
! {{Hl2}} | backed up&lt;br /&gt;
! {{Hl2}} | purged&lt;br /&gt;
! {{Hl2}} | access &lt;br /&gt;
|- &lt;br /&gt;
| /home&lt;br /&gt;
| development&lt;br /&gt;
| 10 GB&lt;br /&gt;
| 256 KB&lt;br /&gt;
| yes&lt;br /&gt;
| never&lt;br /&gt;
| read-only on compute nodes (r/w on login, devel and datamover1) &lt;br /&gt;
|- &lt;br /&gt;
| /scratch&lt;br /&gt;
| computation&lt;br /&gt;
| 20 TB&lt;br /&gt;
| 4 MB&lt;br /&gt;
| no&lt;br /&gt;
| files &amp;gt; 3 month&lt;br /&gt;
| read/write on all nodes&lt;br /&gt;
|- &lt;br /&gt;
| /project&lt;br /&gt;
| computation&lt;br /&gt;
| by allocation&lt;br /&gt;
| 256 KB&lt;br /&gt;
| no&lt;br /&gt;
| never&lt;br /&gt;
| read/write on all nodes&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Home Disk Space===&lt;br /&gt;
&lt;br /&gt;
Every SciNet user gets a 10GB directory on &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; which is regularly backed-up.   Home is visible from &amp;lt;tt&amp;gt;login.scinet&amp;lt;/tt&amp;gt; nodes, and from the development nodes on [[GPC_Quickstart | GPC]] and the [[TCS_Quickstart | TCS]].  However, on the compute nodes of the GPC clusters -- as when jobs are running -- &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; is mounted '''''read-only'''''; thus GPC jobs can read files in /home but cannot write to files there.   &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; is a good place to put code, input files for runs, and anything else that needs to be kept to reproduce runs.  On the other hand, &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; is not a good place to put many small files, since&lt;br /&gt;
the block size for the file system is 256KB, so you would quickly run out of disk quota and you will make the backup system very slow.&lt;br /&gt;
&lt;br /&gt;
If your application absolutely insists on writing material to your home account and you can't find a way to instruct it to write somewhere else, an alternative is to create a link pointing from your account under /home to a location under /scratch.&lt;br /&gt;
&lt;br /&gt;
===Scratch Disk Space===&lt;br /&gt;
&lt;br /&gt;
Every SciNet user also gets a directory in &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt;.   Scratch is visible from the &amp;lt;tt&amp;gt;login.scinet&amp;lt;/tt&amp;gt; nodes,  the development nodes on [[GPC_Quickstart | GPC]] and the [[TCS_Quickstart | TCS]], and on the compute nodes of the clusters, mounted as read-write.   Thus jobs would normally write their output somewhere in &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt;.  There are '''NO''' backups of anything on &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
There is a large amount of space available on &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt; but it is purged routinely so that all users running jobs and generating large outputs will have room to store their data temporarily.  Computational results which you want to keep longer than this must be copied (using &amp;lt;tt&amp;gt;scp&amp;lt;/tt&amp;gt;) off of SciNet entirely and to your local system.   SciNet does not routinely provide long-term storage for large data sets.&lt;br /&gt;
&lt;br /&gt;
===Scratch Disk Purging Policy===&lt;br /&gt;
&lt;br /&gt;
In order to ensure that there is always significant space available for running jobs '''we automatically delete files in /scratch that have not been accessed for more than 3 months by the actual deletion day on the 15th of each month'''. This policy is subject to revision depending on its effectiveness. More details about the purging process and how users can check if their files will be deleted follows. If you have files scheduled for deletion you should move them to a more permanent locations such as your departmental server or your /project space (for PIs who have either been allocated disk space by the LRAC or have bought diskspace).&lt;br /&gt;
&lt;br /&gt;
On the '''first''' of each month, a list of files scheduled for purging is produced, and an email notification is sent to each user on that list. Furthermore, at/or about the '''12th''' of each month a 2nd scan produces a more current assessment and another email notification is sent. This way users can double check that they have indeed taken care of all the files they needed to relocate before the purging deadline. Those files will be automatically deleted on the '''15th''' of the same month unless they have been accessed or relocated in the interim. If you have files scheduled for deletion then they will be listed in a file in /scratch/todelete/current, which has your userid and groupid in the filename. For example, if user xxyz wants to check if they have files scheduled for deletion they can issue the following command on a system which mounts /scratch (e.g. a scinet login node): '''ls -l1 /scratch/todelete/current |grep xxyz'''. In the example below, the name of this file indicates that user xxyz is part of group abc, has 9,560 files scheduled for deletion and they take up 1.0TB of space:&lt;br /&gt;
&lt;br /&gt;
 [xxyz@scinet04 ~]$ ls -l1 /scratch/todelete/current |grep xxyz&lt;br /&gt;
 -rw-r----- 1 xxyz     root       1733059 Jan 12 11:46 10001___xxyz_______abc_________1.00T_____9560files&lt;br /&gt;
&lt;br /&gt;
The file itself contains a list of all files scheduled for deletion (in the last column) and can be viewed with standard commands like more/less/cat - e.g. '''more /scratch/todelete/current/10001___xxyz_______abc_________1.00T_____9560files'''&lt;br /&gt;
&lt;br /&gt;
Similarly, you can also verify all other users on your group by using the ls command with grep on your group. For example: '''ls -l1 /scratch/todelete/current |grep abc'''. That will list all other users in the same group that xxyz is part of, and have files to be purged on the 15th. Members of the same group have access to each other's contents.&lt;br /&gt;
&lt;br /&gt;
'''NOTE:''' Preparing these assessments takes several hours. If you change the access/modification time of a file in the interim, that will not be detected until the next cycle. A way for you to get immediate feedback is to use the ''''ls -lu'''' command on the file. If the file atime has been updated, coming the purging date on the 15th it will not be deleted any longer.&lt;br /&gt;
&lt;br /&gt;
===Project Disk Space===&lt;br /&gt;
&lt;br /&gt;
Investigators who have been granted allocations through the [http://www.scinet.utoronto.ca/resources/Account_Allocations.htm LRAC/NRAC Application Process] may have been allocated disk space in addition to compute time.   For the period of time that the allocation is granted, they will have disk space on the &amp;lt;tt&amp;gt;/project&amp;lt;/tt&amp;gt; disk system.  Space on the project systems are not purged, but neither are they backed up.   All members of the investigators groups will have access to these systems, which will be mounted read/write everywhere.&lt;br /&gt;
&lt;br /&gt;
===How much Disk Space Do I have left?===&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;'''/scinet/gpc/bin/diskUsage'''&amp;lt;/tt&amp;gt; command, available on the login nodes, datamovers and the GPC devel nodes, provides information in a number of ways on the home, scratch, and project file systems. For instance, how much disk space is being used by yourself and your group (with the -a option), or how much your usage has changed over a certain period (&amp;quot;delta information&amp;quot;) or you may generate plots of your usage over time. Please see the usage help below for more details.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: diskUsage [-h|-?| [-a] [-u &amp;lt;user&amp;gt;] [-de|-plot]&lt;br /&gt;
       -h|-?: help&lt;br /&gt;
       -a: list usages of all members on the group&lt;br /&gt;
       -u &amp;lt;user&amp;gt;: as another user on your group&lt;br /&gt;
       -de: include delta information&lt;br /&gt;
       -plot: create plots of disk usages&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note that information on usage and quota is only updated hourly!&lt;br /&gt;
&lt;br /&gt;
===Performance===&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/IBM_General_Parallel_File_System GPFS] is a high-performance filesystem which provides rapid reads and writes to large datasets in parallel from many nodes.  As a consequence of this design, however, '''the file system performs quite ''poorly'' at accessing data sets which consist of many, small files.'''  For instance, you will find that reading data in from one 4MB file is enormously faster than from 100 40KB files.   Such small files are also quite wasteful of space, as the blocksize for the filesystem is 4MB.   This is something you should keep in mind when planning your input/output strategy for runs on SciNet.&lt;br /&gt;
&lt;br /&gt;
For instance, if you run multi-process jobs, having each process write to a file of its own is not an scalable I/O solution. A directory gets locked by the first process accessing it, so all other processes have to wait for it. Not only has the code just become considerably less parallel, chances are the file system will have a time-out while waiting for your other processes, leading your program to crash mysteriously.&lt;br /&gt;
Consider using MPI-IO (part of the MPI-2 standard), which allows files to be opened simultaneously&lt;br /&gt;
by different processes, or using a dedicated process for I/O to which all other processes send their&lt;br /&gt;
data, and which subsequently writes this data to a single file.&lt;br /&gt;
&lt;br /&gt;
===Local Disk===&lt;br /&gt;
&lt;br /&gt;
The compute nodes on the GPC '''do not contain hard drives''' so there is no local disk available to use during your computation.  You can however use part of a compute nodes RAM like a local disk ('ramdisk') but this will reduce how much memory is available for your&lt;br /&gt;
program.  This can be accessed using &amp;lt;tt&amp;gt;/dev/shm/&amp;lt;/tt&amp;gt; and is currently set to 8GB.  Anything written&lt;br /&gt;
to this location that you want to keep must be copied back to the &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt; filesystem as &amp;lt;tt&amp;gt;/dev/shm&amp;lt;/tt&amp;gt; is wiped after each job and since it is in memory will not survive through a reboot of the node. More on ramdisk usage can be found [[User_Ramdisk | here]].&lt;br /&gt;
&lt;br /&gt;
Note that the absense of hard drives also means that the nodes cannot swap memory, so be sure that your computation fits within memory.&lt;br /&gt;
&lt;br /&gt;
==Data Transfer==&lt;br /&gt;
{{:Data_Transfer}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==File/Ownership Management (ACL)==&lt;br /&gt;
* By default, at SciNet, users within the same group have read permission to each other's files (not write)&lt;br /&gt;
* You may use access control list ('''ACL''') to allow your supervisor (or another user within your group) to manage files for you (i.e., create, move, rename, delete), while still retaining your access and permission as the original owner of the files/directories.&lt;br /&gt;
* '''NOTE''': We highly recommend that you never give write permission to other users on the top level of your home directory (/home/[owner]), since that would seriously compromise your privacy, in addition to disable ssh key authentication, among other things. If necessary, make specific sub-directories under your home directory so that other users can manipulate/access files from those.&lt;br /&gt;
* If you need to set up permissions across groups [mailto:support@scinet.utoronto.ca contact us] (and the other group's supervisor!).&lt;br /&gt;
&lt;br /&gt;
===Using  setfacl/getfacl===&lt;br /&gt;
* To allow [supervisor] to manage files in /project/group/[owner] using '''setfacl''' and '''getfacl''' commands, follow the 3-steps below as the [owner] account from a shell:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1) $ /scinet/gpc/bin/setfacl -d -m user:[supervisor]:rwx /project/group/[owner]&lt;br /&gt;
   (every *new* file/directory inside [owner] will inherit [supervisor] ownership by default from now on)&lt;br /&gt;
&lt;br /&gt;
2) $ /scinet/gpc/bin/setfacl -d -m user:[owner]:rwx /project/group/[owner]&lt;br /&gt;
   (but will also inherit [owner] ownership, ie, ownership of both by default, for files/directories created by [supervisor])&lt;br /&gt;
&lt;br /&gt;
3) $ /scinet/gpc/bin/setfacl -Rm user:[supervisor]:rwx /project/group/[owner]&lt;br /&gt;
   (recursively modify all *existing* files/directories inside [owner] to also be rwx by [supervisor])&lt;br /&gt;
&lt;br /&gt;
   $ /scinet/gpc/bin/getfacl /project/group/[owner]&lt;br /&gt;
   (to determine the current ACL attributes)&lt;br /&gt;
&lt;br /&gt;
   $ /scinet/gpc/bin/setfacl -b /project/group/[owner]&lt;br /&gt;
   (to remove any previously set ACL)&lt;br /&gt;
&lt;br /&gt;
PS: on the datamovers getfacl, setfacl and chacl will be on your path&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
For more information on using [http://linux.die.net/man/1/setfacl &amp;lt;tt&amp;gt;setfacl&amp;lt;/tt&amp;gt;] or [http://linux.die.net/man/1/getfacl &amp;lt;tt&amp;gt;getfacl&amp;lt;/tt&amp;gt;] see their man pages.&lt;br /&gt;
&lt;br /&gt;
===Using mmputacl/mmgetacl===&lt;br /&gt;
* Alternatively, you may use gpfs' native '''mmputacl''' and '''mmgetacl''' commands. The advantages are that you can set &amp;quot;control&amp;quot; permission and that [http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.gpfs.doc%2Fgpfs31%2Fbl1adm1160.html POSIX or NFS v4 style ACL] are supported. You will need first to create a /tmp/supervisor.acl file with the following contents:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user::rwxc&lt;br /&gt;
group::----&lt;br /&gt;
other::----&lt;br /&gt;
mask::rwxc&lt;br /&gt;
user:[owner]:rwxc&lt;br /&gt;
user:[supervisor]:rwxc&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then issue the following 2 commands:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1) $ mmputacl -i /tmp/supervisor.acl /project/group/[owner]&lt;br /&gt;
2) $ mmputacl -d -i /tmp/supervisor.acl /project/group/[owner]&lt;br /&gt;
   (every *new* file/directory inside [owner] will inherit [supervisor] ownership by default as well as &lt;br /&gt;
   [owner] ownership, ie, ownership of both by default, for files/directories created by [supervisor])&lt;br /&gt;
&lt;br /&gt;
   $ mmgetacl /project/group/[owner]&lt;br /&gt;
   (to determine the current ACL attributes)&lt;br /&gt;
&lt;br /&gt;
   $ mmdelacl -d /project/group/[owner]&lt;br /&gt;
   (to remove any previously set ACL)&lt;br /&gt;
&lt;br /&gt;
   $ mmeditacl /project/group/[owner]&lt;br /&gt;
   (to create or change a GPFS access control list)&lt;br /&gt;
   (for this command to work set the EDITOR environment variable: export EDITOR=/usr/bin/vi)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is no option to recursively add or remove ACL attributes using a gpfs built-in command. You'll need to use the -i option as above for each file or directory individually. [[Data_Management#bash_script_that_you_may_adapt_to_recursively_add_or_remove_ACL_attributes_using_gpfs_built-in_commands | Here is a sample bash script you may use for that purpose]]&lt;br /&gt;
&lt;br /&gt;
For more information on using [http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.gpfs.doc%2Fgpfs31%2Fbl1adm11120.html &amp;lt;tt&amp;gt;mmputacl&amp;lt;/tt&amp;gt;] or [http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.gpfs.doc%2Fgpfs31%2Fbl1adm11120.html &amp;lt;tt&amp;gt;mmgetaclacl&amp;lt;/tt&amp;gt;] see their man pages.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Appendix===&lt;br /&gt;
====bash script that you may adapt to recursively add or remove ACL attributes using gpfs built-in commands====&lt;br /&gt;
Courtesy of Agata Disks (http://csngwinfo.in2p3.fr/mediawiki/index.php/GPFS_ACL)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# USAGE&lt;br /&gt;
#     - on one directory:     ./set_acl.sh dir_name&lt;br /&gt;
#     - on more directories:  ./set_acl.sh 'dir_nam*'&lt;br /&gt;
#&lt;br /&gt;
&lt;br /&gt;
# Path of the file that contains the ACL&lt;br /&gt;
ACL_FILE_PATH=/agatadisks/data/acl_file.acl&lt;br /&gt;
&lt;br /&gt;
# Directories onto the ACLs have to be set&lt;br /&gt;
dirs=$1&lt;br /&gt;
&lt;br /&gt;
# Recursive function that sets ACL to files and directories&lt;br /&gt;
set_acl () {&lt;br /&gt;
  curr_dir=$1&lt;br /&gt;
  for args in $curr_dir/*&lt;br /&gt;
  do&lt;br /&gt;
    if [ -f $args ]; then&lt;br /&gt;
      echo &amp;quot;ACL set on file $args&amp;quot;&lt;br /&gt;
      mmputacl -i $ACL_FILE_PATH $args&lt;br /&gt;
      if [ $? -ne 0 ]; then&lt;br /&gt;
        echo &amp;quot;ERROR: ACL not set on $args&amp;quot;&lt;br /&gt;
        exit -1&lt;br /&gt;
      fi&lt;br /&gt;
    fi&lt;br /&gt;
    if [ -d $args ]; then&lt;br /&gt;
      # Set Default ACL in directory&lt;br /&gt;
      mmputacl -i $ACL_FILE_PATH $args -d&lt;br /&gt;
      if [ $? -ne 0 ]; then&lt;br /&gt;
        echo &amp;quot;ERROR: Default ACL not set on $args&amp;quot;&lt;br /&gt;
        exit -1&lt;br /&gt;
      fi&lt;br /&gt;
      echo &amp;quot;Default ACL set on directory $args&amp;quot;&lt;br /&gt;
      # Set ACL in directory&lt;br /&gt;
      mmputacl -i $ACL_FILE_PATH $args&lt;br /&gt;
      if [ $? -ne 0 ]; then&lt;br /&gt;
        echo &amp;quot;ERROR: ACL not set on $args&amp;quot;&lt;br /&gt;
        exit -1&lt;br /&gt;
      fi&lt;br /&gt;
      echo &amp;quot;ACL set on directory $args&amp;quot;&lt;br /&gt;
      set_acl $args&lt;br /&gt;
    fi&lt;br /&gt;
  done&lt;br /&gt;
}&lt;br /&gt;
for dir in $dirs&lt;br /&gt;
do&lt;br /&gt;
  if [ ! -d $dir ]; then&lt;br /&gt;
    echo &amp;quot;ERROR: $dir is not a directory&amp;quot;&lt;br /&gt;
    exit -1&lt;br /&gt;
  fi&lt;br /&gt;
  set_acl $dir&lt;br /&gt;
done&lt;br /&gt;
exit 0&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Hierarchical Storage Management (HSM)==&lt;br /&gt;
'''(a pilot project is starting in July/2010 with a select group of users)'''&lt;br /&gt;
&lt;br /&gt;
===Basic Concepts===&lt;br /&gt;
'''Hierarchical Storage Management (HSM)''' is a data storage technique which automatically moves data between high-cost and low-cost storage media. HSM systems exist because high-speed storage devices, such as hard disk drive arrays, are more expensive (per byte stored) than slower devices, such as optical discs and magnetic tape drives. While it would be ideal to have all data available on high-speed devices all the time, this is prohibitively expensive for many organizations. Instead, HSM systems store the bulk of the enterprise's data on slower devices, and then copy data to faster disk drives when needed. In effect, HSM turns the fast disk drives into caches for the slower mass storage devices. The HSM system monitors the way data is used and makes best guesses as to which data can safely be moved to slower devices and which data should stay on the fast devices.&lt;br /&gt;
&lt;br /&gt;
In a typical HSM scenario, data files which are frequently used are stored on disk drives, but are eventually migrated to tape if they are not used for a certain period of time, typically a few months. If a user does reuse a file which is on tape, it is automatically moved back to disk storage. The advantage is that the total amount of stored data can be much larger than the capacity of the disk storage available, but since only rarely-used files are on tape, most users will not notice any slowdown.&lt;br /&gt;
&lt;br /&gt;
The HSM client provides both ''automatic'' and ''selective migration''. Once file ''migration'' begins, the HSM client sends a copy of your file to storage volumes on disk devices or devices that support removable media, such as tape and replaces the original file with a ''stub file'' on HSM managed file system (aka ''repository'' at SciNet)&lt;br /&gt;
&lt;br /&gt;
'''Repository''' commonly refers to a location for long-term storage, often for safety or preservation. &lt;br /&gt;
&lt;br /&gt;
'''Migration''', in the context of HSM, refers to set of actions that move files from the front-end disk based repository to a back-end tape library system (often invisible or inaccessible to users)&lt;br /&gt;
&lt;br /&gt;
'''Relocation''', in the context of SciNet, refers to the use of unix commands such as copy, move, tar or rsync to get data into the repository.&lt;br /&gt;
&lt;br /&gt;
'''The stub file''' is a small replacement file that makes it appear as though the original file is on the repository. It contains required metadata information to locate and recall a migrated file and to respond to specific UNIX commands without recalling the file.&lt;br /&gt;
&lt;br /&gt;
'''Automatic migration''' periodically monitors space usage and automatically migrates eligible files according to the options and settings that have been selected. The HSM client provides two types of automatic migration: ''threshold migration'' and ''demand migration''.&lt;br /&gt;
&lt;br /&gt;
'''Threshold migration''' maintains a specific level of free space on the repository file system. When disk usage reaches the high threshold percentage, eligible files are migrated to tapes automatically. When space usage drops to the low threshold set for the file system, file migration stops.&lt;br /&gt;
&lt;br /&gt;
'''Demand migration''' responds to an out-of-space condition on the repository file system. Demand migration starts automatically if the file system runs out of space (usually triggered at 90%). For HSM, as files are migrated (oldest/largest first), space becomes available on the file system, and the process or event that caused the out-of-space condition can be resumed.&lt;br /&gt;
&lt;br /&gt;
'''Selective migration''' often an user given HSM command, that migrates specific files from the repository at will, in anticipation of the automatic migration, or independently of the system wide eligibility criteria. For example, if you know that you will not be using a particular group of files for an extended time, you can migrate them, so as to free additional space on the repository.&lt;br /&gt;
&lt;br /&gt;
'''Reclamation''' is the process of reclaiming unused space on a tape (applies to Virtual Tapes as well). Over time, as files/directories get deleted or updated on the repository, a process will expire old data, creating gaps of unused storage on the tapes. Since tapes are sequential media, typical tape handling software can only write data to the end of the tape, so these gaps of “Empty Space” cannot be used. The process entails periodically and in a rolling fashion copying active data from the &amp;quot;Swiss Cheese&amp;quot; like tapes to unused tapes on a compacted form.&lt;br /&gt;
&lt;br /&gt;
'''Optimal environment:''' HSM should be used in an environment where the old and large files which need to be preserved are not used regularly. Files that are needed frequently should not be migrated at all, otherwise HSM would act as a cache, by migrating files and shortly after the migration the same files will be recalled. This is not advisable. The repository file system needs to be large enough to hold all regularly used files. If the file system is too small and cannot hold all regularly needed files (**), HSM is permanently recalling requested files, getting beyond the high-threshold limit, migrating other files to get below the low-threshold limit and so on.&lt;br /&gt;
&lt;br /&gt;
===Deployment at SciNet===&lt;br /&gt;
HSM is performed by a dedicated IBM software made up of a number of HSM daemons running on '''datamover2'''. These daemons constantly monitor the usage of the '''/repository''' GPFS and, depending on a predefined set of policies, data may be automatically or manually migrated to the Tivoli Storage Management server (TSM), and kept on our library of LTO-4 tapes.&lt;br /&gt;
&lt;br /&gt;
'''/repository is a 15TB &amp;quot;transient&amp;quot; location''' accessible only from datamover2. Users may relocate data as required from /scratch or /project to /repository in a number of ways, such as copy, move, tar or rsync. &amp;quot;Transient&amp;quot; refers to the fact that /repository works like a &amp;quot;Black Hole&amp;quot;: in the background '''it is constantly being emptied''', even while you relocate data in from other file systems. What is left behind is the directory tree with the stub files (0 byte in size at SciNet) and the metadata associated with it, which takes up about 1-2%. But, even if /repository is at 1% to start with, we ask that you please do not initiate a relocation of more than a 10TB chunk at once, so that the system has time to process your data and still allow other user(s) to migrate/recall some material before reaching 100% full. &lt;br /&gt;
&lt;br /&gt;
Inside /repository, data is segregated on a per group basis, just as in /project. Within groups, users and group supervisors can structure materials anyway they prefer. But the recommendation is that those involved spend some time designing that structure ahead of time, since you may merge data from project and/or scratch (or even home). In tests we performed, we've been able to reorganize the FS structure after migration, change the name and ownership of directories and stubs, and still recall files under the new path and ownership. HSM does seem to keep a very symbiotic relation between the metadata and the inode attributes at the file system level, without necessarily having to replicate these changes with tape recall &amp;amp; migration operations. But please don't abuse this flexibility. If possible, keep your initial layout structure somewhat fixed over time.&lt;br /&gt;
&lt;br /&gt;
We also recommend that users bundle files in tar-balls of at least 10GB before relocation, and keep a listing of those files somewhere; in fact you may use the 'tar' command to create the tar-ball directly in /repository on-the-fly. See examples below:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -czvf /repository/[group]/[user]/myproject1.tar.gz /project/[group]/[user]/project1/ &amp;gt; /project/[group]/[user]/myproject1-repository-listing.txt&lt;br /&gt;
&lt;br /&gt;
or&lt;br /&gt;
&lt;br /&gt;
tar -czvf /repository/[group]/[user]/myscratch1.tar.gz /scratch/[user]/scratchdata1/ &amp;gt; /home/[user]/myscratch1-repository-listing.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The important is to avoid the relocation of many thousands (or millions) of small files. It's very demanding on the system to constantly scan/reconcile all these files on the file system, tapes, metadata and database. A good reference is '''average file size &amp;gt; 100MB''' in /repository. Deep directory nesting in general also increases the time required to traverse a file system and thus should be avoided where possible.&lt;br /&gt;
&lt;br /&gt;
===Performance===&lt;br /&gt;
Unlike /project or /scratch, /repository is only a 2 tier disk raid, so don't expect transfer rates much higher than 60MB/sec on a rsync session for example. In another words, a 10TB offload operation will typically take 2 days to complete, if made up of large files. On the other hand, we have conducted experiments where we migrated only 1TB, but with 1 million files expanded, and that took nearly a day! This is a situation should to be avoided for many TeraBytes. Performance is as much a function of the number of files as the amount of data&lt;br /&gt;
&lt;br /&gt;
As for the &amp;quot;ideal tar-ball size&amp;quot;, experiments have shown that an isolated 10GB tar-ball typically takes 10-15 minutes to be pulled back, considering all tape operations involved. That seems like a reasonable amount of time to wait for a group of files kept off-line for an extended period of time. Also consider that pulling back an individual tiny file could still take as long as 5-8 minutes. So, it's pretty clear that you get the best pay for the buck by tar'ing your material, and you won't tie up the tape system for too long. As for the upper limit, you can probably bundle files in 100-500GB tar-balls, provided that you're OK with waiting a couple of hours for them to be recalled at a later date; at least from SciNet's perceptive, it would be a very efficient migration.&lt;br /&gt;
&lt;br /&gt;
'''Please be sure to contact us to schedule your transfers IN or OUT of the system, to avoid conflict with other users or within the system settings.''' For instance, if you recall large amounts of data at once, let's say 7.5TB (about half of /repository), we would have to adjust the high threshold accordingly for that period (to 50%), so we don't induce the never ending migrate/recall issues (**) described on the ''Optimal environment''.&lt;br /&gt;
&lt;br /&gt;
===How to migrate/recall data===&lt;br /&gt;
&lt;br /&gt;
====Automatic====&lt;br /&gt;
We currently setup /repository with '''High and Low thresholds of 2% and 1% respectively'''. That means, at regular intervals the file system is monitored to determine if the 2% usage mark has been reached or surpassed. In that case, data is automatically migrated to tapes, oldest (or largest) first, until the file system is down to 1%, if possible (metadata is not migrated). Since data may be copied/moved/rsync'ed/tar'ed in faster than /repository can be emptied, you may observe 80-90% disk usages sporadically (hence the 10TB chunk of data limit). For now at SciNet we migrate every file in /repository to tapes.&lt;br /&gt;
&lt;br /&gt;
To recall a file automatically all you have to do is '''access''' it. There are many ways you can do this. For example, you may view a file with 'cat', 'more', 'vi/vim', etc. You may also copy the file (or directory) from /repository to another location. '''Please be patient:''' the file will have to be pulled back from tape, and this will take some time, longer if it happens to be at the end of a tape.&lt;br /&gt;
&lt;br /&gt;
====Selective====&lt;br /&gt;
&lt;br /&gt;
Used to overwrite the internal priority of HSM (oldest/largest) or to migrate files/directories &amp;quot;immediately&amp;quot;. The recommendation is to '''not wait''' for the automatic migration cycle to kick in, since this could take some 6 to 12 hours at SciNet. If you already know that you relocated material to repository with the intention of having it migrated to tapes, you can just use dsmmigrate as soon as the rsync to repository has finished, for instance.&lt;br /&gt;
&lt;br /&gt;
(files won't be migrated until they have &amp;quot;aged&amp;quot; for at least 5 minutes, that is, after their last access/modification time)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
dsmmigrate [path to FILE]&lt;br /&gt;
dsmmigrate -R -D /repository/[group]/[user]/[directory]&lt;br /&gt;
dsmmigrate /repository/scinet/pinto/blahblahblah.tar.Z&lt;br /&gt;
or&lt;br /&gt;
cd /repository/scinet/pinto/&lt;br /&gt;
dsmmigrate blahblahblah.tar.Z&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To selectively recall data, just type:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
dsmrecall [path to FILE]&lt;br /&gt;
dsmrecall -R -D /repository/[group]/[user]/[directory]&lt;br /&gt;
dsmrecall /repository/scinet/pinto/blahblahblah.tar.Z&lt;br /&gt;
or&lt;br /&gt;
cd /repository/scinet/pinto/&lt;br /&gt;
dsmrecall blahblahblah.tar.Z&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Note:''' We've been finding that the search for new candidates for automatic migration takes much longer once repository is already full of files/stubs. That is to be expected, hence the recommendation to not wait and proceed with the selective migration of your own files/directories asap.&lt;br /&gt;
&lt;br /&gt;
===Disaster Recovery===&lt;br /&gt;
&lt;br /&gt;
As with any disk based storage, although it's a raid 5 file system, repository is not immune to failures. We do not do regular backups, but it's possible to do a full recovery in case of catastrophic loss of repository. '''For that it's important that all files have been completely migrated to tapes''' before hand. That puts the onus on users to ensure this migration is indeed finished (with selective migration) for the relocated material before they delete the originals from /project or /scratch.&lt;br /&gt;
&lt;br /&gt;
===Common HSM commands===&lt;br /&gt;
Some traditional unix/linux commands, such as 'ls' or 'rm' for instance, will work with the stub file as the real files. But others, such as 'du' or 'df', you better use a HSM equivalent, which will give you more meaningful information in the context of HSM. They only work inside /repository. Some of them will be executable only by root, such as 'dsmrm', in which case you'll be notified.&lt;br /&gt;
&lt;br /&gt;
===='''dsmls'''====&lt;br /&gt;
to check status of files; used in the directory where you expect to have migrated files&lt;br /&gt;
&lt;br /&gt;
'''r''': ''resident''    (the file is on repository only)&lt;br /&gt;
&lt;br /&gt;
'''m''': ''migrated''    (only the stub of the file is on repository)&lt;br /&gt;
&lt;br /&gt;
'''p''': ''premigrated'' (the file is on repository and on tape)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: dsmls [-Noheader] [-Recursive] [-Help] [file specs|-FIlelist=file]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc-logindm02-$ dsmls -R a3&lt;br /&gt;
IBM Tivoli Storage Manager&lt;br /&gt;
Command Line Space Management Client Interface&lt;br /&gt;
  Client Version 6, Release 1, Level 0.0  &lt;br /&gt;
  Client date/time: 07/27/2010 12:06:36&lt;br /&gt;
(c) Copyright by IBM Corporation and other(s) 1990, 2009. All Rights Reserved.&lt;br /&gt;
&lt;br /&gt;
      Actual     Resident     Resident  File   File&lt;br /&gt;
        Size         Size     Blk (KB)  State  Name&lt;br /&gt;
       &amp;lt;dir&amp;gt;         8192            8   -      a3/&lt;br /&gt;
&lt;br /&gt;
/repository/scinet/pinto/a3:&lt;br /&gt;
 34008432640            0            0   m      32G-1&lt;br /&gt;
 34008432640  34008432640            0   r      32G-2&lt;br /&gt;
 34008432640  34008432640            0   p      32G-3&lt;br /&gt;
           0            0            0   r      dsmerror.log&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===='''dsmdu'''==== &lt;br /&gt;
disk usage on the original files/directory&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: dsmdu [-Allfiles] [-Summary] [-Help] [directory names]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===='''dsmdf'''====&lt;br /&gt;
disk free on the HSM file system.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: dsmdf [-Help] [-Detail] [file systems]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===='''dsmmigrate'''====&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: dsmmigrate [-Recursive] [-Premigrate] [-Detail] [-Help] filespecs|-FIlelist=file &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===='''dsmrecall'''====&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: dsmrecall [-Recursive] [-Detail] [-Help] file specs|-FIlelist=file&lt;br /&gt;
   or  dsmrecall [-Detail] -offset=XXXX[kmgKMG] -size=XXXX[kmgKMG] file specs &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To have an idea of what HSM is doing on datamover2 at a given time:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[pinto@gpc-logindm02 ~]$ ps -def | grep dsm | grep -v mmfs&lt;br /&gt;
&lt;br /&gt;
root      2455 15190  0 16:26 ?        00:00:00 dsmmonitord&lt;br /&gt;
root      2456  2455  2 16:26 ?        00:05:38 dsmautomig -2 system::/repository&lt;br /&gt;
pinto    10997 10637 30 16:40 pts/3    01:14:20 dsmmigrate -R -D pinto&lt;br /&gt;
root     12857     1  0 16:15 ?        00:00:00 dsmrecalld&lt;br /&gt;
root     13013 12857  0 16:15 ?        00:00:01 dsmrecalld&lt;br /&gt;
root     13015 12857  0 16:15 ?        00:00:00 dsmrecalld&lt;br /&gt;
root     15190     1  0 16:15 ?        00:00:00 dsmmonitord&lt;br /&gt;
root     16936     1  3 16:15 ?        00:10:44 dsmscoutd&lt;br /&gt;
root     17217     1 13 16:16 ?        00:36:49 dsmrootd&lt;br /&gt;
root     18732  2456  4 17:51 ?        00:07:19 dsmautomig -2 system::/repository&lt;br /&gt;
root     18737  2456  0 17:51 ?        00:00:26 dsmautomig -2 system::/repository&lt;br /&gt;
pinto    24533 10363  0 20:48 pts/2    00:00:00 grep dsm&lt;br /&gt;
root     25090     1  0 06:42 ?        00:00:08 dsmwatchd nodetach&lt;br /&gt;
root     30840 13013  0 17:15 ?        00:00:02 dsmrecalld&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the above example, dsmmonitord, dsmrecalld, dsmscoutd, dsmrootd and dsmwatchd are the 5 typical HSM daemons, and they always running. In addition, there are 3 streams of dsmautomig (triggered by threshold migration) and 1 stream of dsmmigrate (selective migration initiated by user pinto).&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Scheduler&amp;diff=2207</id>
		<title>Scheduler</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Scheduler&amp;diff=2207"/>
		<updated>2010-11-22T22:18:20Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* Multiple Job Submissions */  typo node should be nodes in #PBS command&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;The queueing system used at SciNet is based around Cluster Resources [http://www.clusterresources.com/products/moab-cluster-suite/workload-manager.php Moab Workload Manager].&lt;br /&gt;
Moab is used on both the GPC and TCS however [http://www.clusterresources.com/products/torque/docs/index.shtml Torque] is used as the backend resource manager on the GPC and IBM's [http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp LoadLeveler] is used on the TCS.&lt;br /&gt;
&lt;br /&gt;
This page outlines some of the most common Moab commands with full documentation available from Moab [http://www.clusterresources.com/products/mwm/docs/a.gcommandoverview.shtml here], the torque (pbs) commands full documentation is [http://www.clusterresources.com/products/torque/docs/a.acommands.shtml here].&lt;br /&gt;
&lt;br /&gt;
=== Queues ===&lt;br /&gt;
&lt;br /&gt;
==== GPC ====&lt;br /&gt;
&lt;br /&gt;
===== batch =====&lt;br /&gt;
&lt;br /&gt;
The batch queue is the default queue on the GPC allowing the user access to all the &lt;br /&gt;
resources for jobs upto 48 hours.  If a specific queue is not specified, &amp;lt;tt&amp;gt;-q&amp;lt;/tt&amp;gt; flag,&lt;br /&gt;
then a job is submitted to the batch queue.&lt;br /&gt;
&lt;br /&gt;
===== debug =====&lt;br /&gt;
&lt;br /&gt;
A debug queue has been set up primarily for code developers to quickly test&lt;br /&gt;
and evaluate their codes and configurations without having to wait in the batch queue.  There are 10 nodes&lt;br /&gt;
currently reserved for the debug queue.  It has quite restrictive limits to promote high turnover&lt;br /&gt;
and availability thus a user can only use 2 nodes (16 cores) for 2 hours, to a maximum&lt;br /&gt;
of 8 nodes (64 cores) for 1/2 an hour and can only have one job in the debug queue at a time. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -l nodes=1:ppn=8,walltime=1:00:00 -q debug -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===== largemem =====&lt;br /&gt;
&lt;br /&gt;
The largemem queue is used for accessing one of two 16 core with 128 GB memory intel Xeon (non-nehalem) nodes. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -l nodes=1:ppn=16,walltime=1:00:00 -q largemem -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== TCS ====&lt;br /&gt;
&lt;br /&gt;
The TCS currently only has one queue, or class, in use called &amp;quot;verylong&amp;quot; and its only&lt;br /&gt;
limitation is that jobs must be under 48 hours.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#@ class           = verylong&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Job Info===&lt;br /&gt;
&lt;br /&gt;
To see all jobs queued on a system use&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Three sections are shown; running, idle, and blocked.  Idle jobs are commonly referred to as queued jobs &lt;br /&gt;
as they meet all the requirements, however they are waiting for available resources.  Blocked jobs &lt;br /&gt;
are either caused by improper resource requests or more commonly by exceeding a user or groups allowable&lt;br /&gt;
resources.   For example if you are allowed to submit 10 jobs and you submit 20, the first 10&lt;br /&gt;
jobs will be submitted properly and either run right away or be queued, however the other 10 jobs&lt;br /&gt;
will be blocked and the jobs won't be submitted to the queue until one of the first 10 finishes.&lt;br /&gt;
&lt;br /&gt;
If showq is returning output slowly, you can query cached info using &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showq --noblock&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Available Resources ===&lt;br /&gt;
&lt;br /&gt;
Determining when your job will run can be tricky as it involves a combination of queue type, node type, system reservations, and job priority. The following commands are provided to help you figure out what resources are currently available, however they may not tell you exactly when your job will run for the aforementioned reasons.&lt;br /&gt;
&lt;br /&gt;
==== GPC ====&lt;br /&gt;
To show how many ethernet nodes are currently free, use the show back fill command&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showbf -f compute-eth &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To show how many infiniband nodes are free, use&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showbf -f ib&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== TCS ====&lt;br /&gt;
To show how many TCS nodes are free, use&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showbf -c verylong&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For example checking for an ethernet job&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showbf -f compute-eth&lt;br /&gt;
Partition     Tasks  Nodes      Duration   StartOffset       StartDate&lt;br /&gt;
---------     -----  -----  ------------  ------------  --------------&lt;br /&gt;
ALL           14728   1839       7:36:23      00:00:00  00:23:37_09/24&lt;br /&gt;
ALL             256     30      INFINITY      00:00:00  00:23:37_09/24&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
shows that for jobs under 7:36:23 you can use 1839 nodes, but if you submit&lt;br /&gt;
a job over that time only 30 will be available.  In this case this is&lt;br /&gt;
due to a large reservation made my SciNet staff, but from a users point&lt;br /&gt;
of view, showbf tells you very simply what is available and at what time point.&lt;br /&gt;
In this case, a user may wish to set #PBS -l walltime=7:30:00 in their script, or add -l walltime=7:30:00 to their qsub command in order to ensure that the jobs backfill the reserved nodes.&lt;br /&gt;
&lt;br /&gt;
'''NOTE:''' showbf shows currently available nodes, however just because nodes are available&lt;br /&gt;
doesn't mean that your job will start right away.  Job priority, system reservations &lt;br /&gt;
along with dedicated nodes, such as those for the debug queue, will alter when jobs &lt;br /&gt;
run so even if enough nodes appear &amp;quot;free&amp;quot;, it doesn't mean your job will actually run right &lt;br /&gt;
away.&lt;br /&gt;
&lt;br /&gt;
=== Job Submission ===&lt;br /&gt;
&lt;br /&gt;
==== Interactive ====&lt;br /&gt;
&lt;br /&gt;
On the GPC an interactive queue session can be requested using the following &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -l nodes=2:ppn=8,walltime=1:00:00 -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Non-interactive (Batch) ====&lt;br /&gt;
&lt;br /&gt;
For a non-interactive job submission you require a submission script formatted for the appropriate resource manger. Examples&lt;br /&gt;
are provided for the [[GPC_Quickstart#Submitting_A_Batch_Job | GPC]] and [[TCS_Quickstart#Submitting_A_Job | TCS]].&lt;br /&gt;
&lt;br /&gt;
=== Job Status ===&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ checkjob jobid&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Cancel a Job ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ canceljob jobid&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Accounting ===&lt;br /&gt;
&lt;br /&gt;
For any user with an NRAC/LRAC allocation, a special account with the Resource Allocation Project (RAP) identifier (RAPI) from Compute Canada Database (CCDB) is set up in order to access the allocated resources.  Please use the following instructions to run your job using your special allocation.  This is necessary both for accounting purposes as well as to assign the appropriate priority to your jobs.&lt;br /&gt;
&lt;br /&gt;
Each job run on the system will have a default RAP associated with it.  Most users already have their default RAP properly set.  However, if you have more than one allocation (different RAPs),  you may need/want to change your default RAP in order to charge your jobs to a particular RAP.&lt;br /&gt;
&lt;br /&gt;
==== Changing your default RAP ====&lt;br /&gt;
&lt;br /&gt;
# Go to the [https://portal.scinet.utoronto.ca portal], login with your SciNet username and password.&lt;br /&gt;
# Click on &amp;quot;Change SciNet default RAP&amp;quot; and change your default RAP.&lt;br /&gt;
&lt;br /&gt;
==== Specifying the RAP for GPC ====&lt;br /&gt;
&lt;br /&gt;
Alternatively, you may want to assign a RAP for each particular job you run.  There are two ways to specify an account for Moab/Torque: From the command line or inside the batch submission script.&lt;br /&gt;
&lt;br /&gt;
===== Command line =====&lt;br /&gt;
&lt;br /&gt;
Use the '-A RAPI' flag when you submit your job using qsub.  Note that the command line option will override the submission script if an account is specified on both the submission script and the command line.  &amp;quot;RAPI&amp;quot; is the RAP Identifier, e.g. abc-123-de.&lt;br /&gt;
&lt;br /&gt;
===== Submission Script =====&lt;br /&gt;
&lt;br /&gt;
Add a line in your submit script as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#PBS -A RAPI&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Please replace &amp;quot;RAPI&amp;quot; with your RAP Identifier.&lt;br /&gt;
&lt;br /&gt;
==== Specifiying the RAP for TCS ====&lt;br /&gt;
&lt;br /&gt;
Add a line in your submit script as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# @ account_no = RAPI&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Please replace &amp;quot;RAPI&amp;quot; with your RAP Identifier.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== User Stats ===&lt;br /&gt;
&lt;br /&gt;
Show current usage stats for a $USER&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showstats -u $USER&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Reservations ===&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showres&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Standard users can only see their reservations not other users or system ones.&lt;br /&gt;
To determine what is available a user can use &amp;quot;showbf&amp;quot;, it shows what resources are&lt;br /&gt;
available and at what time level, taking into account running jobs and all the reservations. Refer to the [[Moab#Available_Resources | Available Resources]] section of this page for more details.&lt;br /&gt;
&lt;br /&gt;
=== Job Dependencies ===&lt;br /&gt;
&lt;br /&gt;
Sometimes you may want one job not to start until another job finishes, however&lt;br /&gt;
you would like to submit them both at the same time.  This can be done&lt;br /&gt;
using job dependencies on both the GPC and TCS, however the commands &lt;br /&gt;
are different due to the underlying resource managers being different.&lt;br /&gt;
&lt;br /&gt;
==== GPC ====&lt;br /&gt;
&lt;br /&gt;
Use the -W flag with the following syntax in your submission script to have this job not start&lt;br /&gt;
until the job with jobid or jobName (given with -N jobName) is finished&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend:after:{jobid | jobName}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More detailed syntax and examples can be found &lt;br /&gt;
[[http://www.clusterresources.com/products/mwm/docs/11.5jobdependencies.shtml#overview here ]] and&lt;br /&gt;
[[http://www.clusterresources.com/products/torque/docs/commands/qsub.shtml#W here]].&lt;br /&gt;
&lt;br /&gt;
==== TCS ====&lt;br /&gt;
&lt;br /&gt;
Loadleveler does job dependencies using what they call steps.&lt;br /&gt;
See the [[TCS_Quickstart#Steps | TCS Quickstart]] guide for an example.&lt;br /&gt;
&lt;br /&gt;
=== Adjusting Job Priority ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The ability to adjust job priorities downwards can also be of use to adjust relative priorities of jobs between users who are running jobs of the same allocation (eg, a default, LRAC, or NRAC allocation of the same PI).   Priorities are determined by how much of the time of that allocation been currently used, and all users using that account will have identical priorities.   This mechanism allows users to voluntarily reduce their priority to allow other users of the same allocation to run ahead of them.&lt;br /&gt;
&lt;br /&gt;
In principle, by adjusting a jobs priority downwards, you could reduce your jobs priority to the point that someone elses job entirely could go ahead of yours.  In practice, however, this is extremely unlikely.   Users with LRAC or NRAC allocations have priorities that are extremely large positive numbers that depend on their allocation and how much of it they have already used during the past fairshare window (2 weeks); it is very unlikely that two groups would have priorities that are within 10 or 100 or 1000 of each other.&lt;br /&gt;
&lt;br /&gt;
Note that at the moment, we do not allow priorities to go negative; they are integers that can go no lower than 1.  (This may change in the future)  That means that users of accounts that have already used their full allocation during the current fairshare period (eg, over the past two weeks), and so whose priority would normally be negative but is capped at 1, can not lower their priority any further.   Similar, users with a `default' allocation have priority 1, and cannot lower their priorities any further.&lt;br /&gt;
&lt;br /&gt;
==== GPC ====&lt;br /&gt;
&lt;br /&gt;
Moab allows users to adjust their jobs' priority moderately downwards, with the &amp;lt;tt&amp;gt;-p&amp;lt;/tt&amp;gt; flag; that is, on a qsub line&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub ... -p -10  JOBID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or in a script&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
#PBS -p -10&lt;br /&gt;
..&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The number used (-10 in the examples above) can be any negative number down to -1024.   &lt;br /&gt;
&lt;br /&gt;
The ability to adjust job priorities downwards can be useful when you are running a number of jobs and want some to enter the queue at higher priorities than others.   Note that if you absolutely require some jobs to start before others, you could use [[#Job Dependencies | job dependencies]] instead.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For a job that is currently queued, one can adjust its priority with&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qalter -p -10 JOBID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== TCS ====&lt;br /&gt;
&lt;br /&gt;
TCS users can adjust their priorities by putting the following line in their scripts&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#@ user_priority = 50 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the number can be between 0 (which is 50 below the default priority) to 50 (the default priority).&lt;br /&gt;
&lt;br /&gt;
=== Suspending a Running Job ===&lt;br /&gt;
&lt;br /&gt;
Separate from, and in addition to, the ability to place a hold on a queued job, you may want to suspend a running job. For example, you may want to test the timing of events in a weakly coupled parallel environment.&lt;br /&gt;
&lt;br /&gt;
==== GPC ====&lt;br /&gt;
&lt;br /&gt;
To suspend a job:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsig -s STOP &amp;lt;jobid&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and to start it again:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsig -s CONT &amp;lt;jobid&amp;gt;.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Scripts are suspendable by default, so you don't need to add any signal handling for this to work.&lt;br /&gt;
As far as we can tell, the result is identical to using fg and ctrl-Z (or kill -STOP &amp;lt;PID&amp;gt;) in an interactive run.&lt;br /&gt;
&lt;br /&gt;
More about using (and trapping) signals can be found on the [[Using Signals]] page.&lt;br /&gt;
&lt;br /&gt;
=== Multiple Job Submissions ===&lt;br /&gt;
&lt;br /&gt;
If you are doing doing batch processing of a number of similar jobs on the GPC, torque has a feature called job arrays that can be used to simplify this process. &lt;br /&gt;
By using the &amp;quot;-t 0-N&amp;quot; option on the command line during job submission or putting it in the job script file, #PBS -t 0-N, torque will expand your &lt;br /&gt;
single job submission into N jobs and sets the environment variable PBS_ARRAYID equal to that jobs specific number, ie 0-N, for each job.  This &lt;br /&gt;
reduces the amount of calls to qsub, and can allow the user to have many less submission scripts. Job arrays also have the benefit of batching &lt;br /&gt;
groups of jobs allowing commands like qalter, qdel, qhold to work on all or a subset of the job array jobs with one command, instead of having &lt;br /&gt;
to run the command for each job.&lt;br /&gt;
&lt;br /&gt;
In the following example, 10 jobs are submitted using a single command &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -t 0-10 jobscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and the submission script then modifies the job based on the PBS_ARRAYID.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=10:00:00&lt;br /&gt;
#PBS -N array_jobs&lt;br /&gt;
&lt;br /&gt;
cd ${PBS_O_WORKDIR}&lt;br /&gt;
mkdir job.${PBS_ARRAYID}&lt;br /&gt;
cd job.${PBS_ARRAYID}&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;Running job ${PBS_ARRAYID}&amp;quot; &amp;gt; array_job.${PBS_ARRAYID}.out&lt;br /&gt;
mpirun -np 8 ./mycode -f mycode.${PBS_ARRAYID} &amp;gt;&amp;gt; array_job.${PBS_ARRAYID}.out&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The JOBID and the job name both get the additional ARRAYID added onto them in the form of a hyphen, ie JOBID-ARRAYID.  If for example you wanted&lt;br /&gt;
to cancel all the jobs in a job array you would use &amp;quot;qdel JOBID&amp;quot;, whereas if you wanted to cancel just one of the jobs you would use  &lt;br /&gt;
&amp;quot;qdel JOBID-ARRAYID&amp;quot;.  &lt;br /&gt;
&lt;br /&gt;
See [http://www.clusterresources.com/products/torque/docs/2.1jobsubmission.shtml#jobarrays here] and [http://www.clusterresources.com/products/torque/docs/commands/qsub.shtml#t here]&lt;br /&gt;
for full details.&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=SciNet_Users_Group_(SNUG)&amp;diff=2088</id>
		<title>SciNet Users Group (SNUG)</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=SciNet_Users_Group_(SNUG)&amp;diff=2088"/>
		<updated>2010-10-13T15:03:53Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* Desired topics for the half-hour prepared talk */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Meetings==&lt;br /&gt;
The SciNet Users Group (SNUG) currently meets every month on the second Wednesday from 12:00 to 1:30 and involve pizza, user discussion, feedback, and a half-hour talk on topics or technologies of interest to the SciNet community. &lt;br /&gt;
&lt;br /&gt;
For more information, and to sign up, please visit https://support.scinet.utoronto.ca/courses/&lt;br /&gt;
&lt;br /&gt;
====Upcoming topics for the half-hour prepared talk====&lt;br /&gt;
* October's SNUG will be on the 13th, and the TechTalk will be: &amp;quot;Version control on SciNet - svn, git, mercurial&amp;quot;.&lt;br /&gt;
* November's SNUG will be on the 10th, and the TechTalk will be: &amp;quot;Debuggers &amp;amp; parallel debugging on SciNet - gdb, ddd, padb&amp;quot;&lt;br /&gt;
* December's SNUG will be on the 8th, and the TechTalk will be: &amp;quot;Performance and profiling on SciNet - gprof, scalasca, peekperf&amp;quot;&lt;br /&gt;
&lt;br /&gt;
====Desired topics for the half-hour prepared talk====&lt;br /&gt;
* Walk-through of Compute Canada resources (nation-wide)&lt;br /&gt;
* How to write a successful application for computational resources&lt;br /&gt;
* Add yours here!&lt;br /&gt;
&lt;br /&gt;
====Previous topics for the half-hour prepared talk====&lt;br /&gt;
* September's SNUG will be on the 8th, and the TechTalk will be: &amp;quot;The SciNet GPFS file systems and you&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
==Suggestions==&lt;br /&gt;
* Turning this suggestion list into something to which users can not only add ideas, but also vote on them.&lt;br /&gt;
* Some type of online communication device to foster communication between users that is more like a mailing list than this wiki&lt;br /&gt;
* Each SNUG begins with a very informal go-round in which each user mentions something they learned recently, something they did recently, some problem they had recently, etc.&lt;br /&gt;
* Create an audiotape of the meetings and run it through voice detection software (e.g. Dragon Naturally Speaking) and post the output to the wiki as minutes (no formatting or human editing, just whatever comes out of the software so that it doesn't end up being a huge chore).&lt;br /&gt;
* make this SNUG page require a login (or make some other change so that google won't index it)&lt;br /&gt;
* Add yours here!&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=SciNet_Users_Group_(SNUG)&amp;diff=2087</id>
		<title>SciNet Users Group (SNUG)</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=SciNet_Users_Group_(SNUG)&amp;diff=2087"/>
		<updated>2010-10-13T15:02:05Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* Desired topics for the half-hour prepared talk */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Meetings==&lt;br /&gt;
The SciNet Users Group (SNUG) currently meets every month on the second Wednesday from 12:00 to 1:30 and involve pizza, user discussion, feedback, and a half-hour talk on topics or technologies of interest to the SciNet community. &lt;br /&gt;
&lt;br /&gt;
For more information, and to sign up, please visit https://support.scinet.utoronto.ca/courses/&lt;br /&gt;
&lt;br /&gt;
====Upcoming topics for the half-hour prepared talk====&lt;br /&gt;
* October's SNUG will be on the 13th, and the TechTalk will be: &amp;quot;Version control on SciNet - svn, git, mercurial&amp;quot;.&lt;br /&gt;
* November's SNUG will be on the 10th, and the TechTalk will be: &amp;quot;Debuggers &amp;amp; parallel debugging on SciNet - gdb, ddd, padb&amp;quot;&lt;br /&gt;
* December's SNUG will be on the 8th, and the TechTalk will be: &amp;quot;Performance and profiling on SciNet - gprof, scalasca, peekperf&amp;quot;&lt;br /&gt;
&lt;br /&gt;
====Desired topics for the half-hour prepared talk====&lt;br /&gt;
* Compute Canada resources (nation-wide)&lt;br /&gt;
* Add yours here!&lt;br /&gt;
&lt;br /&gt;
====Previous topics for the half-hour prepared talk====&lt;br /&gt;
* September's SNUG will be on the 8th, and the TechTalk will be: &amp;quot;The SciNet GPFS file systems and you&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
==Suggestions==&lt;br /&gt;
* Turning this suggestion list into something to which users can not only add ideas, but also vote on them.&lt;br /&gt;
* Some type of online communication device to foster communication between users that is more like a mailing list than this wiki&lt;br /&gt;
* Each SNUG begins with a very informal go-round in which each user mentions something they learned recently, something they did recently, some problem they had recently, etc.&lt;br /&gt;
* Create an audiotape of the meetings and run it through voice detection software (e.g. Dragon Naturally Speaking) and post the output to the wiki as minutes (no formatting or human editing, just whatever comes out of the software so that it doesn't end up being a huge chore).&lt;br /&gt;
* make this SNUG page require a login (or make some other change so that google won't index it)&lt;br /&gt;
* Add yours here!&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=SciNet_Users_Group_(SNUG)&amp;diff=2086</id>
		<title>SciNet Users Group (SNUG)</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=SciNet_Users_Group_(SNUG)&amp;diff=2086"/>
		<updated>2010-10-13T15:01:12Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* Meetings */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Meetings==&lt;br /&gt;
The SciNet Users Group (SNUG) currently meets every month on the second Wednesday from 12:00 to 1:30 and involve pizza, user discussion, feedback, and a half-hour talk on topics or technologies of interest to the SciNet community. &lt;br /&gt;
&lt;br /&gt;
For more information, and to sign up, please visit https://support.scinet.utoronto.ca/courses/&lt;br /&gt;
&lt;br /&gt;
====Upcoming topics for the half-hour prepared talk====&lt;br /&gt;
* October's SNUG will be on the 13th, and the TechTalk will be: &amp;quot;Version control on SciNet - svn, git, mercurial&amp;quot;.&lt;br /&gt;
* November's SNUG will be on the 10th, and the TechTalk will be: &amp;quot;Debuggers &amp;amp; parallel debugging on SciNet - gdb, ddd, padb&amp;quot;&lt;br /&gt;
* December's SNUG will be on the 8th, and the TechTalk will be: &amp;quot;Performance and profiling on SciNet - gprof, scalasca, peekperf&amp;quot;&lt;br /&gt;
&lt;br /&gt;
====Desired topics for the half-hour prepared talk====&lt;br /&gt;
* Add yours here!&lt;br /&gt;
&lt;br /&gt;
====Previous topics for the half-hour prepared talk====&lt;br /&gt;
* September's SNUG will be on the 8th, and the TechTalk will be: &amp;quot;The SciNet GPFS file systems and you&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
==Suggestions==&lt;br /&gt;
* Turning this suggestion list into something to which users can not only add ideas, but also vote on them.&lt;br /&gt;
* Some type of online communication device to foster communication between users that is more like a mailing list than this wiki&lt;br /&gt;
* Each SNUG begins with a very informal go-round in which each user mentions something they learned recently, something they did recently, some problem they had recently, etc.&lt;br /&gt;
* Create an audiotape of the meetings and run it through voice detection software (e.g. Dragon Naturally Speaking) and post the output to the wiki as minutes (no formatting or human editing, just whatever comes out of the software so that it doesn't end up being a huge chore).&lt;br /&gt;
* make this SNUG page require a login (or make some other change so that google won't index it)&lt;br /&gt;
* Add yours here!&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Data_Management&amp;diff=2083</id>
		<title>Data Management</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Data_Management&amp;diff=2083"/>
		<updated>2010-10-08T16:29:43Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* Overview of the different file systems */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Storage Space==&lt;br /&gt;
SciNet's storage system is based on IBM's [http://en.wikipedia.org/wiki/IBM_General_Parallel_File_System GPFS] (General Parallel File System).   There are two main systems for user data: &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt;, a small, backed-up space where user home directories are located, and &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt;, a large system for input or output data for jobs; data on &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt; is not only not backed up (a third storage system, /project, exist only for groups with LRAC/NRAC allocations). Data placed on scratch will be deleted if it has not been accessed in 3 months.  SciNet does not provide long-term storage for large data sets.  &lt;br /&gt;
&lt;br /&gt;
===Overview of the different file systems===&lt;br /&gt;
&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! {{Hl2}} | file system &lt;br /&gt;
! {{Hl2}} | purpose &lt;br /&gt;
! {{Hl2}} | quota &lt;br /&gt;
! {{Hl2}} | block size &lt;br /&gt;
! {{Hl2}} | backed up&lt;br /&gt;
! {{Hl2}} | purged&lt;br /&gt;
! {{Hl2}} | access &lt;br /&gt;
|- &lt;br /&gt;
| /home&lt;br /&gt;
| development&lt;br /&gt;
| 10 GB&lt;br /&gt;
| 256 KB&lt;br /&gt;
| yes&lt;br /&gt;
| never&lt;br /&gt;
| read-only on compute nodes (r/w on login, devel and datamover1) &lt;br /&gt;
|- &lt;br /&gt;
| /scratch&lt;br /&gt;
| computation&lt;br /&gt;
| 20 TB&lt;br /&gt;
| 4 MB&lt;br /&gt;
| no&lt;br /&gt;
| files &amp;gt; 3 month&lt;br /&gt;
| read/write on all nodes&lt;br /&gt;
|- &lt;br /&gt;
| /project&lt;br /&gt;
| computation&lt;br /&gt;
| by allocation&lt;br /&gt;
| 256 KB&lt;br /&gt;
| no&lt;br /&gt;
| never&lt;br /&gt;
| read/write on all nodes&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Home Disk Space===&lt;br /&gt;
&lt;br /&gt;
Every SciNet user gets a 10GB directory on &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; which is regularly backed-up.   Home is visible from &amp;lt;tt&amp;gt;login.scinet&amp;lt;/tt&amp;gt; nodes, and from the development nodes on [[GPC_Quickstart | GPC]] and the [[TCS_Quickstart | TCS]].  However, on the compute nodes of the GPC clusters -- as when jobs are running -- &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; is mounted '''''read-only'''''; thus GPC jobs can read files in /home but cannot write to files there.   &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; is a good place to put code, input files for runs, and anything else that needs to be kept to reproduce runs.  On the other hand, &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; is not a good place to put many small files, since&lt;br /&gt;
the block size for the file system is 256KB, so you would quickly run out of disk quota and you will make the backup system very slow.&lt;br /&gt;
&lt;br /&gt;
If your application absolutely insists on writing material to your home account and you can't find a way to instruct it to write somewhere else, an alternative is to create a link pointing from your account under /home to a location under /scratch.&lt;br /&gt;
&lt;br /&gt;
===Scratch Disk Space===&lt;br /&gt;
&lt;br /&gt;
Every SciNet user also gets a directory in &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt;.   Scratch is visible from the &amp;lt;tt&amp;gt;login.scinet&amp;lt;/tt&amp;gt; nodes,  the development nodes on [[GPC_Quickstart | GPC]] and the [[TCS_Quickstart | TCS]], and on the compute nodes of the clusters, mounted as read-write.   Thus jobs would normally write their output somewhere in &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt;.  There are '''NO''' backups of anything on &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
There is a large amount of space available on &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt; but it is purged routinely so that all users running jobs and generating large outputs will have room to store their data temporarily.  Computational results which you want to keep longer than this must be copied (using &amp;lt;tt&amp;gt;scp&amp;lt;/tt&amp;gt;) off of SciNet entirely and to your local system.   SciNet does not routinely provide long-term storage for large data sets.&lt;br /&gt;
&lt;br /&gt;
===Scratch Disk Purging Policy===&lt;br /&gt;
&lt;br /&gt;
In order to ensure that there is always significant space available for running jobs '''we automatically delete files in /scratch that have not been accessed in 3 months'''. This policy is subject to revision depending on its effectiveness. More details about the purging process and how users can check if their files will be deleted follows. If you have files scheduled for deletion you should move them to a more permanent locations such as your departmental server or your /project space (for PIs who have either been allocated disk space by the LRAC or have bought diskspace).&lt;br /&gt;
&lt;br /&gt;
On the '''first''' of each month, a list of files scheduled for deletion is produced, and an email notification is sent to each user on that list. Furthermore, at/or about the '''12th''' of each month a 2nd scan produces a more current assessment and another email notification is sent. This way users can double check that they have indeed taken care of all the files they needed to relocate before the purging deadline. Those files will be automatically deleted on the '''15th''' of the same month unless they have been accessed or relocated in the interim. If you have files scheduled for deletion then they will be listed in a file in /scratch/todelete/current, which has your userid and groupid in the filename. For example, if user xxyz wants to check if they have files scheduled for deletion they can issue the following command on a system which mounts /scratch (e.g. a scinet login node): '''ls -l1 /scratch/todelete/current |grep xxyz'''. In the example below, the name of this file indicates that user xxyz is part of group abc, has 9,560 files scheduled for deletion and they take up 1.0TB of space:&lt;br /&gt;
&lt;br /&gt;
 [xxyz@scinet04 ~]$ ls -l1 /scratch/todelete/current |grep xxyz&lt;br /&gt;
 -rw-r----- 1 xxyz     root       1733059 Jan 12 11:46 10001___xxyz_______abc_________1.00T_____9560files&lt;br /&gt;
&lt;br /&gt;
The file itself contains a list of all files scheduled for deletion (in the last column) and can be viewed with standard commands like more/less/cat - e.g. '''more /scratch/todelete/current/10001___xxyz_______abc_________1.00T_____9560files'''&lt;br /&gt;
&lt;br /&gt;
Similarly, you can also verify all other users on your group by using the ls command with grep on your group. For example: '''ls -l1 /scratch/todelete/current |grep abc'''. That will list all other users in the same group that xxyz is part of, and have files to be purged on the 15th. Members of the same group have access to each other's contents.&lt;br /&gt;
&lt;br /&gt;
'''NOTE:''' Preparing these assessments takes several hours. If you change the access/modification time of a file in the interim, that will not be detected until the next cycle. A way for you to get immediate feedback is to use the ''''ls -lu'''' command on the file. If the file atime has been updated, coming the purging date on the 15th it will not be deleted any longer.&lt;br /&gt;
&lt;br /&gt;
===Project Disk Space===&lt;br /&gt;
&lt;br /&gt;
Investigators who have been granted allocations through the [http://www.scinet.utoronto.ca/resources/Account_Allocations.htm LRAC/NRAC Application Process] may have been allocated disk space in addition to compute time.   For the period of time that the allocation is granted, they will have disk space on the &amp;lt;tt&amp;gt;/project&amp;lt;/tt&amp;gt; disk system.  Space on the project systems are not purged, but neither are they backed up.   All members of the investigators groups will have access to these systems, which will be mounted read/write everywhere.&lt;br /&gt;
&lt;br /&gt;
===How much Disk Space Do I have left?===&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;'''/scinet/gpc/bin/diskUsage [-a] [-de]'''&amp;lt;/tt&amp;gt; command, available on the login nodes, datamovers and the GPC devel nodes, reports how much disk space is being used by yourself and your group (with the -a option) on the home, scratch, and project file systems, and how much remains available. It also shows your quotas on the various filesystems. In addition you may get information on how much your usage has changed (&amp;quot;delta information&amp;quot;) over a certain period (with the -de option). &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: diskUsage [-a] [-de]&lt;br /&gt;
       -a: list usages of all members on the group&lt;br /&gt;
       -de: include delta information&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note that information on usage and quota is only updated hourly!&lt;br /&gt;
&lt;br /&gt;
===Performance===&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/IBM_General_Parallel_File_System GPFS] is a high-performance filesystem which provides rapid reads and writes to large datasets in parallel from many nodes.  As a consequence of this design, however, '''the file system performs quite ''poorly'' at accessing data sets which consist of many, small files.'''  For instance, you will find that reading data in from one 4MB file is enormously faster than from 100 40KB files.   Such small files are also quite wasteful of space, as the blocksize for the filesystem is 4MB.   This is something you should keep in mind when planning your input/output strategy for runs on SciNet.&lt;br /&gt;
&lt;br /&gt;
For instance, if you run multi-process jobs, having each process write to a file of its own is not an scalable I/O solution. A directory gets locked by the first process accessing it, so all other processes have to wait for it. Not only has the code just become considerably less parallel, chances are the file system will have a time-out while waiting for your other processes, leading your program to crash mysteriously.&lt;br /&gt;
Consider using MPI-IO (part of the MPI-2 standard), which allows files to be opened simultaneously&lt;br /&gt;
by different processes, or using a dedicated process for I/O to which all other processes send their&lt;br /&gt;
data, and which subsequently writes this data to a single file.&lt;br /&gt;
&lt;br /&gt;
===Local Disk===&lt;br /&gt;
&lt;br /&gt;
The compute nodes on the GPC '''do not contain hard drives''' so there is no local disk available to use during your computation.  You can however use part of a compute nodes RAM like a local disk ('ramdisk') but this will reduce how much memory is available for your&lt;br /&gt;
program.  This can be accessed using &amp;lt;tt&amp;gt;/dev/shm/&amp;lt;/tt&amp;gt; and is currently set to 8GB.  Anything written&lt;br /&gt;
to this location that you want to keep must be copied back to the &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt; filesystem as &amp;lt;tt&amp;gt;/dev/shm&amp;lt;/tt&amp;gt; is wiped after each job and since it is in memory will not survive through a reboot of the node. More on ramdisk usage can be found [[User_Ramdisk | here]].&lt;br /&gt;
&lt;br /&gt;
Note that the absense of hard drives also means that the nodes cannot swap memory, so be sure that your computation fits within memory.&lt;br /&gt;
&lt;br /&gt;
==Data Transfer==&lt;br /&gt;
{{:Data_Transfer}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==File/Ownership Management (ACL)==&lt;br /&gt;
* By default, at SciNet, users within the same group have read permission to each other's files (not write)&lt;br /&gt;
* You may use access control list ('''ACL''') to allow your supervisor (or another user within your group) to manage files for you (i.e., create, move, rename, delete), while still retaining your access as the original owner of the files/directories.&lt;br /&gt;
* For example, to allow [supervisor] to manage files in /project/group/[owner], issue the following commands as the [owner] account from a shell:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ getfacl /project/group/[owner]&lt;br /&gt;
(to determine the current ACL attributes)&lt;br /&gt;
&lt;br /&gt;
$ setfacl -d -m user:[supervisor]:rwx /project/group/[owner]&lt;br /&gt;
(every *new* file/directory inside [owner] will inherit [supervisor] ownership by default from now on)&lt;br /&gt;
&lt;br /&gt;
$ setfacl -d -m user:[owner]:rwx /project/group/[owner]&lt;br /&gt;
(but will also inherit [owner] ownership, ie, ownership of both by default)&lt;br /&gt;
&lt;br /&gt;
$ setfacl -Rm user:[supervisor]:rwx /project/group/[owner]&lt;br /&gt;
(recursively modify all *existing* files/directories inside [owner] to also be rwx by [supervisor])&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
For more information on using &amp;lt;tt&amp;gt;getfacl&amp;lt;/tt&amp;gt; and &amp;lt;tt&amp;gt;setfacl&amp;lt;/tt&amp;gt;, see their man pages.&lt;br /&gt;
&lt;br /&gt;
If you need to set up permissions across groups, contact us (and the other group's supervisor!).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Hierarchical Storage Management (HSM)==&lt;br /&gt;
'''(a pilot project is starting in July/2010 with a select group of users)'''&lt;br /&gt;
&lt;br /&gt;
===Basic Concepts===&lt;br /&gt;
'''Hierarchical Storage Management (HSM)''' is a data storage technique which automatically moves data between high-cost and low-cost storage media. HSM systems exist because high-speed storage devices, such as hard disk drive arrays, are more expensive (per byte stored) than slower devices, such as optical discs and magnetic tape drives. While it would be ideal to have all data available on high-speed devices all the time, this is prohibitively expensive for many organizations. Instead, HSM systems store the bulk of the enterprise's data on slower devices, and then copy data to faster disk drives when needed. In effect, HSM turns the fast disk drives into caches for the slower mass storage devices. The HSM system monitors the way data is used and makes best guesses as to which data can safely be moved to slower devices and which data should stay on the fast devices.&lt;br /&gt;
&lt;br /&gt;
In a typical HSM scenario, data files which are frequently used are stored on disk drives, but are eventually migrated to tape if they are not used for a certain period of time, typically a few months. If a user does reuse a file which is on tape, it is automatically moved back to disk storage. The advantage is that the total amount of stored data can be much larger than the capacity of the disk storage available, but since only rarely-used files are on tape, most users will not notice any slowdown.&lt;br /&gt;
&lt;br /&gt;
The HSM client provides both ''automatic'' and ''selective migration''. Once file ''migration'' begins, the HSM client sends a copy of your file to storage volumes on disk devices or devices that support removable media, such as tape and replaces the original file with a ''stub file'' on HSM managed file system (aka ''repository'' at SciNet)&lt;br /&gt;
&lt;br /&gt;
'''Repository''' commonly refers to a location for long-term storage, often for safety or preservation. &lt;br /&gt;
&lt;br /&gt;
'''Migration''', in the context of HSM, refers to set of actions that move files from the front-end disk based repository to a back-end tape library system (often invisible or inaccessible to users)&lt;br /&gt;
&lt;br /&gt;
'''Relocation''', in the context of SciNet, refers to the use of unix commands such as copy, move, tar or rsync to get data into the repository.&lt;br /&gt;
&lt;br /&gt;
'''The stub file''' is a small replacement file that makes it appear as though the original file is on the repository. It contains required metadata information to locate and recall a migrated file and to respond to specific UNIX commands without recalling the file.&lt;br /&gt;
&lt;br /&gt;
'''Automatic migration''' periodically monitors space usage and automatically migrates eligible files according to the options and settings that have been selected. The HSM client provides two types of automatic migration: ''threshold migration'' and ''demand migration''.&lt;br /&gt;
&lt;br /&gt;
'''Threshold migration''' maintains a specific level of free space on the repository file system. When disk usage reaches the high threshold percentage, eligible files are migrated to tapes automatically. When space usage drops to the low threshold set for the file system, file migration stops.&lt;br /&gt;
&lt;br /&gt;
'''Demand migration''' responds to an out-of-space condition on the repository file system. Demand migration starts automatically if the file system runs out of space (usually triggered at 90%). For HSM, as files are migrated (oldest/largest first), space becomes available on the file system, and the process or event that caused the out-of-space condition can be resumed.&lt;br /&gt;
&lt;br /&gt;
'''Selective migration''' often an user given HSM command, that migrates specific files from the repository at will, in anticipation of the automatic migration, or independently of the system wide eligibility criteria. For example, if you know that you will not be using a particular group of files for an extended time, you can migrate them, so as to free additional space on the repository.&lt;br /&gt;
&lt;br /&gt;
'''Reclamation''' is the process of reclaiming unused space on a tape (applies to Virtual Tapes as well). Over time, as files/directories get deleted or updated on the repository, a process will expire old data, creating gaps of unused storage on the tapes. Since tapes are sequential media, typical tape handling software can only write data to the end of the tape, so these gaps of “Empty Space” cannot be used. The process entails periodically and in a rolling fashion copying active data from the &amp;quot;Swiss Cheese&amp;quot; like tapes to unused tapes on a compacted form.&lt;br /&gt;
&lt;br /&gt;
'''Optimal environment:''' HSM should be used in an environment where the old and large files which need to be preserved are not used regularly. Files that are needed frequently should not be migrated at all, otherwise HSM would act as a cache, by migrating files and shortly after the migration the same files will be recalled. This is not advisable. The repository file system needs to be large enough to hold all regularly used files. If the file system is too small and cannot hold all regularly needed files (**), HSM is permanently recalling requested files, getting beyond the high-threshold limit, migrating other files to get below the low-threshold limit and so on.&lt;br /&gt;
&lt;br /&gt;
===Deployment at SciNet===&lt;br /&gt;
HSM is performed by a dedicated IBM software made up of a number of HSM daemons running on '''datamover2'''. These daemons constantly monitor the usage of the '''/repository''' GPFS and, depending on a predefined set of policies, data may be automatically or manually migrated to the Tivoli Storage Management server (TSM), and kept on our library of LTO-4 tapes.&lt;br /&gt;
&lt;br /&gt;
'''/repository is a 15TB &amp;quot;transient&amp;quot; location''' accessible only from datamover2. Users may relocate data as required from /scratch or /project to /repository in a number of ways, such as copy, move, tar or rsync. &amp;quot;Transient&amp;quot; refers to the fact that /repository works like a &amp;quot;Black Hole&amp;quot;: in the background '''it is constantly being emptied''', even while you relocate data in from other file systems. What is left behind is the directory tree with the stub files (0 byte in size at SciNet) and the metadata associated with it, which takes up about 1-2%. But, even if /repository is at 1% to start with, we ask that you please do not initiate a relocation of more than a 10TB chunk at once, so that the system has time to process your data and still allow other user(s) to migrate/recall some material before reaching 100% full. &lt;br /&gt;
&lt;br /&gt;
Inside /repository, data is segregated on a per group basis, just as in /project. Within groups, users and group supervisors can structure materials anyway they prefer. But the recommendation is that those involved spend some time designing that structure ahead of time, since you may merge data from project and/or scratch (or even home). In tests we performed, we've been able to reorganize the FS structure after migration, change the name and ownership of directories and stubs, and still recall files under the new path and ownership. HSM does seem to keep a very symbiotic relation between the metadata and the inode attributes at the file system level, without necessarily having to replicate these changes with tape recall &amp;amp; migration operations. But please don't abuse this flexibility. If possible, keep your initial layout structure somewhat fixed over time.&lt;br /&gt;
&lt;br /&gt;
We also recommend that users bundle files in tar-balls of at least 10GB before relocation, and keep a listing of those files somewhere; in fact you may use the 'tar' command to create the tar-ball directly in /repository on-the-fly. See examples below:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -czvf /repository/[group]/[user]/myproject1.tar.gz /project/[group]/[user]/project1/ &amp;gt; /project/[group]/[user]/myproject1-repository-listing.txt&lt;br /&gt;
&lt;br /&gt;
or&lt;br /&gt;
&lt;br /&gt;
tar -czvf /repository/[group]/[user]/myscratch1.tar.gz /scratch/[user]/scratchdata1/ &amp;gt; /home/[user]/myscratch1-repository-listing.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The important is to avoid the relocation of many thousands (or millions) of small files. It's very demanding on the system to constantly scan/reconcile all these files on the file system, tapes, metadata and database. A good reference is '''average file size &amp;gt; 100MB''' in /repository. Deep directory nesting in general also increases the time required to traverse a file system and thus should be avoided where possible.&lt;br /&gt;
&lt;br /&gt;
===Performance===&lt;br /&gt;
Unlike /project or /scratch, /repository is only a 2 tier disk raid, so don't expect transfer rates much higher than 60MB/sec on a rsync session for example. In another words, a 10TB offload operation will typically take 2 days to complete, if made up of large files. On the other hand, we have conducted experiments where we migrated only 1TB, but with 1 million files expanded, and that took nearly a day! This is a situation should to be avoided for many TeraBytes. Performance is as much a function of the number of files as the amount of data&lt;br /&gt;
&lt;br /&gt;
As for the &amp;quot;ideal tar-ball size&amp;quot;, experiments have shown that an isolated 10GB tar-ball typically takes 10-15 minutes to be pulled back, considering all tape operations involved. That seems like a reasonable amount of time to wait for a group of files kept off-line for an extended period of time. Also consider that pulling back an individual tiny file could still take as long as 5-8 minutes. So, it's pretty clear that you get the best pay for the buck by tar'ing your material, and you won't tie up the tape system for too long. As for the upper limit, you can probably bundle files in 100-500GB tar-balls, provided that you're OK with waiting a couple of hours for them to be recalled at a later date; at least from SciNet's perceptive, it would be a very efficient migration.&lt;br /&gt;
&lt;br /&gt;
'''Please be sure to contact us to schedule your transfers IN or OUT of the system, to avoid conflict with other users or within the system settings.''' For instance, if you recall large amounts of data at once, let's say 7.5TB (about half of /repository), we would have to adjust the high threshold accordingly for that period (to 50%), so we don't induce the never ending migrate/recall issues (**) described on the ''Optimal environment''.&lt;br /&gt;
&lt;br /&gt;
===How to migrate/recall data===&lt;br /&gt;
&lt;br /&gt;
====Automatic====&lt;br /&gt;
We currently setup /repository with '''High and Low thresholds of 2% and 1% respectively'''. That means, at regular intervals the file system is monitored to determine if the 2% usage mark has been reached or surpassed. In that case, data is automatically migrated to tapes, oldest (or largest) first, until the file system is down to 1%, if possible (metadata is not migrated). Since data may be copied/moved/rsync'ed/tar'ed in faster than /repository can be emptied, you may observe 80-90% disk usages sporadically (hence the 10TB chunk of data limit). For now at SciNet we migrate every file in /repository to tapes.&lt;br /&gt;
&lt;br /&gt;
To recall a file automatically all you have to do is '''access''' it. There are many ways you can do this. For example, you may view a file with 'cat', 'more', 'vi/vim', etc. You may also copy the file (or directory) from /repository to another location. '''Please be patient:''' the file will have to be pulled back from tape, and this will take some time, longer if it happens to be at the end of a tape.&lt;br /&gt;
&lt;br /&gt;
====Selective====&lt;br /&gt;
&lt;br /&gt;
Used to overwrite the internal priority of HSM (oldest/largest) or to migrate files/directories &amp;quot;immediately&amp;quot;. The recommendation is to '''not wait''' for the automatic migration cycle to kick in, since this could take some 6 to 12 hours at SciNet. If you already know that you relocated material to repository with the intention of having it migrated to tapes, you can just use dsmmigrate as soon as the rsync to repository has finished, for instance.&lt;br /&gt;
&lt;br /&gt;
(files won't be migrated until they have &amp;quot;aged&amp;quot; for at least 5 minutes, that is, after their last access/modification time)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
dsmmigrate [path to FILE]&lt;br /&gt;
dsmmigrate -R -D /repository/[group]/[user]/[directory]&lt;br /&gt;
dsmmigrate /repository/scinet/pinto/blahblahblah.tar.Z&lt;br /&gt;
or&lt;br /&gt;
cd /repository/scinet/pinto/&lt;br /&gt;
dsmmigrate blahblahblah.tar.Z&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To selectively recall data, just type:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
dsmrecall [path to FILE]&lt;br /&gt;
dsmrecall -R -D /repository/[group]/[user]/[directory]&lt;br /&gt;
dsmrecall /repository/scinet/pinto/blahblahblah.tar.Z&lt;br /&gt;
or&lt;br /&gt;
cd /repository/scinet/pinto/&lt;br /&gt;
dsmrecall blahblahblah.tar.Z&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Note:''' We've been finding that the search for new candidates for automatic migration takes much longer once repository is already full of files/stubs. That is to be expected, hence the recommendation to not wait and proceed with the selective migration of your own files/directories asap.&lt;br /&gt;
&lt;br /&gt;
===Disaster Recovery===&lt;br /&gt;
&lt;br /&gt;
As with any disk based storage, although it's a raid 5 file system, repository is not immune to failures. We do not do regular backups, but it's possible to do a full recovery in case of catastrophic loss of repository. '''For that it's important that all files have been completely migrated to tapes''' before hand. That puts the onus on users to ensure this migration is indeed finished (with selective migration) for the relocated material before they delete the originals from /project or /scratch.&lt;br /&gt;
&lt;br /&gt;
===Common HSM commands===&lt;br /&gt;
Some traditional unix/linux commands, such as 'ls' or 'rm' for instance, will work with the stub file as the real files. But others, such as 'du' or 'df', you better use a HSM equivalent, which will give you more meaningful information in the context of HSM. They only work inside /repository. Some of them will be executable only by root, such as 'dsmrm', in which case you'll be notified.&lt;br /&gt;
&lt;br /&gt;
===='''dsmls'''====&lt;br /&gt;
to check status of files; used in the directory where you expect to have migrated files&lt;br /&gt;
&lt;br /&gt;
'''r''': ''resident''    (the file is on repository only)&lt;br /&gt;
&lt;br /&gt;
'''m''': ''migrated''    (only the stub of the file is on repository)&lt;br /&gt;
&lt;br /&gt;
'''p''': ''premigrated'' (the file is on repository and on tape)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: dsmls [-Noheader] [-Recursive] [-Help] [file specs|-FIlelist=file]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc-logindm02-$ dsmls -R a3&lt;br /&gt;
IBM Tivoli Storage Manager&lt;br /&gt;
Command Line Space Management Client Interface&lt;br /&gt;
  Client Version 6, Release 1, Level 0.0  &lt;br /&gt;
  Client date/time: 07/27/2010 12:06:36&lt;br /&gt;
(c) Copyright by IBM Corporation and other(s) 1990, 2009. All Rights Reserved.&lt;br /&gt;
&lt;br /&gt;
      Actual     Resident     Resident  File   File&lt;br /&gt;
        Size         Size     Blk (KB)  State  Name&lt;br /&gt;
       &amp;lt;dir&amp;gt;         8192            8   -      a3/&lt;br /&gt;
&lt;br /&gt;
/repository/scinet/pinto/a3:&lt;br /&gt;
 34008432640            0            0   m      32G-1&lt;br /&gt;
 34008432640  34008432640            0   r      32G-2&lt;br /&gt;
 34008432640  34008432640            0   p      32G-3&lt;br /&gt;
           0            0            0   r      dsmerror.log&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===='''dsmdu'''==== &lt;br /&gt;
disk usage on the original files/directory&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: dsmdu [-Allfiles] [-Summary] [-Help] [directory names]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===='''dsmdf'''====&lt;br /&gt;
disk free on the HSM file system.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: dsmdf [-Help] [-Detail] [file systems]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===='''dsmmigrate'''====&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: dsmmigrate [-Recursive] [-Premigrate] [-Detail] [-Help] filespecs|-FIlelist=file &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===='''dsmrecall'''====&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: dsmrecall [-Recursive] [-Detail] [-Help] file specs|-FIlelist=file&lt;br /&gt;
   or  dsmrecall [-Detail] -offset=XXXX[kmgKMG] -size=XXXX[kmgKMG] file specs &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To have an idea of what HSM is doing on datamover2 at a given time:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[pinto@gpc-logindm02 ~]$ ps -def | grep dsm | grep -v mmfs&lt;br /&gt;
&lt;br /&gt;
root      2455 15190  0 16:26 ?        00:00:00 dsmmonitord&lt;br /&gt;
root      2456  2455  2 16:26 ?        00:05:38 dsmautomig -2 system::/repository&lt;br /&gt;
pinto    10997 10637 30 16:40 pts/3    01:14:20 dsmmigrate -R -D pinto&lt;br /&gt;
root     12857     1  0 16:15 ?        00:00:00 dsmrecalld&lt;br /&gt;
root     13013 12857  0 16:15 ?        00:00:01 dsmrecalld&lt;br /&gt;
root     13015 12857  0 16:15 ?        00:00:00 dsmrecalld&lt;br /&gt;
root     15190     1  0 16:15 ?        00:00:00 dsmmonitord&lt;br /&gt;
root     16936     1  3 16:15 ?        00:10:44 dsmscoutd&lt;br /&gt;
root     17217     1 13 16:16 ?        00:36:49 dsmrootd&lt;br /&gt;
root     18732  2456  4 17:51 ?        00:07:19 dsmautomig -2 system::/repository&lt;br /&gt;
root     18737  2456  0 17:51 ?        00:00:26 dsmautomig -2 system::/repository&lt;br /&gt;
pinto    24533 10363  0 20:48 pts/2    00:00:00 grep dsm&lt;br /&gt;
root     25090     1  0 06:42 ?        00:00:08 dsmwatchd nodetach&lt;br /&gt;
root     30840 13013  0 17:15 ?        00:00:02 dsmrecalld&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the above example, dsmmonitord, dsmrecalld, dsmscoutd, dsmrootd and dsmwatchd are the 5 typical HSM daemons, and they always running. In addition, there are 3 streams of dsmautomig (triggered by threshold migration) and 1 stream of dsmmigrate (selective migration initiated by user pinto).&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Software_and_Libraries&amp;diff=2046</id>
		<title>Software and Libraries</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Software_and_Libraries&amp;diff=2046"/>
		<updated>2010-09-30T14:40:13Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* GPC Software */  added link to HDF5 page&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Software Module System ==&lt;br /&gt;
All the software listed on this page is accessed using a modules system.  This means that much of the software is not &lt;br /&gt;
accessible by default but has to be loaded using the module command. The&lt;br /&gt;
reason is that&lt;br /&gt;
* it allows us to easily keep multiple versions of software for different users on the system;&lt;br /&gt;
* it allows users to easily switch between versions.&lt;br /&gt;
The module system works similarly on the GPC and the TCS, although different modules are installed on these two systems.&lt;br /&gt;
&lt;br /&gt;
Note that, generally, if you compile a program with a module loaded, you will have to run it with that same module loaded, to make dynamically linked libraries accessible.&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
!{{Hl2}}|Function&lt;br /&gt;
!{{Hl2}}|Command&lt;br /&gt;
!{{Hl2}}|Comments&lt;br /&gt;
|-&lt;br /&gt;
|List available software packages:&lt;br /&gt;
|&amp;lt;pre&amp;gt;$ module avail&amp;lt;/pre&amp;gt;&lt;br /&gt;
|&lt;br /&gt;
*If a module is not listed here, it is not supported.&lt;br /&gt;
*The flag &amp;quot;(default)&amp;quot; is never part of the name.&lt;br /&gt;
|-&lt;br /&gt;
|Use particular software:&lt;br /&gt;
|&amp;lt;pre&amp;gt; $ module load [module-name] &amp;lt;/pre&amp;gt;&lt;br /&gt;
|&lt;br /&gt;
*If possible, specify only the short name (the part before the &amp;quot;/&amp;quot;). &lt;br /&gt;
*When ambiguous, this loads the default one. &lt;br /&gt;
|-&lt;br /&gt;
|List available versions of a specific software package:&lt;br /&gt;
|&amp;lt;pre&amp;gt;$ module avail [short-module-name]&amp;lt;/pre&amp;gt;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|List currently loaded modules:&lt;br /&gt;
|&amp;lt;pre&amp;gt;$ module list&amp;lt;/pre&amp;gt;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|Get description of a particular module:&lt;br /&gt;
|&amp;lt;pre&amp;gt;$ module help [module-name]&amp;lt;/pre&amp;gt;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|Remove a module from your shell:&lt;br /&gt;
|&amp;lt;pre&amp;gt;$ module unload [module-name]&amp;lt;/pre&amp;gt;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|Remove all modules:&lt;br /&gt;
|&amp;lt;pre&amp;gt;$ module purge&amp;lt;/pre&amp;gt;&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|Replace one loaded module with another:&lt;br /&gt;
|&amp;lt;pre&amp;gt;$ module switch [old-module-name] [new-module-name]&amp;lt;/pre&amp;gt;&lt;br /&gt;
|&lt;br /&gt;
|}&lt;br /&gt;
Modules that load libraries, define environment variables pointing to the location of library files and include files for use Makefiles. These environment variables follow the naming convention&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
SCINET_[short-module-name]_BASE&lt;br /&gt;
SCINET_[short-module-name]_LIB&lt;br /&gt;
SCINET_[short-module-name]_INC&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
for the base location of the module's files, the location of the libraries binaries and the header files, respectively.&lt;br /&gt;
&lt;br /&gt;
So to compile and link the library, you will have to add &amp;lt;tt&amp;gt;-I${SCINET_[short-module-name]_INC}&amp;lt;/tt&amp;gt; and &amp;lt;tt&amp;gt;-L${SCINET_[short-module-name]_LIB}&amp;lt;/tt&amp;gt;, respectively, in addition to the usual &amp;lt;tt&amp;gt;-l[libname]&amp;lt;/tt&amp;gt;.  &lt;br /&gt;
&lt;br /&gt;
Errors in loaded modules can arise for a few reasons, for instance:&lt;br /&gt;
* A module by that name may not exist.&lt;br /&gt;
* Some modules require other modules to have been loaded; it this requirement is not met when you try to load that module, an error message will be printed explaining what module is needed.&lt;br /&gt;
* Some modules cannot be loaded together: an error message will be printed explaining which modules conflict.&lt;br /&gt;
&lt;br /&gt;
It is recommended to load frequently used modules in the file [[Important_.bashrc_guidelines|.bashrc]] in your home directory.&lt;br /&gt;
&lt;br /&gt;
== Default and non-default modules ==&lt;br /&gt;
&lt;br /&gt;
When you load a module with its 'short' name, you will get the ''default'' version, which is the most recent (usually), recommended version of that library or piece of software.  In general, using the short module name is the way to go. However, you may have code that depends on the intricacies of a non-default version.  For that reason, the most common older versions are also available as modules.  You can find all available modules using the &amp;lt;tt&amp;gt;module avail&amp;lt;/tt&amp;gt; command.&lt;br /&gt;
&lt;br /&gt;
== Deprecated modules ==&lt;br /&gt;
&lt;br /&gt;
Some older software for which newer versions exist, get deprecated. In contrast to the non-default modules, the deprecated modules are not found by the &amp;lt;tt&amp;gt;module avail&amp;lt;/tt&amp;gt; command.  If you have a piece of legacy code that really depends on a deprecated version of a library (and we urge you to check that it does not work with newer versions!), then you can load a deprecated version by &amp;lt;pre&amp;gt;module load use.deprecated [deprecated-module-name]&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Currently, the following modules are deprecated: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gcc/gcc-4.3.2          hdf5/184-v16-serial     intel/intel-v11.1.046               openmpi/1.3.3-intel-v11.0-ofed&lt;br /&gt;
hdf5/183-v16-openmpi   hdf5/184-v18-intelmpi   intelmpi/impi-3.2.1.009             openmpi/1.3.2-intel-v11.0-ofed.orig&lt;br /&gt;
hdf5/183-v18-openmpi   hdf5/184-v18-openmpi    intelmpi/impi-3.2.2.006             pgplot/5.2.2-gcc.old            &lt;br /&gt;
hdf5/184-v16-intelmpi  hdf5/184-v18-serial     intelmpi/impi-4.0.0.013             pgplot/5.2.2-intel.old&lt;br /&gt;
hdf5/184-v16-openmpi   intel/intel-v11.0.081   intelmpi/impi-4.0.0.025               &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Commercial software ==&lt;br /&gt;
&lt;br /&gt;
Apart from the compilers on our systems, we generally do not provide licensed application software, e.g., no Gaussian, IDL, Matlab, etc. &lt;br /&gt;
See the [[FAQ#How_can_I_run_Matlab_.2F_IDL_.2F_Gaussian_.2F_my_favourite_commercial_software_on_SciNet.3F | FAQ]].&lt;br /&gt;
&lt;br /&gt;
== Other software and libraries ==&lt;br /&gt;
&lt;br /&gt;
If you want to use a piece of software or a library that is not on the list, you can in principle install it yourself in you &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; directory.&lt;br /&gt;
Note however that building libraries and software from source often uses a lot of files. To avoid running out of disk space, building software is therefore best done from the &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt;, from which&lt;br /&gt;
you can copy/install only the libraries, header files and binaries to your &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; directory.&lt;br /&gt;
&lt;br /&gt;
If you suspect that a particular piece of software or a library would be of use to other users of SciNet as well, contact us, and we will consider adding it to the system.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== GPC Software ==&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
!{{Hl2}}| Software  &lt;br /&gt;
!{{Hl2}}| Version&lt;br /&gt;
!{{Hl2}}| Comments&lt;br /&gt;
!{{Hl2}}| Command/Library&lt;br /&gt;
!{{Hl2}}| Module Name&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Compilers'''''&lt;br /&gt;
|-  &lt;br /&gt;
|Intel Compiler&lt;br /&gt;
|11.1,update&amp;amp;nbsp;6*&lt;br /&gt;
| includes MKL library&lt;br /&gt;
| &amp;lt;tt&amp;gt;icpc,icc,ifort&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;intel&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| GCC Compiler&lt;br /&gt;
| 4.4.0&lt;br /&gt;
|&lt;br /&gt;
| &amp;lt;tt&amp;gt;gcc,g++,gfortran&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gcc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| IntelMPI&lt;br /&gt;
| 4.0.0&lt;br /&gt;
|&lt;br /&gt;
| &amp;lt;tt&amp;gt;mpicc,mpiCC,mpif77,mpif90&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;intelmpi&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| OpenMPI&lt;br /&gt;
| 1.4.1*&lt;br /&gt;
|&lt;br /&gt;
| &amp;lt;tt&amp;gt;mpicc,mpiCC,mpif77,mpif90&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;openmpi&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Editors'''''&lt;br /&gt;
|- &lt;br /&gt;
| Nano&lt;br /&gt;
| 2.2.4&lt;br /&gt;
| Nano's another editor&lt;br /&gt;
| &amp;lt;tt&amp;gt;nano&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;editors/nano/2.2.4&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| Emacs&lt;br /&gt;
| 23.1&lt;br /&gt;
| New version of popular text editor&lt;br /&gt;
| &amp;lt;tt&amp;gt;emacs&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;emacs&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| XEmacs&lt;br /&gt;
| 21.4.22&lt;br /&gt;
| XEmacs editor&lt;br /&gt;
| &amp;lt;tt&amp;gt;xemacs&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;xemacs&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Development tools'''''&lt;br /&gt;
|- &lt;br /&gt;
| Autoconf&lt;br /&gt;
| 2.64&lt;br /&gt;
| system to automatically configure software source code package&lt;br /&gt;
| &amp;lt;tt&amp;gt;autoconf, ...&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;autoconf&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| CMake&lt;br /&gt;
| 2.8.0&lt;br /&gt;
| cross-platform, open-source build system&lt;br /&gt;
| &amp;lt;tt&amp;gt;cmake&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cmake&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| Git&lt;br /&gt;
| 1.6.3&lt;br /&gt;
| Revision control system&lt;br /&gt;
| &amp;lt;tt&amp;gt;git,gitk&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;git&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Scons&lt;br /&gt;
| 1.3.0&lt;br /&gt;
| Software construction tool&lt;br /&gt;
| &amp;lt;tt&amp;gt;scons&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;scons&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Subversion&lt;br /&gt;
| 2.6.5&lt;br /&gt;
| Version control system&lt;br /&gt;
| &amp;lt;tt&amp;gt;svn&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;svn&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Debug and performance tools'''''&lt;br /&gt;
|- &lt;br /&gt;
| DDD&lt;br /&gt;
| 3.3.12&lt;br /&gt;
| Data Display Debugger&lt;br /&gt;
| &amp;lt;tt&amp;gt;ddd&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;debuggers/ddd-3.3.12&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| GDB&lt;br /&gt;
| 7.1&lt;br /&gt;
| GNU debugger (the intel idbc debugger is available by default)&lt;br /&gt;
| &amp;lt;tt&amp;gt;gdb&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;debuggers/gdb-7.1&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| MPE2&lt;br /&gt;
| 2.4.5&lt;br /&gt;
| Multi-Processing Environment with intel + OpenMPI&lt;br /&gt;
| &amp;lt;tt&amp;gt;mpecc, mpefc, jumpshot&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;mpe&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [[Introduction_To_Performance#OpenSpeedShop_.28profiling.2C_MPI_tracing:_GPC.29 | OpenSpeedShop]]&lt;br /&gt;
| 1.9.3.4*&lt;br /&gt;
| sampling and MPI tracing&lt;br /&gt;
| &amp;lt;tt&amp;gt;openss, ...&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;openspeedshop&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [[Introduction_To_Performance#Scalasca_.28profiling.2C_tracing:_TCS.2C_GPC.29 | Scalasca]]&lt;br /&gt;
| 1.2&lt;br /&gt;
| SCalable performance Analysis of LArge SCale Applications (Compiled with OpenMPI)&lt;br /&gt;
| &amp;lt;tt&amp;gt;scalasca&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;scalasca&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [[Performance_And_Debugging_Tools:_GPC#Valgrind | Valgrind]]&lt;br /&gt;
| 3.5.0*&lt;br /&gt;
| Memory checking utility&lt;br /&gt;
| &amp;lt;tt&amp;gt;valgrind,cachegrind&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;valgrind&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Visualization tools'''''&lt;br /&gt;
|- &lt;br /&gt;
| Grace&lt;br /&gt;
| 5.22.1&lt;br /&gt;
| Plotting utility&lt;br /&gt;
| &amp;lt;tt&amp;gt;xmgrace&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;graphics&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| Gnuplot&lt;br /&gt;
| 4.2.6&lt;br /&gt;
| Plotting utility&lt;br /&gt;
| &amp;lt;tt&amp;gt;gnuplot&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;graphics&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| VMD&lt;br /&gt;
| 1.8.6&lt;br /&gt;
| Visualization and analysis utility&lt;br /&gt;
| &amp;lt;tt&amp;gt;vmd&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;vmd&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| Ferret&lt;br /&gt;
| 6.4&lt;br /&gt;
| Plotting utility&lt;br /&gt;
| &amp;lt;tt&amp;gt;ferret&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;ferret&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| NCL/NCARG&lt;br /&gt;
| 5.1.1&lt;br /&gt;
| NCARG graphics and ncl utilities&lt;br /&gt;
| &amp;lt;tt&amp;gt;ncl&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;ncl&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| ROOT&lt;br /&gt;
| 5.26.00&lt;br /&gt;
| ROOT Analysis Framework from CERN&lt;br /&gt;
| &amp;lt;tt&amp;gt;root&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;ROOT&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [[ Using_Paraview | ParaView ]]&lt;br /&gt;
| 3.8.0&lt;br /&gt;
| Scientific visualization, server only&lt;br /&gt;
| &amp;lt;tt&amp;gt;pvserver,pvbatch,pvpython&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;visualization&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| PGPLOT&lt;br /&gt;
| 5.2.2*&lt;br /&gt;
| Graphics subroutine library&lt;br /&gt;
| &amp;lt;tt&amp;gt;libcpgplot,libpgplot,libtkpgplot&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;pgplot&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Storage tools and libraries'''''&lt;br /&gt;
|- &lt;br /&gt;
| NetCDF&lt;br /&gt;
| 4.0.1*&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;ncdump,ncgen,libnetcdf&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;netcdf&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| Parallel netCDF&lt;br /&gt;
| 1.1.1&lt;br /&gt;
| Scientific data storage and retrieval using MPI-IO&lt;br /&gt;
| &amp;lt;tt&amp;gt;libpnetcdf.a&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;parallel-netcdf&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| Ncview&lt;br /&gt;
| 1.93g&lt;br /&gt;
| Visualization for NetCDF files&lt;br /&gt;
| &amp;lt;tt&amp;gt;ncview&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;graphics/ncview&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| NCO&lt;br /&gt;
| 3.9.9&lt;br /&gt;
| NCO utilities to manipulate netCDF files&lt;br /&gt;
| &amp;lt;tt&amp;gt;ncap, ncap2, ncatted, etc.&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;nco&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| UDUNITS&lt;br /&gt;
| 2.1.11&lt;br /&gt;
| unit conversion utilities&lt;br /&gt;
| &amp;lt;tt&amp;gt;libudunits2&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;udunits&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| HDF4&lt;br /&gt;
| 4.2r4*&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;h4fc,hdiff,...,libdf,libsz&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hdf4&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [[Hdf5 | HDF5]]&lt;br /&gt;
| 1.8.4-v18*&lt;br /&gt;
| Scientific data storage and retrieval, parallel I/O&lt;br /&gt;
| &amp;lt;tt&amp;gt;h5ls, h5diff, ..., libhdf5&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hdf5&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Applications'''''&lt;br /&gt;
|- &lt;br /&gt;
| [[gamess|GAMESS (US)]]&lt;br /&gt;
| January 12, 2009 R3&lt;br /&gt;
| General Atomic and Molecular Electronic Structure System&lt;br /&gt;
| &amp;lt;tt&amp;gt;rungms&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gamess&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [[nwchem|NWChem]]&lt;br /&gt;
| 5.1.1&lt;br /&gt;
| NWChem Quantum Chemistry&lt;br /&gt;
| &amp;lt;tt&amp;gt;nwchem&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;nwchem&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [[gromacs|GROMACS]]&lt;br /&gt;
| 4.5.1&lt;br /&gt;
| GROMACS molecular mechanics, single precision, MPI&lt;br /&gt;
| &amp;lt;tt&amp;gt;grompp, mdrun&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gromacs&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [[cpmd|CPMD]]&lt;br /&gt;
| 3.13.2&lt;br /&gt;
| Carr-Parinello molecular dynamics, MPI&lt;br /&gt;
| &amp;lt;tt&amp;gt;cpmd.x&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cpmd&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| [http://blast.ncbi.nlm.nih.gov BLAST]&lt;br /&gt;
| 2.2.23+&lt;br /&gt;
| Basic Local Alignment Search Tool&lt;br /&gt;
| &amp;lt;tt&amp;gt;blastn,blastp,blastx,psiblast,tblastn...&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;blast&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [[amber|AMBER 10]]&lt;br /&gt;
| Amber 10 + Amber Tools 1.3&lt;br /&gt;
| Amber Molecular Dynamics Package&lt;br /&gt;
| &amp;lt;tt&amp;gt;sander, sander.MPI&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;amber10&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [http://www.gdal.org/ GDAL]&lt;br /&gt;
| 1.7.1&lt;br /&gt;
| Geospatial Data Abstraction Library&lt;br /&gt;
| &amp;lt;tt&amp;gt;gdal_contour,gdal_rasterize,gdal_grid, libgdal&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gdal&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| [http://ab-initio.mit.edu/wiki/index.php/Meep MEEP ]&lt;br /&gt;
| 1.1.1*&lt;br /&gt;
| MIT Electromagnetic Equation Propagation&lt;br /&gt;
| &amp;lt;tt&amp;gt;meep, meep-mpi&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;meep/1.1.1-serial&amp;lt;br&amp;gt;meep/1.1.1-intelmpi&amp;lt;br&amp;gt; meep/1.1.1-openmpi&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| MPB&lt;br /&gt;
| 1.4.2&lt;br /&gt;
| MIT Photonic Bands &lt;br /&gt;
| &amp;lt;tt&amp;gt;mpb, mpb-data, mpb-split&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;mpb&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Libraries'''''&lt;br /&gt;
|- &lt;br /&gt;
| [http://www.mcs.anl.gov/petsc/petsc-as/  PETSc ]&lt;br /&gt;
| 3.0.0*&lt;br /&gt;
| Portable, Extensible Toolkit for Scientific Computation (PETSc)&lt;br /&gt;
| &amp;lt;tt&amp;gt;libpetsc, etc.. &amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;petsc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| BOOST&lt;br /&gt;
| 1.40&lt;br /&gt;
| C++ Boost libraries&lt;br /&gt;
| &amp;lt;tt&amp;gt;libboost...&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;cxxlibraries/boost&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| GSL&lt;br /&gt;
| 1.13&lt;br /&gt;
| GNU Scientific Library&lt;br /&gt;
| &amp;lt;tt&amp;gt;libgsl, libgslcblas&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gsl&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| FFTW&lt;br /&gt;
| 3.2.2*&lt;br /&gt;
| fast Fourier transform library&lt;br /&gt;
''Be careful in combining fftw3 and MKL: you need to link fftw3 first, with'' &amp;lt;tt&amp;gt;-L${SCINET_FFTW_LIB} -lfftw3&amp;lt;/tt&amp;gt;, then link MKL&lt;br /&gt;
| &amp;lt;tt&amp;gt;libfftw3&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;fftw&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| LAPACK&lt;br /&gt;
| &lt;br /&gt;
| Provided by the Intel MKL library&lt;br /&gt;
| See http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/&lt;br /&gt;
| &amp;lt;tt&amp;gt;intel&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| extras&lt;br /&gt;
|  &lt;br /&gt;
| Full set of X11 libraries and others not installed on compute nodes&lt;br /&gt;
| &amp;lt;tt&amp;gt;bc, dmidecode, gv, iostat, lsof, tkdiff, zip, libXaw,...,libjpeg&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;extras&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Programming/scripting languages'''''&lt;br /&gt;
|- &lt;br /&gt;
| Guile + ctl&lt;br /&gt;
| 1.8.7 + 3.1&lt;br /&gt;
| guile + libctl scheme interpreter&lt;br /&gt;
| &amp;lt;tt&amp;gt;libguile, libctl&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;guile&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Java&lt;br /&gt;
| 1.6.0&lt;br /&gt;
| IBM's Java JRE ad SDK&lt;br /&gt;
| &amp;lt;tt&amp;gt;java&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;javac&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;java&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| Python&lt;br /&gt;
| 2.6.2&lt;br /&gt;
| Python programming language&lt;br /&gt;
| &amp;lt;tt&amp;gt;python&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;python&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- &lt;br /&gt;
| Ruby&lt;br /&gt;
| 1.9.1&lt;br /&gt;
| Ruby programming language&lt;br /&gt;
| &amp;lt;tt&amp;gt;ruby&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;ruby&amp;lt;/tt&amp;gt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
''* Several versions of this module are installed; listed is the default version.''&lt;br /&gt;
&lt;br /&gt;
== TCS Software ==&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
!{{Hl2}} |Software  &lt;br /&gt;
!{{Hl2}}| Version&lt;br /&gt;
!{{Hl2}}| Comments&lt;br /&gt;
!{{Hl2}}| Command/Library&lt;br /&gt;
!{{Hl2}}| Module Name&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Compilers'''''&lt;br /&gt;
|-&lt;br /&gt;
|IBM compilers&lt;br /&gt;
|10.1(c/c++)&amp;lt;br&amp;gt;12.1(fortran)&lt;br /&gt;
| See [[TCS Quickstart]]&lt;br /&gt;
| &amp;lt;tt&amp;gt;xlc,xlC,xlf,xlc_r,xlC_r,xlf_r&amp;lt;/tt&amp;gt;&lt;br /&gt;
| ''standard available''&lt;br /&gt;
|-&lt;br /&gt;
|IBM MPI library&lt;br /&gt;
|&lt;br /&gt;
| See [[TCS Quickstart]]&lt;br /&gt;
| &amp;lt;tt&amp;gt;mpcc,mpCC,mpxlf,mpcc_r,mpCC_r,mpxlf_r&amp;lt;/tt&amp;gt;&lt;br /&gt;
| ''standard available''&lt;br /&gt;
|-&lt;br /&gt;
| UPC&lt;br /&gt;
| 1.2&lt;br /&gt;
| Unified Parallel C&lt;br /&gt;
| &amp;lt;tt&amp;gt;xlupc&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;upc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Debug/performancs tools'''''&lt;br /&gt;
|-&lt;br /&gt;
| MPE2&lt;br /&gt;
| 1.0.6&lt;br /&gt;
| Performance Visualization for Parallel Programs   &lt;br /&gt;
| &amp;lt;tt&amp;gt;libmpe&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;mpe&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Scalasca&lt;br /&gt;
| 1.2&lt;br /&gt;
| SCalable performance Analysis of LArge SCale Applications&lt;br /&gt;
| &amp;lt;tt&amp;gt;scalasca, ...&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;scalasca&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Storage tools and libraries'''''&lt;br /&gt;
|-&lt;br /&gt;
| HDF4&lt;br /&gt;
| 4.2.5&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;h4fc, hdiff, ..., libdf, libsz&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hdf4&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| HDF5&lt;br /&gt;
|&lt;br /&gt;
| Scientific data storage and retrieval, parallel I/O&amp;lt;br&amp;gt;Part of the extras module on the tcs:&amp;lt;br&amp;gt;compile with &amp;lt;tt&amp;gt;-I${SCINET_EXTRAS_INC}&amp;lt;/tt&amp;gt;&amp;lt;br&amp;gt; link with &amp;lt;tt&amp;gt;-L${SCINET_EXTRAS_LIB}&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;libhdf5&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;extras&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| NetCDF + ncview&lt;br /&gt;
| 4.0.1*&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;ncdump, ncgen, libnetcdf, ncview&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;netcdf&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| parallel netCDF&lt;br /&gt;
| 1.1.1*&lt;br /&gt;
| Scientific data storage and retrieval using MPI-IO&lt;br /&gt;
| &amp;lt;tt&amp;gt;libpnetcdf.a&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;parallel-netcdf&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| NCO&lt;br /&gt;
| 3.9.6*&lt;br /&gt;
| NCO utilities to manipulate netCDF files&lt;br /&gt;
| &amp;lt;tt&amp;gt;ncap, ncap2, ncatted, &amp;lt;/tt&amp;gt; etc.&lt;br /&gt;
| &amp;lt;tt&amp;gt;nco&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Libraries'''''&lt;br /&gt;
|-&lt;br /&gt;
| FFTW&lt;br /&gt;
| &lt;br /&gt;
| Fast Fourier transform library&amp;lt;br&amp;gt;Part of the extras module on the tcs:&amp;lt;br&amp;gt;compile with &amp;lt;tt&amp;gt;-I${SCINET_EXTRAS_INC}&amp;lt;/tt&amp;gt;&amp;lt;br&amp;gt; link with &amp;lt;tt&amp;gt;-L${SCINET_EXTRAS_LIB}&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;libfftw, libfftw_mpi,libfftw3&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;extras&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| GSL&lt;br /&gt;
| 1.13&lt;br /&gt;
| GNU Scientific Library&lt;br /&gt;
| &amp;lt;tt&amp;gt;libgsl, libgslcblas&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gsl&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| extras&lt;br /&gt;
|&lt;br /&gt;
| Adds paths to a fuller set of libraries to your user environment&amp;lt;br&amp;gt; compile with &amp;lt;tt&amp;gt;-I${SCINET_EXTRAS_INC}&amp;lt;/tt&amp;gt;&amp;lt;br&amp;gt; link with &amp;lt;tt&amp;gt;-L${SCINET_EXTRAS_LIB}&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;libfftw, libfftw_mpi, libfftw3, libhdf5, liblapack, ...&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;extras&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|colspan=5 style='background: #E0E0E0'|'''''Other'''''&lt;br /&gt;
|-&lt;br /&gt;
| antlr&lt;br /&gt;
| 2.7.7&lt;br /&gt;
| ANother Tool for Language Recognition&lt;br /&gt;
| &amp;lt;tt&amp;gt;antlr, antlr-config&amp;lt;br&amp;gt;libantlr, antlr.jar, antlr.py&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;antlr&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| NCL&lt;br /&gt;
| 5.1.1&lt;br /&gt;
| NCAR Command Language&lt;br /&gt;
| &amp;lt;tt&amp;gt;ncl, libncl, ...&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;ncl&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
''* Several versions of this module are installed; listed is the default version.''&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Hdf5&amp;diff=2045</id>
		<title>Hdf5</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Hdf5&amp;diff=2045"/>
		<updated>2010-09-30T14:37:02Z</updated>

		<summary type="html">&lt;p&gt;Cneale: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;To compile a serial program that uses HDF5, use module to set your paths correctly, then link the libraries at compile time. In this example, the imaginary test.c uses both the base HDF5 libraries and the newer high-level routines (i.e. it has #include &amp;quot;hdf5.h&amp;quot; and #include &amp;quot;hdf5_hl.h&amp;quot;), so needs libhdf5 and libhdf5_hl:&lt;br /&gt;
&lt;br /&gt;
  module purge&lt;br /&gt;
  module load intel hdf5/184-p1-v18-serial&lt;br /&gt;
&lt;br /&gt;
  icc -o test test.c -lhdf5_hl -lhdf5 -limf&lt;br /&gt;
  #or, if you prefer to be explicit,&lt;br /&gt;
  icc -o test test.c -L${SCINET_HDF5_LIB} -lhdf5_hl -lhdf5 -limf&lt;br /&gt;
&lt;br /&gt;
And remember when you run the program that you must have loaded module hdf5/184-p1-v18-serial&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Hdf5&amp;diff=2044</id>
		<title>Hdf5</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Hdf5&amp;diff=2044"/>
		<updated>2010-09-30T14:36:23Z</updated>

		<summary type="html">&lt;p&gt;Cneale: beginning of HDF5 information page&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;To compile a serial program that uses HDF5, use module to set your paths correctly, then link the libraries at compile time. In this example, the imaginary test.c uses both the base HDF5 libraries and the newer high-level routines (i.e. it has #include &amp;quot;hdf5.h&amp;quot; and #include &amp;quot;hdf5_hl.h&amp;quot;), so needs libhdf5 and libhdf5_hl:&lt;br /&gt;
&lt;br /&gt;
  module purge&lt;br /&gt;
  module load intel hdf5/184-p1-v18-serial&lt;br /&gt;
&lt;br /&gt;
  icc -o test test.c -lhdf5_hl -lhdf5 -limf&lt;br /&gt;
&lt;br /&gt;
  #or, if you prefer to be explicit,&lt;br /&gt;
&lt;br /&gt;
  icc -o test test.c -L${SCINET_HDF5_LIB} -lhdf5_hl -lhdf5 -limf&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=SciNet_Users_Group_(SNUG)&amp;diff=1945</id>
		<title>SciNet Users Group (SNUG)</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=SciNet_Users_Group_(SNUG)&amp;diff=1945"/>
		<updated>2010-09-10T16:26:11Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* Suggestions */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Meetings==&lt;br /&gt;
The SciNet Users Group (SNUG) currently meets every month on the second Wednesday, and involve pizza, user discussion, feedback, and a half-hour talk on topics or technologies of interest to the SciNet community. &lt;br /&gt;
&lt;br /&gt;
For more information, and to sign up, please visit https://support.scinet.utoronto.ca/courses/&lt;br /&gt;
&lt;br /&gt;
====Upcoming topics for the half-hour prepared talk====&lt;br /&gt;
* October's SNUG will be on the 13th, and the TechTalk will be: &amp;quot;Version control on SciNet - svn, git, mercurial&amp;quot;.&lt;br /&gt;
* November's SNUG will be on the 10th, and the TechTalk will be: &amp;quot;Debuggers &amp;amp; parallel debugging on SciNet - gdb, ddd, padb&amp;quot;&lt;br /&gt;
* December's SNUG will be on the 8th, and the TechTalk will be: &amp;quot;Performance and profiling on SciNet - gprof, scalasca, peekperf&amp;quot;&lt;br /&gt;
&lt;br /&gt;
====Desired topics for the half-hour prepared talk====&lt;br /&gt;
* Add yours here!&lt;br /&gt;
&lt;br /&gt;
====Previous topics for the half-hour prepared talk====&lt;br /&gt;
* September's SNUG will be on the 8th, and the TechTalk will be: &amp;quot;The SciNet GPFS file systems and you&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
==Suggestions==&lt;br /&gt;
* Turning this suggestion list into something to which users can not only add ideas, but also vote on them.&lt;br /&gt;
* Some type of online communication device to foster communication between users that is more like a mailing list than this wiki&lt;br /&gt;
* Each SNUG begins with a very informal go-round in which each user mentions something they learned recently, something they did recently, some problem they had recently, etc.&lt;br /&gt;
* Create an audiotape of the meetings and run it through voice detection software (e.g. Dragon Naturally Speaking) and post the output to the wiki as minutes (no formatting or human editing, just whatever comes out of the software so that it doesn't end up being a huge chore).&lt;br /&gt;
* make this SNUG page require a login (or make some other change so that google won't index it)&lt;br /&gt;
* Add yours here!&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=SciNet_Users_Group_(SNUG)&amp;diff=1944</id>
		<title>SciNet Users Group (SNUG)</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=SciNet_Users_Group_(SNUG)&amp;diff=1944"/>
		<updated>2010-09-10T16:23:34Z</updated>

		<summary type="html">&lt;p&gt;Cneale: Added a SNUG page&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Meetings==&lt;br /&gt;
The SciNet Users Group (SNUG) currently meets every month on the second Wednesday, and involve pizza, user discussion, feedback, and a half-hour talk on topics or technologies of interest to the SciNet community. &lt;br /&gt;
&lt;br /&gt;
For more information, and to sign up, please visit https://support.scinet.utoronto.ca/courses/&lt;br /&gt;
&lt;br /&gt;
====Upcoming topics for the half-hour prepared talk====&lt;br /&gt;
* October's SNUG will be on the 13th, and the TechTalk will be: &amp;quot;Version control on SciNet - svn, git, mercurial&amp;quot;.&lt;br /&gt;
* November's SNUG will be on the 10th, and the TechTalk will be: &amp;quot;Debuggers &amp;amp; parallel debugging on SciNet - gdb, ddd, padb&amp;quot;&lt;br /&gt;
* December's SNUG will be on the 8th, and the TechTalk will be: &amp;quot;Performance and profiling on SciNet - gprof, scalasca, peekperf&amp;quot;&lt;br /&gt;
&lt;br /&gt;
====Desired topics for the half-hour prepared talk====&lt;br /&gt;
* Add yours here!&lt;br /&gt;
&lt;br /&gt;
====Previous topics for the half-hour prepared talk====&lt;br /&gt;
* September's SNUG will be on the 8th, and the TechTalk will be: &amp;quot;The SciNet GPFS file systems and you&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
==Suggestions==&lt;br /&gt;
* Turning this suggestion list into something to which users can not only add ideas, but also vote on them.&lt;br /&gt;
* Some type of online communication device to foster communication between users that is more like a mailing list than this wiki&lt;br /&gt;
* Each SNUG begins with a very informal go-round in which each user mentions something they learned recently, something they did recently, some problem they had recently, etc.&lt;br /&gt;
* Create an audiotape of the meetings and run it through voice detection software (e.g. Dragon Naturally Speaking) and post the output to the wiki as minutes (no formatting or human editing, just whatever comes out of the software so that it doesn't end up being a huge chore).&lt;br /&gt;
* Add yours here!&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=GPC_Quickstart&amp;diff=1807</id>
		<title>GPC Quickstart</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=GPC_Quickstart&amp;diff=1807"/>
		<updated>2010-08-09T16:13:33Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* Examples */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:University_of_Tor_79284gm-a.jpg|center|300px|thumb]]&lt;br /&gt;
|name=General Purpose Cluster (GPC)&lt;br /&gt;
|installed=June 2009&lt;br /&gt;
|operatingsystem= Linux&lt;br /&gt;
|loginnode= gpc01..gpc04 (from &amp;lt;tt&amp;gt;login.scinet&amp;lt;/tt&amp;gt;)&lt;br /&gt;
|numberofnodes=3780&lt;br /&gt;
|rampernode=16 Gb &lt;br /&gt;
|corespernode=8&lt;br /&gt;
|interconnect=1/4 on Infiniband, rest on GigE&lt;br /&gt;
|vendorcompilers=icc (C) ifort (fortran) icpc (C++)&lt;br /&gt;
|queuetype=[[Moab | Moab/Torque]]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
The General Purpose Cluster is an extremely large cluster (ranked [http://www.top500.org/list/2009/06/100 16th] in the world at its inception, and fastest in Canada) and is where most simulations are to be done at SciNet.  It is an IBM iDataPlex cluster based on Intel's Nehalem architecture (one of the [http://www.hpcwire.com/features/HPC-Vendors-Jump-On-Nehalem-42360237.html first in the world] to make use of the new chips). The GPC consists of 3,780 nodes with a total of 30,240  2.5GHz cores, with 16GB RAM per node (2GB per core). Approximately one quarter of the cluster is interconnected with non-blocking 4x-DDR InfiniBand while the rest of the nodes are connected with gigabit ethernet.  The compute nodes are accessed through a queuing system that allows jobs with a maximum wall time of 48 hours.&lt;br /&gt;
&lt;br /&gt;
===Login===&lt;br /&gt;
&lt;br /&gt;
First login via ssh with your scinet account at &amp;lt;tt&amp;gt;login.scinet.utoronto.ca&amp;lt;/tt&amp;gt;, and from there you can proceed to the Development nodes to compile/test your code.&lt;br /&gt;
&lt;br /&gt;
===Compile/Devel Nodes===&lt;br /&gt;
&lt;br /&gt;
From a scinet login node you can ssh to &amp;lt;tt&amp;gt;gpc01&amp;lt;/tt&amp;gt;..&amp;lt;tt&amp;gt;gpc04&amp;lt;/tt&amp;gt;.  These nodes have the same hardware configuration as most of the compute nodes -- 8 Nehalem processing cores with 16GB RAM and Gigabit ethernet.  You can compile and test your codes on these nodes. To interactively test on more than 8 processors, or to test your code over an InfiniBand connection, you can submit an [[GPC_Quickstart#Submitting_an_Interactive_Job | interactive job request]].&lt;br /&gt;
&lt;br /&gt;
Your [[Storage_Quickstart | home directory]] is in &amp;lt;tt&amp;gt;/home/USER&amp;lt;/tt&amp;gt;; you have 10GB there that is backed up. This directory cannot be written to by the compute nodes! Thus, to run jobs, you'll use the &amp;lt;tt&amp;gt;/scratch/USER&amp;lt;/tt&amp;gt; directory. Here, there is a large amount of disk space, but it is not backed up. Thus it makes sense to keep your codes in /home, compile there, and then run them in the /scratch directory.&lt;br /&gt;
&lt;br /&gt;
===Modules and Environment Variables===&lt;br /&gt;
&lt;br /&gt;
To use most packages on the SciNet machines - including any of the compilers - , you will have to use the `modules' command.  The command &amp;lt;tt&amp;gt;module load some-package&amp;lt;/tt&amp;gt; will set your environment variables (&amp;lt;tt&amp;gt;PATH&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;LD_LIBRARY_PATH&amp;lt;/tt&amp;gt;, etc) to include the default version of that package.   &amp;lt;tt&amp;gt;module load some-package/specific-version&amp;lt;/tt&amp;gt; will load a specific version of that package.  This makes it very easy for different users to use different versions of compilers, MPI versions, libraries etc.&lt;br /&gt;
&lt;br /&gt;
Note that to use even the gcc compilers you will have to do&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load gcc&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
but in fact you probably should use the intel compilers installed on this system as they usually produce faster code (and sometimes, much faster.)&lt;br /&gt;
&lt;br /&gt;
A list of the installed software is available in [[Software_and_Libraries | Software &amp;amp; Libraries]] and can &lt;br /&gt;
be seen on the system by typing &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module avail&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To load a module (for example, the default version of the intel compilers)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load intel&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
To unload a module&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module unload intel&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
To unload all modules&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module purge&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These commands should go in your .bashrc files and/or in your submission scripts to make sure you&lt;br /&gt;
are using the correct packages.&lt;br /&gt;
&lt;br /&gt;
===Compilers===&lt;br /&gt;
&lt;br /&gt;
The intel compilers are icc/icpc/ifort for C/C++/Fortran, and are available with the default module &amp;quot;intel&amp;quot;.  The intel compilers are recommended over the GNU compilers.  Documentation about icpc is available at &lt;br /&gt;
http://software.intel.com/en-us/articles/intel-software-technical-documentation/.  The Intel compilers accept many of the options that the GNU compilers accept, but tend to produce faster programs on our system.  If, for some reason, you really need the GNU compilers, the latest version of the GNU compiler collection (currently 4.4.0) is available by loading the &amp;quot;gcc&amp;quot; module, with gcc/g++/gfortran for C/C++/Fortran.   Note that f77/g77 is not supported. &lt;br /&gt;
&lt;br /&gt;
To ensure that the intel compilers are in your &amp;lt;tt&amp;gt;PATH&amp;lt;/tt&amp;gt; and their libraries are in your &amp;lt;tt&amp;gt;LD_LIBRARY_PATH&amp;lt;/tt&amp;gt;, use the command&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load intel&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This should likely go in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt; file so that it will automatically be loaded.&lt;br /&gt;
&lt;br /&gt;
Optimize your code for the GPC machine using of at least the following compiler flags: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   -O3 -xHost&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
(or &amp;lt;tt&amp;gt;-O3 -march=native&amp;lt;/tt&amp;gt; for the GNU compilers). &lt;br /&gt;
&lt;br /&gt;
*If your program uses openmp, add &amp;lt;tt&amp;gt;-openmp&amp;lt;/tt&amp;gt; (&amp;lt;tt&amp;gt;-fopenmp&amp;lt;/tt&amp;gt; for GNU compilers).&lt;br /&gt;
*If you get the warning &amp;lt;tt&amp;gt;feupdatreenv is not implemented&amp;lt;/tt&amp;gt;, add -limf to the link line.&lt;br /&gt;
*If you need to link in the MKL libraries, you are well advised to use the Intel(R) Math Kernel Library Link Line Advisor: http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/ for help in devising the list of libraries to link with your code.&lt;br /&gt;
&lt;br /&gt;
===[[ GPC_MPI_Versions | MPI]]===&lt;br /&gt;
&lt;br /&gt;
SciNet currently provides multiple MPI libraries for the GPC; [http://www.open-mpi.org/ OpenMPI], and [http://software.intel.com/en-us/intel-mpi-library/ IntelMPI].  We currently recommend OpenMPI as the default, as it quite reliably demonstrates good performance on both the infiniband and ethernet networks.  For full details and options see the complete [[ GPC_MPI_Versions | '''MPI''']] section.&lt;br /&gt;
&lt;br /&gt;
The MPI libraries are compiled with both the gnu compiler suite and the intel compiler suite.   To use (for instance) the intel-compiled OpenMPI libraries, which we recommend as the default (and use for most of our examples here), use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load openmpi intel&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
&lt;br /&gt;
in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt;.   Other combinations behave similarly.&lt;br /&gt;
&lt;br /&gt;
The MPI libraries define the wrappers mpicc/mpicxx/mpif90/mpif77 as wrappers around the appropriate compilers, which ensure the appropriate include and library directories and used in the compilation and linking steps.&lt;br /&gt;
&lt;br /&gt;
We currently recommend the Intel + OpenMPI combination.  However, if you require the GNU compilers as well as MPI, you would want to find the most recent openmpi module available with `gcc' in the version name.  This will enable development and runtime with gcc/g++/gfortran  and OpenMPI.  You can make this your default by putting the module load line in your ~/.bashrc file.&lt;br /&gt;
&lt;br /&gt;
For mixed OpenMP/MPI code using Intel MPI, add the compilation flag -mt_mpi for full thread-safety.&lt;br /&gt;
&lt;br /&gt;
===Submitting A Batch Job===&lt;br /&gt;
&lt;br /&gt;
The SciNet machines are shared systems, and jobs that are to run on them are submitted to a queue; the&lt;br /&gt;
[[Moab | scheduler]] then orders the jobs in order to make the best use of the machine, and has them launched &lt;br /&gt;
when resources become availble.   The intervention of the scheduler can mean that the jobs aren't&lt;br /&gt;
quite run in a  first-in first-out order.&lt;br /&gt;
&lt;br /&gt;
The maximum [[wallclock time]] for a job in the queue is 48 hours; computations that will take longer than&lt;br /&gt;
this must be broken into 48-hour chunks and run as several jobs.  The usual way to do this is with [[checkpoints]],&lt;br /&gt;
writing out the complete state of the computation every so often in such a way that a job can be restarted from&lt;br /&gt;
this state information and continue on from where it left off.  Generating [[checkpoints]] is a good idea anyway,&lt;br /&gt;
as in the unlikely event of a hardware failure during your run, it allows you to restart without having lost much work.&lt;br /&gt;
&lt;br /&gt;
There are limits to how many jobs you can submit.  If your group has a default account, up to 32 nodes at a time for 48 hours per job on the GPC cluster are allowed to be queued. This is a total limit, e.g., you could request 64 nodes for 24 hours.  Jobs of users with an LRAC or NRAC allocation will run at a higher priority than others while their resources last. Because of the group-based allocation, it is conceivable that your jobs won't run if your colleagues have already exhausted your group's limits.&lt;br /&gt;
&lt;br /&gt;
Note that scheduling big jobs greatly affects the queuer and other users, so you have to talk to us first to run massively parallel jobs (&amp;gt; 2048 cores). We will help make sure that your jobs start and run efficiently.&lt;br /&gt;
&lt;br /&gt;
If your job should run in fewer than  48 hours, specify that in your script -- your job &lt;br /&gt;
will start sooner.   (It's easier for the [[Moab | scheduler]] to fit in a short job than a long job).  On the downside, the&lt;br /&gt;
job will be killed automatically by the queue manager software at the end of the specified [[wallclock time]], so if you&lt;br /&gt;
guess wrong you might lose some work.  So the standard procedure is to estimate how long your job will take and&lt;br /&gt;
add 10% or so. &lt;br /&gt;
&lt;br /&gt;
You interact with the queuing system through the queue/resource manager, [[Moab | Moab]] and [[Moab | Torque]].  To see all the jobs in the queue use&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To submit your own job, you must write a script which describes the job and how it is to be run (a sample script [[GPC_Quickstart#Submission_Script | follows]]) and submit it to the queue, using the command&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub [SCRIPT-FILE-NAME]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where you will replace &amp;lt;tt&amp;gt;[SCRIPT-FILE-NAME]&amp;lt;/tt&amp;gt; with the file containing the submission script.   This will return a job ID, for example 31415, which is used to identify the jobs.  Information about a queued job can be found using&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ checkjob [JOB-ID]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and jobs can be canceled with the command&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ canceljob [JOB-ID]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Again, these commands have many options, which can be read about on their man pages.&lt;br /&gt;
&lt;br /&gt;
Much more information on the queueing system is available on our [[Moab | queue]] page.&lt;br /&gt;
&lt;br /&gt;
====Batch Submission Script: MPI====&lt;br /&gt;
&lt;br /&gt;
A sample submission script is shown below for an mpi job using ethernet with the &amp;lt;tt&amp;gt; #PBS &amp;lt;/tt&amp;gt; directives at the top and the rest being &lt;br /&gt;
what will be executed on the compute node.  &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for SciNet GPC (ethernet)&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=2:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N test&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND; -np = nodes*ppn&lt;br /&gt;
mpirun -np 16 -hostfile $PBS_NODEFILE ./a.out&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The lines that begin &amp;lt;tt&amp;gt;#PBS&amp;lt;/tt&amp;gt; are commands that are parsed and interpreted by qsub at submission time, and control administrative things about your job.   In this example, the script above requests two nodes, using 8 processors per node, for a [[wallclock time]] of one hour.  (The resources required by the job are listed on the &amp;lt;tt&amp;gt;#PBS -l&amp;lt;/tt&amp;gt; line.)   Other options can be given in other &amp;lt;tt&amp;gt;#PBS&amp;lt;/tt&amp;gt; lines, such as &amp;lt;tt&amp;gt;#PBS -N&amp;lt;/tt&amp;gt;, which sets the name of the job.   &lt;br /&gt;
&lt;br /&gt;
The rest of the script is run as a bash script at run time.   A bash shell on the first node of the two nodes that are requested executes these commands as a normal bash script, just as if you had run this as a shell script from the terminal.   The only difference is that PBS sets certain environment variables that you can use in the script.  &amp;lt;tt&amp;gt;$PBS_O_WORKDIR&amp;lt;/tt&amp;gt; is set to be the directory that the command was 'submitted' from - eg,  &amp;lt;tt&amp;gt;/scratch/USER/SOMEDIRECTORY&amp;lt;/tt&amp;gt; - and &amp;lt;tt&amp;gt;$PBS_NODEFILE&amp;lt;/tt&amp;gt; is the name of a file which contains all the nodes on which programs should execute.   Using these environment variables, the script then uses the &amp;lt;tt&amp;gt;mpirun&amp;lt;/tt&amp;gt; command to launch the job.   Assumed here is that the user has a line like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load openmpi intel&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
&lt;br /&gt;
in their &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
* Note: The different versions of MPI require different commands to launch the run, and thus different scripts. The above script is  specific for the openmpi module.  For the intelmpi module, the last line of the script should read&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun -r ssh -np 16 -env I_MPI_DEVICE ssm ./a.out&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
====Submitting Collections of Serial Jobs====&lt;br /&gt;
&lt;br /&gt;
SciNet-approved methods for running collections of serial jobs can be found on the [[User_Serial|serial run wiki page]].&lt;br /&gt;
&lt;br /&gt;
====Batch Submission Script: OpenMP====&lt;br /&gt;
&lt;br /&gt;
For running OpenMP jobs, the procedure is similar as for MPI jobs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for SciNet GPC (OpenMP)&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N test&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=8&lt;br /&gt;
./a.out&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that [[Introduction_To_Performance#Throughput | in some circumstances]] it can be more efficient to run (say) two jobs each running on four threads than one job running on eight threads.   In that case you can use the same `ampersand-and-wait' technique outlined for serial jobs (see [[User_Serial|serial run wiki page]]) for less-than-eight-core OpenMP jobs.&lt;br /&gt;
&lt;br /&gt;
====Hybrid MPI/OpenMP jobs====&lt;br /&gt;
&lt;br /&gt;
=====Using Intel MPI=====&lt;br /&gt;
Here is how to run hybrid codes using intelmpi::&lt;br /&gt;
&lt;br /&gt;
http://software.intel.com/en-us/articles/hybrid-applications-intelmpi-openmp/&lt;br /&gt;
&lt;br /&gt;
Make sure you compile with the -mt_mpi option to the compilers to use the thread safe libraries. &lt;br /&gt;
Set the environment variable I_MPI_PIN_DOMAIN:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ export I_MPI_PIN_DOMAIN=omp&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
This will set the process pinning domain size to be equal to OMP_NUM_THREADS (which you should set to the desired number of threads per mpi process). Therefore, each MPI process can create $OMP_NUM_THREADS number of children threads for running within the corresponding domain. If OMP_NUM_THREADS is not set, each node is treated as a separate domain (which will allow as many threads per MPI processes as there are cores).&lt;br /&gt;
&lt;br /&gt;
In addition, when invoking mpirun, you should add the argument &amp;quot;-ppn X&amp;quot;, where X is the number of MPI processes per node.&lt;br /&gt;
For example:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun -r ssh -ppn 2 -np 8 [executable]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
would start 2 mpi processes of &amp;lt;tt&amp;gt;[executable]&amp;lt;/tt&amp;gt; per node for a total of 8 processes, so mpirun will try to run mpi processes on 4 nodes&lt;br /&gt;
(OMP_NUM_THREADS is then probably best set at 4).&lt;br /&gt;
Your job script should still ask for these 4 nodes with the line&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
     #PBS -l nodes=4:ppn=8,walltime=....&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
(&amp;lt;tt&amp;gt;ppn=8&amp;lt;/tt&amp;gt; is not a mistake here; the ppn parameter has a different meaning for PBS and for mpirun)&lt;br /&gt;
&lt;br /&gt;
''The ppn parameter to ''mpirun'' is very important! Without it, eight mpi jobs would get bunched on the first node in this example, leaving 3 nodes unused.''&lt;br /&gt;
&lt;br /&gt;
NOTE: In order to pin OpenMP threads inside the domain, use the corresponding OpenMP feature by setting the KMP_AFFINITY environment variable, see [http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/fortran/lin/compiler_f/optaps/common/optaps_openmp_thread_affinity.htm#KMP_AFFINITY_Environment_Variable|Intel's Compiler User and Reference Guide].&lt;br /&gt;
&lt;br /&gt;
The IntelMPI manual is referenced on the front page of our wiki:&lt;br /&gt;
&lt;br /&gt;
http://software.intel.com/sites/products/documentation/hpc/mpi/linux/reference_manual.pdf&lt;br /&gt;
&lt;br /&gt;
For the above example of a total of 8 processes on 4 nodes, you could use the following script:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for SciNet GPC (hybrid job)&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=4:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N test&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# SET THE NUMBER OF THREADS PER PROCESS:&lt;br /&gt;
export OMP_NUM_THREADS=4&lt;br /&gt;
&lt;br /&gt;
# PIN THE MPI DOMAINS ACCORDING TO OMP&lt;br /&gt;
export I_MPI_PIN_DOMAIN=omp&lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND; -np = nodes*ppn&lt;br /&gt;
mpirun -r ssh -ppn 2 -np 8 ./a.out&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=====Using Open MPI=====&lt;br /&gt;
&lt;br /&gt;
For mixed MPI/OpenMP jobs using OpenMPI, which is the default for many users, the procedure is similar, but details differ.&lt;br /&gt;
&lt;br /&gt;
* Request the number of nodes in the PBS script.&lt;br /&gt;
* Set OMP_NUM_THREADS to the number of threads per MPI process.&lt;br /&gt;
* In addition to the -np parameter for mpirun, add the argument &amp;lt;tt&amp;gt;--bynode&amp;lt;/tt&amp;gt;, so that the mpi processes are not bunched up.&lt;br /&gt;
&lt;br /&gt;
So for example, to start a total of 8 processes on 4 nodes, you could use the following script&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for SciNet GPC (hybrid job)&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=4:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N test&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# SET THE NUMBER OF THREADS PER PROCESS:&lt;br /&gt;
export OMP_NUM_THREADS=4&lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND; -np = nodes*processes_per_nodes; --byhost forces a round robin of nodes.&lt;br /&gt;
mpirun -np 8 --bynode -hostfile $PBS_NODEFILE ./a.out&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Submitting an Interactive (Debug) Job===&lt;br /&gt;
&lt;br /&gt;
It is sometimes convenient to run a job interactively; this can be very handy for debugging purposes.  In this case, you type a &amp;lt;tt&amp;gt;qsub&amp;lt;/tt&amp;gt; command which submits an interactive job to the queue; when the scheduler selects this job to run, then it starts a shell running on the first node of the job, which connects to your terminal.  You can then type any series of commands (for instance, the same commands listed as in the batch submission script above) to run a job interactively.&lt;br /&gt;
&lt;br /&gt;
For example, to start the same sort of job as in the batch submission script above, but interactively, one would type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -I -l nodes=2:ppn=8,walltime=1:00:00&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This is exactly the &amp;lt;tt&amp;gt;#PBS -l&amp;lt;/tt&amp;gt; line in the batch script above (which requests all 8 processors on each of 2 nodes for one hour), but prepended with a &amp;lt;tt&amp;gt;-I&amp;lt;/tt&amp;gt; for `interactive'.   When this job begins, your terminal will now show you as being logged in to one of the compute nodes, and one can type in any shell command, run &amp;lt;tt&amp;gt;mpirun&amp;lt;/tt&amp;gt;, etc.   When you exit the shell, the job will end.  Interactive jobs can be used with any of the [[ Moab#GPC | GPC queues ]] however, there is a short&lt;br /&gt;
high turnover queue called [[ Moab#debug | debug ]] which can be especially useful when the system is busy.&lt;br /&gt;
&lt;br /&gt;
===Ethernet vs. Infiniband===&lt;br /&gt;
&lt;br /&gt;
About 1/4 of the GPC (862 nodes or 6896 cores) is connected with a high bandwidth low-latency fabric called&lt;br /&gt;
[http://en.wikipedia.org/wiki/InfiniBand InfiniBand].  Many jobs which require tight coupling to scale well greatly benefit from this interconnect;&lt;br /&gt;
other types of jobs, which have relatively modest communications, do not require this and run fine on Gigabit ethernet.&lt;br /&gt;
&lt;br /&gt;
Jobs which require the InfiniBand for good performance can request the nodes that have the `&amp;lt;tt&amp;gt;ib&amp;lt;/tt&amp;gt;' feature in the &amp;lt;tt&amp;gt;#PBS -l&amp;lt;/tt&amp;gt; line,&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#PBS -l nodes=2:ib:ppn=8,walltime=1:00:00&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Because there are a limited number of these nodes, your job will start running faster if you do not request them (e.g. if you use the scripts as shown above), as this increases the number of nodes available to run your job. In fact, the InfiniBand nodes are to be used only for jobs that are known to scale well and  will benefit from this type of interconnect. As such the minimum number of nodes requested has to be at least 2, as single node jobs will not benefit from using an&lt;br /&gt;
Infiniband node. The MPI libraries provided by SciNet automatically correctly use either the InfiniBand or ethernet interconnect depending on which nodes your job runs on.&lt;br /&gt;
&lt;br /&gt;
===HyperThreading===&lt;br /&gt;
&lt;br /&gt;
Each GPC compute node has 8 Nehalem cores (2 sockets each with a four-core Intel Xeon E5540 @ 2.53GHz).  Thus, to make full use of the computing power of a GPC node, you must be running least 8 &amp;quot;tasks&amp;quot; -- MPI processes, or OpenMP threads.&lt;br /&gt;
&lt;br /&gt;
Under most circumstances, running exactly 8 tasks is the most efficient way to use these nodes.  However, sometimes software design (eg, having one thread for communication and one for computation) can usefully `oversubscribe' the number of physical cores, and running (say) twice as many tasks as cores can be a useful strategy.   If your code is highly memory-bandwidth bound, having one task ready to run while another waits for memory access can make more effective use of the processor.&lt;br /&gt;
&lt;br /&gt;
The Nehalem processors have hardware support for such two-way overloading of processors, through &amp;quot;HyperThreading&amp;quot;; there are an extra set of registers on each core to facilitate rapid switching between two tasks, making it look to the operating system that there are in fact 16 cores per node.   Depending on the nature of your code, making use of these virtual extra cores may speed up or slow down your computation; you should run small test cases before running production jobs in this manner.  In most cases, the speed difference will be under 10%.  Some of our users have obtained an 8% speedup by running gromacs with 16 tasks instead of 8 on a single node (mpirun -np 16 ./gromacs/mdrun -npme 4 is 108% the speed of mpirun -np 8 ./gromacs/mdrun with -npme 2 or -1).&lt;br /&gt;
&lt;br /&gt;
====HyperThreading with OpenMP====&lt;br /&gt;
&lt;br /&gt;
To use hyperthreading with an OpenMP job, one just runs twice as many threads as one would have previously; eg, if you were running 8 threads before (&amp;lt;tt&amp;gt;export OMP_NUM_THREADS=8&amp;lt;/tt&amp;gt;) you would run with 16 (&amp;lt;tt&amp;gt;export OMP_NUM_THREADS=16&amp;lt;/tt&amp;gt;).  Everything else remains the same, including the job submission script; one still uses &amp;lt;tt&amp;gt;ppn=8&amp;lt;/tt&amp;gt; in the submission of the job, as Torque has no way of knowing (or reason for caring) that you will be running on 16 `virtual' cores rather than 8 physical cores.&lt;br /&gt;
&lt;br /&gt;
====HyperThreading with MPI====&lt;br /&gt;
&lt;br /&gt;
To use hyperthreading with an MPI job, one just runs twice as many MPI processes as one would have previously; eg, if you were running on three nodes using 8 MPI tasks per node and used &amp;lt;tt&amp;gt;mpirun ... -np 24&amp;lt;/tt&amp;gt;, you could run instead with &amp;lt;tt&amp;gt;-np 48&amp;lt;/tt&amp;gt;.  Everything else remains the same, including the job submission script; one still uses &amp;lt;tt&amp;gt;ppn=8&amp;lt;/tt&amp;gt; in the submission of the job, as Torque has no way of knowing (or reason for caring) that you will be running on 16 `virtual' cores rather than 8 physical cores.&lt;br /&gt;
&lt;br /&gt;
Note that if you are using OpenMPI (as is the default), there is another consideration; OpenMPI assumes that there is no oversubscription and each task very aggressively makes full use of a core when it is waiting for a message (eg, the waits are &amp;quot;busywaits&amp;quot;).  If you find a significant slowdown when running multiple MPI tasks per core with OpenMPI, you may want to try adding the additional option to mpirun: &amp;lt;tt&amp;gt;--mca mpi_yield_when_idle 1&amp;lt;/tt&amp;gt;.  This will increase the latency of individual messages, but free up the core to do additional work while waiting.&lt;br /&gt;
&lt;br /&gt;
With IntelMPI, the problem should be less pronounced, but you can still improve things by using &amp;lt;tt&amp;gt;mpirun -genv I_MPI_SPIN_COUNT 1 ...&amp;lt;/tt&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=====Examples of hyperthreading with MPI=====&lt;br /&gt;
Hyperthreading using gromacs: https://support.scinet.utoronto.ca/wiki/index.php/Gromacs#Hyperthreading_with_Gromacs&lt;br /&gt;
&lt;br /&gt;
====HyperThreading with Hybrid MPI/OpenMP codes====&lt;br /&gt;
&lt;br /&gt;
With a hybrid code, one has extra flexibility in how to assign the &amp;quot;extra&amp;quot; cores -- you could run extra MPI tasks or extra OpenMPI threads.  As with all hybrid codes, the combination which results in the best performance depends very strongly on the nature of your code, and you should experiment with different combinations.   In addition, with hybrid codes processor and memory affinity issues become very important; if you're unsure as to how to tune your application for best performance, please make an appointment with the SciNet technical analysts for more help.&lt;br /&gt;
&lt;br /&gt;
===Memory Configuration===&lt;br /&gt;
&lt;br /&gt;
==== 16G ====&lt;br /&gt;
&lt;br /&gt;
There are 3756 nodes which have 16G of memory, and is the primary configuration in the GPC. These nodes will be used by default.&lt;br /&gt;
&lt;br /&gt;
==== 18G ====&lt;br /&gt;
&lt;br /&gt;
There are 24 Infiniband nodes which have 18G of memory. These nodes have a fully populated memory configuration that maximizes memory bandwidth. To&lt;br /&gt;
request these nodes use:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -l nodes=2:ib:m18g:ppn=8,walltime=1:00:00 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== 32G ====&lt;br /&gt;
&lt;br /&gt;
There are 84 Infiniband nodes which have 32G of memory. To request these nodes use:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -l nodes=2:ib:m32g:ppn=8,walltime=1:00:00 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== 128G ====&lt;br /&gt;
There are two stand-alone large memory (128GB) nodes, &amp;lt;tt&amp;gt;gpc-lrgmem01&amp;lt;/tt&amp;gt; and &amp;lt;tt&amp;gt;gpc-lrgmem02&amp;lt;/tt&amp;gt; which are primarily to be used for data analysis of runs.  They have 16 cores and are intel machines running linux, but they are not the same architecture (Nehalem) as the GPC compute nodes, so codes may have to be compiled separately for these machines.  They can be accessed using a specific &amp;lt;tt&amp;gt;largemem&amp;lt;/tt&amp;gt; queue.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -l nodes=2:ppn=8,walltime=1:00:00 -q largemem -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Ram Disk===&lt;br /&gt;
&lt;br /&gt;
On the GPC nodes, there is a `ram disk' available - up to half of the memory on the node may be used as a temporary file system.  This is particularly useful for use in the early stages of migrating destop-computing codes to a High Performance Computing platform such as the GPC.    It is much faster than real disk and does not require network traffic; however, each node sees its own ramdisk and cannot see files on that of other nodes.   This is a very easy way to cache writes (by writing them to fast ram disk instead of slow `real' disk); and then one would periodically copy the files to files on /scratch or /project so that they are available after the job has completed.&lt;br /&gt;
&lt;br /&gt;
To use the ramdisk, create and read to / write from files in /dev/shm/.. just as one would to (eg) /scratch/USER/.  Only the amount of RAM needed to store the files will be taken up by the temporary file system; thus if you have 8 serial jobs each requiring 1 GB of RAM, and 1GB is taken up by various OS services, you would still have approximately 7GB available to use as ramdisk on a 16GB node.   However, if you were to write 8 GB of data to the RAM disk, this would exceed available memory and your job would likely crash.&lt;br /&gt;
   &lt;br /&gt;
NOTE: it is very important to delete your files from ram disk at the end of your job.   If you do not do this, the next user to use that node will have less RAM available than they might expect, and this might kill their jobs.&lt;br /&gt;
&lt;br /&gt;
More details on how to setup your script to use the ramdisk can be found on the [[User_Ramdisk|Ramdisk wiki page]].&lt;br /&gt;
&lt;br /&gt;
=== Managing jobs on the Queuing system ===&lt;br /&gt;
Information on checking available resources, starting, viewing, managing and canceling jobs on [[Moab | Moab/Torque]]&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=GPC_Quickstart&amp;diff=1806</id>
		<title>GPC Quickstart</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=GPC_Quickstart&amp;diff=1806"/>
		<updated>2010-08-09T16:12:46Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* HyperThreading with MPI */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:University_of_Tor_79284gm-a.jpg|center|300px|thumb]]&lt;br /&gt;
|name=General Purpose Cluster (GPC)&lt;br /&gt;
|installed=June 2009&lt;br /&gt;
|operatingsystem= Linux&lt;br /&gt;
|loginnode= gpc01..gpc04 (from &amp;lt;tt&amp;gt;login.scinet&amp;lt;/tt&amp;gt;)&lt;br /&gt;
|numberofnodes=3780&lt;br /&gt;
|rampernode=16 Gb &lt;br /&gt;
|corespernode=8&lt;br /&gt;
|interconnect=1/4 on Infiniband, rest on GigE&lt;br /&gt;
|vendorcompilers=icc (C) ifort (fortran) icpc (C++)&lt;br /&gt;
|queuetype=[[Moab | Moab/Torque]]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
The General Purpose Cluster is an extremely large cluster (ranked [http://www.top500.org/list/2009/06/100 16th] in the world at its inception, and fastest in Canada) and is where most simulations are to be done at SciNet.  It is an IBM iDataPlex cluster based on Intel's Nehalem architecture (one of the [http://www.hpcwire.com/features/HPC-Vendors-Jump-On-Nehalem-42360237.html first in the world] to make use of the new chips). The GPC consists of 3,780 nodes with a total of 30,240  2.5GHz cores, with 16GB RAM per node (2GB per core). Approximately one quarter of the cluster is interconnected with non-blocking 4x-DDR InfiniBand while the rest of the nodes are connected with gigabit ethernet.  The compute nodes are accessed through a queuing system that allows jobs with a maximum wall time of 48 hours.&lt;br /&gt;
&lt;br /&gt;
===Login===&lt;br /&gt;
&lt;br /&gt;
First login via ssh with your scinet account at &amp;lt;tt&amp;gt;login.scinet.utoronto.ca&amp;lt;/tt&amp;gt;, and from there you can proceed to the Development nodes to compile/test your code.&lt;br /&gt;
&lt;br /&gt;
===Compile/Devel Nodes===&lt;br /&gt;
&lt;br /&gt;
From a scinet login node you can ssh to &amp;lt;tt&amp;gt;gpc01&amp;lt;/tt&amp;gt;..&amp;lt;tt&amp;gt;gpc04&amp;lt;/tt&amp;gt;.  These nodes have the same hardware configuration as most of the compute nodes -- 8 Nehalem processing cores with 16GB RAM and Gigabit ethernet.  You can compile and test your codes on these nodes. To interactively test on more than 8 processors, or to test your code over an InfiniBand connection, you can submit an [[GPC_Quickstart#Submitting_an_Interactive_Job | interactive job request]].&lt;br /&gt;
&lt;br /&gt;
Your [[Storage_Quickstart | home directory]] is in &amp;lt;tt&amp;gt;/home/USER&amp;lt;/tt&amp;gt;; you have 10GB there that is backed up. This directory cannot be written to by the compute nodes! Thus, to run jobs, you'll use the &amp;lt;tt&amp;gt;/scratch/USER&amp;lt;/tt&amp;gt; directory. Here, there is a large amount of disk space, but it is not backed up. Thus it makes sense to keep your codes in /home, compile there, and then run them in the /scratch directory.&lt;br /&gt;
&lt;br /&gt;
===Modules and Environment Variables===&lt;br /&gt;
&lt;br /&gt;
To use most packages on the SciNet machines - including any of the compilers - , you will have to use the `modules' command.  The command &amp;lt;tt&amp;gt;module load some-package&amp;lt;/tt&amp;gt; will set your environment variables (&amp;lt;tt&amp;gt;PATH&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;LD_LIBRARY_PATH&amp;lt;/tt&amp;gt;, etc) to include the default version of that package.   &amp;lt;tt&amp;gt;module load some-package/specific-version&amp;lt;/tt&amp;gt; will load a specific version of that package.  This makes it very easy for different users to use different versions of compilers, MPI versions, libraries etc.&lt;br /&gt;
&lt;br /&gt;
Note that to use even the gcc compilers you will have to do&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load gcc&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
but in fact you probably should use the intel compilers installed on this system as they usually produce faster code (and sometimes, much faster.)&lt;br /&gt;
&lt;br /&gt;
A list of the installed software is available in [[Software_and_Libraries | Software &amp;amp; Libraries]] and can &lt;br /&gt;
be seen on the system by typing &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module avail&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To load a module (for example, the default version of the intel compilers)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load intel&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
To unload a module&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module unload intel&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
To unload all modules&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module purge&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These commands should go in your .bashrc files and/or in your submission scripts to make sure you&lt;br /&gt;
are using the correct packages.&lt;br /&gt;
&lt;br /&gt;
===Compilers===&lt;br /&gt;
&lt;br /&gt;
The intel compilers are icc/icpc/ifort for C/C++/Fortran, and are available with the default module &amp;quot;intel&amp;quot;.  The intel compilers are recommended over the GNU compilers.  Documentation about icpc is available at &lt;br /&gt;
http://software.intel.com/en-us/articles/intel-software-technical-documentation/.  The Intel compilers accept many of the options that the GNU compilers accept, but tend to produce faster programs on our system.  If, for some reason, you really need the GNU compilers, the latest version of the GNU compiler collection (currently 4.4.0) is available by loading the &amp;quot;gcc&amp;quot; module, with gcc/g++/gfortran for C/C++/Fortran.   Note that f77/g77 is not supported. &lt;br /&gt;
&lt;br /&gt;
To ensure that the intel compilers are in your &amp;lt;tt&amp;gt;PATH&amp;lt;/tt&amp;gt; and their libraries are in your &amp;lt;tt&amp;gt;LD_LIBRARY_PATH&amp;lt;/tt&amp;gt;, use the command&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load intel&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This should likely go in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt; file so that it will automatically be loaded.&lt;br /&gt;
&lt;br /&gt;
Optimize your code for the GPC machine using of at least the following compiler flags: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   -O3 -xHost&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
(or &amp;lt;tt&amp;gt;-O3 -march=native&amp;lt;/tt&amp;gt; for the GNU compilers). &lt;br /&gt;
&lt;br /&gt;
*If your program uses openmp, add &amp;lt;tt&amp;gt;-openmp&amp;lt;/tt&amp;gt; (&amp;lt;tt&amp;gt;-fopenmp&amp;lt;/tt&amp;gt; for GNU compilers).&lt;br /&gt;
*If you get the warning &amp;lt;tt&amp;gt;feupdatreenv is not implemented&amp;lt;/tt&amp;gt;, add -limf to the link line.&lt;br /&gt;
*If you need to link in the MKL libraries, you are well advised to use the Intel(R) Math Kernel Library Link Line Advisor: http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/ for help in devising the list of libraries to link with your code.&lt;br /&gt;
&lt;br /&gt;
===[[ GPC_MPI_Versions | MPI]]===&lt;br /&gt;
&lt;br /&gt;
SciNet currently provides multiple MPI libraries for the GPC; [http://www.open-mpi.org/ OpenMPI], and [http://software.intel.com/en-us/intel-mpi-library/ IntelMPI].  We currently recommend OpenMPI as the default, as it quite reliably demonstrates good performance on both the infiniband and ethernet networks.  For full details and options see the complete [[ GPC_MPI_Versions | '''MPI''']] section.&lt;br /&gt;
&lt;br /&gt;
The MPI libraries are compiled with both the gnu compiler suite and the intel compiler suite.   To use (for instance) the intel-compiled OpenMPI libraries, which we recommend as the default (and use for most of our examples here), use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load openmpi intel&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
&lt;br /&gt;
in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt;.   Other combinations behave similarly.&lt;br /&gt;
&lt;br /&gt;
The MPI libraries define the wrappers mpicc/mpicxx/mpif90/mpif77 as wrappers around the appropriate compilers, which ensure the appropriate include and library directories and used in the compilation and linking steps.&lt;br /&gt;
&lt;br /&gt;
We currently recommend the Intel + OpenMPI combination.  However, if you require the GNU compilers as well as MPI, you would want to find the most recent openmpi module available with `gcc' in the version name.  This will enable development and runtime with gcc/g++/gfortran  and OpenMPI.  You can make this your default by putting the module load line in your ~/.bashrc file.&lt;br /&gt;
&lt;br /&gt;
For mixed OpenMP/MPI code using Intel MPI, add the compilation flag -mt_mpi for full thread-safety.&lt;br /&gt;
&lt;br /&gt;
===Submitting A Batch Job===&lt;br /&gt;
&lt;br /&gt;
The SciNet machines are shared systems, and jobs that are to run on them are submitted to a queue; the&lt;br /&gt;
[[Moab | scheduler]] then orders the jobs in order to make the best use of the machine, and has them launched &lt;br /&gt;
when resources become availble.   The intervention of the scheduler can mean that the jobs aren't&lt;br /&gt;
quite run in a  first-in first-out order.&lt;br /&gt;
&lt;br /&gt;
The maximum [[wallclock time]] for a job in the queue is 48 hours; computations that will take longer than&lt;br /&gt;
this must be broken into 48-hour chunks and run as several jobs.  The usual way to do this is with [[checkpoints]],&lt;br /&gt;
writing out the complete state of the computation every so often in such a way that a job can be restarted from&lt;br /&gt;
this state information and continue on from where it left off.  Generating [[checkpoints]] is a good idea anyway,&lt;br /&gt;
as in the unlikely event of a hardware failure during your run, it allows you to restart without having lost much work.&lt;br /&gt;
&lt;br /&gt;
There are limits to how many jobs you can submit.  If your group has a default account, up to 32 nodes at a time for 48 hours per job on the GPC cluster are allowed to be queued. This is a total limit, e.g., you could request 64 nodes for 24 hours.  Jobs of users with an LRAC or NRAC allocation will run at a higher priority than others while their resources last. Because of the group-based allocation, it is conceivable that your jobs won't run if your colleagues have already exhausted your group's limits.&lt;br /&gt;
&lt;br /&gt;
Note that scheduling big jobs greatly affects the queuer and other users, so you have to talk to us first to run massively parallel jobs (&amp;gt; 2048 cores). We will help make sure that your jobs start and run efficiently.&lt;br /&gt;
&lt;br /&gt;
If your job should run in fewer than  48 hours, specify that in your script -- your job &lt;br /&gt;
will start sooner.   (It's easier for the [[Moab | scheduler]] to fit in a short job than a long job).  On the downside, the&lt;br /&gt;
job will be killed automatically by the queue manager software at the end of the specified [[wallclock time]], so if you&lt;br /&gt;
guess wrong you might lose some work.  So the standard procedure is to estimate how long your job will take and&lt;br /&gt;
add 10% or so. &lt;br /&gt;
&lt;br /&gt;
You interact with the queuing system through the queue/resource manager, [[Moab | Moab]] and [[Moab | Torque]].  To see all the jobs in the queue use&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To submit your own job, you must write a script which describes the job and how it is to be run (a sample script [[GPC_Quickstart#Submission_Script | follows]]) and submit it to the queue, using the command&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub [SCRIPT-FILE-NAME]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where you will replace &amp;lt;tt&amp;gt;[SCRIPT-FILE-NAME]&amp;lt;/tt&amp;gt; with the file containing the submission script.   This will return a job ID, for example 31415, which is used to identify the jobs.  Information about a queued job can be found using&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ checkjob [JOB-ID]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and jobs can be canceled with the command&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ canceljob [JOB-ID]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Again, these commands have many options, which can be read about on their man pages.&lt;br /&gt;
&lt;br /&gt;
Much more information on the queueing system is available on our [[Moab | queue]] page.&lt;br /&gt;
&lt;br /&gt;
====Batch Submission Script: MPI====&lt;br /&gt;
&lt;br /&gt;
A sample submission script is shown below for an mpi job using ethernet with the &amp;lt;tt&amp;gt; #PBS &amp;lt;/tt&amp;gt; directives at the top and the rest being &lt;br /&gt;
what will be executed on the compute node.  &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for SciNet GPC (ethernet)&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=2:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N test&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND; -np = nodes*ppn&lt;br /&gt;
mpirun -np 16 -hostfile $PBS_NODEFILE ./a.out&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The lines that begin &amp;lt;tt&amp;gt;#PBS&amp;lt;/tt&amp;gt; are commands that are parsed and interpreted by qsub at submission time, and control administrative things about your job.   In this example, the script above requests two nodes, using 8 processors per node, for a [[wallclock time]] of one hour.  (The resources required by the job are listed on the &amp;lt;tt&amp;gt;#PBS -l&amp;lt;/tt&amp;gt; line.)   Other options can be given in other &amp;lt;tt&amp;gt;#PBS&amp;lt;/tt&amp;gt; lines, such as &amp;lt;tt&amp;gt;#PBS -N&amp;lt;/tt&amp;gt;, which sets the name of the job.   &lt;br /&gt;
&lt;br /&gt;
The rest of the script is run as a bash script at run time.   A bash shell on the first node of the two nodes that are requested executes these commands as a normal bash script, just as if you had run this as a shell script from the terminal.   The only difference is that PBS sets certain environment variables that you can use in the script.  &amp;lt;tt&amp;gt;$PBS_O_WORKDIR&amp;lt;/tt&amp;gt; is set to be the directory that the command was 'submitted' from - eg,  &amp;lt;tt&amp;gt;/scratch/USER/SOMEDIRECTORY&amp;lt;/tt&amp;gt; - and &amp;lt;tt&amp;gt;$PBS_NODEFILE&amp;lt;/tt&amp;gt; is the name of a file which contains all the nodes on which programs should execute.   Using these environment variables, the script then uses the &amp;lt;tt&amp;gt;mpirun&amp;lt;/tt&amp;gt; command to launch the job.   Assumed here is that the user has a line like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load openmpi intel&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
&lt;br /&gt;
in their &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
* Note: The different versions of MPI require different commands to launch the run, and thus different scripts. The above script is  specific for the openmpi module.  For the intelmpi module, the last line of the script should read&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun -r ssh -np 16 -env I_MPI_DEVICE ssm ./a.out&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
====Submitting Collections of Serial Jobs====&lt;br /&gt;
&lt;br /&gt;
SciNet-approved methods for running collections of serial jobs can be found on the [[User_Serial|serial run wiki page]].&lt;br /&gt;
&lt;br /&gt;
====Batch Submission Script: OpenMP====&lt;br /&gt;
&lt;br /&gt;
For running OpenMP jobs, the procedure is similar as for MPI jobs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for SciNet GPC (OpenMP)&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N test&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=8&lt;br /&gt;
./a.out&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that [[Introduction_To_Performance#Throughput | in some circumstances]] it can be more efficient to run (say) two jobs each running on four threads than one job running on eight threads.   In that case you can use the same `ampersand-and-wait' technique outlined for serial jobs (see [[User_Serial|serial run wiki page]]) for less-than-eight-core OpenMP jobs.&lt;br /&gt;
&lt;br /&gt;
====Hybrid MPI/OpenMP jobs====&lt;br /&gt;
&lt;br /&gt;
=====Using Intel MPI=====&lt;br /&gt;
Here is how to run hybrid codes using intelmpi::&lt;br /&gt;
&lt;br /&gt;
http://software.intel.com/en-us/articles/hybrid-applications-intelmpi-openmp/&lt;br /&gt;
&lt;br /&gt;
Make sure you compile with the -mt_mpi option to the compilers to use the thread safe libraries. &lt;br /&gt;
Set the environment variable I_MPI_PIN_DOMAIN:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ export I_MPI_PIN_DOMAIN=omp&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
This will set the process pinning domain size to be equal to OMP_NUM_THREADS (which you should set to the desired number of threads per mpi process). Therefore, each MPI process can create $OMP_NUM_THREADS number of children threads for running within the corresponding domain. If OMP_NUM_THREADS is not set, each node is treated as a separate domain (which will allow as many threads per MPI processes as there are cores).&lt;br /&gt;
&lt;br /&gt;
In addition, when invoking mpirun, you should add the argument &amp;quot;-ppn X&amp;quot;, where X is the number of MPI processes per node.&lt;br /&gt;
For example:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun -r ssh -ppn 2 -np 8 [executable]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
would start 2 mpi processes of &amp;lt;tt&amp;gt;[executable]&amp;lt;/tt&amp;gt; per node for a total of 8 processes, so mpirun will try to run mpi processes on 4 nodes&lt;br /&gt;
(OMP_NUM_THREADS is then probably best set at 4).&lt;br /&gt;
Your job script should still ask for these 4 nodes with the line&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
     #PBS -l nodes=4:ppn=8,walltime=....&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
(&amp;lt;tt&amp;gt;ppn=8&amp;lt;/tt&amp;gt; is not a mistake here; the ppn parameter has a different meaning for PBS and for mpirun)&lt;br /&gt;
&lt;br /&gt;
''The ppn parameter to ''mpirun'' is very important! Without it, eight mpi jobs would get bunched on the first node in this example, leaving 3 nodes unused.''&lt;br /&gt;
&lt;br /&gt;
NOTE: In order to pin OpenMP threads inside the domain, use the corresponding OpenMP feature by setting the KMP_AFFINITY environment variable, see [http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/fortran/lin/compiler_f/optaps/common/optaps_openmp_thread_affinity.htm#KMP_AFFINITY_Environment_Variable|Intel's Compiler User and Reference Guide].&lt;br /&gt;
&lt;br /&gt;
The IntelMPI manual is referenced on the front page of our wiki:&lt;br /&gt;
&lt;br /&gt;
http://software.intel.com/sites/products/documentation/hpc/mpi/linux/reference_manual.pdf&lt;br /&gt;
&lt;br /&gt;
For the above example of a total of 8 processes on 4 nodes, you could use the following script:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for SciNet GPC (hybrid job)&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=4:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N test&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# SET THE NUMBER OF THREADS PER PROCESS:&lt;br /&gt;
export OMP_NUM_THREADS=4&lt;br /&gt;
&lt;br /&gt;
# PIN THE MPI DOMAINS ACCORDING TO OMP&lt;br /&gt;
export I_MPI_PIN_DOMAIN=omp&lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND; -np = nodes*ppn&lt;br /&gt;
mpirun -r ssh -ppn 2 -np 8 ./a.out&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=====Using Open MPI=====&lt;br /&gt;
&lt;br /&gt;
For mixed MPI/OpenMP jobs using OpenMPI, which is the default for many users, the procedure is similar, but details differ.&lt;br /&gt;
&lt;br /&gt;
* Request the number of nodes in the PBS script.&lt;br /&gt;
* Set OMP_NUM_THREADS to the number of threads per MPI process.&lt;br /&gt;
* In addition to the -np parameter for mpirun, add the argument &amp;lt;tt&amp;gt;--bynode&amp;lt;/tt&amp;gt;, so that the mpi processes are not bunched up.&lt;br /&gt;
&lt;br /&gt;
So for example, to start a total of 8 processes on 4 nodes, you could use the following script&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for SciNet GPC (hybrid job)&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=4:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N test&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# SET THE NUMBER OF THREADS PER PROCESS:&lt;br /&gt;
export OMP_NUM_THREADS=4&lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND; -np = nodes*processes_per_nodes; --byhost forces a round robin of nodes.&lt;br /&gt;
mpirun -np 8 --bynode -hostfile $PBS_NODEFILE ./a.out&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Submitting an Interactive (Debug) Job===&lt;br /&gt;
&lt;br /&gt;
It is sometimes convenient to run a job interactively; this can be very handy for debugging purposes.  In this case, you type a &amp;lt;tt&amp;gt;qsub&amp;lt;/tt&amp;gt; command which submits an interactive job to the queue; when the scheduler selects this job to run, then it starts a shell running on the first node of the job, which connects to your terminal.  You can then type any series of commands (for instance, the same commands listed as in the batch submission script above) to run a job interactively.&lt;br /&gt;
&lt;br /&gt;
For example, to start the same sort of job as in the batch submission script above, but interactively, one would type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -I -l nodes=2:ppn=8,walltime=1:00:00&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This is exactly the &amp;lt;tt&amp;gt;#PBS -l&amp;lt;/tt&amp;gt; line in the batch script above (which requests all 8 processors on each of 2 nodes for one hour), but prepended with a &amp;lt;tt&amp;gt;-I&amp;lt;/tt&amp;gt; for `interactive'.   When this job begins, your terminal will now show you as being logged in to one of the compute nodes, and one can type in any shell command, run &amp;lt;tt&amp;gt;mpirun&amp;lt;/tt&amp;gt;, etc.   When you exit the shell, the job will end.  Interactive jobs can be used with any of the [[ Moab#GPC | GPC queues ]] however, there is a short&lt;br /&gt;
high turnover queue called [[ Moab#debug | debug ]] which can be especially useful when the system is busy.&lt;br /&gt;
&lt;br /&gt;
===Ethernet vs. Infiniband===&lt;br /&gt;
&lt;br /&gt;
About 1/4 of the GPC (862 nodes or 6896 cores) is connected with a high bandwidth low-latency fabric called&lt;br /&gt;
[http://en.wikipedia.org/wiki/InfiniBand InfiniBand].  Many jobs which require tight coupling to scale well greatly benefit from this interconnect;&lt;br /&gt;
other types of jobs, which have relatively modest communications, do not require this and run fine on Gigabit ethernet.&lt;br /&gt;
&lt;br /&gt;
Jobs which require the InfiniBand for good performance can request the nodes that have the `&amp;lt;tt&amp;gt;ib&amp;lt;/tt&amp;gt;' feature in the &amp;lt;tt&amp;gt;#PBS -l&amp;lt;/tt&amp;gt; line,&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#PBS -l nodes=2:ib:ppn=8,walltime=1:00:00&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Because there are a limited number of these nodes, your job will start running faster if you do not request them (e.g. if you use the scripts as shown above), as this increases the number of nodes available to run your job. In fact, the InfiniBand nodes are to be used only for jobs that are known to scale well and  will benefit from this type of interconnect. As such the minimum number of nodes requested has to be at least 2, as single node jobs will not benefit from using an&lt;br /&gt;
Infiniband node. The MPI libraries provided by SciNet automatically correctly use either the InfiniBand or ethernet interconnect depending on which nodes your job runs on.&lt;br /&gt;
&lt;br /&gt;
===HyperThreading===&lt;br /&gt;
&lt;br /&gt;
Each GPC compute node has 8 Nehalem cores (2 sockets each with a four-core Intel Xeon E5540 @ 2.53GHz).  Thus, to make full use of the computing power of a GPC node, you must be running least 8 &amp;quot;tasks&amp;quot; -- MPI processes, or OpenMP threads.&lt;br /&gt;
&lt;br /&gt;
Under most circumstances, running exactly 8 tasks is the most efficient way to use these nodes.  However, sometimes software design (eg, having one thread for communication and one for computation) can usefully `oversubscribe' the number of physical cores, and running (say) twice as many tasks as cores can be a useful strategy.   If your code is highly memory-bandwidth bound, having one task ready to run while another waits for memory access can make more effective use of the processor.&lt;br /&gt;
&lt;br /&gt;
The Nehalem processors have hardware support for such two-way overloading of processors, through &amp;quot;HyperThreading&amp;quot;; there are an extra set of registers on each core to facilitate rapid switching between two tasks, making it look to the operating system that there are in fact 16 cores per node.   Depending on the nature of your code, making use of these virtual extra cores may speed up or slow down your computation; you should run small test cases before running production jobs in this manner.  In most cases, the speed difference will be under 10%.  Some of our users have obtained an 8% speedup by running gromacs with 16 tasks instead of 8 on a single node (mpirun -np 16 ./gromacs/mdrun -npme 4 is 108% the speed of mpirun -np 8 ./gromacs/mdrun with -npme 2 or -1).&lt;br /&gt;
&lt;br /&gt;
====HyperThreading with OpenMP====&lt;br /&gt;
&lt;br /&gt;
To use hyperthreading with an OpenMP job, one just runs twice as many threads as one would have previously; eg, if you were running 8 threads before (&amp;lt;tt&amp;gt;export OMP_NUM_THREADS=8&amp;lt;/tt&amp;gt;) you would run with 16 (&amp;lt;tt&amp;gt;export OMP_NUM_THREADS=16&amp;lt;/tt&amp;gt;).  Everything else remains the same, including the job submission script; one still uses &amp;lt;tt&amp;gt;ppn=8&amp;lt;/tt&amp;gt; in the submission of the job, as Torque has no way of knowing (or reason for caring) that you will be running on 16 `virtual' cores rather than 8 physical cores.&lt;br /&gt;
&lt;br /&gt;
====HyperThreading with MPI====&lt;br /&gt;
&lt;br /&gt;
To use hyperthreading with an MPI job, one just runs twice as many MPI processes as one would have previously; eg, if you were running on three nodes using 8 MPI tasks per node and used &amp;lt;tt&amp;gt;mpirun ... -np 24&amp;lt;/tt&amp;gt;, you could run instead with &amp;lt;tt&amp;gt;-np 48&amp;lt;/tt&amp;gt;.  Everything else remains the same, including the job submission script; one still uses &amp;lt;tt&amp;gt;ppn=8&amp;lt;/tt&amp;gt; in the submission of the job, as Torque has no way of knowing (or reason for caring) that you will be running on 16 `virtual' cores rather than 8 physical cores.&lt;br /&gt;
&lt;br /&gt;
Note that if you are using OpenMPI (as is the default), there is another consideration; OpenMPI assumes that there is no oversubscription and each task very aggressively makes full use of a core when it is waiting for a message (eg, the waits are &amp;quot;busywaits&amp;quot;).  If you find a significant slowdown when running multiple MPI tasks per core with OpenMPI, you may want to try adding the additional option to mpirun: &amp;lt;tt&amp;gt;--mca mpi_yield_when_idle 1&amp;lt;/tt&amp;gt;.  This will increase the latency of individual messages, but free up the core to do additional work while waiting.&lt;br /&gt;
&lt;br /&gt;
With IntelMPI, the problem should be less pronounced, but you can still improve things by using &amp;lt;tt&amp;gt;mpirun -genv I_MPI_SPIN_COUNT 1 ...&amp;lt;/tt&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=====Examples=====&lt;br /&gt;
Hyperthreading using gromacs: https://support.scinet.utoronto.ca/wiki/index.php/Gromacs#Hyperthreading_with_Gromacs&lt;br /&gt;
&lt;br /&gt;
====HyperThreading with Hybrid MPI/OpenMP codes====&lt;br /&gt;
&lt;br /&gt;
With a hybrid code, one has extra flexibility in how to assign the &amp;quot;extra&amp;quot; cores -- you could run extra MPI tasks or extra OpenMPI threads.  As with all hybrid codes, the combination which results in the best performance depends very strongly on the nature of your code, and you should experiment with different combinations.   In addition, with hybrid codes processor and memory affinity issues become very important; if you're unsure as to how to tune your application for best performance, please make an appointment with the SciNet technical analysts for more help.&lt;br /&gt;
&lt;br /&gt;
===Memory Configuration===&lt;br /&gt;
&lt;br /&gt;
==== 16G ====&lt;br /&gt;
&lt;br /&gt;
There are 3756 nodes which have 16G of memory, and is the primary configuration in the GPC. These nodes will be used by default.&lt;br /&gt;
&lt;br /&gt;
==== 18G ====&lt;br /&gt;
&lt;br /&gt;
There are 24 Infiniband nodes which have 18G of memory. These nodes have a fully populated memory configuration that maximizes memory bandwidth. To&lt;br /&gt;
request these nodes use:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -l nodes=2:ib:m18g:ppn=8,walltime=1:00:00 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== 32G ====&lt;br /&gt;
&lt;br /&gt;
There are 84 Infiniband nodes which have 32G of memory. To request these nodes use:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -l nodes=2:ib:m32g:ppn=8,walltime=1:00:00 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== 128G ====&lt;br /&gt;
There are two stand-alone large memory (128GB) nodes, &amp;lt;tt&amp;gt;gpc-lrgmem01&amp;lt;/tt&amp;gt; and &amp;lt;tt&amp;gt;gpc-lrgmem02&amp;lt;/tt&amp;gt; which are primarily to be used for data analysis of runs.  They have 16 cores and are intel machines running linux, but they are not the same architecture (Nehalem) as the GPC compute nodes, so codes may have to be compiled separately for these machines.  They can be accessed using a specific &amp;lt;tt&amp;gt;largemem&amp;lt;/tt&amp;gt; queue.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -l nodes=2:ppn=8,walltime=1:00:00 -q largemem -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Ram Disk===&lt;br /&gt;
&lt;br /&gt;
On the GPC nodes, there is a `ram disk' available - up to half of the memory on the node may be used as a temporary file system.  This is particularly useful for use in the early stages of migrating destop-computing codes to a High Performance Computing platform such as the GPC.    It is much faster than real disk and does not require network traffic; however, each node sees its own ramdisk and cannot see files on that of other nodes.   This is a very easy way to cache writes (by writing them to fast ram disk instead of slow `real' disk); and then one would periodically copy the files to files on /scratch or /project so that they are available after the job has completed.&lt;br /&gt;
&lt;br /&gt;
To use the ramdisk, create and read to / write from files in /dev/shm/.. just as one would to (eg) /scratch/USER/.  Only the amount of RAM needed to store the files will be taken up by the temporary file system; thus if you have 8 serial jobs each requiring 1 GB of RAM, and 1GB is taken up by various OS services, you would still have approximately 7GB available to use as ramdisk on a 16GB node.   However, if you were to write 8 GB of data to the RAM disk, this would exceed available memory and your job would likely crash.&lt;br /&gt;
   &lt;br /&gt;
NOTE: it is very important to delete your files from ram disk at the end of your job.   If you do not do this, the next user to use that node will have less RAM available than they might expect, and this might kill their jobs.&lt;br /&gt;
&lt;br /&gt;
More details on how to setup your script to use the ramdisk can be found on the [[User_Ramdisk|Ramdisk wiki page]].&lt;br /&gt;
&lt;br /&gt;
=== Managing jobs on the Queuing system ===&lt;br /&gt;
Information on checking available resources, starting, viewing, managing and canceling jobs on [[Moab | Moab/Torque]]&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Gromacs&amp;diff=1805</id>
		<title>Gromacs</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Gromacs&amp;diff=1805"/>
		<updated>2010-08-09T16:11:29Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* Hyperthreading with Gromacs */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Download and general information: http://www.gromacs.org&lt;br /&gt;
&lt;br /&gt;
Search the mailing list archives: http://www.gromacs.org/Support/Mailing_Lists/Search&lt;br /&gt;
&lt;br /&gt;
=Peculiarities of running single node GROMACS jobs on SCINET=&lt;br /&gt;
This is '''VERY IMPORTANT !!!'''&lt;br /&gt;
Please read the [[https://support.scinet.utoronto.ca/wiki/index.php/User_Tips#Running_single_node_MPI_jobs relevant user tips section]] for information that is essential for your single node (up to 8 core) MPI GROMACS jobs.&lt;br /&gt;
&lt;br /&gt;
-- [[User:Cneale|cneale]] 14 September 2009&lt;br /&gt;
&lt;br /&gt;
=Compiling GROMACS on SciNet=&lt;br /&gt;
Please refer to the [[Compiling_Gromacs|GROMACS compilation page]]&lt;br /&gt;
&lt;br /&gt;
=Submitting GROMACS jobs on SciNet=&lt;br /&gt;
Please refer to the [[Running_Gromacs|GROMACS submission page]]&lt;br /&gt;
&lt;br /&gt;
-- [[User:Cneale|cneale]] 18 August 2009&lt;br /&gt;
=GROMACS benchmarks on Scinet=&lt;br /&gt;
&lt;br /&gt;
This is a rudimentary list of scaling information.&lt;br /&gt;
 &lt;br /&gt;
I have a 50K atom system running performance on GPC right now. On 56&lt;br /&gt;
cores connected with IB I am getting 55 ns/day. I set up 50 such&lt;br /&gt;
simulations, each with 2 proteins in a bilayer and I'm getting a total&lt;br /&gt;
of 5.5 us per day. I am using gromacs 4.0.5 and a 5&lt;br /&gt;
fs timestep by fixing the bond lengths and all angles involving&lt;br /&gt;
hydrogen.&lt;br /&gt;
&lt;br /&gt;
I can get about 12 ns/day on 8 cores of the non-IB part of GPC -- also&lt;br /&gt;
excellent.&lt;br /&gt;
&lt;br /&gt;
As for larger systems, My speedup over saw.sharcnet.ca for a 1e6 atom&lt;br /&gt;
system is only 1.2x running on 128 cores in single precision. Although saw.sharcnet.ca &lt;br /&gt;
is composed of xeons, they are running at 2.83 GHz (https://www.sharcnet.ca/my/systems/show/41), which is a&lt;br /&gt;
faster clock speed than the Scinet 2.5 GHz for Intel's next-generation X86-CPU architecture.&lt;br /&gt;
While GROMACS is generally not excellent for scaling up to or beyond 128 cores (even for large systems), &lt;br /&gt;
our benchmarking of this system on saw.sharcnet.ca indicated that it was running at about 65% efficiency.&lt;br /&gt;
Benchmarking was also done on Scinet for this system, but was not recorded as we were mostly tinkering with the&lt;br /&gt;
-npme option to mdrun in an attempt to optimize it. My recollection, though, is that the scaling was similar on scinet.&lt;br /&gt;
&lt;br /&gt;
-- [[User:Cneale|cneale]] 19 August 2009&lt;br /&gt;
=Strong scaling for GROMACS on GPC=&lt;br /&gt;
&lt;br /&gt;
Requested, and on our list to complete, but not yet available in a complete chart form.&lt;br /&gt;
&lt;br /&gt;
-- [[User:Cneale|cneale]] 19 August 2009&lt;br /&gt;
=Scientific studies being carried out using GROMACS on GPC=&lt;br /&gt;
&lt;br /&gt;
Requested, but not yet available&lt;br /&gt;
&lt;br /&gt;
-- [[User:Cneale|cneale]] 19 August 2009&lt;br /&gt;
&lt;br /&gt;
=Hyperthreading with Gromacs=&lt;br /&gt;
Using -np 16 on an 8 core box, I get an 8% to 18% performance increase &lt;br /&gt;
when using -np 16 and optimizing -npme as compared to -np 8 and optimizing -npme (using gromacs 4.0.7). &lt;br /&gt;
I now regularly overload the number of processes.&lt;br /&gt;
&lt;br /&gt;
selected examples:&lt;br /&gt;
System A with 250,000 atoms:&lt;br /&gt;
  mdrun -np 8  -npme -1    1.15 ns/day&lt;br /&gt;
  mdrun -np 8  -npme  2    1.02 ns/day&lt;br /&gt;
  mdrun -np 16 -npme  2    0.99 ns/day&lt;br /&gt;
  mdrun -np 16 -npme  4    1.36 ns/day &amp;lt;-- 118 % performance vs 1.15 ns/day&lt;br /&gt;
  mdrun -np 15 -npme  3    1.32 ns/day&lt;br /&gt;
&lt;br /&gt;
System B with 35,000 atoms (4 fs timestep):&lt;br /&gt;
  mdrun -np 8  -npme -1    22.66 ns/day&lt;br /&gt;
  mdrun -np 8  -npme  2    23.06 ns/day&lt;br /&gt;
  mdrun -np 16 -npme -1    22.69 ns/day&lt;br /&gt;
  mdrun -np 16 -npme  4    24.90 ns/day &amp;lt;-- 108 % performance vs 23.06 ns/day&lt;br /&gt;
  mdrun -np 56 -npme 16    14.15 ns/day&lt;br /&gt;
&lt;br /&gt;
Cutoffs and timesteps differ between these runs, but both use PME and &lt;br /&gt;
explicit water.&lt;br /&gt;
&lt;br /&gt;
And according to gromacs developer Berk Hess ( http://lists.gromacs.org/pipermail/gmx-users/2010-August/053033.html )&lt;br /&gt;
&lt;br /&gt;
&amp;quot;In Gromacs 4.5 there is no difference [between -np and -nt based hyperthreading], since it does not use real thread parallelization.&lt;br /&gt;
Gromacs 4.5 has a built-in threaded MPI library, but openmpi also has an efficient&lt;br /&gt;
MPI implementation for shared memory machines. But even with proper thread&lt;br /&gt;
parallelization I expect the same 15 to 20% performance improvement.&amp;quot;&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Gromacs&amp;diff=1804</id>
		<title>Gromacs</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Gromacs&amp;diff=1804"/>
		<updated>2010-08-09T16:10:57Z</updated>

		<summary type="html">&lt;p&gt;Cneale: added gromace hyperthreading example&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Download and general information: http://www.gromacs.org&lt;br /&gt;
&lt;br /&gt;
Search the mailing list archives: http://www.gromacs.org/Support/Mailing_Lists/Search&lt;br /&gt;
&lt;br /&gt;
=Peculiarities of running single node GROMACS jobs on SCINET=&lt;br /&gt;
This is '''VERY IMPORTANT !!!'''&lt;br /&gt;
Please read the [[https://support.scinet.utoronto.ca/wiki/index.php/User_Tips#Running_single_node_MPI_jobs relevant user tips section]] for information that is essential for your single node (up to 8 core) MPI GROMACS jobs.&lt;br /&gt;
&lt;br /&gt;
-- [[User:Cneale|cneale]] 14 September 2009&lt;br /&gt;
&lt;br /&gt;
=Compiling GROMACS on SciNet=&lt;br /&gt;
Please refer to the [[Compiling_Gromacs|GROMACS compilation page]]&lt;br /&gt;
&lt;br /&gt;
=Submitting GROMACS jobs on SciNet=&lt;br /&gt;
Please refer to the [[Running_Gromacs|GROMACS submission page]]&lt;br /&gt;
&lt;br /&gt;
-- [[User:Cneale|cneale]] 18 August 2009&lt;br /&gt;
=GROMACS benchmarks on Scinet=&lt;br /&gt;
&lt;br /&gt;
This is a rudimentary list of scaling information.&lt;br /&gt;
 &lt;br /&gt;
I have a 50K atom system running performance on GPC right now. On 56&lt;br /&gt;
cores connected with IB I am getting 55 ns/day. I set up 50 such&lt;br /&gt;
simulations, each with 2 proteins in a bilayer and I'm getting a total&lt;br /&gt;
of 5.5 us per day. I am using gromacs 4.0.5 and a 5&lt;br /&gt;
fs timestep by fixing the bond lengths and all angles involving&lt;br /&gt;
hydrogen.&lt;br /&gt;
&lt;br /&gt;
I can get about 12 ns/day on 8 cores of the non-IB part of GPC -- also&lt;br /&gt;
excellent.&lt;br /&gt;
&lt;br /&gt;
As for larger systems, My speedup over saw.sharcnet.ca for a 1e6 atom&lt;br /&gt;
system is only 1.2x running on 128 cores in single precision. Although saw.sharcnet.ca &lt;br /&gt;
is composed of xeons, they are running at 2.83 GHz (https://www.sharcnet.ca/my/systems/show/41), which is a&lt;br /&gt;
faster clock speed than the Scinet 2.5 GHz for Intel's next-generation X86-CPU architecture.&lt;br /&gt;
While GROMACS is generally not excellent for scaling up to or beyond 128 cores (even for large systems), &lt;br /&gt;
our benchmarking of this system on saw.sharcnet.ca indicated that it was running at about 65% efficiency.&lt;br /&gt;
Benchmarking was also done on Scinet for this system, but was not recorded as we were mostly tinkering with the&lt;br /&gt;
-npme option to mdrun in an attempt to optimize it. My recollection, though, is that the scaling was similar on scinet.&lt;br /&gt;
&lt;br /&gt;
-- [[User:Cneale|cneale]] 19 August 2009&lt;br /&gt;
=Strong scaling for GROMACS on GPC=&lt;br /&gt;
&lt;br /&gt;
Requested, and on our list to complete, but not yet available in a complete chart form.&lt;br /&gt;
&lt;br /&gt;
-- [[User:Cneale|cneale]] 19 August 2009&lt;br /&gt;
=Scientific studies being carried out using GROMACS on GPC=&lt;br /&gt;
&lt;br /&gt;
Requested, but not yet available&lt;br /&gt;
&lt;br /&gt;
-- [[User:Cneale|cneale]] 19 August 2009&lt;br /&gt;
&lt;br /&gt;
=Hyperthreading with Gromacs=&lt;br /&gt;
Using -np 16 on an 8 core box, I get an 8% to 18% performance increase &lt;br /&gt;
when using -np 16 and optimizing -npme as compared to -np 8 and optimizing -npme (using gromacs 4.0.7). &lt;br /&gt;
I now regularly overload the number of processes.&lt;br /&gt;
&lt;br /&gt;
selected examples:&lt;br /&gt;
System A with 250,000 atoms:&lt;br /&gt;
mdrun -np 8  -npme -1    1.15 ns/day&lt;br /&gt;
mdrun -np 8  -npme  2    1.02 ns/day&lt;br /&gt;
mdrun -np 16 -npme  2    0.99 ns/day&lt;br /&gt;
mdrun -np 16 -npme  4    1.36 ns/day &amp;lt;-- 118 % performance vs 1.15 ns/day&lt;br /&gt;
mdrun -np 15 -npme  3    1.32 ns/day&lt;br /&gt;
&lt;br /&gt;
System B with 35,000 atoms (4 fs timestep):&lt;br /&gt;
mdrun -np 8  -npme -1    22.66 ns/day&lt;br /&gt;
mdrun -np 8  -npme  2    23.06 ns/day&lt;br /&gt;
mdrun -np 16 -npme -1    22.69 ns/day&lt;br /&gt;
mdrun -np 16 -npme  4    24.90 ns/day &amp;lt;-- 108 % performance vs 23.06 ns/day&lt;br /&gt;
mdrun -np 56 -npme 16    14.15 ns/day&lt;br /&gt;
&lt;br /&gt;
Cutoffs and timesteps differ between these runs, but both use PME and &lt;br /&gt;
explicit water.&lt;br /&gt;
&lt;br /&gt;
And according to gromacs developer Berk Hess ( http://lists.gromacs.org/pipermail/gmx-users/2010-August/053033.html )&lt;br /&gt;
&lt;br /&gt;
&amp;quot;In Gromacs 4.5 there is no difference [between -np and -nt based hyperthreading], since it does not use real thread parallelization.&lt;br /&gt;
Gromacs 4.5 has a built-in threaded MPI library, but openmpi also has an efficient&lt;br /&gt;
MPI implementation for shared memory machines. But even with proper thread&lt;br /&gt;
parallelization I expect the same 15 to 20% performance improvement.&amp;quot;&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=GPU_Devel_Nodes&amp;diff=1762</id>
		<title>GPU Devel Nodes</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=GPU_Devel_Nodes&amp;diff=1762"/>
		<updated>2010-08-02T20:03:07Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* Login */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:GeForce_9800_GT_3qtr_low.png|center|300px|thumb]]&lt;br /&gt;
|name=GPU Development Cluster&lt;br /&gt;
|installed=June 2010&lt;br /&gt;
|operatingsystem= Linux&lt;br /&gt;
|loginnode= cell-srv01 (from &amp;lt;tt&amp;gt;login.scinet&amp;lt;/tt&amp;gt;)&lt;br /&gt;
|numberofnodes=8&lt;br /&gt;
|rampernode=48 Gb &lt;br /&gt;
|corespernode=8&lt;br /&gt;
|interconnect=Infiniband,GigE&lt;br /&gt;
|vendorcompilers=gcc,nvcc&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
The Intel nodes have two 2.53GHz 4core Xeon X5550 CPU's with 48GB of RAM per node with 3 containing NVIDIA 9800GT GPUs.   &lt;br /&gt;
&lt;br /&gt;
===Login===&lt;br /&gt;
&lt;br /&gt;
First login via ssh with your scinet account at &amp;lt;tt&amp;gt;login.scinet.utoronto.ca&amp;lt;/tt&amp;gt;, and from there you can proceed to &amp;lt;tt&amp;gt;cell-srv01&amp;lt;/tt&amp;gt; which &lt;br /&gt;
is currently the gateway machine.&lt;br /&gt;
&lt;br /&gt;
Access to these machines is currently controlled. Please email support@scinet.utoronto.ca for access.&lt;br /&gt;
&lt;br /&gt;
==Compile/Devel/Compute Nodes==&lt;br /&gt;
&lt;br /&gt;
=== Nehalem (x86_64) ===&lt;br /&gt;
You can log into any of 8 nodes '''&amp;lt;tt&amp;gt;cell-srv[01-08]&amp;lt;/tt&amp;gt;''' directly however the nodes have differing configurations as follows:&lt;br /&gt;
&lt;br /&gt;
* '''&amp;lt;tt&amp;gt;cell-srv01&amp;lt;/tt&amp;gt;''' - login node &amp;amp; nfs server, GigE connected&lt;br /&gt;
* '''&amp;lt;tt&amp;gt;cell-srv[02-05]&amp;lt;/tt&amp;gt;''' - no GPU, GigE connected&lt;br /&gt;
* '''&amp;lt;tt&amp;gt;cell-srv[06-07]&amp;lt;/tt&amp;gt;''' - 1x NVIDIA 9800GT GPU, Infiniband connected&lt;br /&gt;
* '''&amp;lt;tt&amp;gt;cell-srv08&amp;lt;/tt&amp;gt;''' - 2x NVIDIA 9800GT GPU, GigE connected&lt;br /&gt;
&lt;br /&gt;
=== Software ===&lt;br /&gt;
&lt;br /&gt;
The same software installed on the GPC is available on ARC using the same modules framework. &lt;br /&gt;
See [[GPC_Quickstart#Modules_and_Environment_Variables | here]] for full details.&lt;br /&gt;
&lt;br /&gt;
==Programming Frameworks==&lt;br /&gt;
&lt;br /&gt;
Currently there are two programming frameworks to use, NVIDIA's CUDA framework or OpenCL.&lt;br /&gt;
&lt;br /&gt;
=== CUDA ===&lt;br /&gt;
&lt;br /&gt;
The current CUDA Toolkit in use is 3.0. To use it just add the following module&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The CUDA driver is installed locally, however the CUDA SDK is installed in.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/project/scinet/arc/cuda/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== OpenCL ===&lt;br /&gt;
 &lt;br /&gt;
As of 3.0, OpenCL is included in the CUDA Toolkit so loading the CUDA module is all the is required.&lt;br /&gt;
&lt;br /&gt;
===Compilers===&lt;br /&gt;
&lt;br /&gt;
* '''nvcc''' -- Nvidia compiler&lt;br /&gt;
&lt;br /&gt;
===MPI===&lt;br /&gt;
&lt;br /&gt;
The GPC MPI packages can be used on this system. See the GPC section on [[ GPC_Quickstart#MPI |MPI ]] for more details.&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
* CUDA&lt;br /&gt;
** google &amp;quot;CUDA&amp;quot;&lt;br /&gt;
&lt;br /&gt;
* OpenCL&lt;br /&gt;
** see above&lt;br /&gt;
&lt;br /&gt;
== Further Info ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== User Codes ==&lt;br /&gt;
&lt;br /&gt;
Please discuss put any relevant information/problems/best practices you have encountered when using/developing for CUDA and/or OpenCL&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Scheduler&amp;diff=1747</id>
		<title>Scheduler</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Scheduler&amp;diff=1747"/>
		<updated>2010-07-30T15:35:07Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* GPC */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;The queueing system used at SciNet is based around Cluster Resources [http://www.clusterresources.com/products/moab-cluster-suite/workload-manager.php Moab Workload Manager].&lt;br /&gt;
Moab is used on both the GPC and TCS however [http://www.clusterresources.com/products/torque/docs/index.shtml Torque] is used as the backend resource manager on the GPC and IBM's [http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp LoadLeveler] is used on the TCS.&lt;br /&gt;
&lt;br /&gt;
This page outlines some of the most common Moab commands with full documentation available from Moab [http://www.clusterresources.com/products/mwm/docs/a.gcommandoverview.shtml here].&lt;br /&gt;
&lt;br /&gt;
=== Queues ===&lt;br /&gt;
&lt;br /&gt;
==== GPC ====&lt;br /&gt;
&lt;br /&gt;
===== batch =====&lt;br /&gt;
&lt;br /&gt;
The batch queue is the default queue on the GPC allowing the user access to all the &lt;br /&gt;
resources for jobs upto 48 hours.  If a specific queue is not specified, &amp;lt;tt&amp;gt;-q&amp;lt;/tt&amp;gt; flag,&lt;br /&gt;
then a job is submitted to the batch queue.&lt;br /&gt;
&lt;br /&gt;
===== debug =====&lt;br /&gt;
&lt;br /&gt;
A debug queue has been set up primarily for code developers to quickly test&lt;br /&gt;
and evaluate their codes and configurations without having to wait in the batch queue.  There are 10 nodes&lt;br /&gt;
currently reserved for the debug queue.  It has quite restrictive limits to promote high turnover&lt;br /&gt;
and availability thus a user can only use 2 nodes (16 cores) for 2 hours, to a maximum&lt;br /&gt;
of 8 nodes (64 cores) for 1/2 an hour and can only have one job in the debug queue at a time. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -l nodes=1:ppn=8,walltime=1:00:00 -q debug -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===== largemem =====&lt;br /&gt;
&lt;br /&gt;
The largemem queue is used for accessing one of two 16 core with 128 GB memory intel Xeon (non-nehalem) nodes. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -l nodes=1:ppn=16,walltime=1:00:00 -q largemem -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== TCS ====&lt;br /&gt;
&lt;br /&gt;
The TCS currently only has one queue, or class, in use called &amp;quot;verylong&amp;quot; and its only&lt;br /&gt;
limitation is that jobs must be under 48 hours.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#@ class           = verylong&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Job Info===&lt;br /&gt;
&lt;br /&gt;
To see all jobs queued on a system use&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Three sections are shown; running, idle, and blocked.  Idle jobs are commonly referred to as queued jobs &lt;br /&gt;
as they meet all the requirements, however they are waiting for available resources.  Blocked jobs &lt;br /&gt;
are either caused by improper resource requests or more commonly by exceeding a user or groups allowable&lt;br /&gt;
resources.   For example if you are allowed to submit 10 jobs and you submit 20, the first 10&lt;br /&gt;
jobs will be submitted properly and either run right away or be queued, however the other 10 jobs&lt;br /&gt;
will be blocked and the jobs won't be submitted to the queue until one of the first 10 finishes.&lt;br /&gt;
&lt;br /&gt;
If showq is returning output slowly, you can query cached info using &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showq --noblock&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Available Resources ===&lt;br /&gt;
&lt;br /&gt;
Determining when your job will run can be tricky as it involves a combination of queue type, node type, system reservations, and job priority. The following commands are provided to help you figure out what resources are currently available, however they may not tell you exactly when your job will run for the aforementioned reasons.&lt;br /&gt;
&lt;br /&gt;
==== GPC ====&lt;br /&gt;
To show how many ethernet nodes are currently free, use the show back fill command&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showbf -f compute-eth &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To show how many infiniband nodes are free, use&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showbf -f ib&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== TCS ====&lt;br /&gt;
To show how many TCS nodes are free, use&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showbf -c verylong&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For example checking for an ethernet job&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showbf -f compute-eth&lt;br /&gt;
Partition     Tasks  Nodes      Duration   StartOffset       StartDate&lt;br /&gt;
---------     -----  -----  ------------  ------------  --------------&lt;br /&gt;
ALL           14728   1839       7:36:23      00:00:00  00:23:37_09/24&lt;br /&gt;
ALL             256     30      INFINITY      00:00:00  00:23:37_09/24&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
shows that for jobs under 7:36:23 you can use 1839 nodes, but if you submit&lt;br /&gt;
a job over that time only 30 will be available.  In this case this is&lt;br /&gt;
due to a large reservation made my SciNet staff, but from a users point&lt;br /&gt;
of view, showbf tells you very simply what is available and at what time point.&lt;br /&gt;
In this case, a user may wish to set #PBS -l walltime=7:30:00 in their script, or add -l walltime=7:30:00 to their qsub command in order to ensure that the jobs backfill the reserved nodes.&lt;br /&gt;
&lt;br /&gt;
'''NOTE:''' showbf shows currently available nodes, however just because nodes are available&lt;br /&gt;
doesn't mean that your job will start right away.  Job priority, system reservations &lt;br /&gt;
along with dedicated nodes, such as those for the debug queue, will alter when jobs &lt;br /&gt;
run so even if enough nodes appear &amp;quot;free&amp;quot;, it doesn't mean your job will actually run right &lt;br /&gt;
away.&lt;br /&gt;
&lt;br /&gt;
=== Job Submission ===&lt;br /&gt;
&lt;br /&gt;
==== Interactive ====&lt;br /&gt;
&lt;br /&gt;
On the GPC an interactive queue session can be requested using the following &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -l nodes=2:ppn=8,walltime=1:00:00 -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Non-interactive (Batch) ====&lt;br /&gt;
&lt;br /&gt;
For a non-interactive job submission you require a submission script formatted for the appropriate resource manger. Examples&lt;br /&gt;
are provided for the [[GPC_Quickstart#Submitting_A_Batch_Job | GPC]] and [[TCS_Quickstart#Submitting_A_Job | TCS]].&lt;br /&gt;
&lt;br /&gt;
=== Job Status ===&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ checkjob jobid&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Cancel a Job ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ canceljob jobid&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Accounting ===&lt;br /&gt;
&lt;br /&gt;
For any user with an NRAC/LRAC allocation, a special account with the Resource Allocation Project (RAP) identifier (RAPI) from Compute Canada Database (CCDB) is set up in order to access the allocated resources.  Please use the following instructions to run your job using your special allocation.  This is necessary both for accounting purposes as well as to assign the appropriate priority to your jobs.&lt;br /&gt;
&lt;br /&gt;
Each job run on the system will have a default RAP associated with it.  Most users already have their default RAP properly set.  However, if you have more than one allocation (different RAPs),  you may need/want to change your default RAP in order to charge your jobs to a particular RAP.&lt;br /&gt;
&lt;br /&gt;
==== Changing your default RAP ====&lt;br /&gt;
&lt;br /&gt;
# Go to the [https://portal.scinet.utoronto.ca portal], login with your SciNet username and password.&lt;br /&gt;
# Click on &amp;quot;Change SciNet default RAP&amp;quot; and change your default RAP.&lt;br /&gt;
&lt;br /&gt;
==== Specifying the RAP for GPC ====&lt;br /&gt;
&lt;br /&gt;
Alternatively, you may want to assign a RAP for each particular job you run.  There are two ways to specify an account for Moab/Torque: From the command line or inside the batch submission script.&lt;br /&gt;
&lt;br /&gt;
===== Command line =====&lt;br /&gt;
&lt;br /&gt;
Use the '-A RAPI' flag when you submit your job using qsub.  Note that the command line option will override the submission script if an account is specified on both the submission script and the command line.  &amp;quot;RAPI&amp;quot; is the RAP Identifier, e.g. abc-123-de.&lt;br /&gt;
&lt;br /&gt;
===== Submission Script =====&lt;br /&gt;
&lt;br /&gt;
Add a line in your submit script as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#PBS -A RAPI&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Please replace &amp;quot;RAPI&amp;quot; with your RAP Identifier.&lt;br /&gt;
&lt;br /&gt;
==== Specifiying the RAP for TCS ====&lt;br /&gt;
&lt;br /&gt;
Add a line in your submit script as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# @ account_no = RAPI&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Please replace &amp;quot;RAPI&amp;quot; with your RAP Identifier.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== User Stats ===&lt;br /&gt;
&lt;br /&gt;
Show current usage stats for a $USER&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showstats -u $USER&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Reservations ===&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showres&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Standard users can only see their reservations not other users or system ones.&lt;br /&gt;
To determine what is available a user can use &amp;quot;showbf&amp;quot;, it shows what resources are&lt;br /&gt;
available and at what time level, taking into account running jobs and all the reservations. Refer to the [[Moab#Available_Resources | Available Resources]] section of this page for more details.&lt;br /&gt;
&lt;br /&gt;
=== Job Dependencies ===&lt;br /&gt;
&lt;br /&gt;
Sometimes you may want one job not to start until another job finishes, however&lt;br /&gt;
you would like to submit them both at the same time.  This can be done&lt;br /&gt;
using job dependencies on both the GPC and TCS, however the commands &lt;br /&gt;
are different due to the underlying resource managers being different.&lt;br /&gt;
&lt;br /&gt;
==== GPC ====&lt;br /&gt;
&lt;br /&gt;
Use the -W flag with the following syntax in your submission script to have this job not start&lt;br /&gt;
until the job with jobid or jobName (given with -N jobName) is finished&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend:after:{jobid | jobName}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More detailed syntax and examples can be found &lt;br /&gt;
[[http://www.clusterresources.com/products/mwm/docs/11.5jobdependencies.shtml#overview here ]] and&lt;br /&gt;
[[http://www.clusterresources.com/products/torque/docs/commands/qsub.shtml#W here]].&lt;br /&gt;
&lt;br /&gt;
==== TCS ====&lt;br /&gt;
&lt;br /&gt;
Loadleveler does job dependencies using what they call steps.&lt;br /&gt;
See the [[TCS_Quickstart#Steps | TCS Quickstart]] guide for an example.&lt;br /&gt;
&lt;br /&gt;
=== Adjusting Job Priority ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The ability to adjust job priorities downwards can also be of use to adjust relative priorities of jobs between users who are running jobs of the same allocation (eg, a default, LRAC, or NRAC allocation of the same PI).   Priorities are determined by how much of the time of that allocation been currently used, and all users using that account will have identical priorities.   This mechanism allows users to voluntarily reduce their priority to allow other users of the same allocation to run ahead of them.&lt;br /&gt;
&lt;br /&gt;
In principle, by adjusting a jobs priority downwards, you could reduce your jobs priority to the point that someone elses job entirely could go ahead of yours.  In practice, however, this is extremely unlikely.   Users with LRAC or NRAC allocations have priorities that are extremely large positive numbers that depend on their allocation and how much of it they have already used during the past fairshare window (2 weeks); it is very unlikely that two groups would have priorities that are within 10 or 100 or 1000 of each other.&lt;br /&gt;
&lt;br /&gt;
Note that at the moment, we do not allow priorities to go negative; they are integers that can go no lower than 1.  (This may change in the future)  That means that users of accounts that have already used their full allocation during the current fairshare period (eg, over the past two weeks), and so whose priority would normally be negative but is capped at 1, can not lower their priority any further.   Similar, users with a `default' allocation have priority 1, and cannot lower their priorities any further.&lt;br /&gt;
&lt;br /&gt;
==== GPC ====&lt;br /&gt;
&lt;br /&gt;
Moab allows users to adjust their jobs' priority moderately downwards, with the &amp;lt;tt&amp;gt;-p&amp;lt;/tt&amp;gt; flag; that is, on a qsub line&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub ... -p -10&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or in a script&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
#PBS -p -10&lt;br /&gt;
..&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The number used (-10 in the examples above) can be any negative number down to -1024.   &lt;br /&gt;
&lt;br /&gt;
The ability to adjust job priorities downwards can be useful when you are running a number of jobs and want some to enter the queue at higher priorities than others.   Note that if you absolutely require some jobs to start before others, you could use [[#Job Dependencies | job dependencies]] instead.&lt;br /&gt;
&lt;br /&gt;
For a job that is currently queued, one can adjust its priority with&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mjobctl -p -10&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== TCS ====&lt;br /&gt;
&lt;br /&gt;
TCS users can adjust their priorities by putting the following line in their scripts&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#@ user_priority = 50 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the number can be between 0 (which is 50 below the default priority) to 50 (the default priority).&lt;br /&gt;
&lt;br /&gt;
=== Suspending a Running Job ===&lt;br /&gt;
&lt;br /&gt;
Separate from, and in addition to, the ability to place a hold on a queued job, you may want to suspend a running job. For example, you may want to test the timing of events in a weakly coupled parallel environment.&lt;br /&gt;
&lt;br /&gt;
==== GPC ====&lt;br /&gt;
&lt;br /&gt;
To suspend a job:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsig -s STOP &amp;lt;jobid&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and to start it again:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsig -s CONT &amp;lt;jobid&amp;gt;.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Scripts are suspendable by default, so you don't need to add any signal handling for this to work.&lt;br /&gt;
As far as we can tell, the result is identical to using fg and ctrl-Z (or kill -STOP &amp;lt;PID&amp;gt;) in an interactive run.&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Scheduler&amp;diff=1746</id>
		<title>Scheduler</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Scheduler&amp;diff=1746"/>
		<updated>2010-07-30T15:31:32Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* GPC */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;The queueing system used at SciNet is based around Cluster Resources [http://www.clusterresources.com/products/moab-cluster-suite/workload-manager.php Moab Workload Manager].&lt;br /&gt;
Moab is used on both the GPC and TCS however [http://www.clusterresources.com/products/torque/docs/index.shtml Torque] is used as the backend resource manager on the GPC and IBM's [http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp LoadLeveler] is used on the TCS.&lt;br /&gt;
&lt;br /&gt;
This page outlines some of the most common Moab commands with full documentation available from Moab [http://www.clusterresources.com/products/mwm/docs/a.gcommandoverview.shtml here].&lt;br /&gt;
&lt;br /&gt;
=== Queues ===&lt;br /&gt;
&lt;br /&gt;
==== GPC ====&lt;br /&gt;
&lt;br /&gt;
===== batch =====&lt;br /&gt;
&lt;br /&gt;
The batch queue is the default queue on the GPC allowing the user access to all the &lt;br /&gt;
resources for jobs upto 48 hours.  If a specific queue is not specified, &amp;lt;tt&amp;gt;-q&amp;lt;/tt&amp;gt; flag,&lt;br /&gt;
then a job is submitted to the batch queue.&lt;br /&gt;
&lt;br /&gt;
===== debug =====&lt;br /&gt;
&lt;br /&gt;
A debug queue has been set up primarily for code developers to quickly test&lt;br /&gt;
and evaluate their codes and configurations without having to wait in the batch queue.  There are 10 nodes&lt;br /&gt;
currently reserved for the debug queue.  It has quite restrictive limits to promote high turnover&lt;br /&gt;
and availability thus a user can only use 2 nodes (16 cores) for 2 hours, to a maximum&lt;br /&gt;
of 8 nodes (64 cores) for 1/2 an hour and can only have one job in the debug queue at a time. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -l nodes=1:ppn=8,walltime=1:00:00 -q debug -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===== largemem =====&lt;br /&gt;
&lt;br /&gt;
The largemem queue is used for accessing one of two 16 core with 128 GB memory intel Xeon (non-nehalem) nodes. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -l nodes=1:ppn=16,walltime=1:00:00 -q largemem -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== TCS ====&lt;br /&gt;
&lt;br /&gt;
The TCS currently only has one queue, or class, in use called &amp;quot;verylong&amp;quot; and its only&lt;br /&gt;
limitation is that jobs must be under 48 hours.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#@ class           = verylong&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Job Info===&lt;br /&gt;
&lt;br /&gt;
To see all jobs queued on a system use&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Three sections are shown; running, idle, and blocked.  Idle jobs are commonly referred to as queued jobs &lt;br /&gt;
as they meet all the requirements, however they are waiting for available resources.  Blocked jobs &lt;br /&gt;
are either caused by improper resource requests or more commonly by exceeding a user or groups allowable&lt;br /&gt;
resources.   For example if you are allowed to submit 10 jobs and you submit 20, the first 10&lt;br /&gt;
jobs will be submitted properly and either run right away or be queued, however the other 10 jobs&lt;br /&gt;
will be blocked and the jobs won't be submitted to the queue until one of the first 10 finishes.&lt;br /&gt;
&lt;br /&gt;
If showq is returning output slowly, you can query cached info using &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showq --noblock&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Available Resources ===&lt;br /&gt;
&lt;br /&gt;
Determining when your job will run can be tricky as it involves a combination of queue type, node type, system reservations, and job priority. The following commands are provided to help you figure out what resources are currently available, however they may not tell you exactly when your job will run for the aforementioned reasons.&lt;br /&gt;
&lt;br /&gt;
==== GPC ====&lt;br /&gt;
To show how many ethernet nodes are currently free, use the show back fill command&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showbf -f compute-eth &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To show how many infiniband nodes are free, use&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showbf -f ib&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== TCS ====&lt;br /&gt;
To show how many TCS nodes are free, use&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showbf -c verylong&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For example checking for an ethernet job&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showbf -f compute-eth&lt;br /&gt;
Partition     Tasks  Nodes      Duration   StartOffset       StartDate&lt;br /&gt;
---------     -----  -----  ------------  ------------  --------------&lt;br /&gt;
ALL           14728   1839       7:36:23      00:00:00  00:23:37_09/24&lt;br /&gt;
ALL             256     30      INFINITY      00:00:00  00:23:37_09/24&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
shows that for jobs under 7:36:23 you can use 1839 nodes, but if you submit&lt;br /&gt;
a job over that time only 30 will be available.  In this case this is&lt;br /&gt;
due to a large reservation made my SciNet staff, but from a users point&lt;br /&gt;
of view, showbf tells you very simply what is available and at what time point.&lt;br /&gt;
In this case, a user may wish to set #PBS -l walltime=7:30:00 in their script, or add -l walltime=7:30:00 to their qsub command in order to ensure that the jobs backfill the reserved nodes.&lt;br /&gt;
&lt;br /&gt;
'''NOTE:''' showbf shows currently available nodes, however just because nodes are available&lt;br /&gt;
doesn't mean that your job will start right away.  Job priority, system reservations &lt;br /&gt;
along with dedicated nodes, such as those for the debug queue, will alter when jobs &lt;br /&gt;
run so even if enough nodes appear &amp;quot;free&amp;quot;, it doesn't mean your job will actually run right &lt;br /&gt;
away.&lt;br /&gt;
&lt;br /&gt;
=== Job Submission ===&lt;br /&gt;
&lt;br /&gt;
==== Interactive ====&lt;br /&gt;
&lt;br /&gt;
On the GPC an interactive queue session can be requested using the following &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -l nodes=2:ppn=8,walltime=1:00:00 -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Non-interactive (Batch) ====&lt;br /&gt;
&lt;br /&gt;
For a non-interactive job submission you require a submission script formatted for the appropriate resource manger. Examples&lt;br /&gt;
are provided for the [[GPC_Quickstart#Submitting_A_Batch_Job | GPC]] and [[TCS_Quickstart#Submitting_A_Job | TCS]].&lt;br /&gt;
&lt;br /&gt;
=== Job Status ===&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ checkjob jobid&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Cancel a Job ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ canceljob jobid&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Accounting ===&lt;br /&gt;
&lt;br /&gt;
For any user with an NRAC/LRAC allocation, a special account with the Resource Allocation Project (RAP) identifier (RAPI) from Compute Canada Database (CCDB) is set up in order to access the allocated resources.  Please use the following instructions to run your job using your special allocation.  This is necessary both for accounting purposes as well as to assign the appropriate priority to your jobs.&lt;br /&gt;
&lt;br /&gt;
Each job run on the system will have a default RAP associated with it.  Most users already have their default RAP properly set.  However, if you have more than one allocation (different RAPs),  you may need/want to change your default RAP in order to charge your jobs to a particular RAP.&lt;br /&gt;
&lt;br /&gt;
==== Changing your default RAP ====&lt;br /&gt;
&lt;br /&gt;
# Go to the [https://portal.scinet.utoronto.ca portal], login with your SciNet username and password.&lt;br /&gt;
# Click on &amp;quot;Change SciNet default RAP&amp;quot; and change your default RAP.&lt;br /&gt;
&lt;br /&gt;
==== Specifying the RAP for GPC ====&lt;br /&gt;
&lt;br /&gt;
Alternatively, you may want to assign a RAP for each particular job you run.  There are two ways to specify an account for Moab/Torque: From the command line or inside the batch submission script.&lt;br /&gt;
&lt;br /&gt;
===== Command line =====&lt;br /&gt;
&lt;br /&gt;
Use the '-A RAPI' flag when you submit your job using qsub.  Note that the command line option will override the submission script if an account is specified on both the submission script and the command line.  &amp;quot;RAPI&amp;quot; is the RAP Identifier, e.g. abc-123-de.&lt;br /&gt;
&lt;br /&gt;
===== Submission Script =====&lt;br /&gt;
&lt;br /&gt;
Add a line in your submit script as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#PBS -A RAPI&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Please replace &amp;quot;RAPI&amp;quot; with your RAP Identifier.&lt;br /&gt;
&lt;br /&gt;
==== Specifiying the RAP for TCS ====&lt;br /&gt;
&lt;br /&gt;
Add a line in your submit script as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# @ account_no = RAPI&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Please replace &amp;quot;RAPI&amp;quot; with your RAP Identifier.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== User Stats ===&lt;br /&gt;
&lt;br /&gt;
Show current usage stats for a $USER&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showstats -u $USER&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Reservations ===&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showres&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Standard users can only see their reservations not other users or system ones.&lt;br /&gt;
To determine what is available a user can use &amp;quot;showbf&amp;quot;, it shows what resources are&lt;br /&gt;
available and at what time level, taking into account running jobs and all the reservations. Refer to the [[Moab#Available_Resources | Available Resources]] section of this page for more details.&lt;br /&gt;
&lt;br /&gt;
=== Job Dependencies ===&lt;br /&gt;
&lt;br /&gt;
Sometimes you may want one job not to start until another job finishes, however&lt;br /&gt;
you would like to submit them both at the same time.  This can be done&lt;br /&gt;
using job dependencies on both the GPC and TCS, however the commands &lt;br /&gt;
are different due to the underlying resource managers being different.&lt;br /&gt;
&lt;br /&gt;
==== GPC ====&lt;br /&gt;
&lt;br /&gt;
Use the -W flag with the following syntax in your submission script to have this job not start&lt;br /&gt;
until the job with jobid or jobName (given with -N jobName) is finished&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend:after:{jobid | jobName}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More detailed syntax and examples can be found &lt;br /&gt;
[[http://www.clusterresources.com/products/mwm/docs/11.5jobdependencies.shtml#overview here ]] and&lt;br /&gt;
[[http://www.clusterresources.com/products/torque/docs/commands/qsub.shtml#W here]].&lt;br /&gt;
&lt;br /&gt;
==== TCS ====&lt;br /&gt;
&lt;br /&gt;
Loadleveler does job dependencies using what they call steps.&lt;br /&gt;
See the [[TCS_Quickstart#Steps | TCS Quickstart]] guide for an example.&lt;br /&gt;
&lt;br /&gt;
=== Adjusting Job Priority ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The ability to adjust job priorities downwards can also be of use to adjust relative priorities of jobs between users who are running jobs of the same allocation (eg, a default, LRAC, or NRAC allocation of the same PI).   Priorities are determined by how much of the time of that allocation been currently used, and all users using that account will have identical priorities.   This mechanism allows users to voluntarily reduce their priority to allow other users of the same allocation to run ahead of them.&lt;br /&gt;
&lt;br /&gt;
In principle, by adjusting a jobs priority downwards, you could reduce your jobs priority to the point that someone elses job entirely could go ahead of yours.  In practice, however, this is extremely unlikely.   Users with LRAC or NRAC allocations have priorities that are extremely large positive numbers that depend on their allocation and how much of it they have already used during the past fairshare window (2 weeks); it is very unlikely that two groups would have priorities that are within 10 or 100 or 1000 of each other.&lt;br /&gt;
&lt;br /&gt;
Note that at the moment, we do not allow priorities to go negative; they are integers that can go no lower than 1.  (This may change in the future)  That means that users of accounts that have already used their full allocation during the current fairshare period (eg, over the past two weeks), and so whose priority would normally be negative but is capped at 1, can not lower their priority any further.   Similar, users with a `default' allocation have priority 1, and cannot lower their priorities any further.&lt;br /&gt;
&lt;br /&gt;
==== GPC ====&lt;br /&gt;
&lt;br /&gt;
Moab allows users to adjust their jobs' priority moderately downwards, with the &amp;lt;tt&amp;gt;-p&amp;lt;/tt&amp;gt; flag; that is, on a qsub line&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub ... -p -10&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or in a script&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
#PBS -p -10&lt;br /&gt;
..&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The number used (-10 in the examples above) can be any negative number down to -1024.   &lt;br /&gt;
&lt;br /&gt;
The ability to adjust job priorities downwards can be useful when you are running a number of jobs and want some to enter the queue at higher priorities than others.   Note that if you absolutely require some jobs to start before others, you could use [[#Job Dependencies | job dependencies]] instead.&lt;br /&gt;
&lt;br /&gt;
For a job that is currently queued, one can adjust its priority with&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mjobctl -p -10&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== TCS ====&lt;br /&gt;
&lt;br /&gt;
TCS users can adjust their priorities by putting the following line in their scripts&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#@ user_priority = 50 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the number can be between 0 (which is 50 below the default priority) to 50 (the default priority).&lt;br /&gt;
&lt;br /&gt;
=== Suspending a Running Job ===&lt;br /&gt;
&lt;br /&gt;
Separate from, and in addition to, the ability to place a hold on a queued job, you may want to suspend a running job. For example, you may want to test the timing of events in a weakly coupled parallel environment.&lt;br /&gt;
&lt;br /&gt;
==== GPC ====&lt;br /&gt;
&lt;br /&gt;
To save you the trouble of finding out which node a job is running on and then ssh-ing into the node, you can use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsig -s STOP &amp;lt;jobid&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and to start it again&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsig -s CONT &amp;lt;jobid&amp;gt;.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Scheduler&amp;diff=1744</id>
		<title>Scheduler</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Scheduler&amp;diff=1744"/>
		<updated>2010-07-30T12:05:38Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* GPC */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;The queueing system used at SciNet is based around Cluster Resources [http://www.clusterresources.com/products/moab-cluster-suite/workload-manager.php Moab Workload Manager].&lt;br /&gt;
Moab is used on both the GPC and TCS however [http://www.clusterresources.com/products/torque/docs/index.shtml Torque] is used as the backend resource manager on the GPC and IBM's [http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp LoadLeveler] is used on the TCS.&lt;br /&gt;
&lt;br /&gt;
This page outlines some of the most common Moab commands with full documentation available from Moab [http://www.clusterresources.com/products/mwm/docs/a.gcommandoverview.shtml here].&lt;br /&gt;
&lt;br /&gt;
=== Queues ===&lt;br /&gt;
&lt;br /&gt;
==== GPC ====&lt;br /&gt;
&lt;br /&gt;
===== batch =====&lt;br /&gt;
&lt;br /&gt;
The batch queue is the default queue on the GPC allowing the user access to all the &lt;br /&gt;
resources for jobs upto 48 hours.  If a specific queue is not specified, &amp;lt;tt&amp;gt;-q&amp;lt;/tt&amp;gt; flag,&lt;br /&gt;
then a job is submitted to the batch queue.&lt;br /&gt;
&lt;br /&gt;
===== debug =====&lt;br /&gt;
&lt;br /&gt;
A debug queue has been set up primarily for code developers to quickly test&lt;br /&gt;
and evaluate their codes and configurations without having to wait in the batch queue.  There are 10 nodes&lt;br /&gt;
currently reserved for the debug queue.  It has quite restrictive limits to promote high turnover&lt;br /&gt;
and availability thus a user can only use 2 nodes (16 cores) for 2 hours, to a maximum&lt;br /&gt;
of 8 nodes (64 cores) for 1/2 an hour and can only have one job in the debug queue at a time. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -l nodes=1:ppn=8,walltime=1:00:00 -q debug -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===== largemem =====&lt;br /&gt;
&lt;br /&gt;
The largemem queue is used for accessing one of two 16 core with 128 GB memory intel Xeon (non-nehalem) nodes. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -l nodes=1:ppn=16,walltime=1:00:00 -q largemem -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== TCS ====&lt;br /&gt;
&lt;br /&gt;
The TCS currently only has one queue, or class, in use called &amp;quot;verylong&amp;quot; and its only&lt;br /&gt;
limitation is that jobs must be under 48 hours.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#@ class           = verylong&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Job Info===&lt;br /&gt;
&lt;br /&gt;
To see all jobs queued on a system use&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Three sections are shown; running, idle, and blocked.  Idle jobs are commonly referred to as queued jobs &lt;br /&gt;
as they meet all the requirements, however they are waiting for available resources.  Blocked jobs &lt;br /&gt;
are either caused by improper resource requests or more commonly by exceeding a user or groups allowable&lt;br /&gt;
resources.   For example if you are allowed to submit 10 jobs and you submit 20, the first 10&lt;br /&gt;
jobs will be submitted properly and either run right away or be queued, however the other 10 jobs&lt;br /&gt;
will be blocked and the jobs won't be submitted to the queue until one of the first 10 finishes.&lt;br /&gt;
&lt;br /&gt;
If showq is returning output slowly, you can query cached info using &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showq --noblock&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Available Resources ===&lt;br /&gt;
&lt;br /&gt;
Determining when your job will run can be tricky as it involves a combination of queue type, node type, system reservations, and job priority. The following commands are provided to help you figure out what resources are currently available, however they may not tell you exactly when your job will run for the aforementioned reasons.&lt;br /&gt;
&lt;br /&gt;
==== GPC ====&lt;br /&gt;
To show how many ethernet nodes are currently free, use the show back fill command&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showbf -f compute-eth &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To show how many infiniband nodes are free, use&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showbf -f ib&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== TCS ====&lt;br /&gt;
To show how many TCS nodes are free, use&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showbf -c verylong&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For example checking for an ethernet job&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showbf -f compute-eth&lt;br /&gt;
Partition     Tasks  Nodes      Duration   StartOffset       StartDate&lt;br /&gt;
---------     -----  -----  ------------  ------------  --------------&lt;br /&gt;
ALL           14728   1839       7:36:23      00:00:00  00:23:37_09/24&lt;br /&gt;
ALL             256     30      INFINITY      00:00:00  00:23:37_09/24&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
shows that for jobs under 7:36:23 you can use 1839 nodes, but if you submit&lt;br /&gt;
a job over that time only 30 will be available.  In this case this is&lt;br /&gt;
due to a large reservation made my SciNet staff, but from a users point&lt;br /&gt;
of view, showbf tells you very simply what is available and at what time point.&lt;br /&gt;
In this case, a user may wish to set #PBS -l walltime=7:30:00 in their script, or add -l walltime=7:30:00 to their qsub command in order to ensure that the jobs backfill the reserved nodes.&lt;br /&gt;
&lt;br /&gt;
'''NOTE:''' showbf shows currently available nodes, however just because nodes are available&lt;br /&gt;
doesn't mean that your job will start right away.  Job priority, system reservations &lt;br /&gt;
along with dedicated nodes, such as those for the debug queue, will alter when jobs &lt;br /&gt;
run so even if enough nodes appear &amp;quot;free&amp;quot;, it doesn't mean your job will actually run right &lt;br /&gt;
away.&lt;br /&gt;
&lt;br /&gt;
=== Job Submission ===&lt;br /&gt;
&lt;br /&gt;
==== Interactive ====&lt;br /&gt;
&lt;br /&gt;
On the GPC an interactive queue session can be requested using the following &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -l nodes=2:ppn=8,walltime=1:00:00 -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Non-interactive (Batch) ====&lt;br /&gt;
&lt;br /&gt;
For a non-interactive job submission you require a submission script formatted for the appropriate resource manger. Examples&lt;br /&gt;
are provided for the [[GPC_Quickstart#Submitting_A_Batch_Job | GPC]] and [[TCS_Quickstart#Submitting_A_Job | TCS]].&lt;br /&gt;
&lt;br /&gt;
=== Job Status ===&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ checkjob jobid&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Cancel a Job ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ canceljob jobid&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Accounting ===&lt;br /&gt;
&lt;br /&gt;
For any user with an NRAC/LRAC allocation, a special account with the Resource Allocation Project (RAP) identifier (RAPI) from Compute Canada Database (CCDB) is set up in order to access the allocated resources.  Please use the following instructions to run your job using your special allocation.  This is necessary both for accounting purposes as well as to assign the appropriate priority to your jobs.&lt;br /&gt;
&lt;br /&gt;
Each job run on the system will have a default RAP associated with it.  Most users already have their default RAP properly set.  However, if you have more than one allocation (different RAPs),  you may need/want to change your default RAP in order to charge your jobs to a particular RAP.&lt;br /&gt;
&lt;br /&gt;
==== Changing your default RAP ====&lt;br /&gt;
&lt;br /&gt;
# Go to the [https://portal.scinet.utoronto.ca portal], login with your SciNet username and password.&lt;br /&gt;
# Click on &amp;quot;Change SciNet default RAP&amp;quot; and change your default RAP.&lt;br /&gt;
&lt;br /&gt;
==== Specifying the RAP for GPC ====&lt;br /&gt;
&lt;br /&gt;
Alternatively, you may want to assign a RAP for each particular job you run.  There are two ways to specify an account for Moab/Torque: From the command line or inside the batch submission script.&lt;br /&gt;
&lt;br /&gt;
===== Command line =====&lt;br /&gt;
&lt;br /&gt;
Use the '-A RAPI' flag when you submit your job using qsub.  Note that the command line option will override the submission script if an account is specified on both the submission script and the command line.  &amp;quot;RAPI&amp;quot; is the RAP Identifier, e.g. abc-123-de.&lt;br /&gt;
&lt;br /&gt;
===== Submission Script =====&lt;br /&gt;
&lt;br /&gt;
Add a line in your submit script as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#PBS -A RAPI&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Please replace &amp;quot;RAPI&amp;quot; with your RAP Identifier.&lt;br /&gt;
&lt;br /&gt;
==== Specifiying the RAP for TCS ====&lt;br /&gt;
&lt;br /&gt;
Add a line in your submit script as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# @ account_no = RAPI&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Please replace &amp;quot;RAPI&amp;quot; with your RAP Identifier.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== User Stats ===&lt;br /&gt;
&lt;br /&gt;
Show current usage stats for a $USER&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showstats -u $USER&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Reservations ===&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showres&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Standard users can only see their reservations not other users or system ones.&lt;br /&gt;
To determine what is available a user can use &amp;quot;showbf&amp;quot;, it shows what resources are&lt;br /&gt;
available and at what time level, taking into account running jobs and all the reservations. Refer to the [[Moab#Available_Resources | Available Resources]] section of this page for more details.&lt;br /&gt;
&lt;br /&gt;
=== Job Dependencies ===&lt;br /&gt;
&lt;br /&gt;
Sometimes you may want one job not to start until another job finishes, however&lt;br /&gt;
you would like to submit them both at the same time.  This can be done&lt;br /&gt;
using job dependencies on both the GPC and TCS, however the commands &lt;br /&gt;
are different due to the underlying resource managers being different.&lt;br /&gt;
&lt;br /&gt;
==== GPC ====&lt;br /&gt;
&lt;br /&gt;
Use the -W flag with the following syntax in your submission script to have this job not start&lt;br /&gt;
until the job with jobid or jobName (given with -N jobName) is finished&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend:after:{jobid | jobName}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More detailed syntax and examples can be found &lt;br /&gt;
[[http://www.clusterresources.com/products/mwm/docs/11.5jobdependencies.shtml#overview here ]] and&lt;br /&gt;
[[http://www.clusterresources.com/products/torque/docs/commands/qsub.shtml#W here]].&lt;br /&gt;
&lt;br /&gt;
==== TCS ====&lt;br /&gt;
&lt;br /&gt;
Loadleveler does job dependencies using what they call steps.&lt;br /&gt;
See the [[TCS_Quickstart#Steps | TCS Quickstart]] guide for an example.&lt;br /&gt;
&lt;br /&gt;
=== Adjusting Job Priority ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The ability to adjust job priorities downwards can also be of use to adjust relative priorities of jobs between users who are running jobs of the same allocation (eg, a default, LRAC, or NRAC allocation of the same PI).   Priorities are determined by how much of the time of that allocation been currently used, and all users using that account will have identical priorities.   This mechanism allows users to voluntarily reduce their priority to allow other users of the same allocation to run ahead of them.&lt;br /&gt;
&lt;br /&gt;
In principle, by adjusting a jobs priority downwards, you could reduce your jobs priority to the point that someone elses job entirely could go ahead of yours.  In practice, however, this is extremely unlikely.   Users with LRAC or NRAC allocations have priorities that are extremely large positive numbers that depend on their allocation and how much of it they have already used during the past fairshare window (2 weeks); it is very unlikely that two groups would have priorities that are within 10 or 100 or 1000 of each other.&lt;br /&gt;
&lt;br /&gt;
Note that at the moment, we do not allow priorities to go negative; they are integers that can go no lower than 1.  (This may change in the future)  That means that users of accounts that have already used their full allocation during the current fairshare period (eg, over the past two weeks), and so whose priority would normally be negative but is capped at 1, can not lower their priority any further.   Similar, users with a `default' allocation have priority 1, and cannot lower their priorities any further.&lt;br /&gt;
&lt;br /&gt;
==== GPC ====&lt;br /&gt;
&lt;br /&gt;
Moab allows users to adjust their jobs' priority moderately downwards, with the &amp;lt;tt&amp;gt;-p&amp;lt;/tt&amp;gt; flag; that is, on a qsub line&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub ... -p -10&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or in a script&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
#PBS -p -10&lt;br /&gt;
..&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The number used (-10 in the examples above) can be any negative number down to -1024.   &lt;br /&gt;
&lt;br /&gt;
The ability to adjust job priorities downwards can be useful when you are running a number of jobs and want some to enter the queue at higher priorities than others.   Note that if you absolutely require some jobs to start before others, you could use [[#Job Dependencies | job dependencies]] instead.&lt;br /&gt;
&lt;br /&gt;
For a job that is currently queued, one can adjust its priority with&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mjobctl -p -10&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== TCS ====&lt;br /&gt;
&lt;br /&gt;
TCS users can adjust their priorities by putting the following line in their scripts&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#@ user_priority = 50 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the number can be between 0 (which is 50 below the default priority) to 50 (the default priority).&lt;br /&gt;
&lt;br /&gt;
=== Suspending a Running Job ===&lt;br /&gt;
&lt;br /&gt;
Separate from, and in addition to, the ability to place a hold on a queued job, you may want to suspend a running job. For example, you may want to test the timing of events in a weakly coupled parallel environment.&lt;br /&gt;
&lt;br /&gt;
==== GPC ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mjobctl -s &amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This, however, is not working as one might expect and you will get the notice:&lt;br /&gt;
&lt;br /&gt;
job &amp;lt;JOBID&amp;gt; not suspendable, preempt policy temporarily changed to requeue&lt;br /&gt;
job successfully preempted&lt;br /&gt;
&lt;br /&gt;
You could also send a signal with -N option for mjobctl if you want more customize behaviour.  &lt;br /&gt;
Please see the mjobctl information page for more details here:&lt;br /&gt;
http://www.clusterresources.com/products/mwm/docs/commands/mjobctl.shtml&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Scheduler&amp;diff=1743</id>
		<title>Scheduler</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Scheduler&amp;diff=1743"/>
		<updated>2010-07-30T11:54:24Z</updated>

		<summary type="html">&lt;p&gt;Cneale: Added job suspension information from Jason Chong&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;The queueing system used at SciNet is based around Cluster Resources [http://www.clusterresources.com/products/moab-cluster-suite/workload-manager.php Moab Workload Manager].&lt;br /&gt;
Moab is used on both the GPC and TCS however [http://www.clusterresources.com/products/torque/docs/index.shtml Torque] is used as the backend resource manager on the GPC and IBM's [http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp LoadLeveler] is used on the TCS.&lt;br /&gt;
&lt;br /&gt;
This page outlines some of the most common Moab commands with full documentation available from Moab [http://www.clusterresources.com/products/mwm/docs/a.gcommandoverview.shtml here].&lt;br /&gt;
&lt;br /&gt;
=== Queues ===&lt;br /&gt;
&lt;br /&gt;
==== GPC ====&lt;br /&gt;
&lt;br /&gt;
===== batch =====&lt;br /&gt;
&lt;br /&gt;
The batch queue is the default queue on the GPC allowing the user access to all the &lt;br /&gt;
resources for jobs upto 48 hours.  If a specific queue is not specified, &amp;lt;tt&amp;gt;-q&amp;lt;/tt&amp;gt; flag,&lt;br /&gt;
then a job is submitted to the batch queue.&lt;br /&gt;
&lt;br /&gt;
===== debug =====&lt;br /&gt;
&lt;br /&gt;
A debug queue has been set up primarily for code developers to quickly test&lt;br /&gt;
and evaluate their codes and configurations without having to wait in the batch queue.  There are 10 nodes&lt;br /&gt;
currently reserved for the debug queue.  It has quite restrictive limits to promote high turnover&lt;br /&gt;
and availability thus a user can only use 2 nodes (16 cores) for 2 hours, to a maximum&lt;br /&gt;
of 8 nodes (64 cores) for 1/2 an hour and can only have one job in the debug queue at a time. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -l nodes=1:ppn=8,walltime=1:00:00 -q debug -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===== largemem =====&lt;br /&gt;
&lt;br /&gt;
The largemem queue is used for accessing one of two 16 core with 128 GB memory intel Xeon (non-nehalem) nodes. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -l nodes=1:ppn=16,walltime=1:00:00 -q largemem -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== TCS ====&lt;br /&gt;
&lt;br /&gt;
The TCS currently only has one queue, or class, in use called &amp;quot;verylong&amp;quot; and its only&lt;br /&gt;
limitation is that jobs must be under 48 hours.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#@ class           = verylong&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Job Info===&lt;br /&gt;
&lt;br /&gt;
To see all jobs queued on a system use&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Three sections are shown; running, idle, and blocked.  Idle jobs are commonly referred to as queued jobs &lt;br /&gt;
as they meet all the requirements, however they are waiting for available resources.  Blocked jobs &lt;br /&gt;
are either caused by improper resource requests or more commonly by exceeding a user or groups allowable&lt;br /&gt;
resources.   For example if you are allowed to submit 10 jobs and you submit 20, the first 10&lt;br /&gt;
jobs will be submitted properly and either run right away or be queued, however the other 10 jobs&lt;br /&gt;
will be blocked and the jobs won't be submitted to the queue until one of the first 10 finishes.&lt;br /&gt;
&lt;br /&gt;
If showq is returning output slowly, you can query cached info using &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showq --noblock&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Available Resources ===&lt;br /&gt;
&lt;br /&gt;
Determining when your job will run can be tricky as it involves a combination of queue type, node type, system reservations, and job priority. The following commands are provided to help you figure out what resources are currently available, however they may not tell you exactly when your job will run for the aforementioned reasons.&lt;br /&gt;
&lt;br /&gt;
==== GPC ====&lt;br /&gt;
To show how many ethernet nodes are currently free, use the show back fill command&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showbf -f compute-eth &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To show how many infiniband nodes are free, use&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showbf -f ib&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== TCS ====&lt;br /&gt;
To show how many TCS nodes are free, use&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showbf -c verylong&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For example checking for an ethernet job&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showbf -f compute-eth&lt;br /&gt;
Partition     Tasks  Nodes      Duration   StartOffset       StartDate&lt;br /&gt;
---------     -----  -----  ------------  ------------  --------------&lt;br /&gt;
ALL           14728   1839       7:36:23      00:00:00  00:23:37_09/24&lt;br /&gt;
ALL             256     30      INFINITY      00:00:00  00:23:37_09/24&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
shows that for jobs under 7:36:23 you can use 1839 nodes, but if you submit&lt;br /&gt;
a job over that time only 30 will be available.  In this case this is&lt;br /&gt;
due to a large reservation made my SciNet staff, but from a users point&lt;br /&gt;
of view, showbf tells you very simply what is available and at what time point.&lt;br /&gt;
In this case, a user may wish to set #PBS -l walltime=7:30:00 in their script, or add -l walltime=7:30:00 to their qsub command in order to ensure that the jobs backfill the reserved nodes.&lt;br /&gt;
&lt;br /&gt;
'''NOTE:''' showbf shows currently available nodes, however just because nodes are available&lt;br /&gt;
doesn't mean that your job will start right away.  Job priority, system reservations &lt;br /&gt;
along with dedicated nodes, such as those for the debug queue, will alter when jobs &lt;br /&gt;
run so even if enough nodes appear &amp;quot;free&amp;quot;, it doesn't mean your job will actually run right &lt;br /&gt;
away.&lt;br /&gt;
&lt;br /&gt;
=== Job Submission ===&lt;br /&gt;
&lt;br /&gt;
==== Interactive ====&lt;br /&gt;
&lt;br /&gt;
On the GPC an interactive queue session can be requested using the following &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -l nodes=2:ppn=8,walltime=1:00:00 -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Non-interactive (Batch) ====&lt;br /&gt;
&lt;br /&gt;
For a non-interactive job submission you require a submission script formatted for the appropriate resource manger. Examples&lt;br /&gt;
are provided for the [[GPC_Quickstart#Submitting_A_Batch_Job | GPC]] and [[TCS_Quickstart#Submitting_A_Job | TCS]].&lt;br /&gt;
&lt;br /&gt;
=== Job Status ===&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ checkjob jobid&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Cancel a Job ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ canceljob jobid&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Accounting ===&lt;br /&gt;
&lt;br /&gt;
For any user with an NRAC/LRAC allocation, a special account with the Resource Allocation Project (RAP) identifier (RAPI) from Compute Canada Database (CCDB) is set up in order to access the allocated resources.  Please use the following instructions to run your job using your special allocation.  This is necessary both for accounting purposes as well as to assign the appropriate priority to your jobs.&lt;br /&gt;
&lt;br /&gt;
Each job run on the system will have a default RAP associated with it.  Most users already have their default RAP properly set.  However, if you have more than one allocation (different RAPs),  you may need/want to change your default RAP in order to charge your jobs to a particular RAP.&lt;br /&gt;
&lt;br /&gt;
==== Changing your default RAP ====&lt;br /&gt;
&lt;br /&gt;
# Go to the [https://portal.scinet.utoronto.ca portal], login with your SciNet username and password.&lt;br /&gt;
# Click on &amp;quot;Change SciNet default RAP&amp;quot; and change your default RAP.&lt;br /&gt;
&lt;br /&gt;
==== Specifying the RAP for GPC ====&lt;br /&gt;
&lt;br /&gt;
Alternatively, you may want to assign a RAP for each particular job you run.  There are two ways to specify an account for Moab/Torque: From the command line or inside the batch submission script.&lt;br /&gt;
&lt;br /&gt;
===== Command line =====&lt;br /&gt;
&lt;br /&gt;
Use the '-A RAPI' flag when you submit your job using qsub.  Note that the command line option will override the submission script if an account is specified on both the submission script and the command line.  &amp;quot;RAPI&amp;quot; is the RAP Identifier, e.g. abc-123-de.&lt;br /&gt;
&lt;br /&gt;
===== Submission Script =====&lt;br /&gt;
&lt;br /&gt;
Add a line in your submit script as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#PBS -A RAPI&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Please replace &amp;quot;RAPI&amp;quot; with your RAP Identifier.&lt;br /&gt;
&lt;br /&gt;
==== Specifiying the RAP for TCS ====&lt;br /&gt;
&lt;br /&gt;
Add a line in your submit script as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# @ account_no = RAPI&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Please replace &amp;quot;RAPI&amp;quot; with your RAP Identifier.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== User Stats ===&lt;br /&gt;
&lt;br /&gt;
Show current usage stats for a $USER&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showstats -u $USER&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Reservations ===&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ showres&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Standard users can only see their reservations not other users or system ones.&lt;br /&gt;
To determine what is available a user can use &amp;quot;showbf&amp;quot;, it shows what resources are&lt;br /&gt;
available and at what time level, taking into account running jobs and all the reservations. Refer to the [[Moab#Available_Resources | Available Resources]] section of this page for more details.&lt;br /&gt;
&lt;br /&gt;
=== Job Dependencies ===&lt;br /&gt;
&lt;br /&gt;
Sometimes you may want one job not to start until another job finishes, however&lt;br /&gt;
you would like to submit them both at the same time.  This can be done&lt;br /&gt;
using job dependencies on both the GPC and TCS, however the commands &lt;br /&gt;
are different due to the underlying resource managers being different.&lt;br /&gt;
&lt;br /&gt;
==== GPC ====&lt;br /&gt;
&lt;br /&gt;
Use the -W flag with the following syntax in your submission script to have this job not start&lt;br /&gt;
until the job with jobid or jobName (given with -N jobName) is finished&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend:after:{jobid | jobName}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More detailed syntax and examples can be found &lt;br /&gt;
[[http://www.clusterresources.com/products/mwm/docs/11.5jobdependencies.shtml#overview here ]] and&lt;br /&gt;
[[http://www.clusterresources.com/products/torque/docs/commands/qsub.shtml#W here]].&lt;br /&gt;
&lt;br /&gt;
==== TCS ====&lt;br /&gt;
&lt;br /&gt;
Loadleveler does job dependencies using what they call steps.&lt;br /&gt;
See the [[TCS_Quickstart#Steps | TCS Quickstart]] guide for an example.&lt;br /&gt;
&lt;br /&gt;
=== Adjusting Job Priority ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The ability to adjust job priorities downwards can also be of use to adjust relative priorities of jobs between users who are running jobs of the same allocation (eg, a default, LRAC, or NRAC allocation of the same PI).   Priorities are determined by how much of the time of that allocation been currently used, and all users using that account will have identical priorities.   This mechanism allows users to voluntarily reduce their priority to allow other users of the same allocation to run ahead of them.&lt;br /&gt;
&lt;br /&gt;
In principle, by adjusting a jobs priority downwards, you could reduce your jobs priority to the point that someone elses job entirely could go ahead of yours.  In practice, however, this is extremely unlikely.   Users with LRAC or NRAC allocations have priorities that are extremely large positive numbers that depend on their allocation and how much of it they have already used during the past fairshare window (2 weeks); it is very unlikely that two groups would have priorities that are within 10 or 100 or 1000 of each other.&lt;br /&gt;
&lt;br /&gt;
Note that at the moment, we do not allow priorities to go negative; they are integers that can go no lower than 1.  (This may change in the future)  That means that users of accounts that have already used their full allocation during the current fairshare period (eg, over the past two weeks), and so whose priority would normally be negative but is capped at 1, can not lower their priority any further.   Similar, users with a `default' allocation have priority 1, and cannot lower their priorities any further.&lt;br /&gt;
&lt;br /&gt;
==== GPC ====&lt;br /&gt;
&lt;br /&gt;
Moab allows users to adjust their jobs' priority moderately downwards, with the &amp;lt;tt&amp;gt;-p&amp;lt;/tt&amp;gt; flag; that is, on a qsub line&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub ... -p -10&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or in a script&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
#PBS -p -10&lt;br /&gt;
..&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The number used (-10 in the examples above) can be any negative number down to -1024.   &lt;br /&gt;
&lt;br /&gt;
The ability to adjust job priorities downwards can be useful when you are running a number of jobs and want some to enter the queue at higher priorities than others.   Note that if you absolutely require some jobs to start before others, you could use [[#Job Dependencies | job dependencies]] instead.&lt;br /&gt;
&lt;br /&gt;
For a job that is currently queued, one can adjust its priority with&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mjobctl -p -10&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== TCS ====&lt;br /&gt;
&lt;br /&gt;
TCS users can adjust their priorities by putting the following line in their scripts&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#@ user_priority = 50 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the number can be between 0 (which is 50 below the default priority) to 50 (the default priority).&lt;br /&gt;
&lt;br /&gt;
=== Suspending a Running Job ===&lt;br /&gt;
&lt;br /&gt;
Separate from, and in addition to, the ability to place a hold on a queued job, you may want to suspend a running job. For example, you may want to test the timing of events in a weakly coupled parallel environment.&lt;br /&gt;
&lt;br /&gt;
==== GPC ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mjobctl -s &amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You could also send a signal with -N option for mjobctl if you want more customize behaviour.  &lt;br /&gt;
Please see the mjobctl information page for more details here:&lt;br /&gt;
http://www.clusterresources.com/products/mwm/docs/commands/mjobctl.shtml&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Gromacs&amp;diff=1742</id>
		<title>Gromacs</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Gromacs&amp;diff=1742"/>
		<updated>2010-07-30T11:49:44Z</updated>

		<summary type="html">&lt;p&gt;Cneale: Updated gromacs search link&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Download and general information: http://www.gromacs.org&lt;br /&gt;
&lt;br /&gt;
Search the mailing list archives: http://www.gromacs.org/Support/Mailing_Lists/Search&lt;br /&gt;
&lt;br /&gt;
=Peculiarities of running single node GROMACS jobs on SCINET=&lt;br /&gt;
This is '''VERY IMPORTANT !!!'''&lt;br /&gt;
Please read the [[https://support.scinet.utoronto.ca/wiki/index.php/User_Tips#Running_single_node_MPI_jobs relevant user tips section]] for information that is essential for your single node (up to 8 core) MPI GROMACS jobs.&lt;br /&gt;
&lt;br /&gt;
-- [[User:Cneale|cneale]] 14 September 2009&lt;br /&gt;
&lt;br /&gt;
=Compiling GROMACS on SciNet=&lt;br /&gt;
Please refer to the [[Compiling_Gromacs|GROMACS compilation page]]&lt;br /&gt;
&lt;br /&gt;
=Submitting GROMACS jobs on SciNet=&lt;br /&gt;
Please refer to the [[Running_Gromacs|GROMACS submission page]]&lt;br /&gt;
&lt;br /&gt;
-- [[User:Cneale|cneale]] 18 August 2009&lt;br /&gt;
=GROMACS benchmarks on Scinet=&lt;br /&gt;
&lt;br /&gt;
This is a rudimentary list of scaling information.&lt;br /&gt;
 &lt;br /&gt;
I have a 50K atom system running performance on GPC right now. On 56&lt;br /&gt;
cores connected with IB I am getting 55 ns/day. I set up 50 such&lt;br /&gt;
simulations, each with 2 proteins in a bilayer and I'm getting a total&lt;br /&gt;
of 5.5 us per day. I am using gromacs 4.0.5 and a 5&lt;br /&gt;
fs timestep by fixing the bond lengths and all angles involving&lt;br /&gt;
hydrogen.&lt;br /&gt;
&lt;br /&gt;
I can get about 12 ns/day on 8 cores of the non-IB part of GPC -- also&lt;br /&gt;
excellent.&lt;br /&gt;
&lt;br /&gt;
As for larger systems, My speedup over saw.sharcnet.ca for a 1e6 atom&lt;br /&gt;
system is only 1.2x running on 128 cores in single precision. Although saw.sharcnet.ca &lt;br /&gt;
is composed of xeons, they are running at 2.83 GHz (https://www.sharcnet.ca/my/systems/show/41), which is a&lt;br /&gt;
faster clock speed than the Scinet 2.5 GHz for Intel's next-generation X86-CPU architecture.&lt;br /&gt;
While GROMACS is generally not excellent for scaling up to or beyond 128 cores (even for large systems), &lt;br /&gt;
our benchmarking of this system on saw.sharcnet.ca indicated that it was running at about 65% efficiency.&lt;br /&gt;
Benchmarking was also done on Scinet for this system, but was not recorded as we were mostly tinkering with the&lt;br /&gt;
-npme option to mdrun in an attempt to optimize it. My recollection, though, is that the scaling was similar on scinet.&lt;br /&gt;
&lt;br /&gt;
-- [[User:Cneale|cneale]] 19 August 2009&lt;br /&gt;
=Strong scaling for GROMACS on GPC=&lt;br /&gt;
&lt;br /&gt;
Requested, and on our list to complete, but not yet available in a complete chart form.&lt;br /&gt;
&lt;br /&gt;
-- [[User:Cneale|cneale]] 19 August 2009&lt;br /&gt;
=Scientific studies being carried out using GROMACS on GPC=&lt;br /&gt;
&lt;br /&gt;
Requested, but not yet available&lt;br /&gt;
&lt;br /&gt;
-- [[User:Cneale|cneale]] 19 August 2009&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=GPC_Quickstart&amp;diff=1319</id>
		<title>GPC Quickstart</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=GPC_Quickstart&amp;diff=1319"/>
		<updated>2010-07-04T17:14:30Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* HyperThreading */  -- added multithreading for MPI jobs after testing&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:University_of_Tor_79284gm-a.jpg|center|300px|thumb]]&lt;br /&gt;
|name=General Purpose Cluster (GPC)&lt;br /&gt;
|installed=June 2009&lt;br /&gt;
|operatingsystem= Linux&lt;br /&gt;
|loginnode= gpc01..gpc04 (from &amp;lt;tt&amp;gt;login.scinet&amp;lt;/tt&amp;gt;)&lt;br /&gt;
|numberofnodes=3780&lt;br /&gt;
|rampernode=16 Gb &lt;br /&gt;
|corespernode=8&lt;br /&gt;
|interconnect=1/4 on Infiniband, rest on GigE&lt;br /&gt;
|vendorcompilers=icc (C) ifort (fortran) icpc (C++)&lt;br /&gt;
|queuetype=[[Moab | Moab/Torque]]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
The General Purpose Cluster is an extremely large cluster (ranked [http://www.top500.org/list/2009/06/100 16th] in the world at its inception, and fastest in Canada) and is where most simulations are to be done at SciNet.  It is an IBM iDataPlex cluster based on Intel's Nehalem architecture (one of the [http://www.hpcwire.com/features/HPC-Vendors-Jump-On-Nehalem-42360237.html first in the world] to make use of the new chips). The GPC consists of 3,780 nodes with a total of 30,240  2.5GHz cores, with 16GB RAM per node (2GB per core). Approximately one quarter of the cluster is interconnected with non-blocking 4x-DDR InfiniBand while the rest of the nodes are connected with gigabit ethernet. &lt;br /&gt;
&lt;br /&gt;
===Login===&lt;br /&gt;
&lt;br /&gt;
First login via ssh with your scinet account at &amp;lt;tt&amp;gt;login.scinet.utoronto.ca&amp;lt;/tt&amp;gt;, and from there you can proceed to the Development nodes to compile/test your code.&lt;br /&gt;
&lt;br /&gt;
===Compile/Devel Nodes===&lt;br /&gt;
&lt;br /&gt;
From a scinet login node you can ssh to &amp;lt;tt&amp;gt;gpc01&amp;lt;/tt&amp;gt;..&amp;lt;tt&amp;gt;gpc04&amp;lt;/tt&amp;gt;.  These nodes have the same hardware configuration as most of the compute nodes -- 8 Nehalem processing cores with 16GB RAM and Gigabit ethernet.  You can compile and test your codes on these nodes. To interactively test on more than 8 processors, or to test your code over an InfiniBand connection, you can submit an [[GPC_Quickstart#Submitting_an_Interactive_Job | interactive job request]].&lt;br /&gt;
&lt;br /&gt;
Your [[Storage_Quickstart | home directory]] is in &amp;lt;tt&amp;gt;/home/USER&amp;lt;/tt&amp;gt;; you have 10GB there that is backed up. This directory cannot be written to by the compute nodes! Thus, to run jobs, you'll use the &amp;lt;tt&amp;gt;/scratch/USER&amp;lt;/tt&amp;gt; directory. Here, there is a large amount of disk space, but it is not backed up. Thus it makes sense to keep your codes in /home, compile there, and then run them in the /scratch directory.&lt;br /&gt;
&lt;br /&gt;
===Modules and Environment Variables===&lt;br /&gt;
&lt;br /&gt;
To use most packages on the SciNet machines - including any of the compilers - , you will have to use the `modules' command.  The command &amp;lt;tt&amp;gt;module load some-package&amp;lt;/tt&amp;gt; will set your environment variables (&amp;lt;tt&amp;gt;PATH&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;LD_LIBRARY_PATH&amp;lt;/tt&amp;gt;, etc) to include the default version of that package.   &amp;lt;tt&amp;gt;module load some-package/specific-version&amp;lt;/tt&amp;gt; will load a specific version of that package.  This makes it very easy for different users to use different versions of compilers, MPI versions, libraries etc.&lt;br /&gt;
&lt;br /&gt;
Note that to use even the gcc compilers you will have to do&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load gcc&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
but in fact you probably should use the intel compilers installed on this system as they usually produce faster code (and sometimes, much faster.)&lt;br /&gt;
&lt;br /&gt;
A list of the installed software is available in [[Software_and_Libraries | Software &amp;amp; Libraries]] and can &lt;br /&gt;
be seen on the system by typing &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module avail&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To load a module (for example, the default version of the intel compilers)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load intel&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
To unload a module&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module unload intel&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
To unload all modules&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module purge&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These commands should go in your .bashrc files and/or in your submission scripts to make sure you&lt;br /&gt;
are using the correct packages.&lt;br /&gt;
&lt;br /&gt;
===Compilers===&lt;br /&gt;
&lt;br /&gt;
The intel compilers are icc/icpc/ifort for C/C++/Fortran, and are available with the default module &amp;quot;intel&amp;quot;.  The intel compilers are recommended over the GNU compilers.  Documentation about icpc is available at &lt;br /&gt;
http://software.intel.com/en-us/articles/intel-software-technical-documentation/.  The Intel compilers accept many of the options that the GNU compilers accept, but tend to produce faster programs on our system.  If, for some reason, you really need the GNU compilers, the latest version of the GNU compiler collection (currently 4.4.0) is available by loading the &amp;quot;gcc&amp;quot; module, with gcc/g++/gfortran for C/C++/Fortran.   Note that f77/g77 is not supported. &lt;br /&gt;
&lt;br /&gt;
To ensure that the intel compilers are in your &amp;lt;tt&amp;gt;PATH&amp;lt;/tt&amp;gt; and their libraries are in your &amp;lt;tt&amp;gt;LD_LIBRARY_PATH&amp;lt;/tt&amp;gt;, use the command&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load intel&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This should likely go in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt; file so that it will automatically be loaded.&lt;br /&gt;
&lt;br /&gt;
Optimize your code for the GPC machine using of at least the following compiler flags: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   -O3 -xHost&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
(or &amp;lt;tt&amp;gt;-O3 -march=native&amp;lt;/tt&amp;gt; for the GNU compilers). &lt;br /&gt;
&lt;br /&gt;
*If your program uses openmp, add &amp;lt;tt&amp;gt;-openmp&amp;lt;/tt&amp;gt; (&amp;lt;tt&amp;gt;-fopenmp&amp;lt;/tt&amp;gt; for GNU compilers).&lt;br /&gt;
*If you get the warning &amp;lt;tt&amp;gt;feupdatreenv is not implemented&amp;lt;/tt&amp;gt;, add -limf to the link line.&lt;br /&gt;
*If you need to link in the MKL libraries, you are well advised to use the Intel(R) Math Kernel Library Link Line Advisor: http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/ for help in devising the list of libraries to link with your code.&lt;br /&gt;
&lt;br /&gt;
===[[ GPC_MPI_Versions | MPI]]===&lt;br /&gt;
&lt;br /&gt;
SciNet currently provides multiple MPI libraries for the GPC; [http://www.open-mpi.org/ OpenMPI], and [http://software.intel.com/en-us/intel-mpi-library/ IntelMPI].  We currently recommend OpenMPI as the default, as it quite reliably demonstrates good performance on both the infiniband and ethernet networks.  For full details and options see the complete [[ GPC_MPI_Versions | '''MPI''']] section.&lt;br /&gt;
&lt;br /&gt;
The MPI libraries are compiled with both the gnu compiler suite and the intel compiler suite.   To use (for instance) the intel-compiled OpenMPI libraries, which we recommend as the default (and use for most of our examples here), use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load openmpi intel&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
&lt;br /&gt;
in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt;.   Other combinations behave similarly.&lt;br /&gt;
&lt;br /&gt;
The MPI libraries define the wrappers mpicc/mpicxx/mpif90/mpif77 as wrappers around the appropriate compilers, which ensure the appropriate include and library directories and used in the compilation and linking steps.&lt;br /&gt;
&lt;br /&gt;
We currently recommend the Intel + OpenMPI combination.  However, if you require the GNU compilers as well as MPI, you would want to find the most recent openmpi module available with `gcc' in the version name.  This will enable development and runtime with gcc/g++/gfortran  and OpenMPI.  You can make this your default by putting the module load line in your ~/.bashrc file.&lt;br /&gt;
&lt;br /&gt;
For mixed OpenMP/MPI code using Intel MPI, add the compilation flag -mt_mpi for full thread-safety.&lt;br /&gt;
&lt;br /&gt;
===Submitting A Batch Job===&lt;br /&gt;
&lt;br /&gt;
The SciNet machines are shared systems, and jobs that are to run on them are submitted to a queue; the&lt;br /&gt;
[[Moab | scheduler]] then orders the jobs in order to make the best use of the machine, and has them launched &lt;br /&gt;
when resources become availble.   The intervention of the scheduler can mean that the jobs aren't&lt;br /&gt;
quite run in a  first-in first-out order.&lt;br /&gt;
&lt;br /&gt;
The maximum [[wallclock time]] for a job in the queue is 48 hours; computations that will take longer than&lt;br /&gt;
this must be broken into 48-hour chunks and run as several jobs.  The usual way to do this is with [[checkpoints]],&lt;br /&gt;
writing out the complete state of the computation every so often in such a way that a job can be restarted from&lt;br /&gt;
this state information and continue on from where it left off.  Generating [[checkpoints]] is a good idea anyway,&lt;br /&gt;
as in the unlikely event of a hardware failure during your run, it allows you to restart without having lost much work.&lt;br /&gt;
&lt;br /&gt;
There are limits to how many jobs you can submit.  If your group has a default account, up to 32 nodes at a time for 48 hours per job on the GPC cluster are allowed to be queued. This is a total limit, e.g., you could request 64 nodes for 24 hours.  Jobs of users with an LRAC or NRAC allocation will run at a higher priority than others while their resources last. Because of the group-based allocation, it is conceivable that your jobs won't run if your colleagues have already exhausted your group's limits.&lt;br /&gt;
&lt;br /&gt;
Note that scheduling big jobs greatly affects the queuer and other users, so you have to talk to us first to run massively parallel jobs (&amp;gt; 2048 cores). We will help make sure that your jobs start and run efficiently.&lt;br /&gt;
&lt;br /&gt;
If your job should run in fewer than  48 hours, specify that in your script -- your job &lt;br /&gt;
will start sooner.   (It's easier for the [[Moab | scheduler]] to fit in a short job than a long job).  On the downside, the&lt;br /&gt;
job will be killed automatically by the queue manager software at the end of the specified [[wallclock time]], so if you&lt;br /&gt;
guess wrong you might lose some work.  So the standard procedure is to estimate how long your job will take and&lt;br /&gt;
add 10% or so. &lt;br /&gt;
&lt;br /&gt;
You interact with the queuing system through the queue/resource manager, [[Moab | Moab]] and [[Moab | Torque]].  To see all the jobs in the queue use&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To submit your own job, you must write a script which describes the job and how it is to be run (a sample script [[GPC_Quickstart#Submission_Script | follows]]) and submit it to the queue, using the command&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub SCRIPT-FILE-NAME&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where you will replace &amp;lt;tt&amp;gt;SCRIPT-FILE-NAME&amp;lt;/tt&amp;gt; with the file containing the submission script.   This will return a job ID, for example 31415, which is used to identify the jobs.  Information about a queued job can be found using&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
checkjob JOB-ID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and jobs can be canceled with the command&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
canceljob JOB-ID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Again, these commands have many options, which can be read about on their man pages.&lt;br /&gt;
&lt;br /&gt;
Much more information on the queueing system is available on our [[Moab | queue]] page.&lt;br /&gt;
&lt;br /&gt;
====Batch Submission Script: MPI====&lt;br /&gt;
&lt;br /&gt;
A sample submission script is shown below for an mpi job using ethernet with the &amp;lt;tt&amp;gt; #PBS &amp;lt;/tt&amp;gt; directives at the top and the rest being &lt;br /&gt;
what will be executed on the compute node.  &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for SciNet GPC (ethernet)&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=2:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N test&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND; -np = nodes*ppn&lt;br /&gt;
mpirun -np 16 -hostfile $PBS_NODEFILE ./a.out&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The lines that begin &amp;lt;tt&amp;gt;#PBS&amp;lt;/tt&amp;gt; are commands that are parsed and interpreted by qsub at submission time, and control administrative things about your job.   In this example, the script above requests two nodes, using 8 processors per node, for a [[wallclock time]] of one hour.  (The resources required by the job are listed on the &amp;lt;tt&amp;gt;#PBS -l&amp;lt;/tt&amp;gt; line.)   Other options can be given in other &amp;lt;tt&amp;gt;#PBS&amp;lt;/tt&amp;gt; lines, such as &amp;lt;tt&amp;gt;#PBS -N&amp;lt;/tt&amp;gt;, which sets the name of the job.   &lt;br /&gt;
&lt;br /&gt;
The rest of the script is run as a bash script at run time.   A bash shell on the first node of the two nodes that are requested executes these commands as a normal bash script, just as if you had run this as a shell script from the terminal.   The only difference is that PBS sets certain environment variables that you can use in the script.  &amp;lt;tt&amp;gt;$PBS_O_WORKDIR&amp;lt;/tt&amp;gt; is set to be the directory that the command was 'submitted' from - eg,  &amp;lt;tt&amp;gt;/scratch/USER/SOMEDIRECTORY&amp;lt;/tt&amp;gt; - and &amp;lt;tt&amp;gt;$PBS_NODEFILE&amp;lt;/tt&amp;gt; is the name of a file which contains all the nodes on which programs should execute.   Using these environment variables, the script then uses the &amp;lt;tt&amp;gt;mpirun&amp;lt;/tt&amp;gt; command to launch the job.   Assumed here is that the user has a line like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load openmpi intel&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
&lt;br /&gt;
in their &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
* Note: The different versions of MPI require different commands to launch the run, and thus different scripts. The above script is  specific for the openmpi module.  For the intelmpi module, the last line of the script should read&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   mpirun -r ssh -np 16 -env I_MPI_DEVICE ssm ./a.out&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
====Submitting Collections of Serial Jobs====&lt;br /&gt;
&lt;br /&gt;
SciNet-approved methods for running collections of serial jobs can be found on the [[User_Serial|serial run wiki page]].&lt;br /&gt;
&lt;br /&gt;
====Batch Submission Script: OpenMP====&lt;br /&gt;
&lt;br /&gt;
For running OpenMP jobs, the procedure is similar as for MPI jobs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for SciNet GPC (OpenMP)&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N test&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=8&lt;br /&gt;
./a.out&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that [[Introduction_To_Performance#Throughput | in some circumstances]] it can be more efficient to run (say) two jobs each running on four threads than one job running on eight threads.   In that case you can use the same `ampersand-and-wait' technique outlined for serial jobs (see [[User_Serial|serial run wiki page]]) for less-than-eight-core OpenMP jobs.&lt;br /&gt;
&lt;br /&gt;
====Hybrid MPI/OpenMP jobs====&lt;br /&gt;
&lt;br /&gt;
=====Using Intel MPI=====&lt;br /&gt;
Here is how to run hybrid codes using intelmpi::&lt;br /&gt;
&lt;br /&gt;
http://software.intel.com/en-us/articles/hybrid-applications-intelmpi-openmp/&lt;br /&gt;
&lt;br /&gt;
Make sure you compile with the -mt_mpi option to the compilers to use the thread safe libraries. &lt;br /&gt;
Set the environment variable I_MPI_PIN_DOMAIN&lt;br /&gt;
&lt;br /&gt;
    $ export I_MPI_PIN_DOMAIN=omp&lt;br /&gt;
&lt;br /&gt;
This will set the process pinning domain size to be equal to OMP_NUM_THREADS (which you should set to the desired number of threads per mpi process). Therefore, each MPI process can create $OMP_NUM_THREADS number of children threads for running within the corresponding domain. If OMP_NUM_THREADS is not set, each node is treated as a separate domain (which will allow as many threads per MPI processes as there are cores).&lt;br /&gt;
&lt;br /&gt;
In addition, when invoking mpirun, you should add the argument &amp;quot;-ppn X&amp;quot;, where X is the number of MPI processes per node.&lt;br /&gt;
For example:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     mpirun -r ssh -ppn 2 -np 8  ....&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
would start 2 mpi processes per node for a total of 8 processes, so mpirun will try to run mpi processes on 4 nodes&lt;br /&gt;
(OMP_NUM_THREADS is then probably best set at 4).&lt;br /&gt;
Your job script should still ask for these 4 nodes with the line&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     #PBS -l nodes=4:ppn=8,walltime=....&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
(ppn=8 is not a mistake here; the ppn parameter has a different meaning for PBS and for mpirun)&lt;br /&gt;
&lt;br /&gt;
''The ppn parameter to mpirun is very important! Without it, eight mpi jobs would get bunched on the first node in this example, leaving 3 nodes unused.''&lt;br /&gt;
&lt;br /&gt;
NOTE: In order to pin OpenMP threads inside the domain, use the corresponding OpenMP feature by setting the KMP_AFFINITY environment variable, see [http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/fortran/lin/compiler_f/optaps/common/optaps_openmp_thread_affinity.htm#KMP_AFFINITY_Environment_Variable|Intel's Compiler User and Reference Guide].&lt;br /&gt;
&lt;br /&gt;
The IntelMPI manual is referenced on the front page of our wiki:&lt;br /&gt;
&lt;br /&gt;
http://software.intel.com/sites/products/documentation/hpc/mpi/linux/reference_manual.pdf&lt;br /&gt;
&lt;br /&gt;
For the above example of a total of 8 processes on 4 nodes, you could use the following script:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for SciNet GPC (hybrid job)&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=4:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N test&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# SET THE NUMBER OF THREADS PER PROCESS:&lt;br /&gt;
export OMP_NUM_THREADS=4&lt;br /&gt;
&lt;br /&gt;
# PIN THE MPI DOMAINS ACCORDING TO OMP&lt;br /&gt;
export I_MPI_PIN_DOMAIN=omp&lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND; -np = nodes*ppn&lt;br /&gt;
mpirun -r ssh -ppn 2 -np 8 ./a.out&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=====Using Open MPI=====&lt;br /&gt;
&lt;br /&gt;
For mixed MPI/OpenMP jobs using OpenMPI, which is the default for many users, the procedure is similar, but details differ.&lt;br /&gt;
&lt;br /&gt;
* Request the number of nodes in the PBS script.&lt;br /&gt;
* Set OMP_NUM_THREADS to the number of threads per MPI process.&lt;br /&gt;
* In addition to the -np parameter for mpirun, add the argument &amp;lt;tt&amp;gt;--bynode&amp;lt;/tt&amp;gt;, so that the mpi processes are not bunched up.&lt;br /&gt;
&lt;br /&gt;
So for example, to start a total of 8 processes on 4 nodes, you could use the following script&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for SciNet GPC (hybrid job)&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=4:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N test&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# SET THE NUMBER OF THREADS PER PROCESS:&lt;br /&gt;
export OMP_NUM_THREADS=4&lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND; -np = nodes*processes_per_nodes; --byhost forces a round robin of nodes.&lt;br /&gt;
mpirun -np 8 --bynode -hostfile $PBS_NODEFILE ./a.out&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Submitting an Interactive Job===&lt;br /&gt;
&lt;br /&gt;
It is sometimes convenient to run a job interactively; this can be very handy for debugging purposes.  In this case, you type a &amp;lt;tt&amp;gt;qsub&amp;lt;/tt&amp;gt; command which submits an interactive job to the queue; when the scheduler selects this job to run, then it starts a shell running on the first node of the job, which connects to your terminal.  You can then type any series of commands (for instance, the same commands listed as in the batch submission script above) to run a job interactively.&lt;br /&gt;
&lt;br /&gt;
For example, to start the same sort of job as in the batch submission script above, but interactively, one would type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -I -l nodes=2:ppn=8,walltime=1:00:00&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This is exactly the &amp;lt;tt&amp;gt;#PBS -l&amp;lt;/tt&amp;gt; line in the batch script above (which requests all 8 processors on each of 2 nodes for one hour), but prepended with a &amp;lt;tt&amp;gt;-I&amp;lt;/tt&amp;gt; for `interactive'.   When this job begins, your terminal will now show you as being logged in to one of the compute nodes, and one can type in any shell command, run &amp;lt;tt&amp;gt;mpirun&amp;lt;/tt&amp;gt;, etc.   When you exit the shell, the job will end.  Interactive jobs can be used with any of the [[ Moab#GPC | GPC queues ]] however, there is a short&lt;br /&gt;
high turnover queue called [[ Moab#debug | debug ]] which can be especially useful when the system is busy.&lt;br /&gt;
&lt;br /&gt;
===Ethernet vs. Infiniband===&lt;br /&gt;
&lt;br /&gt;
About 1/4 of the GPC (862 nodes or 6896 cores) is connected with a high bandwidth low-latency fabric called&lt;br /&gt;
[http://en.wikipedia.org/wiki/InfiniBand InfiniBand].  Many jobs which require tight coupling to scale well greatly benefit from this interconnect;&lt;br /&gt;
other types of jobs, which have relatively modest communications, do not require this and run fine on Gigabit ethernet.&lt;br /&gt;
&lt;br /&gt;
Jobs which require the InfiniBand for good performance can request the nodes that have the `&amp;lt;tt&amp;gt;ib&amp;lt;/tt&amp;gt;' feature in the &amp;lt;tt&amp;gt;#PBS -l&amp;lt;/tt&amp;gt; line,&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#PBS -l nodes=2:ib:ppn=8,walltime=1:00:00&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Because there are a limited number of these nodes, your job will start running faster if you do not request them (e.g. if you use the scripts as shown above), as this increases the number of nodes available to run your job. In fact, the InfiniBand nodes are to be used only for jobs that are known to scale well and  will benefit from this type of interconnect. As such the minimum number of nodes requested has to be at least 2, as single node jobs will not benefit from using an&lt;br /&gt;
Infiniband node. The MPI libraries provided by SciNet automatically correctly use either the InfiniBand or ethernet interconnect depending on which nodes your job runs on.&lt;br /&gt;
&lt;br /&gt;
===HyperThreading===&lt;br /&gt;
&lt;br /&gt;
Each GPC compute node has 8 Nehalem cores (Intel Xeon E5540 @ 2.53&lt;br /&gt;
Ghz). Thus, fully utilizing the node requires at least 8&lt;br /&gt;
tasks. We say `at least' because the Nehalem cores support&lt;br /&gt;
Hyper-Threading, which means it is as if there are twice as many cores, although the number of execution units is unchanged. Because most applications spend a lot of time waiting for data&lt;br /&gt;
from memory, this allows one task to use the processor&lt;br /&gt;
while another is stuck waiting for memory, and vice versa.&lt;br /&gt;
This requires no changes to the code, only running 16 rather than&lt;br /&gt;
8 tasks on the node (e.g. &amp;lt;tt&amp;gt;export OMP_NUM_THREADS=16&amp;lt;/tt&amp;gt;).  The resulting performance gain depends highly on the application.  &lt;br /&gt;
&lt;br /&gt;
Note: the &amp;lt;tt&amp;gt;ppn=8&amp;lt;/tt&amp;gt; part of the job requirement specification should '''not''' be modified because of HyperThreading.&lt;br /&gt;
&lt;br /&gt;
OMP_NUM_THREADS will only affect OpenMP codes, or hybrid codes that do multithreading locally and MPI across nodes. Nevertheless, it is possible to run MPI jobs with more than 8 tasks on a single node. In this case, the processor is able to take advantage of interleaving the threads. Some of our users have obtained an 8% speedup by running gromacs with 16 tasks instead of 8 on a single node (mpirun -np 16 ./gromacs/mdrun -npme 4 is 108% the speed of mpirun -np 8 ./gromacs/mdrun with -npme 2 or -1) -- and 8% adds up over the kinds of simulation times that are accessible on SciNet. Beware that mpi runs might still be fastest when limited to 8 mpi processes per node.&lt;br /&gt;
&lt;br /&gt;
You must test the performance of your codes in all cases, BEFORE submitting multiple jobs, and assess whether there is a benefit from hyperthreading&lt;br /&gt;
for your particular jobs.&lt;br /&gt;
&lt;br /&gt;
===Memory Configuration===&lt;br /&gt;
&lt;br /&gt;
==== 16G ====&lt;br /&gt;
&lt;br /&gt;
There are 3756 nodes which have 16G of memory, and is the primary configuration in the GPC. These nodes will be used by default.&lt;br /&gt;
&lt;br /&gt;
==== 18G ====&lt;br /&gt;
&lt;br /&gt;
There are 24 Infiniband nodes which have 18G of memory. These nodes have a fully populated memory configuration that maximizes memory bandwidth. To&lt;br /&gt;
request these nodes use:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -l nodes=2:ib:m18g:ppn=8,walltime=1:00:00 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== 32G ====&lt;br /&gt;
&lt;br /&gt;
There are 84 Infiniband nodes which have 32G of memory. To request these nodes use:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -l nodes=2:ib:m32g:ppn=8,walltime=1:00:00 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== 128G ====&lt;br /&gt;
There are two stand-alone large memory (128GB) nodes, &amp;lt;tt&amp;gt;gpc-lrgmem01&amp;lt;/tt&amp;gt; and &amp;lt;tt&amp;gt;gpc-lrgmem02&amp;lt;/tt&amp;gt; which are primarily to be used for data analysis of runs.  They have 16 cores and are intel machines running linux, but they are not the same architecture (Nehalem) as the GPC compute nodes, so codes may have to be compiled separately for these machines.  They can be accessed using a specific &amp;lt;tt&amp;gt;largemem&amp;lt;/tt&amp;gt; queue.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -l nodes=2:ppn=8,walltime=1:00:00 -q largemem -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Ram Disk===&lt;br /&gt;
&lt;br /&gt;
On the GPC nodes, there is a `ram disk' available - up to half of the memory on the node may be used as a temporary file system.  This is particularly useful for use in the early stages of migrating destop-computing codes to a High Performance Computing platform such as the GPC.    It is much faster than real disk and does not require network traffic; however, each node sees its own ramdisk and cannot see files on that of other nodes.   This is a very easy way to cache writes (by writing them to fast ram disk instead of slow `real' disk); and then one would periodically copy the files to files on /scratch or /project so that they are available after the job has completed.&lt;br /&gt;
&lt;br /&gt;
To use the ramdisk, create and read to / write from files in /dev/shm/.. just as one would to (eg) /scratch/USER/.  Only the amount of RAM needed to store the files will be taken up by the temporary file system; thus if you have 8 serial jobs each requiring 1 GB of RAM, and 1GB is taken up by various OS services, you would still have approximately 7GB available to use as ramdisk on a 16GB node.   However, if you were to write 8 GB of data to the RAM disk, this would exceed available memory and your job would likely crash.&lt;br /&gt;
   &lt;br /&gt;
NOTE: it is very important to delete your files from ram disk at the end of your job.   If you do not do this, the next user to use that node will have less RAM available than they might expect, and this might kill their jobs.&lt;br /&gt;
&lt;br /&gt;
More details on how to setup your script to use the ramdisk can be found on the [[User_Ramdisk|Ramdisk wiki page]].&lt;br /&gt;
&lt;br /&gt;
=== Managing jobs on the Queuing system ===&lt;br /&gt;
Information on checking available resources, starting, viewing, managing and canceling jobs on [[Moab | Moab/Torque]]&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=User_Ramdisk&amp;diff=1318</id>
		<title>User Ramdisk</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=User_Ramdisk&amp;diff=1318"/>
		<updated>2010-07-04T17:00:04Z</updated>

		<summary type="html">&lt;p&gt;Cneale: Undo revision 1316 by Cneale (Talk)&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Using Ramdisk==&lt;br /&gt;
&lt;br /&gt;
On the GPC nodes, a `ramdisk' is available. Up to half of the memory&lt;br /&gt;
on the node may be used as a temporary file system.  This is&lt;br /&gt;
particularly useful for use in the early stages of migrating&lt;br /&gt;
desktop-computing codes to a High Performance Computing platform such&lt;br /&gt;
as the GPC, especially those that use a lot of I/O, such as Blast.&lt;br /&gt;
Using a lot if I/O becomes a bottleneck in large scale computing. One&lt;br /&gt;
especially suffers a performance penalty on parallel file systems&lt;br /&gt;
(such as the GPFS used on SciNet), since the files are synchronized&lt;br /&gt;
across the whole network.&lt;br /&gt;
&lt;br /&gt;
Ramdisk is much faster than real disk, and is especially beneficial&lt;br /&gt;
for codes which perform a lot of small I/O work, since the ramdisk&lt;br /&gt;
does not require network traffic.  However, each node sees its own&lt;br /&gt;
ramdisk and cannot see files on that of other nodes.&lt;br /&gt;
&lt;br /&gt;
To use the ramdisk, create and read to / write from files in&lt;br /&gt;
/dev/shm/.. just as one would to (eg) /scratch/USER/.  Only the amount&lt;br /&gt;
of RAM needed to store the files will be taken up by the temporary&lt;br /&gt;
file system. Thus if you have 8 serial jobs each requiring 1 GB of&lt;br /&gt;
RAM, and 1GB is taken up by various OS services, you would still have&lt;br /&gt;
approximately 7GB available to use as ramdisk on a 16GB node.&lt;br /&gt;
However, if you were to write 8 GB of data to the RAM disk, this would&lt;br /&gt;
exceed available memory and your job would likely crash.&lt;br /&gt;
&lt;br /&gt;
Note that when using the ramdisk:&lt;br /&gt;
&lt;br /&gt;
* At the start of your job, you can copy frequently accessed files to ramdisk (''stage in''). If there are many such files, it is beneficial to put them in a tar file.&lt;br /&gt;
* One would periodically copy the output files from ramdisk to /scratch or /project, as well as at the end of the job, of course (''stage out'').&lt;br /&gt;
* It is very important to delete your files from ram disk at the end of your job.  If you do not do this, the next user to use that node will have less RAM available than they might expect, and this might kill their jobs. &lt;br /&gt;
&lt;br /&gt;
A script using the ramdisk in a 1 day openMP job might look like this:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#MOAB/Torque submission script for SciNet GPC &lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=24:00:00&lt;br /&gt;
#PBS -N ramdisk-test&lt;br /&gt;
&lt;br /&gt;
#Job parameters:&lt;br /&gt;
execname=job          # name of the executable&lt;br /&gt;
input_tar=input.tar   # tar file with input files and executables&lt;br /&gt;
output_tar=out.tar    # file in which to store output&lt;br /&gt;
input_subdir=indir    # sub-directory (within input_tar) with input files&lt;br /&gt;
output_subdir=outdir  # sub-directory to contain of output files&lt;br /&gt;
poll_period=60        # how often check for job completion (in seconds)&lt;br /&gt;
save_period=120       # how often to save output (in minutes)&lt;br /&gt;
&lt;br /&gt;
#Track how long everything takes.&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
#Copy to ramdisk&lt;br /&gt;
echo &amp;quot;Stage-in: copying files to ramdisk directory /dev/shm/$USER&amp;quot;&lt;br /&gt;
mkdir -p /dev/shm/$USER&lt;br /&gt;
mkdir -p /dev/shm/$USER/$output_subdir&lt;br /&gt;
cd /dev/shm/$USER&lt;br /&gt;
cp $PBS_O_WORKDIR/$input_tar .&lt;br /&gt;
tar xf $input_tar&lt;br /&gt;
rm -rf $input_tar&lt;br /&gt;
&lt;br /&gt;
#Track how long everything takes.&lt;br /&gt;
echo -n &amp;quot;Stage-in completed on &amp;quot;&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
#Run on ramdisk&lt;br /&gt;
echo &amp;quot;Starting job&amp;quot;&lt;br /&gt;
./$execname $input_subdir $output_subdir &amp;amp;&lt;br /&gt;
# Store the process id in $pid so we may check if it's still running:&lt;br /&gt;
pid=$!&lt;br /&gt;
&lt;br /&gt;
#Note:&lt;br /&gt;
# 1. The above launching command is appropriate for a multi-threaded (openMP) applications.&lt;br /&gt;
# 2. Ramdisk MPI jobs are limited to 1 node as /dev/shm is not shared across nodes.&lt;br /&gt;
# 3. For serial jobs, you'd want to start 8 jobs at the same time instead, e.g.&lt;br /&gt;
#     mkdir -p $output_subdir/1&lt;br /&gt;
#     ./$execname ${input_subdir}/1 ${output_subdir}/1 &amp;amp;&lt;br /&gt;
#     pid=$!&lt;br /&gt;
#     mkdir -p $output_subdir/2&lt;br /&gt;
#     ./$execname ${input_subdir}/2 ${output_subdir}/2 &amp;amp;&lt;br /&gt;
#     pid=$pid,$!&lt;br /&gt;
#     mkdir -p $output_subdir/3&lt;br /&gt;
#     ./$execname ${input_subdir}/3 ${output_subdir}/3 &amp;amp;&lt;br /&gt;
#     pid=$pid,$!&lt;br /&gt;
#     mkdir -p $output_subdir/4&lt;br /&gt;
#     ./$execname ${input_subdir}/4 ${output_subdir}/4 &amp;amp;&lt;br /&gt;
#     pid=$pid,$!&lt;br /&gt;
#     mkdir -p $output_subdir/5&lt;br /&gt;
#     ./$execname ${input_subdir}/5 ${output_subdir}/5 &amp;amp;&lt;br /&gt;
#     pid=$pid,$!&lt;br /&gt;
#     mkdir -p $output_subdir/6&lt;br /&gt;
#     ./$execname ${input_subdir}/6 ${output_subdir}/6 &amp;amp;&lt;br /&gt;
#     pid=$pid,$!&lt;br /&gt;
#     mkdir -p $output_subdir/7&lt;br /&gt;
#     ./$execname ${input_subdir}/7 ${output_subdir}/7 &amp;amp;&lt;br /&gt;
#     pid=$pid,$!&lt;br /&gt;
#     mkdir -p $output_subdir/8&lt;br /&gt;
#     ./$execname ${input_subdir}/8 ${output_subdir}/8 &amp;amp;&lt;br /&gt;
#     pid=$pid,$!&lt;br /&gt;
&lt;br /&gt;
#Track how long everything takes.&lt;br /&gt;
echo -n &amp;quot;Job started on &amp;quot;&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
function save_results {    &lt;br /&gt;
    echo -n &amp;quot;Copying from directory $output_subdir to file $PBS_O_WORKDIR/$output_tar on &amp;quot;&lt;br /&gt;
    date&lt;br /&gt;
    tar cf $output_tar $output_subdir/*&lt;br /&gt;
    cp $output_tar $PBS_O_WORKDIR&lt;br /&gt;
    echo -n &amp;quot;Copying of output complete on &amp;quot;&lt;br /&gt;
    date&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
function cleanup_ramdisk {&lt;br /&gt;
    echo -n &amp;quot;Cleaning up ramdisk directory /dev/shm/$USER on &amp;quot;&lt;br /&gt;
    date&lt;br /&gt;
    rm -rf /dev/shm/$USER&lt;br /&gt;
    echo -n &amp;quot;done at &amp;quot;&lt;br /&gt;
    date&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
function trap_term {&lt;br /&gt;
    echo -n &amp;quot;Trapped term (soft kill) signal on &amp;quot;&lt;br /&gt;
    date&lt;br /&gt;
    save_results&lt;br /&gt;
    cleanup_ramdisk&lt;br /&gt;
    exit&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
function interruptible_sleep {&lt;br /&gt;
    # waits for a number of seconds&lt;br /&gt;
    # argument 1 = number of seconds&lt;br /&gt;
    # note: just doing `sleep $1' would not be interruptible!&lt;br /&gt;
    for m in `seq $1`; do  &lt;br /&gt;
        sleep 1&lt;br /&gt;
    done&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
function is_running {&lt;br /&gt;
    # check if one or more process is running &lt;br /&gt;
    # argument 1 = a command separated list of PIDs (no spaces)&lt;br /&gt;
    ps -p $1 -o pid= | wc -l&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
#trap the termination signal, and call the function 'trap_term' when &lt;br /&gt;
# that happens, so results may be saved.&lt;br /&gt;
trap &amp;quot;trap_term&amp;quot; TERM&lt;br /&gt;
&lt;br /&gt;
#number of pollings per save period (rounded down):&lt;br /&gt;
npoll=$(($save_period*60/$poll_period))&lt;br /&gt;
&lt;br /&gt;
#polling and saving loop&lt;br /&gt;
running=$(is_running $pid)&lt;br /&gt;
while [ $running -gt 0 ]&lt;br /&gt;
do&lt;br /&gt;
    for n in `seq $npoll`&lt;br /&gt;
    do&lt;br /&gt;
        interruptible_sleep $poll_period&lt;br /&gt;
        running=$(is_running $pid)&lt;br /&gt;
        if [ $running -eq 0 ]; then&lt;br /&gt;
            break&lt;br /&gt;
        fi&lt;br /&gt;
    done&lt;br /&gt;
    save_results&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
#Done&lt;br /&gt;
cleanup_ramdisk&lt;br /&gt;
&lt;br /&gt;
echo -n &amp;quot;Job finished cleanly on &amp;quot;&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Notes with this script:&lt;br /&gt;
* The script assumes that the tar file &amp;lt;tt&amp;gt;input.tar&amp;lt;/tt&amp;gt; contains the executable &amp;lt;tt&amp;gt;job&amp;lt;/tt&amp;gt; and the input files in a subdirectory called &amp;lt;tt&amp;gt;indir&amp;lt;/tt&amp;gt; (with further subdirectories for the case of 8 serial jobs).&lt;br /&gt;
* The executable is supposed to take the locations of the input and output directory as arguments.&lt;br /&gt;
* The trap comment makes sure that the results gets saved and the ramdisk gets flushed even when the jobs gets killed before the end of the script is reached.  &amp;lt;tt&amp;gt;trap&amp;lt;/tt&amp;gt; is a bash script construction that executes the given command when the script is given, in this case, a TERM signal.  The TERM signal is given by the scheduler 30 seconds before your time is up.&lt;br /&gt;
* You could also [[Using_Signals|trap signals in your C, C++ or FORTRAN codes]].&lt;br /&gt;
* All files are kept in a subdirectory of &amp;lt;tt&amp;gt;/dev/shm&amp;lt;/tt&amp;gt;. This makes the clean up simpler, and keeps things tidy when doing small test jobs on the development nodes.&lt;br /&gt;
&lt;br /&gt;
Further notes:&lt;br /&gt;
* Often collections of serial jobs are run on the ramdisk, see the [[User_Serial|serial run wiki page]] for more details on that.&lt;br /&gt;
* If your application needs just a bit more ramdisk, there are 24 nodes with 18GB and 84 nodes with 32GB of RAM.  These nodes can be requested by &amp;lt;tt&amp;gt;qsub -l nodes=2:ib:m18g:ppn=8,walltime=1:00:00&amp;lt;/tt&amp;gt; or &amp;lt;tt&amp;gt;qsub -l nodes=2:ib:m32g:ppn=8,walltime=1:00:00&amp;lt;/tt&amp;gt;. They are infiniband nodes, which are in short supply, so only use these nodes if you have to. Finally, there are 2 stand-alone large memory (128GB) nodes. They have 16 cores and are intel machines running linux, but they are not the same architecture (Nehalem) as the GPC compute nodes, so codes may have to be compiled separately for these machines. They can be accessed using a specific largemem queue. See [[GPC Quickstart]].&lt;br /&gt;
&lt;br /&gt;
--[[User:Rzon|Rzon]] 18 June 2010 (UTC)&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=GPC_Quickstart&amp;diff=1317</id>
		<title>GPC Quickstart</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=GPC_Quickstart&amp;diff=1317"/>
		<updated>2010-07-04T16:59:41Z</updated>

		<summary type="html">&lt;p&gt;Cneale: Undo revision 1315 by Cneale (Talk)&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:University_of_Tor_79284gm-a.jpg|center|300px|thumb]]&lt;br /&gt;
|name=General Purpose Cluster (GPC)&lt;br /&gt;
|installed=June 2009&lt;br /&gt;
|operatingsystem= Linux&lt;br /&gt;
|loginnode= gpc01..gpc04 (from &amp;lt;tt&amp;gt;login.scinet&amp;lt;/tt&amp;gt;)&lt;br /&gt;
|numberofnodes=3780&lt;br /&gt;
|rampernode=16 Gb &lt;br /&gt;
|corespernode=8&lt;br /&gt;
|interconnect=1/4 on Infiniband, rest on GigE&lt;br /&gt;
|vendorcompilers=icc (C) ifort (fortran) icpc (C++)&lt;br /&gt;
|queuetype=[[Moab | Moab/Torque]]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
The General Purpose Cluster is an extremely large cluster (ranked [http://www.top500.org/list/2009/06/100 16th] in the world at its inception, and fastest in Canada) and is where most simulations are to be done at SciNet.  It is an IBM iDataPlex cluster based on Intel's Nehalem architecture (one of the [http://www.hpcwire.com/features/HPC-Vendors-Jump-On-Nehalem-42360237.html first in the world] to make use of the new chips). The GPC consists of 3,780 nodes with a total of 30,240  2.5GHz cores, with 16GB RAM per node (2GB per core). Approximately one quarter of the cluster is interconnected with non-blocking 4x-DDR InfiniBand while the rest of the nodes are connected with gigabit ethernet. &lt;br /&gt;
&lt;br /&gt;
===Login===&lt;br /&gt;
&lt;br /&gt;
First login via ssh with your scinet account at &amp;lt;tt&amp;gt;login.scinet.utoronto.ca&amp;lt;/tt&amp;gt;, and from there you can proceed to the Development nodes to compile/test your code.&lt;br /&gt;
&lt;br /&gt;
===Compile/Devel Nodes===&lt;br /&gt;
&lt;br /&gt;
From a scinet login node you can ssh to &amp;lt;tt&amp;gt;gpc01&amp;lt;/tt&amp;gt;..&amp;lt;tt&amp;gt;gpc04&amp;lt;/tt&amp;gt;.  These nodes have the same hardware configuration as most of the compute nodes -- 8 Nehalem processing cores with 16GB RAM and Gigabit ethernet.  You can compile and test your codes on these nodes. To interactively test on more than 8 processors, or to test your code over an InfiniBand connection, you can submit an [[GPC_Quickstart#Submitting_an_Interactive_Job | interactive job request]].&lt;br /&gt;
&lt;br /&gt;
Your [[Storage_Quickstart | home directory]] is in &amp;lt;tt&amp;gt;/home/USER&amp;lt;/tt&amp;gt;; you have 10GB there that is backed up. This directory cannot be written to by the compute nodes! Thus, to run jobs, you'll use the &amp;lt;tt&amp;gt;/scratch/USER&amp;lt;/tt&amp;gt; directory. Here, there is a large amount of disk space, but it is not backed up. Thus it makes sense to keep your codes in /home, compile there, and then run them in the /scratch directory.&lt;br /&gt;
&lt;br /&gt;
===Modules and Environment Variables===&lt;br /&gt;
&lt;br /&gt;
To use most packages on the SciNet machines - including any of the compilers - , you will have to use the `modules' command.  The command &amp;lt;tt&amp;gt;module load some-package&amp;lt;/tt&amp;gt; will set your environment variables (&amp;lt;tt&amp;gt;PATH&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;LD_LIBRARY_PATH&amp;lt;/tt&amp;gt;, etc) to include the default version of that package.   &amp;lt;tt&amp;gt;module load some-package/specific-version&amp;lt;/tt&amp;gt; will load a specific version of that package.  This makes it very easy for different users to use different versions of compilers, MPI versions, libraries etc.&lt;br /&gt;
&lt;br /&gt;
Note that to use even the gcc compilers you will have to do&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load gcc&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
but in fact you probably should use the intel compilers installed on this system as they usually produce faster code (and sometimes, much faster.)&lt;br /&gt;
&lt;br /&gt;
A list of the installed software is available in [[Software_and_Libraries | Software &amp;amp; Libraries]] and can &lt;br /&gt;
be seen on the system by typing &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module avail&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To load a module (for example, the default version of the intel compilers)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load intel&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
To unload a module&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module unload intel&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
To unload all modules&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module purge&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These commands should go in your .bashrc files and/or in your submission scripts to make sure you&lt;br /&gt;
are using the correct packages.&lt;br /&gt;
&lt;br /&gt;
===Compilers===&lt;br /&gt;
&lt;br /&gt;
The intel compilers are icc/icpc/ifort for C/C++/Fortran, and are available with the default module &amp;quot;intel&amp;quot;.  The intel compilers are recommended over the GNU compilers.  Documentation about icpc is available at &lt;br /&gt;
http://software.intel.com/en-us/articles/intel-software-technical-documentation/.  The Intel compilers accept many of the options that the GNU compilers accept, but tend to produce faster programs on our system.  If, for some reason, you really need the GNU compilers, the latest version of the GNU compiler collection (currently 4.4.0) is available by loading the &amp;quot;gcc&amp;quot; module, with gcc/g++/gfortran for C/C++/Fortran.   Note that f77/g77 is not supported. &lt;br /&gt;
&lt;br /&gt;
To ensure that the intel compilers are in your &amp;lt;tt&amp;gt;PATH&amp;lt;/tt&amp;gt; and their libraries are in your &amp;lt;tt&amp;gt;LD_LIBRARY_PATH&amp;lt;/tt&amp;gt;, use the command&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load intel&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This should likely go in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt; file so that it will automatically be loaded.&lt;br /&gt;
&lt;br /&gt;
Optimize your code for the GPC machine using of at least the following compiler flags: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   -O3 -xHost&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
(or &amp;lt;tt&amp;gt;-O3 -march=native&amp;lt;/tt&amp;gt; for the GNU compilers). &lt;br /&gt;
&lt;br /&gt;
*If your program uses openmp, add &amp;lt;tt&amp;gt;-openmp&amp;lt;/tt&amp;gt; (&amp;lt;tt&amp;gt;-fopenmp&amp;lt;/tt&amp;gt; for GNU compilers).&lt;br /&gt;
*If you get the warning &amp;lt;tt&amp;gt;feupdatreenv is not implemented&amp;lt;/tt&amp;gt;, add -limf to the link line.&lt;br /&gt;
*If you need to link in the MKL libraries, you are well advised to use the Intel(R) Math Kernel Library Link Line Advisor: http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/ for help in devising the list of libraries to link with your code.&lt;br /&gt;
&lt;br /&gt;
===[[ GPC_MPI_Versions | MPI]]===&lt;br /&gt;
&lt;br /&gt;
SciNet currently provides multiple MPI libraries for the GPC; [http://www.open-mpi.org/ OpenMPI], and [http://software.intel.com/en-us/intel-mpi-library/ IntelMPI].  We currently recommend OpenMPI as the default, as it quite reliably demonstrates good performance on both the infiniband and ethernet networks.  For full details and options see the complete [[ GPC_MPI_Versions | '''MPI''']] section.&lt;br /&gt;
&lt;br /&gt;
The MPI libraries are compiled with both the gnu compiler suite and the intel compiler suite.   To use (for instance) the intel-compiled OpenMPI libraries, which we recommend as the default (and use for most of our examples here), use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load openmpi intel&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
&lt;br /&gt;
in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt;.   Other combinations behave similarly.&lt;br /&gt;
&lt;br /&gt;
The MPI libraries define the wrappers mpicc/mpicxx/mpif90/mpif77 as wrappers around the appropriate compilers, which ensure the appropriate include and library directories and used in the compilation and linking steps.&lt;br /&gt;
&lt;br /&gt;
We currently recommend the Intel + OpenMPI combination.  However, if you require the GNU compilers as well as MPI, you would want to find the most recent openmpi module available with `gcc' in the version name.  This will enable development and runtime with gcc/g++/gfortran  and OpenMPI.  You can make this your default by putting the module load line in your ~/.bashrc file.&lt;br /&gt;
&lt;br /&gt;
For mixed OpenMP/MPI code using Intel MPI, add the compilation flag -mt_mpi for full thread-safety.&lt;br /&gt;
&lt;br /&gt;
===Submitting A Batch Job===&lt;br /&gt;
&lt;br /&gt;
The SciNet machines are shared systems, and jobs that are to run on them are submitted to a queue; the&lt;br /&gt;
[[Moab | scheduler]] then orders the jobs in order to make the best use of the machine, and has them launched &lt;br /&gt;
when resources become availble.   The intervention of the scheduler can mean that the jobs aren't&lt;br /&gt;
quite run in a  first-in first-out order.&lt;br /&gt;
&lt;br /&gt;
The maximum [[wallclock time]] for a job in the queue is 48 hours; computations that will take longer than&lt;br /&gt;
this must be broken into 48-hour chunks and run as several jobs.  The usual way to do this is with [[checkpoints]],&lt;br /&gt;
writing out the complete state of the computation every so often in such a way that a job can be restarted from&lt;br /&gt;
this state information and continue on from where it left off.  Generating [[checkpoints]] is a good idea anyway,&lt;br /&gt;
as in the unlikely event of a hardware failure during your run, it allows you to restart without having lost much work.&lt;br /&gt;
&lt;br /&gt;
There are limits to how many jobs you can submit.  If your group has a default account, up to 32 nodes at a time for 48 hours per job on the GPC cluster are allowed to be queued. This is a total limit, e.g., you could request 64 nodes for 24 hours.  Jobs of users with an LRAC or NRAC allocation will run at a higher priority than others while their resources last. Because of the group-based allocation, it is conceivable that your jobs won't run if your colleagues have already exhausted your group's limits.&lt;br /&gt;
&lt;br /&gt;
Note that scheduling big jobs greatly affects the queuer and other users, so you have to talk to us first to run massively parallel jobs (&amp;gt; 2048 cores). We will help make sure that your jobs start and run efficiently.&lt;br /&gt;
&lt;br /&gt;
If your job should run in fewer than  48 hours, specify that in your script -- your job &lt;br /&gt;
will start sooner.   (It's easier for the [[Moab | scheduler]] to fit in a short job than a long job).  On the downside, the&lt;br /&gt;
job will be killed automatically by the queue manager software at the end of the specified [[wallclock time]], so if you&lt;br /&gt;
guess wrong you might lose some work.  So the standard procedure is to estimate how long your job will take and&lt;br /&gt;
add 10% or so. &lt;br /&gt;
&lt;br /&gt;
You interact with the queuing system through the queue/resource manager, [[Moab | Moab]] and [[Moab | Torque]].  To see all the jobs in the queue use&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To submit your own job, you must write a script which describes the job and how it is to be run (a sample script [[GPC_Quickstart#Submission_Script | follows]]) and submit it to the queue, using the command&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub SCRIPT-FILE-NAME&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where you will replace &amp;lt;tt&amp;gt;SCRIPT-FILE-NAME&amp;lt;/tt&amp;gt; with the file containing the submission script.   This will return a job ID, for example 31415, which is used to identify the jobs.  Information about a queued job can be found using&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
checkjob JOB-ID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and jobs can be canceled with the command&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
canceljob JOB-ID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Again, these commands have many options, which can be read about on their man pages.&lt;br /&gt;
&lt;br /&gt;
Much more information on the queueing system is available on our [[Moab | queue]] page.&lt;br /&gt;
&lt;br /&gt;
====Batch Submission Script: MPI====&lt;br /&gt;
&lt;br /&gt;
A sample submission script is shown below for an mpi job using ethernet with the &amp;lt;tt&amp;gt; #PBS &amp;lt;/tt&amp;gt; directives at the top and the rest being &lt;br /&gt;
what will be executed on the compute node.  &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for SciNet GPC (ethernet)&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=2:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N test&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND; -np = nodes*ppn&lt;br /&gt;
mpirun -np 16 -hostfile $PBS_NODEFILE ./a.out&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The lines that begin &amp;lt;tt&amp;gt;#PBS&amp;lt;/tt&amp;gt; are commands that are parsed and interpreted by qsub at submission time, and control administrative things about your job.   In this example, the script above requests two nodes, using 8 processors per node, for a [[wallclock time]] of one hour.  (The resources required by the job are listed on the &amp;lt;tt&amp;gt;#PBS -l&amp;lt;/tt&amp;gt; line.)   Other options can be given in other &amp;lt;tt&amp;gt;#PBS&amp;lt;/tt&amp;gt; lines, such as &amp;lt;tt&amp;gt;#PBS -N&amp;lt;/tt&amp;gt;, which sets the name of the job.   &lt;br /&gt;
&lt;br /&gt;
The rest of the script is run as a bash script at run time.   A bash shell on the first node of the two nodes that are requested executes these commands as a normal bash script, just as if you had run this as a shell script from the terminal.   The only difference is that PBS sets certain environment variables that you can use in the script.  &amp;lt;tt&amp;gt;$PBS_O_WORKDIR&amp;lt;/tt&amp;gt; is set to be the directory that the command was 'submitted' from - eg,  &amp;lt;tt&amp;gt;/scratch/USER/SOMEDIRECTORY&amp;lt;/tt&amp;gt; - and &amp;lt;tt&amp;gt;$PBS_NODEFILE&amp;lt;/tt&amp;gt; is the name of a file which contains all the nodes on which programs should execute.   Using these environment variables, the script then uses the &amp;lt;tt&amp;gt;mpirun&amp;lt;/tt&amp;gt; command to launch the job.   Assumed here is that the user has a line like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load openmpi intel&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
&lt;br /&gt;
in their &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
* Note: The different versions of MPI require different commands to launch the run, and thus different scripts. The above script is  specific for the openmpi module.  For the intelmpi module, the last line of the script should read&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   mpirun -r ssh -np 16 -env I_MPI_DEVICE ssm ./a.out&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
====Submitting Collections of Serial Jobs====&lt;br /&gt;
&lt;br /&gt;
SciNet-approved methods for running collections of serial jobs can be found on the [[User_Serial|serial run wiki page]].&lt;br /&gt;
&lt;br /&gt;
====Batch Submission Script: OpenMP====&lt;br /&gt;
&lt;br /&gt;
For running OpenMP jobs, the procedure is similar as for MPI jobs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for SciNet GPC (OpenMP)&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N test&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=8&lt;br /&gt;
./a.out&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that [[Introduction_To_Performance#Throughput | in some circumstances]] it can be more efficient to run (say) two jobs each running on four threads than one job running on eight threads.   In that case you can use the same `ampersand-and-wait' technique outlined for serial jobs (see [[User_Serial|serial run wiki page]]) for less-than-eight-core OpenMP jobs.&lt;br /&gt;
&lt;br /&gt;
====Hybrid MPI/OpenMP jobs====&lt;br /&gt;
&lt;br /&gt;
=====Using Intel MPI=====&lt;br /&gt;
Here is how to run hybrid codes using intelmpi::&lt;br /&gt;
&lt;br /&gt;
http://software.intel.com/en-us/articles/hybrid-applications-intelmpi-openmp/&lt;br /&gt;
&lt;br /&gt;
Make sure you compile with the -mt_mpi option to the compilers to use the thread safe libraries. &lt;br /&gt;
Set the environment variable I_MPI_PIN_DOMAIN&lt;br /&gt;
&lt;br /&gt;
    $ export I_MPI_PIN_DOMAIN=omp&lt;br /&gt;
&lt;br /&gt;
This will set the process pinning domain size to be equal to OMP_NUM_THREADS (which you should set to the desired number of threads per mpi process). Therefore, each MPI process can create $OMP_NUM_THREADS number of children threads for running within the corresponding domain. If OMP_NUM_THREADS is not set, each node is treated as a separate domain (which will allow as many threads per MPI processes as there are cores).&lt;br /&gt;
&lt;br /&gt;
In addition, when invoking mpirun, you should add the argument &amp;quot;-ppn X&amp;quot;, where X is the number of MPI processes per node.&lt;br /&gt;
For example:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     mpirun -r ssh -ppn 2 -np 8  ....&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
would start 2 mpi processes per node for a total of 8 processes, so mpirun will try to run mpi processes on 4 nodes&lt;br /&gt;
(OMP_NUM_THREADS is then probably best set at 4).&lt;br /&gt;
Your job script should still ask for these 4 nodes with the line&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     #PBS -l nodes=4:ppn=8,walltime=....&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
(ppn=8 is not a mistake here; the ppn parameter has a different meaning for PBS and for mpirun)&lt;br /&gt;
&lt;br /&gt;
''The ppn parameter to mpirun is very important! Without it, eight mpi jobs would get bunched on the first node in this example, leaving 3 nodes unused.''&lt;br /&gt;
&lt;br /&gt;
NOTE: In order to pin OpenMP threads inside the domain, use the corresponding OpenMP feature by setting the KMP_AFFINITY environment variable, see [http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/fortran/lin/compiler_f/optaps/common/optaps_openmp_thread_affinity.htm#KMP_AFFINITY_Environment_Variable|Intel's Compiler User and Reference Guide].&lt;br /&gt;
&lt;br /&gt;
The IntelMPI manual is referenced on the front page of our wiki:&lt;br /&gt;
&lt;br /&gt;
http://software.intel.com/sites/products/documentation/hpc/mpi/linux/reference_manual.pdf&lt;br /&gt;
&lt;br /&gt;
For the above example of a total of 8 processes on 4 nodes, you could use the following script:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for SciNet GPC (hybrid job)&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=4:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N test&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# SET THE NUMBER OF THREADS PER PROCESS:&lt;br /&gt;
export OMP_NUM_THREADS=4&lt;br /&gt;
&lt;br /&gt;
# PIN THE MPI DOMAINS ACCORDING TO OMP&lt;br /&gt;
export I_MPI_PIN_DOMAIN=omp&lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND; -np = nodes*ppn&lt;br /&gt;
mpirun -r ssh -ppn 2 -np 8 ./a.out&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=====Using Open MPI=====&lt;br /&gt;
&lt;br /&gt;
For mixed MPI/OpenMP jobs using OpenMPI, which is the default for many users, the procedure is similar, but details differ.&lt;br /&gt;
&lt;br /&gt;
* Request the number of nodes in the PBS script.&lt;br /&gt;
* Set OMP_NUM_THREADS to the number of threads per MPI process.&lt;br /&gt;
* In addition to the -np parameter for mpirun, add the argument &amp;lt;tt&amp;gt;--bynode&amp;lt;/tt&amp;gt;, so that the mpi processes are not bunched up.&lt;br /&gt;
&lt;br /&gt;
So for example, to start a total of 8 processes on 4 nodes, you could use the following script&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for SciNet GPC (hybrid job)&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=4:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N test&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# SET THE NUMBER OF THREADS PER PROCESS:&lt;br /&gt;
export OMP_NUM_THREADS=4&lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND; -np = nodes*processes_per_nodes; --byhost forces a round robin of nodes.&lt;br /&gt;
mpirun -np 8 --bynode -hostfile $PBS_NODEFILE ./a.out&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Submitting an Interactive Job===&lt;br /&gt;
&lt;br /&gt;
It is sometimes convenient to run a job interactively; this can be very handy for debugging purposes.  In this case, you type a &amp;lt;tt&amp;gt;qsub&amp;lt;/tt&amp;gt; command which submits an interactive job to the queue; when the scheduler selects this job to run, then it starts a shell running on the first node of the job, which connects to your terminal.  You can then type any series of commands (for instance, the same commands listed as in the batch submission script above) to run a job interactively.&lt;br /&gt;
&lt;br /&gt;
For example, to start the same sort of job as in the batch submission script above, but interactively, one would type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -I -l nodes=2:ppn=8,walltime=1:00:00&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This is exactly the &amp;lt;tt&amp;gt;#PBS -l&amp;lt;/tt&amp;gt; line in the batch script above (which requests all 8 processors on each of 2 nodes for one hour), but prepended with a &amp;lt;tt&amp;gt;-I&amp;lt;/tt&amp;gt; for `interactive'.   When this job begins, your terminal will now show you as being logged in to one of the compute nodes, and one can type in any shell command, run &amp;lt;tt&amp;gt;mpirun&amp;lt;/tt&amp;gt;, etc.   When you exit the shell, the job will end.  Interactive jobs can be used with any of the [[ Moab#GPC | GPC queues ]] however, there is a short&lt;br /&gt;
high turnover queue called [[ Moab#debug | debug ]] which can be especially useful when the system is busy.&lt;br /&gt;
&lt;br /&gt;
===Ethernet vs. Infiniband===&lt;br /&gt;
&lt;br /&gt;
About 1/4 of the GPC (862 nodes or 6896 cores) is connected with a high bandwidth low-latency fabric called&lt;br /&gt;
[http://en.wikipedia.org/wiki/InfiniBand InfiniBand].  Many jobs which require tight coupling to scale well greatly benefit from this interconnect;&lt;br /&gt;
other types of jobs, which have relatively modest communications, do not require this and run fine on Gigabit ethernet.&lt;br /&gt;
&lt;br /&gt;
Jobs which require the InfiniBand for good performance can request the nodes that have the `&amp;lt;tt&amp;gt;ib&amp;lt;/tt&amp;gt;' feature in the &amp;lt;tt&amp;gt;#PBS -l&amp;lt;/tt&amp;gt; line,&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#PBS -l nodes=2:ib:ppn=8,walltime=1:00:00&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Because there are a limited number of these nodes, your job will start running faster if you do not request them (e.g. if you use the scripts as shown above), as this increases the number of nodes available to run your job. In fact, the InfiniBand nodes are to be used only for jobs that are known to scale well and  will benefit from this type of interconnect. As such the minimum number of nodes requested has to be at least 2, as single node jobs will not benefit from using an&lt;br /&gt;
Infiniband node. The MPI libraries provided by SciNet automatically correctly use either the InfiniBand or ethernet interconnect depending on which nodes your job runs on.&lt;br /&gt;
&lt;br /&gt;
===HyperThreading===&lt;br /&gt;
&lt;br /&gt;
Each GPC compute node has 8 Nehalem cores (Intel Xeon E5540 @ 2.53&lt;br /&gt;
Ghz). Thus, fully utilizing the node requires at least 8&lt;br /&gt;
tasks. We say `at least' because the Nehalem cores support&lt;br /&gt;
Hyper-Threading, which means it is as if there are twice as many cores, although the number of execution units is unchanged. Because most applications spend a lot of time waiting for data&lt;br /&gt;
from memory, this allows one task to use the processor&lt;br /&gt;
while another is stuck waiting for memory, and vice versa.&lt;br /&gt;
This requires no changes to the code, only running 16 rather than&lt;br /&gt;
8 tasks on the node (e.g. &amp;lt;tt&amp;gt;export OMP_NUM_THREADS=16&amp;lt;/tt&amp;gt;).  The resulting performance gain depends highly on the application.  Beware that mpi runs should still be limited to 8 mpi processes per node.&lt;br /&gt;
&lt;br /&gt;
Note: the &amp;lt;tt&amp;gt;ppn=8&amp;lt;/tt&amp;gt; part of the job requirement specification should '''not''' be modified because of HyperThreading.&lt;br /&gt;
&lt;br /&gt;
===Memory Configuration===&lt;br /&gt;
&lt;br /&gt;
==== 16G ====&lt;br /&gt;
&lt;br /&gt;
There are 3756 nodes which have 16G of memory, and is the primary configuration in the GPC. These nodes will be used by default.&lt;br /&gt;
&lt;br /&gt;
==== 18G ====&lt;br /&gt;
&lt;br /&gt;
There are 24 Infiniband nodes which have 18G of memory. These nodes have a fully populated memory configuration that maximizes memory bandwidth. To&lt;br /&gt;
request these nodes use:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -l nodes=2:ib:m18g:ppn=8,walltime=1:00:00 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== 32G ====&lt;br /&gt;
&lt;br /&gt;
There are 84 Infiniband nodes which have 32G of memory. To request these nodes use:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -l nodes=2:ib:m32g:ppn=8,walltime=1:00:00 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== 128G ====&lt;br /&gt;
There are two stand-alone large memory (128GB) nodes, &amp;lt;tt&amp;gt;gpc-lrgmem01&amp;lt;/tt&amp;gt; and &amp;lt;tt&amp;gt;gpc-lrgmem02&amp;lt;/tt&amp;gt; which are primarily to be used for data analysis of runs.  They have 16 cores and are intel machines running linux, but they are not the same architecture (Nehalem) as the GPC compute nodes, so codes may have to be compiled separately for these machines.  They can be accessed using a specific &amp;lt;tt&amp;gt;largemem&amp;lt;/tt&amp;gt; queue.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -l nodes=2:ppn=8,walltime=1:00:00 -q largemem -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Ram Disk===&lt;br /&gt;
&lt;br /&gt;
On the GPC nodes, there is a `ram disk' available - up to half of the memory on the node may be used as a temporary file system.  This is particularly useful for use in the early stages of migrating destop-computing codes to a High Performance Computing platform such as the GPC.    It is much faster than real disk and does not require network traffic; however, each node sees its own ramdisk and cannot see files on that of other nodes.   This is a very easy way to cache writes (by writing them to fast ram disk instead of slow `real' disk); and then one would periodically copy the files to files on /scratch or /project so that they are available after the job has completed.&lt;br /&gt;
&lt;br /&gt;
To use the ramdisk, create and read to / write from files in /dev/shm/.. just as one would to (eg) /scratch/USER/.  Only the amount of RAM needed to store the files will be taken up by the temporary file system; thus if you have 8 serial jobs each requiring 1 GB of RAM, and 1GB is taken up by various OS services, you would still have approximately 7GB available to use as ramdisk on a 16GB node.   However, if you were to write 8 GB of data to the RAM disk, this would exceed available memory and your job would likely crash.&lt;br /&gt;
   &lt;br /&gt;
NOTE: it is very important to delete your files from ram disk at the end of your job.   If you do not do this, the next user to use that node will have less RAM available than they might expect, and this might kill their jobs.&lt;br /&gt;
&lt;br /&gt;
More details on how to setup your script to use the ramdisk can be found on the [[User_Ramdisk|Ramdisk wiki page]].&lt;br /&gt;
&lt;br /&gt;
=== Managing jobs on the Queuing system ===&lt;br /&gt;
Information on checking available resources, starting, viewing, managing and canceling jobs on [[Moab | Moab/Torque]]&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=User_Ramdisk&amp;diff=1316</id>
		<title>User Ramdisk</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=User_Ramdisk&amp;diff=1316"/>
		<updated>2010-07-04T16:49:11Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* Using Ramdisk */  -- replaced message about purging ramdisk orelse hurting other users as it is no longer correct&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Using Ramdisk==&lt;br /&gt;
&lt;br /&gt;
On the GPC nodes, a `ramdisk' is available. Up to half of the memory&lt;br /&gt;
on the node may be used as a temporary file system.  This is&lt;br /&gt;
particularly useful for use in the early stages of migrating&lt;br /&gt;
desktop-computing codes to a High Performance Computing platform such&lt;br /&gt;
as the GPC, especially those that use a lot of I/O, such as Blast.&lt;br /&gt;
Using a lot if I/O becomes a bottleneck in large scale computing. One&lt;br /&gt;
especially suffers a performance penalty on parallel file systems&lt;br /&gt;
(such as the GPFS used on SciNet), since the files are synchronized&lt;br /&gt;
across the whole network.&lt;br /&gt;
&lt;br /&gt;
Ramdisk is much faster than real disk, and is especially beneficial&lt;br /&gt;
for codes which perform a lot of small I/O work, since the ramdisk&lt;br /&gt;
does not require network traffic.  However, each node sees its own&lt;br /&gt;
ramdisk and cannot see files on that of other nodes.&lt;br /&gt;
&lt;br /&gt;
To use the ramdisk, create and read to / write from files in&lt;br /&gt;
/dev/shm/.. just as one would to (eg) /scratch/USER/.  Only the amount&lt;br /&gt;
of RAM needed to store the files will be taken up by the temporary&lt;br /&gt;
file system. Thus if you have 8 serial jobs each requiring 1 GB of&lt;br /&gt;
RAM, and 1GB is taken up by various OS services, you would still have&lt;br /&gt;
approximately 7GB available to use as ramdisk on a 16GB node.&lt;br /&gt;
However, if you were to write 8 GB of data to the RAM disk, this would&lt;br /&gt;
exceed available memory and your job would likely crash.&lt;br /&gt;
&lt;br /&gt;
Note that when using the ramdisk:&lt;br /&gt;
&lt;br /&gt;
* At the start of your job, you can copy frequently accessed files to ramdisk (''stage in''). If there are many such files, it is beneficial to put them in a tar file.&lt;br /&gt;
* One would periodically copy the output files from ramdisk to /scratch or /project, as well as at the end of the job, of course (''stage out'').&lt;br /&gt;
* For good form, you should delete your files from ram disk at the end of your job. However, the ramdisk is purged between jobs so that the next user to use that node will always get the full amount of RAM available than they might expect. &lt;br /&gt;
&lt;br /&gt;
A script using the ramdisk in a 1 day openMP job might look like this:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#MOAB/Torque submission script for SciNet GPC &lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=24:00:00&lt;br /&gt;
#PBS -N ramdisk-test&lt;br /&gt;
&lt;br /&gt;
#Job parameters:&lt;br /&gt;
execname=job          # name of the executable&lt;br /&gt;
input_tar=input.tar   # tar file with input files and executables&lt;br /&gt;
output_tar=out.tar    # file in which to store output&lt;br /&gt;
input_subdir=indir    # sub-directory (within input_tar) with input files&lt;br /&gt;
output_subdir=outdir  # sub-directory to contain of output files&lt;br /&gt;
poll_period=60        # how often check for job completion (in seconds)&lt;br /&gt;
save_period=120       # how often to save output (in minutes)&lt;br /&gt;
&lt;br /&gt;
#Track how long everything takes.&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
#Copy to ramdisk&lt;br /&gt;
echo &amp;quot;Stage-in: copying files to ramdisk directory /dev/shm/$USER&amp;quot;&lt;br /&gt;
mkdir -p /dev/shm/$USER&lt;br /&gt;
mkdir -p /dev/shm/$USER/$output_subdir&lt;br /&gt;
cd /dev/shm/$USER&lt;br /&gt;
cp $PBS_O_WORKDIR/$input_tar .&lt;br /&gt;
tar xf $input_tar&lt;br /&gt;
rm -rf $input_tar&lt;br /&gt;
&lt;br /&gt;
#Track how long everything takes.&lt;br /&gt;
echo -n &amp;quot;Stage-in completed on &amp;quot;&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
#Run on ramdisk&lt;br /&gt;
echo &amp;quot;Starting job&amp;quot;&lt;br /&gt;
./$execname $input_subdir $output_subdir &amp;amp;&lt;br /&gt;
# Store the process id in $pid so we may check if it's still running:&lt;br /&gt;
pid=$!&lt;br /&gt;
&lt;br /&gt;
#Note:&lt;br /&gt;
# 1. The above launching command is appropriate for a multi-threaded (openMP) applications.&lt;br /&gt;
# 2. Ramdisk MPI jobs are limited to 1 node as /dev/shm is not shared across nodes.&lt;br /&gt;
# 3. For serial jobs, you'd want to start 8 jobs at the same time instead, e.g.&lt;br /&gt;
#     mkdir -p $output_subdir/1&lt;br /&gt;
#     ./$execname ${input_subdir}/1 ${output_subdir}/1 &amp;amp;&lt;br /&gt;
#     pid=$!&lt;br /&gt;
#     mkdir -p $output_subdir/2&lt;br /&gt;
#     ./$execname ${input_subdir}/2 ${output_subdir}/2 &amp;amp;&lt;br /&gt;
#     pid=$pid,$!&lt;br /&gt;
#     mkdir -p $output_subdir/3&lt;br /&gt;
#     ./$execname ${input_subdir}/3 ${output_subdir}/3 &amp;amp;&lt;br /&gt;
#     pid=$pid,$!&lt;br /&gt;
#     mkdir -p $output_subdir/4&lt;br /&gt;
#     ./$execname ${input_subdir}/4 ${output_subdir}/4 &amp;amp;&lt;br /&gt;
#     pid=$pid,$!&lt;br /&gt;
#     mkdir -p $output_subdir/5&lt;br /&gt;
#     ./$execname ${input_subdir}/5 ${output_subdir}/5 &amp;amp;&lt;br /&gt;
#     pid=$pid,$!&lt;br /&gt;
#     mkdir -p $output_subdir/6&lt;br /&gt;
#     ./$execname ${input_subdir}/6 ${output_subdir}/6 &amp;amp;&lt;br /&gt;
#     pid=$pid,$!&lt;br /&gt;
#     mkdir -p $output_subdir/7&lt;br /&gt;
#     ./$execname ${input_subdir}/7 ${output_subdir}/7 &amp;amp;&lt;br /&gt;
#     pid=$pid,$!&lt;br /&gt;
#     mkdir -p $output_subdir/8&lt;br /&gt;
#     ./$execname ${input_subdir}/8 ${output_subdir}/8 &amp;amp;&lt;br /&gt;
#     pid=$pid,$!&lt;br /&gt;
&lt;br /&gt;
#Track how long everything takes.&lt;br /&gt;
echo -n &amp;quot;Job started on &amp;quot;&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
function save_results {    &lt;br /&gt;
    echo -n &amp;quot;Copying from directory $output_subdir to file $PBS_O_WORKDIR/$output_tar on &amp;quot;&lt;br /&gt;
    date&lt;br /&gt;
    tar cf $output_tar $output_subdir/*&lt;br /&gt;
    cp $output_tar $PBS_O_WORKDIR&lt;br /&gt;
    echo -n &amp;quot;Copying of output complete on &amp;quot;&lt;br /&gt;
    date&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
function cleanup_ramdisk {&lt;br /&gt;
    echo -n &amp;quot;Cleaning up ramdisk directory /dev/shm/$USER on &amp;quot;&lt;br /&gt;
    date&lt;br /&gt;
    rm -rf /dev/shm/$USER&lt;br /&gt;
    echo -n &amp;quot;done at &amp;quot;&lt;br /&gt;
    date&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
function trap_term {&lt;br /&gt;
    echo -n &amp;quot;Trapped term (soft kill) signal on &amp;quot;&lt;br /&gt;
    date&lt;br /&gt;
    save_results&lt;br /&gt;
    cleanup_ramdisk&lt;br /&gt;
    exit&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
function interruptible_sleep {&lt;br /&gt;
    # waits for a number of seconds&lt;br /&gt;
    # argument 1 = number of seconds&lt;br /&gt;
    # note: just doing `sleep $1' would not be interruptible!&lt;br /&gt;
    for m in `seq $1`; do  &lt;br /&gt;
        sleep 1&lt;br /&gt;
    done&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
function is_running {&lt;br /&gt;
    # check if one or more process is running &lt;br /&gt;
    # argument 1 = a command separated list of PIDs (no spaces)&lt;br /&gt;
    ps -p $1 -o pid= | wc -l&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
#trap the termination signal, and call the function 'trap_term' when &lt;br /&gt;
# that happens, so results may be saved.&lt;br /&gt;
trap &amp;quot;trap_term&amp;quot; TERM&lt;br /&gt;
&lt;br /&gt;
#number of pollings per save period (rounded down):&lt;br /&gt;
npoll=$(($save_period*60/$poll_period))&lt;br /&gt;
&lt;br /&gt;
#polling and saving loop&lt;br /&gt;
running=$(is_running $pid)&lt;br /&gt;
while [ $running -gt 0 ]&lt;br /&gt;
do&lt;br /&gt;
    for n in `seq $npoll`&lt;br /&gt;
    do&lt;br /&gt;
        interruptible_sleep $poll_period&lt;br /&gt;
        running=$(is_running $pid)&lt;br /&gt;
        if [ $running -eq 0 ]; then&lt;br /&gt;
            break&lt;br /&gt;
        fi&lt;br /&gt;
    done&lt;br /&gt;
    save_results&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
#Done&lt;br /&gt;
cleanup_ramdisk&lt;br /&gt;
&lt;br /&gt;
echo -n &amp;quot;Job finished cleanly on &amp;quot;&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Notes with this script:&lt;br /&gt;
* The script assumes that the tar file &amp;lt;tt&amp;gt;input.tar&amp;lt;/tt&amp;gt; contains the executable &amp;lt;tt&amp;gt;job&amp;lt;/tt&amp;gt; and the input files in a subdirectory called &amp;lt;tt&amp;gt;indir&amp;lt;/tt&amp;gt; (with further subdirectories for the case of 8 serial jobs).&lt;br /&gt;
* The executable is supposed to take the locations of the input and output directory as arguments.&lt;br /&gt;
* The trap comment makes sure that the results gets saved and the ramdisk gets flushed even when the jobs gets killed before the end of the script is reached.  &amp;lt;tt&amp;gt;trap&amp;lt;/tt&amp;gt; is a bash script construction that executes the given command when the script is given, in this case, a TERM signal.  The TERM signal is given by the scheduler 30 seconds before your time is up.&lt;br /&gt;
* You could also [[Using_Signals|trap signals in your C, C++ or FORTRAN codes]].&lt;br /&gt;
* All files are kept in a subdirectory of &amp;lt;tt&amp;gt;/dev/shm&amp;lt;/tt&amp;gt;. This makes the clean up simpler, and keeps things tidy when doing small test jobs on the development nodes.&lt;br /&gt;
&lt;br /&gt;
Further notes:&lt;br /&gt;
* Often collections of serial jobs are run on the ramdisk, see the [[User_Serial|serial run wiki page]] for more details on that.&lt;br /&gt;
* If your application needs just a bit more ramdisk, there are 24 nodes with 18GB and 84 nodes with 32GB of RAM.  These nodes can be requested by &amp;lt;tt&amp;gt;qsub -l nodes=2:ib:m18g:ppn=8,walltime=1:00:00&amp;lt;/tt&amp;gt; or &amp;lt;tt&amp;gt;qsub -l nodes=2:ib:m32g:ppn=8,walltime=1:00:00&amp;lt;/tt&amp;gt;. They are infiniband nodes, which are in short supply, so only use these nodes if you have to. Finally, there are 2 stand-alone large memory (128GB) nodes. They have 16 cores and are intel machines running linux, but they are not the same architecture (Nehalem) as the GPC compute nodes, so codes may have to be compiled separately for these machines. They can be accessed using a specific largemem queue. See [[GPC Quickstart]].&lt;br /&gt;
&lt;br /&gt;
--[[User:Rzon|Rzon]] 18 June 2010 (UTC)&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=GPC_Quickstart&amp;diff=1315</id>
		<title>GPC Quickstart</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=GPC_Quickstart&amp;diff=1315"/>
		<updated>2010-07-04T16:48:28Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* Ram Disk */  -- old message about needing to delete files from ramdisk removed as it is no longer valid&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:University_of_Tor_79284gm-a.jpg|center|300px|thumb]]&lt;br /&gt;
|name=General Purpose Cluster (GPC)&lt;br /&gt;
|installed=June 2009&lt;br /&gt;
|operatingsystem= Linux&lt;br /&gt;
|loginnode= gpc01..gpc04 (from &amp;lt;tt&amp;gt;login.scinet&amp;lt;/tt&amp;gt;)&lt;br /&gt;
|numberofnodes=3780&lt;br /&gt;
|rampernode=16 Gb &lt;br /&gt;
|corespernode=8&lt;br /&gt;
|interconnect=1/4 on Infiniband, rest on GigE&lt;br /&gt;
|vendorcompilers=icc (C) ifort (fortran) icpc (C++)&lt;br /&gt;
|queuetype=[[Moab | Moab/Torque]]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
The General Purpose Cluster is an extremely large cluster (ranked [http://www.top500.org/list/2009/06/100 16th] in the world at its inception, and fastest in Canada) and is where most simulations are to be done at SciNet.  It is an IBM iDataPlex cluster based on Intel's Nehalem architecture (one of the [http://www.hpcwire.com/features/HPC-Vendors-Jump-On-Nehalem-42360237.html first in the world] to make use of the new chips). The GPC consists of 3,780 nodes with a total of 30,240  2.5GHz cores, with 16GB RAM per node (2GB per core). Approximately one quarter of the cluster is interconnected with non-blocking 4x-DDR InfiniBand while the rest of the nodes are connected with gigabit ethernet. &lt;br /&gt;
&lt;br /&gt;
===Login===&lt;br /&gt;
&lt;br /&gt;
First login via ssh with your scinet account at &amp;lt;tt&amp;gt;login.scinet.utoronto.ca&amp;lt;/tt&amp;gt;, and from there you can proceed to the Development nodes to compile/test your code.&lt;br /&gt;
&lt;br /&gt;
===Compile/Devel Nodes===&lt;br /&gt;
&lt;br /&gt;
From a scinet login node you can ssh to &amp;lt;tt&amp;gt;gpc01&amp;lt;/tt&amp;gt;..&amp;lt;tt&amp;gt;gpc04&amp;lt;/tt&amp;gt;.  These nodes have the same hardware configuration as most of the compute nodes -- 8 Nehalem processing cores with 16GB RAM and Gigabit ethernet.  You can compile and test your codes on these nodes. To interactively test on more than 8 processors, or to test your code over an InfiniBand connection, you can submit an [[GPC_Quickstart#Submitting_an_Interactive_Job | interactive job request]].&lt;br /&gt;
&lt;br /&gt;
Your [[Storage_Quickstart | home directory]] is in &amp;lt;tt&amp;gt;/home/USER&amp;lt;/tt&amp;gt;; you have 10GB there that is backed up. This directory cannot be written to by the compute nodes! Thus, to run jobs, you'll use the &amp;lt;tt&amp;gt;/scratch/USER&amp;lt;/tt&amp;gt; directory. Here, there is a large amount of disk space, but it is not backed up. Thus it makes sense to keep your codes in /home, compile there, and then run them in the /scratch directory.&lt;br /&gt;
&lt;br /&gt;
===Modules and Environment Variables===&lt;br /&gt;
&lt;br /&gt;
To use most packages on the SciNet machines - including any of the compilers - , you will have to use the `modules' command.  The command &amp;lt;tt&amp;gt;module load some-package&amp;lt;/tt&amp;gt; will set your environment variables (&amp;lt;tt&amp;gt;PATH&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;LD_LIBRARY_PATH&amp;lt;/tt&amp;gt;, etc) to include the default version of that package.   &amp;lt;tt&amp;gt;module load some-package/specific-version&amp;lt;/tt&amp;gt; will load a specific version of that package.  This makes it very easy for different users to use different versions of compilers, MPI versions, libraries etc.&lt;br /&gt;
&lt;br /&gt;
Note that to use even the gcc compilers you will have to do&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load gcc&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
but in fact you probably should use the intel compilers installed on this system as they usually produce faster code (and sometimes, much faster.)&lt;br /&gt;
&lt;br /&gt;
A list of the installed software is available in [[Software_and_Libraries | Software &amp;amp; Libraries]] and can &lt;br /&gt;
be seen on the system by typing &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module avail&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To load a module (for example, the default version of the intel compilers)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load intel&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
To unload a module&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module unload intel&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
To unload all modules&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module purge&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These commands should go in your .bashrc files and/or in your submission scripts to make sure you&lt;br /&gt;
are using the correct packages.&lt;br /&gt;
&lt;br /&gt;
===Compilers===&lt;br /&gt;
&lt;br /&gt;
The intel compilers are icc/icpc/ifort for C/C++/Fortran, and are available with the default module &amp;quot;intel&amp;quot;.  The intel compilers are recommended over the GNU compilers.  Documentation about icpc is available at &lt;br /&gt;
http://software.intel.com/en-us/articles/intel-software-technical-documentation/.  The Intel compilers accept many of the options that the GNU compilers accept, but tend to produce faster programs on our system.  If, for some reason, you really need the GNU compilers, the latest version of the GNU compiler collection (currently 4.4.0) is available by loading the &amp;quot;gcc&amp;quot; module, with gcc/g++/gfortran for C/C++/Fortran.   Note that f77/g77 is not supported. &lt;br /&gt;
&lt;br /&gt;
To ensure that the intel compilers are in your &amp;lt;tt&amp;gt;PATH&amp;lt;/tt&amp;gt; and their libraries are in your &amp;lt;tt&amp;gt;LD_LIBRARY_PATH&amp;lt;/tt&amp;gt;, use the command&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load intel&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This should likely go in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt; file so that it will automatically be loaded.&lt;br /&gt;
&lt;br /&gt;
Optimize your code for the GPC machine using of at least the following compiler flags: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   -O3 -xHost&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
(or &amp;lt;tt&amp;gt;-O3 -march=native&amp;lt;/tt&amp;gt; for the GNU compilers). &lt;br /&gt;
&lt;br /&gt;
*If your program uses openmp, add &amp;lt;tt&amp;gt;-openmp&amp;lt;/tt&amp;gt; (&amp;lt;tt&amp;gt;-fopenmp&amp;lt;/tt&amp;gt; for GNU compilers).&lt;br /&gt;
*If you get the warning &amp;lt;tt&amp;gt;feupdatreenv is not implemented&amp;lt;/tt&amp;gt;, add -limf to the link line.&lt;br /&gt;
*If you need to link in the MKL libraries, you are well advised to use the Intel(R) Math Kernel Library Link Line Advisor: http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/ for help in devising the list of libraries to link with your code.&lt;br /&gt;
&lt;br /&gt;
===[[ GPC_MPI_Versions | MPI]]===&lt;br /&gt;
&lt;br /&gt;
SciNet currently provides multiple MPI libraries for the GPC; [http://www.open-mpi.org/ OpenMPI], and [http://software.intel.com/en-us/intel-mpi-library/ IntelMPI].  We currently recommend OpenMPI as the default, as it quite reliably demonstrates good performance on both the infiniband and ethernet networks.  For full details and options see the complete [[ GPC_MPI_Versions | '''MPI''']] section.&lt;br /&gt;
&lt;br /&gt;
The MPI libraries are compiled with both the gnu compiler suite and the intel compiler suite.   To use (for instance) the intel-compiled OpenMPI libraries, which we recommend as the default (and use for most of our examples here), use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load openmpi intel&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
&lt;br /&gt;
in your &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt;.   Other combinations behave similarly.&lt;br /&gt;
&lt;br /&gt;
The MPI libraries define the wrappers mpicc/mpicxx/mpif90/mpif77 as wrappers around the appropriate compilers, which ensure the appropriate include and library directories and used in the compilation and linking steps.&lt;br /&gt;
&lt;br /&gt;
We currently recommend the Intel + OpenMPI combination.  However, if you require the GNU compilers as well as MPI, you would want to find the most recent openmpi module available with `gcc' in the version name.  This will enable development and runtime with gcc/g++/gfortran  and OpenMPI.  You can make this your default by putting the module load line in your ~/.bashrc file.&lt;br /&gt;
&lt;br /&gt;
For mixed OpenMP/MPI code using Intel MPI, add the compilation flag -mt_mpi for full thread-safety.&lt;br /&gt;
&lt;br /&gt;
===Submitting A Batch Job===&lt;br /&gt;
&lt;br /&gt;
The SciNet machines are shared systems, and jobs that are to run on them are submitted to a queue; the&lt;br /&gt;
[[Moab | scheduler]] then orders the jobs in order to make the best use of the machine, and has them launched &lt;br /&gt;
when resources become availble.   The intervention of the scheduler can mean that the jobs aren't&lt;br /&gt;
quite run in a  first-in first-out order.&lt;br /&gt;
&lt;br /&gt;
The maximum [[wallclock time]] for a job in the queue is 48 hours; computations that will take longer than&lt;br /&gt;
this must be broken into 48-hour chunks and run as several jobs.  The usual way to do this is with [[checkpoints]],&lt;br /&gt;
writing out the complete state of the computation every so often in such a way that a job can be restarted from&lt;br /&gt;
this state information and continue on from where it left off.  Generating [[checkpoints]] is a good idea anyway,&lt;br /&gt;
as in the unlikely event of a hardware failure during your run, it allows you to restart without having lost much work.&lt;br /&gt;
&lt;br /&gt;
There are limits to how many jobs you can submit.  If your group has a default account, up to 32 nodes at a time for 48 hours per job on the GPC cluster are allowed to be queued. This is a total limit, e.g., you could request 64 nodes for 24 hours.  Jobs of users with an LRAC or NRAC allocation will run at a higher priority than others while their resources last. Because of the group-based allocation, it is conceivable that your jobs won't run if your colleagues have already exhausted your group's limits.&lt;br /&gt;
&lt;br /&gt;
Note that scheduling big jobs greatly affects the queuer and other users, so you have to talk to us first to run massively parallel jobs (&amp;gt; 2048 cores). We will help make sure that your jobs start and run efficiently.&lt;br /&gt;
&lt;br /&gt;
If your job should run in fewer than  48 hours, specify that in your script -- your job &lt;br /&gt;
will start sooner.   (It's easier for the [[Moab | scheduler]] to fit in a short job than a long job).  On the downside, the&lt;br /&gt;
job will be killed automatically by the queue manager software at the end of the specified [[wallclock time]], so if you&lt;br /&gt;
guess wrong you might lose some work.  So the standard procedure is to estimate how long your job will take and&lt;br /&gt;
add 10% or so. &lt;br /&gt;
&lt;br /&gt;
You interact with the queuing system through the queue/resource manager, [[Moab | Moab]] and [[Moab | Torque]].  To see all the jobs in the queue use&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To submit your own job, you must write a script which describes the job and how it is to be run (a sample script [[GPC_Quickstart#Submission_Script | follows]]) and submit it to the queue, using the command&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub SCRIPT-FILE-NAME&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where you will replace &amp;lt;tt&amp;gt;SCRIPT-FILE-NAME&amp;lt;/tt&amp;gt; with the file containing the submission script.   This will return a job ID, for example 31415, which is used to identify the jobs.  Information about a queued job can be found using&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
checkjob JOB-ID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and jobs can be canceled with the command&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
canceljob JOB-ID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Again, these commands have many options, which can be read about on their man pages.&lt;br /&gt;
&lt;br /&gt;
Much more information on the queueing system is available on our [[Moab | queue]] page.&lt;br /&gt;
&lt;br /&gt;
====Batch Submission Script: MPI====&lt;br /&gt;
&lt;br /&gt;
A sample submission script is shown below for an mpi job using ethernet with the &amp;lt;tt&amp;gt; #PBS &amp;lt;/tt&amp;gt; directives at the top and the rest being &lt;br /&gt;
what will be executed on the compute node.  &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for SciNet GPC (ethernet)&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=2:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N test&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND; -np = nodes*ppn&lt;br /&gt;
mpirun -np 16 -hostfile $PBS_NODEFILE ./a.out&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The lines that begin &amp;lt;tt&amp;gt;#PBS&amp;lt;/tt&amp;gt; are commands that are parsed and interpreted by qsub at submission time, and control administrative things about your job.   In this example, the script above requests two nodes, using 8 processors per node, for a [[wallclock time]] of one hour.  (The resources required by the job are listed on the &amp;lt;tt&amp;gt;#PBS -l&amp;lt;/tt&amp;gt; line.)   Other options can be given in other &amp;lt;tt&amp;gt;#PBS&amp;lt;/tt&amp;gt; lines, such as &amp;lt;tt&amp;gt;#PBS -N&amp;lt;/tt&amp;gt;, which sets the name of the job.   &lt;br /&gt;
&lt;br /&gt;
The rest of the script is run as a bash script at run time.   A bash shell on the first node of the two nodes that are requested executes these commands as a normal bash script, just as if you had run this as a shell script from the terminal.   The only difference is that PBS sets certain environment variables that you can use in the script.  &amp;lt;tt&amp;gt;$PBS_O_WORKDIR&amp;lt;/tt&amp;gt; is set to be the directory that the command was 'submitted' from - eg,  &amp;lt;tt&amp;gt;/scratch/USER/SOMEDIRECTORY&amp;lt;/tt&amp;gt; - and &amp;lt;tt&amp;gt;$PBS_NODEFILE&amp;lt;/tt&amp;gt; is the name of a file which contains all the nodes on which programs should execute.   Using these environment variables, the script then uses the &amp;lt;tt&amp;gt;mpirun&amp;lt;/tt&amp;gt; command to launch the job.   Assumed here is that the user has a line like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load openmpi intel&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
&lt;br /&gt;
in their &amp;lt;tt&amp;gt;.bashrc&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
* Note: The different versions of MPI require different commands to launch the run, and thus different scripts. The above script is  specific for the openmpi module.  For the intelmpi module, the last line of the script should read&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   mpirun -r ssh -np 16 -env I_MPI_DEVICE ssm ./a.out&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
====Submitting Collections of Serial Jobs====&lt;br /&gt;
&lt;br /&gt;
SciNet-approved methods for running collections of serial jobs can be found on the [[User_Serial|serial run wiki page]].&lt;br /&gt;
&lt;br /&gt;
====Batch Submission Script: OpenMP====&lt;br /&gt;
&lt;br /&gt;
For running OpenMP jobs, the procedure is similar as for MPI jobs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for SciNet GPC (OpenMP)&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=1:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N test&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=8&lt;br /&gt;
./a.out&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that [[Introduction_To_Performance#Throughput | in some circumstances]] it can be more efficient to run (say) two jobs each running on four threads than one job running on eight threads.   In that case you can use the same `ampersand-and-wait' technique outlined for serial jobs (see [[User_Serial|serial run wiki page]]) for less-than-eight-core OpenMP jobs.&lt;br /&gt;
&lt;br /&gt;
====Hybrid MPI/OpenMP jobs====&lt;br /&gt;
&lt;br /&gt;
=====Using Intel MPI=====&lt;br /&gt;
Here is how to run hybrid codes using intelmpi::&lt;br /&gt;
&lt;br /&gt;
http://software.intel.com/en-us/articles/hybrid-applications-intelmpi-openmp/&lt;br /&gt;
&lt;br /&gt;
Make sure you compile with the -mt_mpi option to the compilers to use the thread safe libraries. &lt;br /&gt;
Set the environment variable I_MPI_PIN_DOMAIN&lt;br /&gt;
&lt;br /&gt;
    $ export I_MPI_PIN_DOMAIN=omp&lt;br /&gt;
&lt;br /&gt;
This will set the process pinning domain size to be equal to OMP_NUM_THREADS (which you should set to the desired number of threads per mpi process). Therefore, each MPI process can create $OMP_NUM_THREADS number of children threads for running within the corresponding domain. If OMP_NUM_THREADS is not set, each node is treated as a separate domain (which will allow as many threads per MPI processes as there are cores).&lt;br /&gt;
&lt;br /&gt;
In addition, when invoking mpirun, you should add the argument &amp;quot;-ppn X&amp;quot;, where X is the number of MPI processes per node.&lt;br /&gt;
For example:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     mpirun -r ssh -ppn 2 -np 8  ....&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
would start 2 mpi processes per node for a total of 8 processes, so mpirun will try to run mpi processes on 4 nodes&lt;br /&gt;
(OMP_NUM_THREADS is then probably best set at 4).&lt;br /&gt;
Your job script should still ask for these 4 nodes with the line&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     #PBS -l nodes=4:ppn=8,walltime=....&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
(ppn=8 is not a mistake here; the ppn parameter has a different meaning for PBS and for mpirun)&lt;br /&gt;
&lt;br /&gt;
''The ppn parameter to mpirun is very important! Without it, eight mpi jobs would get bunched on the first node in this example, leaving 3 nodes unused.''&lt;br /&gt;
&lt;br /&gt;
NOTE: In order to pin OpenMP threads inside the domain, use the corresponding OpenMP feature by setting the KMP_AFFINITY environment variable, see [http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/fortran/lin/compiler_f/optaps/common/optaps_openmp_thread_affinity.htm#KMP_AFFINITY_Environment_Variable|Intel's Compiler User and Reference Guide].&lt;br /&gt;
&lt;br /&gt;
The IntelMPI manual is referenced on the front page of our wiki:&lt;br /&gt;
&lt;br /&gt;
http://software.intel.com/sites/products/documentation/hpc/mpi/linux/reference_manual.pdf&lt;br /&gt;
&lt;br /&gt;
For the above example of a total of 8 processes on 4 nodes, you could use the following script:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for SciNet GPC (hybrid job)&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=4:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N test&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# SET THE NUMBER OF THREADS PER PROCESS:&lt;br /&gt;
export OMP_NUM_THREADS=4&lt;br /&gt;
&lt;br /&gt;
# PIN THE MPI DOMAINS ACCORDING TO OMP&lt;br /&gt;
export I_MPI_PIN_DOMAIN=omp&lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND; -np = nodes*ppn&lt;br /&gt;
mpirun -r ssh -ppn 2 -np 8 ./a.out&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=====Using Open MPI=====&lt;br /&gt;
&lt;br /&gt;
For mixed MPI/OpenMP jobs using OpenMPI, which is the default for many users, the procedure is similar, but details differ.&lt;br /&gt;
&lt;br /&gt;
* Request the number of nodes in the PBS script.&lt;br /&gt;
* Set OMP_NUM_THREADS to the number of threads per MPI process.&lt;br /&gt;
* In addition to the -np parameter for mpirun, add the argument &amp;lt;tt&amp;gt;--bynode&amp;lt;/tt&amp;gt;, so that the mpi processes are not bunched up.&lt;br /&gt;
&lt;br /&gt;
So for example, to start a total of 8 processes on 4 nodes, you could use the following script&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# MOAB/Torque submission script for SciNet GPC (hybrid job)&lt;br /&gt;
#&lt;br /&gt;
#PBS -l nodes=4:ppn=8,walltime=1:00:00&lt;br /&gt;
#PBS -N test&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $PBS_O_WORKDIR&lt;br /&gt;
&lt;br /&gt;
# SET THE NUMBER OF THREADS PER PROCESS:&lt;br /&gt;
export OMP_NUM_THREADS=4&lt;br /&gt;
&lt;br /&gt;
# EXECUTION COMMAND; -np = nodes*processes_per_nodes; --byhost forces a round robin of nodes.&lt;br /&gt;
mpirun -np 8 --bynode -hostfile $PBS_NODEFILE ./a.out&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Submitting an Interactive Job===&lt;br /&gt;
&lt;br /&gt;
It is sometimes convenient to run a job interactively; this can be very handy for debugging purposes.  In this case, you type a &amp;lt;tt&amp;gt;qsub&amp;lt;/tt&amp;gt; command which submits an interactive job to the queue; when the scheduler selects this job to run, then it starts a shell running on the first node of the job, which connects to your terminal.  You can then type any series of commands (for instance, the same commands listed as in the batch submission script above) to run a job interactively.&lt;br /&gt;
&lt;br /&gt;
For example, to start the same sort of job as in the batch submission script above, but interactively, one would type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -I -l nodes=2:ppn=8,walltime=1:00:00&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This is exactly the &amp;lt;tt&amp;gt;#PBS -l&amp;lt;/tt&amp;gt; line in the batch script above (which requests all 8 processors on each of 2 nodes for one hour), but prepended with a &amp;lt;tt&amp;gt;-I&amp;lt;/tt&amp;gt; for `interactive'.   When this job begins, your terminal will now show you as being logged in to one of the compute nodes, and one can type in any shell command, run &amp;lt;tt&amp;gt;mpirun&amp;lt;/tt&amp;gt;, etc.   When you exit the shell, the job will end.  Interactive jobs can be used with any of the [[ Moab#GPC | GPC queues ]] however, there is a short&lt;br /&gt;
high turnover queue called [[ Moab#debug | debug ]] which can be especially useful when the system is busy.&lt;br /&gt;
&lt;br /&gt;
===Ethernet vs. Infiniband===&lt;br /&gt;
&lt;br /&gt;
About 1/4 of the GPC (862 nodes or 6896 cores) is connected with a high bandwidth low-latency fabric called&lt;br /&gt;
[http://en.wikipedia.org/wiki/InfiniBand InfiniBand].  Many jobs which require tight coupling to scale well greatly benefit from this interconnect;&lt;br /&gt;
other types of jobs, which have relatively modest communications, do not require this and run fine on Gigabit ethernet.&lt;br /&gt;
&lt;br /&gt;
Jobs which require the InfiniBand for good performance can request the nodes that have the `&amp;lt;tt&amp;gt;ib&amp;lt;/tt&amp;gt;' feature in the &amp;lt;tt&amp;gt;#PBS -l&amp;lt;/tt&amp;gt; line,&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#PBS -l nodes=2:ib:ppn=8,walltime=1:00:00&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Because there are a limited number of these nodes, your job will start running faster if you do not request them (e.g. if you use the scripts as shown above), as this increases the number of nodes available to run your job. In fact, the InfiniBand nodes are to be used only for jobs that are known to scale well and  will benefit from this type of interconnect. As such the minimum number of nodes requested has to be at least 2, as single node jobs will not benefit from using an&lt;br /&gt;
Infiniband node. The MPI libraries provided by SciNet automatically correctly use either the InfiniBand or ethernet interconnect depending on which nodes your job runs on.&lt;br /&gt;
&lt;br /&gt;
===HyperThreading===&lt;br /&gt;
&lt;br /&gt;
Each GPC compute node has 8 Nehalem cores (Intel Xeon E5540 @ 2.53&lt;br /&gt;
Ghz). Thus, fully utilizing the node requires at least 8&lt;br /&gt;
tasks. We say `at least' because the Nehalem cores support&lt;br /&gt;
Hyper-Threading, which means it is as if there are twice as many cores, although the number of execution units is unchanged. Because most applications spend a lot of time waiting for data&lt;br /&gt;
from memory, this allows one task to use the processor&lt;br /&gt;
while another is stuck waiting for memory, and vice versa.&lt;br /&gt;
This requires no changes to the code, only running 16 rather than&lt;br /&gt;
8 tasks on the node (e.g. &amp;lt;tt&amp;gt;export OMP_NUM_THREADS=16&amp;lt;/tt&amp;gt;).  The resulting performance gain depends highly on the application.  Beware that mpi runs should still be limited to 8 mpi processes per node.&lt;br /&gt;
&lt;br /&gt;
Note: the &amp;lt;tt&amp;gt;ppn=8&amp;lt;/tt&amp;gt; part of the job requirement specification should '''not''' be modified because of HyperThreading.&lt;br /&gt;
&lt;br /&gt;
===Memory Configuration===&lt;br /&gt;
&lt;br /&gt;
==== 16G ====&lt;br /&gt;
&lt;br /&gt;
There are 3756 nodes which have 16G of memory, and is the primary configuration in the GPC. These nodes will be used by default.&lt;br /&gt;
&lt;br /&gt;
==== 18G ====&lt;br /&gt;
&lt;br /&gt;
There are 24 Infiniband nodes which have 18G of memory. These nodes have a fully populated memory configuration that maximizes memory bandwidth. To&lt;br /&gt;
request these nodes use:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -l nodes=2:ib:m18g:ppn=8,walltime=1:00:00 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== 32G ====&lt;br /&gt;
&lt;br /&gt;
There are 84 Infiniband nodes which have 32G of memory. To request these nodes use:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -l nodes=2:ib:m32g:ppn=8,walltime=1:00:00 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== 128G ====&lt;br /&gt;
There are two stand-alone large memory (128GB) nodes, &amp;lt;tt&amp;gt;gpc-lrgmem01&amp;lt;/tt&amp;gt; and &amp;lt;tt&amp;gt;gpc-lrgmem02&amp;lt;/tt&amp;gt; which are primarily to be used for data analysis of runs.  They have 16 cores and are intel machines running linux, but they are not the same architecture (Nehalem) as the GPC compute nodes, so codes may have to be compiled separately for these machines.  They can be accessed using a specific &amp;lt;tt&amp;gt;largemem&amp;lt;/tt&amp;gt; queue.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -l nodes=2:ppn=8,walltime=1:00:00 -q largemem -I&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Ram Disk===&lt;br /&gt;
&lt;br /&gt;
On the GPC nodes, there is a `ram disk' available - up to half of the memory on the node may be used as a temporary file system.  This is particularly useful for use in the early stages of migrating destop-computing codes to a High Performance Computing platform such as the GPC.    It is much faster than real disk and does not require network traffic; however, each node sees its own ramdisk and cannot see files on that of other nodes.   This is a very easy way to cache writes (by writing them to fast ram disk instead of slow `real' disk); and then one would periodically copy the files to files on /scratch or /project so that they are available after the job has completed.&lt;br /&gt;
&lt;br /&gt;
To use the ramdisk, create and read to / write from files in /dev/shm/.. just as one would to (eg) /scratch/USER/.  Only the amount of RAM needed to store the files will be taken up by the temporary file system; thus if you have 8 serial jobs each requiring 1 GB of RAM, and 1GB is taken up by various OS services, you would still have approximately 7GB available to use as ramdisk on a 16GB node.   However, if you were to write 8 GB of data to the RAM disk, this would exceed available memory and your job would likely crash.&lt;br /&gt;
   &lt;br /&gt;
NOTE: For good form, you should delete your files from ram disk at the end of your job. However, the ramdisk is purged between jobs so that the next user to use that node will always get the full amount of RAM available than they might expect.&lt;br /&gt;
&lt;br /&gt;
More details on how to setup your script to use the ramdisk can be found on the [[User_Ramdisk|Ramdisk wiki page]].&lt;br /&gt;
&lt;br /&gt;
=== Managing jobs on the Queuing system ===&lt;br /&gt;
Information on checking available resources, starting, viewing, managing and canceling jobs on [[Moab | Moab/Torque]]&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=User_Tips&amp;diff=1309</id>
		<title>User Tips</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=User_Tips&amp;diff=1309"/>
		<updated>2010-07-01T16:44:07Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* Checking on the remaining walltime from within a job */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__FORCETOC__&lt;br /&gt;
==Running single node MPI jobs==&lt;br /&gt;
In order to run GROMACS on a single node, the following two things are '''essential'''. If you do not include these two things, then some of your jobs will rune fine, but others will run slowly and others will produce only the beginning of a short log file and will produce no further output, even though they will continue to occupy the resources fully.&lt;br /&gt;
&lt;br /&gt;
1. add :compute-eth: to your #PBS -l line&lt;br /&gt;
&amp;lt;source lang=&amp;quot;sh&amp;quot;&amp;gt;&lt;br /&gt;
#PBS -l nodes=1:compute-eth:ppn=8,walltime=3:00:00,os=centos53computeA&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
2. add -mca btl_sm_num_fifos 7 -np $(wc -l $PBS_NODEFILE | gawk '{print $1}') -mca btl self,sm to the mpirun arguments.&lt;br /&gt;
&lt;br /&gt;
It appears to be important that you put the -np argument between the two -mca arguments.&lt;br /&gt;
&amp;lt;source lang=&amp;quot;sh&amp;quot;&amp;gt;&lt;br /&gt;
/scinet/gpc/mpi/openmpi/1.3.2-intel-v11.0-ofed/bin/mpirun -mca btl_sm_num_fifos 7 &lt;br /&gt;
-np $(wc -l $PBS_NODEFILE | gawk '{print $1}') -mca btl self,sm -machinefile $PBS_NODEFILE &lt;br /&gt;
/scratch/cneale/GPC/exe/intel/gromacs-4.0.5/exec/bin/mdrun_openmpi -deffnm test&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
''Historical Note:'' &lt;br /&gt;
Another solution is to use -mca btl self,tcp instead of what is listed above. This, however, forces your communication to go over sockets and is less efficient than using shared memory. If you want to try it, the code is below. However, for smaller systems (&amp;lt;100,000 atoms) you will see a 1% to 5% performance reduction in comparison to the code listed above.&lt;br /&gt;
&amp;lt;source lang=&amp;quot;sh&amp;quot;&amp;gt;&lt;br /&gt;
/scinet/gpc/mpi/openmpi/1.3.2-intel-v11.0-ofed/bin/mpirun --mca btl self,tcp &lt;br /&gt;
-np $(wc -l $PBS_NODEFILE | gawk '{print $1}') -machinefile $PBS_NODEFILE &lt;br /&gt;
/scratch/cneale/GPC/exe/intel/gromacs-4.0.5/exec/bin/mdrun_openmpi -deffnm test&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We are not exactly sure why this is required, or if it is required for programs other than GROMACS. However, you are strongly recommended to add this to any such script as it should only force you to get what you intend to get in any event. Refer to the section entitled &amp;quot;Ensuring that you get non-IB nodes&amp;quot; below for more information about what these commands do.&lt;br /&gt;
&lt;br /&gt;
[[User:Cneale|cneale]] September 14 2009 (Updated September 22 by cneale)&lt;br /&gt;
&lt;br /&gt;
Currently the reason for the second point above is that the shared-memory communication in OpenMPI seems to be buggy, at least when the code is compiled with gcc, so instead of the (default) option &amp;quot;--mca btl self,sm&amp;quot;, which gets the code to often fail, we use the &amp;quot;--mca btl self,tcp&amp;quot; option which forces communication to go via tcp.  The tcp option is slower than the sm option, but at least for now it works.&lt;br /&gt;
&lt;br /&gt;
[[User:Dgruner|dgruner]] September 21 2009&lt;br /&gt;
&lt;br /&gt;
Note:  This bugginess with the shared memory transport in OpenMPI 1.3.2 and 1.3.3 with gcc has been resolved with the new default openmpi, 1.4.1; please use that instead.  Also note that with the newest versions of openmpi, you do not need a -hostfile or -machinefile entry.&lt;br /&gt;
&lt;br /&gt;
[[User:Ljdursi|Ljdursi]] 17:16, 25 February 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
==Benchmarking==&lt;br /&gt;
===Ensuring that you get non-IB nodes===&lt;br /&gt;
You can specify gigE only nodes using a &amp;quot;compute-eth&amp;quot; flag&lt;br /&gt;
&lt;br /&gt;
nodes=2:compute-eth:ppn=8&lt;br /&gt;
&lt;br /&gt;
and this will only allow the code to run on &amp;quot;gigabit only&lt;br /&gt;
nodes.  So even if IB nodes are available it will sit in the queue.&lt;br /&gt;
&lt;br /&gt;
By default (ie no property feature for the node) the scheduler (moab) is setup to use the gigE nodes first then the IB nodes. The scheduler configuration is ongoing but explicitly putting either &amp;quot;compute-eth&amp;quot; for ethernet or &amp;quot;ib&amp;quot; for infiniband nodes will guarantee the right type of node is used.&lt;br /&gt;
&lt;br /&gt;
Also you can specify the type of interconnect directly on the mpirun line using mpirun --mca btl self,tcp for ethernet, so even if it was on an IB node it would still use ethernet for communication.  Since the nodes are exactly the same except for the IB card, any benchmarking would still be valid.&lt;br /&gt;
&lt;br /&gt;
[[User:northrup|Scott]] August 27 2009&lt;br /&gt;
&lt;br /&gt;
==Advanced interactions with PBS or MOAB==&lt;br /&gt;
===Checking on the remaining walltime from within a job===&lt;br /&gt;
&lt;br /&gt;
There are a number of options for doing this.&lt;br /&gt;
&lt;br /&gt;
1. use start=$(date +%s) to capture the start time of your script and then calculate the number of seconds that have elapsed by running like this:&lt;br /&gt;
&lt;br /&gt;
  #!/bin/bash&lt;br /&gt;
  start=$(date +%s)&lt;br /&gt;
  ...&lt;br /&gt;
  ...&lt;br /&gt;
  now=$(date +%s)&lt;br /&gt;
  timeUsed=$(echo &amp;quot;$now $start&amp;quot;|awk '{print $1-$2}')&lt;br /&gt;
  # bc is not available on nodes so must use awk&lt;br /&gt;
&lt;br /&gt;
2. One can use checkjob, but be aware that it may fail and gpc01 may be off, so one needs to handle that condition in their script.&lt;br /&gt;
&lt;br /&gt;
  #This returns the seconds REMAINING:&lt;br /&gt;
  val=&amp;quot;&amp;quot;; &lt;br /&gt;
  while [ -z $val ]; do &lt;br /&gt;
    val=$(ssh gpc01 &amp;quot;checkjob $PBS_JOBID&amp;quot; 2&amp;gt;/dev/null|grep Reservation|awk '{print $5}'|awk -F ':' '{print $1*3600+$2*60+$3}'); &lt;br /&gt;
  done; &lt;br /&gt;
  echo &amp;quot;$val&amp;quot;&lt;br /&gt;
&lt;br /&gt;
3. qstat is better, because it fails less often than checkjob. checkjob is a moab command, which can fail much more often than qstat (PBS command) when moab is busy scheduling large amount of jobs. Nevertheless, this command can also fail, so protect it like this:&lt;br /&gt;
&lt;br /&gt;
  #This returns the seconds USED (and it only updates every few minutes):&lt;br /&gt;
  val=&amp;quot;&amp;quot;; &lt;br /&gt;
  while [ -z $val ]; do &lt;br /&gt;
    val=$(qstat -f $PBS_JOBID 2&amp;gt;/dev/null|egrep resources_used.walltime|awk '{print $3}'|awk -F ':' '{print $1*3600+$2*60+$3}');&lt;br /&gt;
  done; &lt;br /&gt;
  echo &amp;quot;$val&amp;quot;&lt;br /&gt;
&lt;br /&gt;
4. To be independent of qstat or checkjob command, one possibilty is to parse the output of ps (see man ps for more detail). For example,&lt;br /&gt;
&lt;br /&gt;
  ps -eo pid,etime,args|egrep /var/spool/torque/mom_priv/jobs/$PBS_JOBID | egrep -v egrep| ...&lt;br /&gt;
&lt;br /&gt;
Although we're still not exactly sure how to get the time out of this. If you know, then please add it!&lt;br /&gt;
&lt;br /&gt;
These are meant to be useful, but as always, please test before production runs.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[User:cneale|cneale]] July 1 2010&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=User_Tips&amp;diff=1308</id>
		<title>User Tips</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=User_Tips&amp;diff=1308"/>
		<updated>2010-07-01T16:43:27Z</updated>

		<summary type="html">&lt;p&gt;Cneale: /* Checking on the remaining walltime from within a job */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__FORCETOC__&lt;br /&gt;
==Running single node MPI jobs==&lt;br /&gt;
In order to run GROMACS on a single node, the following two things are '''essential'''. If you do not include these two things, then some of your jobs will rune fine, but others will run slowly and others will produce only the beginning of a short log file and will produce no further output, even though they will continue to occupy the resources fully.&lt;br /&gt;
&lt;br /&gt;
1. add :compute-eth: to your #PBS -l line&lt;br /&gt;
&amp;lt;source lang=&amp;quot;sh&amp;quot;&amp;gt;&lt;br /&gt;
#PBS -l nodes=1:compute-eth:ppn=8,walltime=3:00:00,os=centos53computeA&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
2. add -mca btl_sm_num_fifos 7 -np $(wc -l $PBS_NODEFILE | gawk '{print $1}') -mca btl self,sm to the mpirun arguments.&lt;br /&gt;
&lt;br /&gt;
It appears to be important that you put the -np argument between the two -mca arguments.&lt;br /&gt;
&amp;lt;source lang=&amp;quot;sh&amp;quot;&amp;gt;&lt;br /&gt;
/scinet/gpc/mpi/openmpi/1.3.2-intel-v11.0-ofed/bin/mpirun -mca btl_sm_num_fifos 7 &lt;br /&gt;
-np $(wc -l $PBS_NODEFILE | gawk '{print $1}') -mca btl self,sm -machinefile $PBS_NODEFILE &lt;br /&gt;
/scratch/cneale/GPC/exe/intel/gromacs-4.0.5/exec/bin/mdrun_openmpi -deffnm test&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
''Historical Note:'' &lt;br /&gt;
Another solution is to use -mca btl self,tcp instead of what is listed above. This, however, forces your communication to go over sockets and is less efficient than using shared memory. If you want to try it, the code is below. However, for smaller systems (&amp;lt;100,000 atoms) you will see a 1% to 5% performance reduction in comparison to the code listed above.&lt;br /&gt;
&amp;lt;source lang=&amp;quot;sh&amp;quot;&amp;gt;&lt;br /&gt;
/scinet/gpc/mpi/openmpi/1.3.2-intel-v11.0-ofed/bin/mpirun --mca btl self,tcp &lt;br /&gt;
-np $(wc -l $PBS_NODEFILE | gawk '{print $1}') -machinefile $PBS_NODEFILE &lt;br /&gt;
/scratch/cneale/GPC/exe/intel/gromacs-4.0.5/exec/bin/mdrun_openmpi -deffnm test&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We are not exactly sure why this is required, or if it is required for programs other than GROMACS. However, you are strongly recommended to add this to any such script as it should only force you to get what you intend to get in any event. Refer to the section entitled &amp;quot;Ensuring that you get non-IB nodes&amp;quot; below for more information about what these commands do.&lt;br /&gt;
&lt;br /&gt;
[[User:Cneale|cneale]] September 14 2009 (Updated September 22 by cneale)&lt;br /&gt;
&lt;br /&gt;
Currently the reason for the second point above is that the shared-memory communication in OpenMPI seems to be buggy, at least when the code is compiled with gcc, so instead of the (default) option &amp;quot;--mca btl self,sm&amp;quot;, which gets the code to often fail, we use the &amp;quot;--mca btl self,tcp&amp;quot; option which forces communication to go via tcp.  The tcp option is slower than the sm option, but at least for now it works.&lt;br /&gt;
&lt;br /&gt;
[[User:Dgruner|dgruner]] September 21 2009&lt;br /&gt;
&lt;br /&gt;
Note:  This bugginess with the shared memory transport in OpenMPI 1.3.2 and 1.3.3 with gcc has been resolved with the new default openmpi, 1.4.1; please use that instead.  Also note that with the newest versions of openmpi, you do not need a -hostfile or -machinefile entry.&lt;br /&gt;
&lt;br /&gt;
[[User:Ljdursi|Ljdursi]] 17:16, 25 February 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
==Benchmarking==&lt;br /&gt;
===Ensuring that you get non-IB nodes===&lt;br /&gt;
You can specify gigE only nodes using a &amp;quot;compute-eth&amp;quot; flag&lt;br /&gt;
&lt;br /&gt;
nodes=2:compute-eth:ppn=8&lt;br /&gt;
&lt;br /&gt;
and this will only allow the code to run on &amp;quot;gigabit only&lt;br /&gt;
nodes.  So even if IB nodes are available it will sit in the queue.&lt;br /&gt;
&lt;br /&gt;
By default (ie no property feature for the node) the scheduler (moab) is setup to use the gigE nodes first then the IB nodes. The scheduler configuration is ongoing but explicitly putting either &amp;quot;compute-eth&amp;quot; for ethernet or &amp;quot;ib&amp;quot; for infiniband nodes will guarantee the right type of node is used.&lt;br /&gt;
&lt;br /&gt;
Also you can specify the type of interconnect directly on the mpirun line using mpirun --mca btl self,tcp for ethernet, so even if it was on an IB node it would still use ethernet for communication.  Since the nodes are exactly the same except for the IB card, any benchmarking would still be valid.&lt;br /&gt;
&lt;br /&gt;
[[User:northrup|Scott]] August 27 2009&lt;br /&gt;
&lt;br /&gt;
==Advanced interactions with PBS or MOAB==&lt;br /&gt;
===Checking on the remaining walltime from within a job===&lt;br /&gt;
&lt;br /&gt;
There are a number of options for doing this.&lt;br /&gt;
&lt;br /&gt;
1. use start=$(date +%s) to capture the start time of your script and then calculate the number of seconds that have elapsed by running like this:&lt;br /&gt;
&lt;br /&gt;
  #!/bin/bash&lt;br /&gt;
  start=$(date +%s)&lt;br /&gt;
  ...&lt;br /&gt;
  ...&lt;br /&gt;
  now=$(date +%s)&lt;br /&gt;
  timeUsed=$(echo &amp;quot;$now $start&amp;quot;|awk '{print $1-$2}')&lt;br /&gt;
  # bc is not available on nodes so must use awk&lt;br /&gt;
&lt;br /&gt;
2. One can use checkjob, but be aware that it may fail and gpc01 may be off, so one needs to handle that condition in their script.&lt;br /&gt;
&lt;br /&gt;
  #This returns the seconds REMAINING:&lt;br /&gt;
  val=&amp;quot;&amp;quot;; &lt;br /&gt;
  while [ -z $val ]; do &lt;br /&gt;
    val=$(ssh gpc01 &amp;quot;checkjob $PBS_JOBID&amp;quot; 2&amp;gt;/dev/null|grep Reservation|awk '{print $5}'|awk -F ':' '{print $1*3600+$2*60+$3}'); &lt;br /&gt;
  done; &lt;br /&gt;
  echo &amp;quot;$val&amp;quot;&lt;br /&gt;
&lt;br /&gt;
3. qstat is better, because it fails less often than checkjob. checkjob is a moab command, which can fail much more often than qstat (PBS command) when moab is busy scheduling large amount of jobs. Nevertheless, this command can also fail, so protect it like this:&lt;br /&gt;
&lt;br /&gt;
  #This returns the seconds USED (and it only updates every few minutes):&lt;br /&gt;
  val=&amp;quot;&amp;quot;; &lt;br /&gt;
  while [ -z $val ]; do &lt;br /&gt;
    val=$(qstat -f $PBS_JOBID 2&amp;gt;/dev/null|egrep resources_used.walltime|awk '{print $3}'|awk -F ':' '{print $1*3600+$2*60+$3}');&lt;br /&gt;
  done; &lt;br /&gt;
  echo &amp;quot;$val&amp;quot;&lt;br /&gt;
&lt;br /&gt;
4. To be independent of qstat or checkjob command, one possibilty is to parse the output of ps (see man ps for more detail). For example,&lt;br /&gt;
&lt;br /&gt;
  ps -eo pid,etime,args|egrep /var/spool/torque/mom_priv/jobs/$PBS_JOBID | egrep -v egrep| ...&lt;br /&gt;
&lt;br /&gt;
Although we're still not exactly sure how to get the time out of this. If you know, then please add it!&lt;br /&gt;
&lt;br /&gt;
These are meant to be useful, but as always, please test before production runs.&lt;/div&gt;</summary>
		<author><name>Cneale</name></author>
	</entry>
</feed>