oldwiki.scinet.utoronto.ca - User contributions [en-gb]

Data Transfer

2018-09-20T15:23:26Z

Pinto: /* Globus data transfer */

=== General guidelines ===

All traffic to and from the data centre has to go via [http://en.wikipedia.org/wiki/Secure_Shell SSH], or secure shell.
This is a protocol which sets up a secure connection between two sites. In all cases, incoming connections to SciNet go through relatively low-speed connections to the login.scinet gateways, but there are many ways to copy files on top of the ssh protocol.

What node to use for data transfer to and from SciNet depends mostly on the amount of data to transfer:

==== Moving <10GB through the login nodes ====

The login nodes are accessible from outside SciNet, which means that you can transfer data between your own office/home machine and SciNet using scp or rsync (see below). Although the login nodes have a cpu_time timeout of 5 minutes (emphasis on cpu_time, not wall_time), most likely if you try to transfer more than 10GB probably you wouldn't succeed. While the login nodes can be used for transfers of less than 10GB, using a datamover node would still be faster.

Note that transfers through a login node will timeout after a certain time (currently set to 5 minutes cpu_time), so if you have a slow connection you may need to go through datamover1.

==== Moving >10GB through the datamover1 node ====

Serious moves of data (>10GB) to or from SciNet should be done from <tt>datamover1</tt> or <tt>datamover2</tt> nodes. From any of the interactive SciNet nodes, one should be able to <tt>ssh datamover1</tt> or <tt>ssh datmover2</tt> to log in. Those are the machines that have the fastest network connections to the outside world (by a factor of 10; a 10Gb/s link as vs 1Gb/s).

Transfers must be ''originated'' from <tt>datamover1</tt> or <tt>datamover2</tt>; that is, one can not copy files from the outside world directly to or from the datamovers; one has to log in to a datamover and copy the data to or from the outside network. Your local machine must be reachable from the outside as well, either by its name or its IP address. If you are behind a firewall or a (wireless) router, this may not be possible. You may need to ask your network administrator to allow the datamovers to ssh to your machine. If you need to open a hole on your firewall we provide their IPs:

datamover1 142.150.188.121
datamover2 142.150.188.122

==== Hpn-ssh ====

The usual ssh protocols were not designed for speed. On the <tt>datamover1</tt> or <tt>datamover2</tt> nodes, we have installed hpn-ssh, or [http://www.psc.edu/networking/projects/hpn-ssh/ High-Performance-enabled ssh]. You use this higher-performance ssh/scp/sftp variant by loading the `hpnssh' module. Hpn-ssh is backwards compatible with the `usual' ssh, but is capable of significantly higher speeds. If you routinely have large data transfers to do, we recommend having your system administrator look into installing [http://www.psc.edu/networking/projects/hpn-ssh/ hpn-ssh] on your system.

Everything we discuss below, unless otherwise stated, will work regardless of whether you have hpn-ssh installed on your remote system.

==== For Microsoft Windows users ====

Linux-windows transfers can be a bit more involved than linux-to-linux, but using [http://www.cygwin.com Cygwin], this should not be a problem. Make sure you install Cygwin with the openssh package.

If you want to remain 100% a Windows environment, another very good tool is [http://winscp.net/eng/index.php WinSCP]. It will let you easily transfer and synchronize data between your Windows workstation and the login nodes using your ssh credentials (provided that it's not much more than 10GB on each sync pass).

If you are going to use the [[Data_Management#Moving_.3E10GB_through_the_datamover1_node | datamover1 method]], and assuming your machine is not a wireless laptop (if it
is, best to find a nearby computer that's not wireless and use a usb
memory stick), you'll need the IP address of your machine, which you find by
typing "ipconfig /all" on your local windows machine. Also, you will need to have the ssh daemon (sshd) running locally in Cygwin.

Also note that your windows user name does not have to be the same as on SciNet, this just
depends on how your local windows system was set up.

All locations given to scp or rsync in cygwin have to be in unix format (using "/" not "\"), and will be relative to cygwin's path, not windows (e.g.
use /cygdrive/c/...... to get to the windows C: drive).

=== Ways to transfer data ===

==== Globus data transfer ====

Globus is a file-transfer service with an easy-to-use web interface to allow people to transfer file with ease. To get started, please sign up for a Globus account at [https://www.globus.org/ Globus website]. Once you sign up for an account, go to [https://www.globus.org/xfer/StartTransfer this page] to start the file transfer. Please enter computecanada#niagara as one endpoint for file transfer. If you are trying to transfer data from a laptop or desktop, you will need to install Globus Connect Personal software available [https://www.globus.org/globus-connect-personal here] to setup an endpoint for the laptop or desktop and perform the transfer.

Please see the following [https://docs.computecanada.ca/wiki/Globus page] on how to setup Globus to perform data transfer.

==== scp ====

<tt>scp</tt>, or secure copy, is the easiest way to copy files, although we generally find rsync below to be faster.

scp works like cp to copy files:

$ scp original_file copy_file

except that either the original or the copy can be on another system:

$ scp jonsdatafile.bin jon@remote.system.com:/home/jon/bigdatadir/

will copy the data file into the directory <tt>/home/jon/bigdatadir/</tt> on <tt>remote.system.com</tt> after logging in as <tt>jon</tt>; you will be prompted for a password (unless you've set up ssh keys).

Copying from remote systems works the same way:

$ scp jon@remote.system.com:/home/jon/bigdatadir/newdata.bin .

And wildcards work as you'd expect (except you have to quote the wildcards on the remote system, as it can't expand properly here.)

$ scp *.bin jon@remote.system.com:/home/jon/bigdatadir/
$ scp jon@remote.system.com:"/home/jon/inputdata/*" .

There are few options worth knowing about:
* <tt>scp -C</tt> compresses the file before transmitting it; ''if'' the file compresses well, this can significantly increase the effective data transfer rate (though usually not as much as compressing the data, then sending it, then uncompressing). If the file doesn't compress well, than this adds CPU overhead without accomplishing much, and can slow down your data transfer.
* <tt>scp -oNoneEnabled=yes -oNoneSwitch=yes</tt> -- This is an hpn-ssh only option. If CPU overhead is a significant bottleneck in the data transfer, then we can avoid this by turning off the secure encryption of the data. For most of us, this is ok, but for others it is not. In either cases, '''authentication''' remains secure, it is only the data transfer that is in plaintext.

==== rsync ====

[http://samba.anu.edu.au/rsync/ rsync] is a very powerful tool for mirroring directories of data.
$ rsync -av -e ssh scinetdatadir jon@remote.system.com:/home/jon/bigdatadir/
rsync has a dizzying number of options; the above syncs <tt>scinetdatadir</tt> ''to'' the remote system; that is, any files that are newer on the localsystem are updated on the remote system. The converse isn't true; if there were newer files on the remote system, you'd have to bring those over with
$ rsync -av -e ssh jon@remote.system.com:/home/jon/bigdatadir/ scinetdatadir
The <tt>-av</tt> options are for verbose and `archive' mode, which preserves timestamps and permissions, which is normally what you want. <tt>-e ssh</tt> tells it to use ssh for the transfer.

One of the powerful things about rsync is that it looks to see what files already exist before copying, so you can use it repeatedly as a data directory fills and it won't make unnecessary copies; similarly, if a (say) log file grows over time, it will only copy the difference between the files, further speeding things up. This also means that it behaves well if a transfer is interrupted; a second invocation of rsync will continue where the other left off.

As with <tt>scp -C</tt>, <tt>rsync -z</tt> compresses on the fly, which can significantly enhance effective data transfer rates if the files compress well, or hurt it if not.

As with scp, if both sides are running hpn-ssh one can disable encryption of the data stream should that prove to be a bottleneck:
$ rsync -av -e "ssh -oNoneEnabled=yes -oNoneSwitch=yes" jon@remote.system.com:/home/jon/bigdatadir/ scinetdatadir

SciNet's login nodes, 142.150.188.5[1-4], are publicly accessible and can be used for data transfer as long as your material is not one big chunk (much more than 2-3GB each file). We have a 5 minutes CPU time limit on the login nodes and the transfer process may be killed by the kernel before completion. The workaround is to transfer your data using a rsync loop, by checking the rsync return code, assuming some files can be transferred before reaching the CPU limit. For example in a bash shell:
<pre>
for i in {1..100}; do ### try 100 times
rsync ...
[ "$?" == "0" ] && break
done
</pre>

==== ssh tunnel ====

Alternatively you may use a reverse ssh tunnel (ssh -R).

If your transfer is above 10GB you will need to use one of SciNet's datamovers. If your workstation is behind a firewall (as the datamovers are), you'll need a node external to your firewall, on the edge of your organization's network, that will serve as a gateway, and can be accessible via ssh by both your workstation and the datamovers. Initiate a "ssh -R" connection from SciNet's datamover to that node. This node needs to have its ssh GatewayPorts enabled so that your workstation can connect to the specified port on that node, which will forward the traffic back to SciNet's datamover.

=== Transfer speeds ===

==== What transfer speeds could I expect? ====

Below are some typical transfer numbers from datamover1 to another University of Toronto machine with a 1Gb/s link to the campus network:

{| class="wikitable" border="0"
|-
!{{Hl2}}| Mode
!{{Hl2}}| With hpn-ssh
!{{Hl2}}| Without
|-
| rsync
| 60-80 MB/s
| 30-40 MB/s
|-
| scp
| 50 MB/s
| 25 MB/s
|}

==== What can slow down my data transfer? ====

To move data quickly, ''all'' of the stages in the process have to be fast; the file system you are reading data from, the CPU reading the data, the network connection between the sender and the reciever, and the recipient CPU and disk. The slowest element in that chain will slow down the entire transfer.

On SciNet's side, our underlying filesystem is the high-performance [http://www-03.ibm.com/systems/software/gpfs/index.html GPFS] system, and the node we recommend you use (datamover1) has a high-speed connection to the network and fast CPUs.

==== Why are my transfers so much slower? ====

If you get numbers significantly lower than above, then there is a bottleneck in the transfer. The first thing to do is to run <tt>top</tt> on datamover1; if other people are transfering large files at the same time you are trying to, network congestion could result and you'll just have to wait until they are done.

If nothing else is going on on datamover1, there are a number of possibilites:
* network connection between SciNet and your machine - do you know the network connection of your remote machine? Are your systems connections tuned for performance [http://www.psc.edu/networking/projects/tcptune]?
* is the remote server busy?
* are the remote servers disks busy, or known to be slow?

For any further questions, contact us at [mailto:support@scinet.utoronto.ca Support @ SciNet]

Data Transfer

2018-09-20T15:21:49Z

Pinto: /* Globus data transfer */

=== General guidelines ===

All traffic to and from the data centre has to go via [http://en.wikipedia.org/wiki/Secure_Shell SSH], or secure shell.
This is a protocol which sets up a secure connection between two sites. In all cases, incoming connections to SciNet go through relatively low-speed connections to the login.scinet gateways, but there are many ways to copy files on top of the ssh protocol.

What node to use for data transfer to and from SciNet depends mostly on the amount of data to transfer:

==== Moving <10GB through the login nodes ====

The login nodes are accessible from outside SciNet, which means that you can transfer data between your own office/home machine and SciNet using scp or rsync (see below). Although the login nodes have a cpu_time timeout of 5 minutes (emphasis on cpu_time, not wall_time), most likely if you try to transfer more than 10GB probably you wouldn't succeed. While the login nodes can be used for transfers of less than 10GB, using a datamover node would still be faster.

Note that transfers through a login node will timeout after a certain time (currently set to 5 minutes cpu_time), so if you have a slow connection you may need to go through datamover1.

==== Moving >10GB through the datamover1 node ====

Serious moves of data (>10GB) to or from SciNet should be done from <tt>datamover1</tt> or <tt>datamover2</tt> nodes. From any of the interactive SciNet nodes, one should be able to <tt>ssh datamover1</tt> or <tt>ssh datmover2</tt> to log in. Those are the machines that have the fastest network connections to the outside world (by a factor of 10; a 10Gb/s link as vs 1Gb/s).

Transfers must be ''originated'' from <tt>datamover1</tt> or <tt>datamover2</tt>; that is, one can not copy files from the outside world directly to or from the datamovers; one has to log in to a datamover and copy the data to or from the outside network. Your local machine must be reachable from the outside as well, either by its name or its IP address. If you are behind a firewall or a (wireless) router, this may not be possible. You may need to ask your network administrator to allow the datamovers to ssh to your machine. If you need to open a hole on your firewall we provide their IPs:

datamover1 142.150.188.121
datamover2 142.150.188.122

==== Hpn-ssh ====

The usual ssh protocols were not designed for speed. On the <tt>datamover1</tt> or <tt>datamover2</tt> nodes, we have installed hpn-ssh, or [http://www.psc.edu/networking/projects/hpn-ssh/ High-Performance-enabled ssh]. You use this higher-performance ssh/scp/sftp variant by loading the `hpnssh' module. Hpn-ssh is backwards compatible with the `usual' ssh, but is capable of significantly higher speeds. If you routinely have large data transfers to do, we recommend having your system administrator look into installing [http://www.psc.edu/networking/projects/hpn-ssh/ hpn-ssh] on your system.

Everything we discuss below, unless otherwise stated, will work regardless of whether you have hpn-ssh installed on your remote system.

==== For Microsoft Windows users ====

Linux-windows transfers can be a bit more involved than linux-to-linux, but using [http://www.cygwin.com Cygwin], this should not be a problem. Make sure you install Cygwin with the openssh package.

If you want to remain 100% a Windows environment, another very good tool is [http://winscp.net/eng/index.php WinSCP]. It will let you easily transfer and synchronize data between your Windows workstation and the login nodes using your ssh credentials (provided that it's not much more than 10GB on each sync pass).

If you are going to use the [[Data_Management#Moving_.3E10GB_through_the_datamover1_node | datamover1 method]], and assuming your machine is not a wireless laptop (if it
is, best to find a nearby computer that's not wireless and use a usb
memory stick), you'll need the IP address of your machine, which you find by
typing "ipconfig /all" on your local windows machine. Also, you will need to have the ssh daemon (sshd) running locally in Cygwin.

Also note that your windows user name does not have to be the same as on SciNet, this just
depends on how your local windows system was set up.

All locations given to scp or rsync in cygwin have to be in unix format (using "/" not "\"), and will be relative to cygwin's path, not windows (e.g.
use /cygdrive/c/...... to get to the windows C: drive).

=== Ways to transfer data ===

==== Globus data transfer ====

Globus is a file-transfer service with an easy-to-use web interface to allow people to transfer file with ease. To get started, please sign up for a Globus account at [https://www.globus.org/ Globus website]. Once you sign up for an account, go to [https://www.globus.org/xfer/StartTransfer this page] to start the file transfer. Please enter computecanada#niagara as one endpoint for file transfer. If you are trying to transfer data from a laptop or desktop, you will need to install Globus Connect Personal software available [https://support.globus.org/entries/24044351 here] to setup an endpoint for the laptop or desktop and perform the transfer.

Please see the following [https://docs.computecanada.ca/wiki/Globus page] on how to setup Globus to perform data transfer.

==== scp ====

<tt>scp</tt>, or secure copy, is the easiest way to copy files, although we generally find rsync below to be faster.

scp works like cp to copy files:

$ scp original_file copy_file

except that either the original or the copy can be on another system:

$ scp jonsdatafile.bin jon@remote.system.com:/home/jon/bigdatadir/

will copy the data file into the directory <tt>/home/jon/bigdatadir/</tt> on <tt>remote.system.com</tt> after logging in as <tt>jon</tt>; you will be prompted for a password (unless you've set up ssh keys).

Copying from remote systems works the same way:

$ scp jon@remote.system.com:/home/jon/bigdatadir/newdata.bin .

And wildcards work as you'd expect (except you have to quote the wildcards on the remote system, as it can't expand properly here.)

$ scp *.bin jon@remote.system.com:/home/jon/bigdatadir/
$ scp jon@remote.system.com:"/home/jon/inputdata/*" .

There are few options worth knowing about:
* <tt>scp -C</tt> compresses the file before transmitting it; ''if'' the file compresses well, this can significantly increase the effective data transfer rate (though usually not as much as compressing the data, then sending it, then uncompressing). If the file doesn't compress well, than this adds CPU overhead without accomplishing much, and can slow down your data transfer.
* <tt>scp -oNoneEnabled=yes -oNoneSwitch=yes</tt> -- This is an hpn-ssh only option. If CPU overhead is a significant bottleneck in the data transfer, then we can avoid this by turning off the secure encryption of the data. For most of us, this is ok, but for others it is not. In either cases, '''authentication''' remains secure, it is only the data transfer that is in plaintext.

==== rsync ====

[http://samba.anu.edu.au/rsync/ rsync] is a very powerful tool for mirroring directories of data.
$ rsync -av -e ssh scinetdatadir jon@remote.system.com:/home/jon/bigdatadir/
rsync has a dizzying number of options; the above syncs <tt>scinetdatadir</tt> ''to'' the remote system; that is, any files that are newer on the localsystem are updated on the remote system. The converse isn't true; if there were newer files on the remote system, you'd have to bring those over with
$ rsync -av -e ssh jon@remote.system.com:/home/jon/bigdatadir/ scinetdatadir
The <tt>-av</tt> options are for verbose and `archive' mode, which preserves timestamps and permissions, which is normally what you want. <tt>-e ssh</tt> tells it to use ssh for the transfer.

One of the powerful things about rsync is that it looks to see what files already exist before copying, so you can use it repeatedly as a data directory fills and it won't make unnecessary copies; similarly, if a (say) log file grows over time, it will only copy the difference between the files, further speeding things up. This also means that it behaves well if a transfer is interrupted; a second invocation of rsync will continue where the other left off.

As with <tt>scp -C</tt>, <tt>rsync -z</tt> compresses on the fly, which can significantly enhance effective data transfer rates if the files compress well, or hurt it if not.

As with scp, if both sides are running hpn-ssh one can disable encryption of the data stream should that prove to be a bottleneck:
$ rsync -av -e "ssh -oNoneEnabled=yes -oNoneSwitch=yes" jon@remote.system.com:/home/jon/bigdatadir/ scinetdatadir

SciNet's login nodes, 142.150.188.5[1-4], are publicly accessible and can be used for data transfer as long as your material is not one big chunk (much more than 2-3GB each file). We have a 5 minutes CPU time limit on the login nodes and the transfer process may be killed by the kernel before completion. The workaround is to transfer your data using a rsync loop, by checking the rsync return code, assuming some files can be transferred before reaching the CPU limit. For example in a bash shell:
<pre>
for i in {1..100}; do ### try 100 times
rsync ...
[ "$?" == "0" ] && break
done
</pre>

==== ssh tunnel ====

Alternatively you may use a reverse ssh tunnel (ssh -R).

If your transfer is above 10GB you will need to use one of SciNet's datamovers. If your workstation is behind a firewall (as the datamovers are), you'll need a node external to your firewall, on the edge of your organization's network, that will serve as a gateway, and can be accessible via ssh by both your workstation and the datamovers. Initiate a "ssh -R" connection from SciNet's datamover to that node. This node needs to have its ssh GatewayPorts enabled so that your workstation can connect to the specified port on that node, which will forward the traffic back to SciNet's datamover.

=== Transfer speeds ===

==== What transfer speeds could I expect? ====

Below are some typical transfer numbers from datamover1 to another University of Toronto machine with a 1Gb/s link to the campus network:

{| class="wikitable" border="0"
|-
!{{Hl2}}| Mode
!{{Hl2}}| With hpn-ssh
!{{Hl2}}| Without
|-
| rsync
| 60-80 MB/s
| 30-40 MB/s
|-
| scp
| 50 MB/s
| 25 MB/s
|}

==== What can slow down my data transfer? ====

To move data quickly, ''all'' of the stages in the process have to be fast; the file system you are reading data from, the CPU reading the data, the network connection between the sender and the reciever, and the recipient CPU and disk. The slowest element in that chain will slow down the entire transfer.

On SciNet's side, our underlying filesystem is the high-performance [http://www-03.ibm.com/systems/software/gpfs/index.html GPFS] system, and the node we recommend you use (datamover1) has a high-speed connection to the network and fast CPUs.

==== Why are my transfers so much slower? ====

If you get numbers significantly lower than above, then there is a bottleneck in the transfer. The first thing to do is to run <tt>top</tt> on datamover1; if other people are transfering large files at the same time you are trying to, network congestion could result and you'll just have to wait until they are done.

If nothing else is going on on datamover1, there are a number of possibilites:
* network connection between SciNet and your machine - do you know the network connection of your remote machine? Are your systems connections tuned for performance [http://www.psc.edu/networking/projects/tcptune]?
* is the remote server busy?
* are the remote servers disks busy, or known to be slow?

For any further questions, contact us at [mailto:support@scinet.utoronto.ca Support @ SciNet]

Data Management

2018-06-25T20:27:04Z

Pinto: /* Performance */

=='''Storage Space'''==
SciNet's storage system is based on IBM's [http://en.wikipedia.org/wiki/IBM_General_Parallel_File_System GPFS] (General Parallel File System). There are two main systems for user data: <tt>/home</tt>, a small, backed-up space where user home directories are located, and <tt>/scratch</tt>, a large system for input or output data for jobs; data on <tt>/scratch</tt> is not only not backed up (a third storage system, /project, exist only for groups with LRAC/NRAC allocations). Data placed on scratch will be deleted if it has not been accessed in 3 months. SciNet does not provide long-term storage for large data sets.

===Overview of the different file systems===

{|border="1" cellpadding="10" cellspacing="0"
|-
! {{Hl2}} | file system
! {{Hl2}} | purpose
! {{Hl2}} | user quota
! {{Hl2}} | block size
! {{Hl2}} | backed up
! {{Hl2}} | purged
! {{Hl2}} | access
|-
| /home
| development
| 50 GB
| 256 KB
| yes
| never
| read-only on compute nodes (r/w on login, devel and datamover1)
|-
| /scratch
| computation
| first of (20 TB ; 1 million files)
| 4 MB
| no
| files > 3 months
| read/write on all nodes
|-
| /project
| computation
| by allocation
| 256 KB
| yes
| never
| read/write on all nodes
|}
project is included in scratch

===Home Disk Space===

Every SciNet user gets a 50GB directory on <tt>/home</tt> in a directory <tt>/home/G/GROUP/USER</tt>, which is regularly backed-up. Home is visible from <tt>login.scinet</tt> nodes, and from the development nodes on [[GPC_Quickstart | GPC]] and the [[TCS_Quickstart | TCS]]. However, on the compute nodes of the GPC clusters -- as when jobs are running -- <tt>/home</tt> is mounted '''''read-only'''''; thus GPC jobs can read files in /home but cannot write to files there. <tt>/home</tt> is a good place to put code, input files for runs, and anything else that needs to be kept to reproduce runs. On the other hand, <tt>/home</tt> is not a good place to put many small files, since
the block size for the file system is 256KB, so you would quickly run out of disk quota and you will make the backup system very slow.

If your application absolutely insists on writing material to your home account and you can't find a way to instruct it to write somewhere else, an alternative is to create a link pointing from your account under /home to a location under /scratch.

===Scratch Disk Space===

Every SciNet user also gets a directory in <tt>/scratch</tt> called <tt>/scratch/G/GROUP/USER</tt>. Scratch is visible from the <tt>login.scinet</tt> nodes, the development nodes on [[GPC_Quickstart | GPC]] and the [[TCS_Quickstart | TCS]], and on the compute nodes of the clusters, mounted as read-write. Thus jobs would normally write their output somewhere in <tt>/scratch</tt>. There are '''NO''' backups of anything on <tt>/scratch</tt>.

There is a large amount of space available on <tt>/scratch</tt> but it is purged routinely so that all users running jobs and generating large outputs will have room to store their data temporarily. Computational results which you want to keep longer than this must be copied (using <tt>scp</tt>) off of SciNet entirely and to your local system. SciNet does not routinely provide long-term storage for large data sets.

Also note that the shared parallel file system was not designed to do many small file transactions. For that reason, the number of files that any user can have on scratch is limited to 1 million. This limit should be thought of as a safeguard, not an invitation to create one million files. Please see [[File System and I/O dos and don'ts]].

===Scratch Disk Purging Policy===

In order to ensure that there is always significant space available for running jobs '''we automatically delete files in /scratch that have not been accessed or modified for more than 3 months by the actual deletion day on the 15th of each month'''. Note that we recently changed the cut out reference to the ''MostRecentOf(atime,mtime)''. This policy is subject to revision depending on its effectiveness. More details about the purging process and how users can check if their files will be deleted follows. If you have files scheduled for deletion you should move them to a more permanent locations such as your departmental server or your /project space (for PIs who have either been allocated disk space by the LRAC or have bought diskspace).

On the '''first''' of each month, a list of files scheduled for purging is produced, and an email notification is sent to each user on that list. Furthermore, at/or about the '''12th''' of each month a 2nd scan produces a more current assessment and another email notification is sent. This way users can double check that they have indeed taken care of all the files they needed to relocate before the purging deadline. Those files will be automatically deleted on the '''15th''' of the same month unless they have been accessed or relocated in the interim. If you have files scheduled for deletion then they will be listed in a file in /scratch/t/todelete/current, which has your userid and groupid in the filename. For example, if user xxyz wants to check if they have files scheduled for deletion they can issue the following command on a system which mounts /scratch (e.g. a scinet login node): '''ls -1 /scratch/t/todelete/current |grep xxyz'''. In the example below, the name of this file indicates that user xxyz is part of group abc, has 9,560 files scheduled for deletion and they take up 1.0TB of space:

<pre>
[xxyz@scinet04 ~]$ ls -1 /scratch/t/todelete/current |grep xxyz
-rw-r----- 1 xxyz root 1733059 Jan 12 11:46 10001___xxyz_______abc_________1.00T_____9560files
</pre>

The file itself contains a list of all files scheduled for deletion (in the last column) and can be viewed with standard commands like more/less/cat - e.g. '''more /scratch/t/todelete/current/10001___xxyz_______abc_________1.00T_____9560files'''

Similarly, you can also verify all other users on your group by using the ls command with grep on your group. For example: '''ls -1 /scratch/t/todelete/current |grep abc'''. That will list all other users in the same group that xxyz is part of, and have files to be purged on the 15th. Members of the same group have access to each other's contents.

'''NOTE:''' Preparing these assessments takes several hours. If you change the access/modification time of a file in the interim, that will not be detected until the next cycle. A way for you to get immediate feedback is to use the ''''ls -lu'''' command on the file to verify the atime and ''''ls -la'''' for the mtime. If the file atime/mtime has been updated in the meantime, coming the purging date on the 15th it will no longer be deleted.

===Project Disk Space===

Investigators who have been granted allocations through the [http://wiki.scinethpc.ca/wiki/index.php/Application_Process LRAC/NRAC Application Process] may have been allocated disk space in addition to compute time. For the period of time that the allocation is granted, they will have disk space on the <tt>/project</tt> disk system. Space on project is a subset of scratch, but is not purged and is backed up. All members of the investigators groups will have access to this disk system, which will be mounted read/write everywhere.

===How much Disk Space Do I have left?===

The <tt>'''/scinet/gpc/bin6/diskUsage'''</tt> command, available on the login nodes, datamovers and the GPC devel nodes, provides information in a number of ways on the home, scratch, and project file systems. For instance, how much disk space is being used by yourself and your group (with the -a option), or how much your usage has changed over a certain period ("delta information") or you may generate plots of your usage over time. Please see the usage help below for more details.
<pre>
Usage: diskUsage [-h|-?| [-a] [-u <user>] [-de|-plot]
-h|-?: help
-a: list usages of all members on the group
-u <user>: as another user on your group
-de: include delta information
-plot: create plots of disk usages
</pre>

Did you know that you can check which of your directories have more than 1000 files with the <tt>'''/scinet/gpc/bin6/topUserDirOver1000list'''</tt> command and which have more than 1GB of material with the <tt>'''/scinet/gpc/bin6/topUserDirOver1GBlist'''</tt> command?

Notes:
* information on usage and quota is only updated hourly!
* contents of project count against space and #files limits on scratch

===Performance===

[http://en.wikipedia.org/wiki/IBM_General_Parallel_File_System GPFS] is a high-performance filesystem which provides rapid reads and writes to large datasets in parallel from many nodes. As a consequence of this design, however, '''the file system performs quite ''poorly'' at accessing data sets which consist of many, small files.''' For instance, you will find that reading data in from one 16MB file is enormously faster than from 400 40KB files. Such small files are also quite wasteful of space, as the blocksize for the scratch filesystem is 16MB. This is something you should keep in mind when planning your input/output strategy for runs on SciNet.

For instance, if you run multi-process jobs, having each process write to a file of its own is not an scalable I/O solution. A directory gets locked by the first process accessing it, so all other processes have to wait for it. Not only has the code just become considerably less parallel, chances are the file system will have a time-out while waiting for your other processes, leading your program to crash mysteriously.
Consider using MPI-IO (part of the MPI-2 standard), which allows files to be opened simultaneously by different processes, or using a dedicated process for I/O to which all other processes send their data, and which subsequently writes this data to a single file.

===Local Disk===

The compute nodes on the GPC '''do not contain hard drives''' so there is no local disk available to use during your computation. You can however use part of a compute nodes RAM like a local disk ('ramdisk') but this will reduce how much memory is available for your
program. This can be accessed using <tt>/dev/shm/</tt> and is currently set to 8GB. Anything written
to this location that you want to keep must be copied back to the <tt>/scratch</tt> filesystem as <tt>/dev/shm</tt> is wiped after each job and since it is in memory will not survive through a reboot of the node. More on ramdisk usage can be found [[User_Ramdisk | here]].

Note that the absense of hard drives also means that the nodes cannot swap memory, so be sure that your computation fits within memory.

===Buying storage space on GPFS or HPSS===

Groups can buy space on GPFS or HPSS rather than rely on the [http://wiki.scinethpc.ca/wiki/index.php/Application_Process annual allocation process]. A good budgetary number would be:

GPFS $400/TB

HPSS $150/TB

This is a one-time cost. We have no formal, written data retention policy at this point but the intent is to keep any HPSS data (including migrating to new tape technologies) as long as SciNet is in operation. These numbers are for budgetary purposes only and subject to change (e.g. as markets and technologies evolve).

=='''Data Transfer'''==
{{:Data_Transfer}}

=='''File/Ownership Management (ACL)'''==
* By default, at SciNet, users within the same group have read permission to each other's files (not write)
* You may use access control list ('''ACL''') to allow your supervisor (or another user within your group) to manage files for you (i.e., create, move, rename, delete), while still retaining your access and permission as the original owner of the files/directories.
* '''NOTE''': We highly recommend that you never give write permission to other users on the top level of your home directory (/home/G/GROUP/[owner]), since that would seriously compromise your privacy, in addition to disable ssh key authentication, among other things. If necessary, make specific sub-directories under your home directory so that other users can manipulate/access files from those.
* If you need to set up permissions across groups [mailto:support@scinet.utoronto.ca contact us] (and the other group's supervisor!).



===Using mmputacl/mmgetacl===
* You may use gpfs' native '''mmputacl''' and '''mmgetacl''' commands. The advantages are that you can set "control" permission and that [http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.gpfs.doc%2Fgpfs31%2Fbl1adm1160.html POSIX or NFS v4 style ACL] are supported. You will need first to create a /tmp/supervisor.acl file with the following contents:
<pre>
user::rwxc
group::----
other::----
mask::rwxc
user:[owner]:rwxc
user:[supervisor]:rwxc
</pre>

Then issue the following 2 commands:
<pre>
1) $ mmputacl -i /tmp/supervisor.acl /project/g/group/[owner]
2) $ mmputacl -d -i /tmp/supervisor.acl /project/g/group/[owner]
(every *new* file/directory inside [owner] will inherit [supervisor] ownership by default as well as
[owner] ownership, ie, ownership of both by default, for files/directories created by [supervisor])

$ mmgetacl /project/g/group/[owner]
(to determine the current ACL attributes)

$ mmdelacl -d /project/g/group/[owner]
(to remove any previously set ACL)

$ mmeditacl /project/g/group/[owner]
(to create or change a GPFS access control list)
(for this command to work set the EDITOR environment variable: export EDITOR=/usr/bin/vi)
</pre>

NOTES:
* There is no option to recursively add or remove ACL attributes using a gpfs built-in command to existing files. You'll need to use the -i option as above for each file or directory individually. [[Data_Management#bash_script_that_you_may_adapt_to_recursively_add_or_remove_ACL_attributes_using_gpfs_built-in_commands | Here is a sample bash script you may use for that purpose]]

* mmputacl/setfacl will not overwrite the original linux group permissions for a directory when copied to another directory already with ACLs, hence the "#effective:r-x" note you may see from time to time with mmgetacf/getfacl. If you want to give rwx permissions to everyone in your group you should simply rely on the plain unix 'chmod g+rwx' command. You may do that before or after copying the original material to another folder with the ACLs.

For more information on using [http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.gpfs.doc%2Fgpfs31%2Fbl1adm11120.html <tt>mmputacl</tt>] or [http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.gpfs.doc%2Fgpfs31%2Fbl1adm11120.html <tt>mmgetaclacl</tt>] see their man pages.

===Appendix (ACL)===
====bash script that you may adapt to recursively add or remove ACL attributes using gpfs built-in commands====
Courtesy of Agata Disks (http://csngwinfo.in2p3.fr/mediawiki/index.php/GPFS_ACL)

<pre>
#!/bin/bash
# USAGE
# - on one directory: ./set_acl.sh dir_name
# - on more directories: ./set_acl.sh 'dir_nam*'
#

# Path of the file that contains the ACL
ACL_FILE_PATH=/agatadisks/data/acl_file.acl

# Directories onto the ACLs have to be set
dirs=$1

# Recursive function that sets ACL to files and directories
set_acl () {
curr_dir=$1
for args in $curr_dir/*
do
if [ -f $args ]; then
echo "ACL set on file $args"
mmputacl -i $ACL_FILE_PATH $args
if [ $? -ne 0 ]; then
echo "ERROR: ACL not set on $args"
exit -1
fi
fi
if [ -d $args ]; then
# Set Default ACL in directory
mmputacl -i $ACL_FILE_PATH $args -d
if [ $? -ne 0 ]; then
echo "ERROR: Default ACL not set on $args"
exit -1
fi
echo "Default ACL set on directory $args"
# Set ACL in directory
mmputacl -i $ACL_FILE_PATH $args
if [ $? -ne 0 ]; then
echo "ERROR: ACL not set on $args"
exit -1
fi
echo "ACL set on directory $args"
set_acl $args
fi
done
}
for dir in $dirs
do
if [ ! -d $dir ]; then
echo "ERROR: $dir is not a directory"
exit -1
fi
set_acl $dir
done
exit 0

</pre>

==[[HPSS|'''High Performance Storage System (HPSS)''']]==

==More questions on data management?==

Check out the [[FAQ#Data_on_SciNet_disks|FAQ]].

Data Transfer

2018-05-05T06:25:16Z

Pinto: /* Globus data transfer */

=== General guidelines ===

All traffic to and from the data centre has to go via [http://en.wikipedia.org/wiki/Secure_Shell SSH], or secure shell.
This is a protocol which sets up a secure connection between two sites. In all cases, incoming connections to SciNet go through relatively low-speed connections to the login.scinet gateways, but there are many ways to copy files on top of the ssh protocol.

What node to use for data transfer to and from SciNet depends mostly on the amount of data to transfer:

==== Moving <10GB through the login nodes ====

The login nodes are accessible from outside SciNet, which means that you can transfer data between your own office/home machine and SciNet using scp or rsync (see below). Although the login nodes have a cpu_time timeout of 5 minutes (emphasis on cpu_time, not wall_time), most likely if you try to transfer more than 10GB probably you wouldn't succeed. While the login nodes can be used for transfers of less than 10GB, using a datamover node would still be faster.

Note that transfers through a login node will timeout after a certain time (currently set to 5 minutes cpu_time), so if you have a slow connection you may need to go through datamover1.

==== Moving >10GB through the datamover1 node ====

Serious moves of data (>10GB) to or from SciNet should be done from <tt>datamover1</tt> or <tt>datamover2</tt> nodes. From any of the interactive SciNet nodes, one should be able to <tt>ssh datamover1</tt> or <tt>ssh datmover2</tt> to log in. Those are the machines that have the fastest network connections to the outside world (by a factor of 10; a 10Gb/s link as vs 1Gb/s).

Transfers must be ''originated'' from <tt>datamover1</tt> or <tt>datamover2</tt>; that is, one can not copy files from the outside world directly to or from the datamovers; one has to log in to a datamover and copy the data to or from the outside network. Your local machine must be reachable from the outside as well, either by its name or its IP address. If you are behind a firewall or a (wireless) router, this may not be possible. You may need to ask your network administrator to allow the datamovers to ssh to your machine. If you need to open a hole on your firewall we provide their IPs:

datamover1 142.150.188.121
datamover2 142.150.188.122

==== Hpn-ssh ====

The usual ssh protocols were not designed for speed. On the <tt>datamover1</tt> or <tt>datamover2</tt> nodes, we have installed hpn-ssh, or [http://www.psc.edu/networking/projects/hpn-ssh/ High-Performance-enabled ssh]. You use this higher-performance ssh/scp/sftp variant by loading the `hpnssh' module. Hpn-ssh is backwards compatible with the `usual' ssh, but is capable of significantly higher speeds. If you routinely have large data transfers to do, we recommend having your system administrator look into installing [http://www.psc.edu/networking/projects/hpn-ssh/ hpn-ssh] on your system.

Everything we discuss below, unless otherwise stated, will work regardless of whether you have hpn-ssh installed on your remote system.

==== For Microsoft Windows users ====

Linux-windows transfers can be a bit more involved than linux-to-linux, but using [http://www.cygwin.com Cygwin], this should not be a problem. Make sure you install Cygwin with the openssh package.

If you want to remain 100% a Windows environment, another very good tool is [http://winscp.net/eng/index.php WinSCP]. It will let you easily transfer and synchronize data between your Windows workstation and the login nodes using your ssh credentials (provided that it's not much more than 10GB on each sync pass).

If you are going to use the [[Data_Management#Moving_.3E10GB_through_the_datamover1_node | datamover1 method]], and assuming your machine is not a wireless laptop (if it
is, best to find a nearby computer that's not wireless and use a usb
memory stick), you'll need the IP address of your machine, which you find by
typing "ipconfig /all" on your local windows machine. Also, you will need to have the ssh daemon (sshd) running locally in Cygwin.

Also note that your windows user name does not have to be the same as on SciNet, this just
depends on how your local windows system was set up.

All locations given to scp or rsync in cygwin have to be in unix format (using "/" not "\"), and will be relative to cygwin's path, not windows (e.g.
use /cygdrive/c/...... to get to the windows C: drive).

=== Ways to transfer data ===

==== Globus data transfer ====

Globus is a file-transfer service with an easy-to-use web interface to allow people to transfer file with ease. To get started, please sign up for a Globus account at [https://www.globus.org/ Globus website]. Once you sign up for an account, go to [https://www.globus.org/xfer/StartTransfer this page] to start the file transfer. Please enter computecanada#niagara or computecanada#gpc as one endpoint for file transfer. The computecanada#gpc endpoint requires authentication using your scinet username and password. If you are trying to transfer data from a laptop or desktop, you will need to install Globus Connect Personal software available [https://support.globus.org/entries/24044351 here] to setup an endpoint for the laptop or desktop and perform the transfer.

Please see the following [https://docs.computecanada.ca/wiki/Globus page] on how to setup Globus to perform data transfer.

==== scp ====

<tt>scp</tt>, or secure copy, is the easiest way to copy files, although we generally find rsync below to be faster.

scp works like cp to copy files:

$ scp original_file copy_file

except that either the original or the copy can be on another system:

$ scp jonsdatafile.bin jon@remote.system.com:/home/jon/bigdatadir/

will copy the data file into the directory <tt>/home/jon/bigdatadir/</tt> on <tt>remote.system.com</tt> after logging in as <tt>jon</tt>; you will be prompted for a password (unless you've set up ssh keys).

Copying from remote systems works the same way:

$ scp jon@remote.system.com:/home/jon/bigdatadir/newdata.bin .

And wildcards work as you'd expect (except you have to quote the wildcards on the remote system, as it can't expand properly here.)

$ scp *.bin jon@remote.system.com:/home/jon/bigdatadir/
$ scp jon@remote.system.com:"/home/jon/inputdata/*" .

There are few options worth knowing about:
* <tt>scp -C</tt> compresses the file before transmitting it; ''if'' the file compresses well, this can significantly increase the effective data transfer rate (though usually not as much as compressing the data, then sending it, then uncompressing). If the file doesn't compress well, than this adds CPU overhead without accomplishing much, and can slow down your data transfer.
* <tt>scp -oNoneEnabled=yes -oNoneSwitch=yes</tt> -- This is an hpn-ssh only option. If CPU overhead is a significant bottleneck in the data transfer, then we can avoid this by turning off the secure encryption of the data. For most of us, this is ok, but for others it is not. In either cases, '''authentication''' remains secure, it is only the data transfer that is in plaintext.

==== rsync ====

[http://samba.anu.edu.au/rsync/ rsync] is a very powerful tool for mirroring directories of data.
$ rsync -av -e ssh scinetdatadir jon@remote.system.com:/home/jon/bigdatadir/
rsync has a dizzying number of options; the above syncs <tt>scinetdatadir</tt> ''to'' the remote system; that is, any files that are newer on the localsystem are updated on the remote system. The converse isn't true; if there were newer files on the remote system, you'd have to bring those over with
$ rsync -av -e ssh jon@remote.system.com:/home/jon/bigdatadir/ scinetdatadir
The <tt>-av</tt> options are for verbose and `archive' mode, which preserves timestamps and permissions, which is normally what you want. <tt>-e ssh</tt> tells it to use ssh for the transfer.

One of the powerful things about rsync is that it looks to see what files already exist before copying, so you can use it repeatedly as a data directory fills and it won't make unnecessary copies; similarly, if a (say) log file grows over time, it will only copy the difference between the files, further speeding things up. This also means that it behaves well if a transfer is interrupted; a second invocation of rsync will continue where the other left off.

As with <tt>scp -C</tt>, <tt>rsync -z</tt> compresses on the fly, which can significantly enhance effective data transfer rates if the files compress well, or hurt it if not.

As with scp, if both sides are running hpn-ssh one can disable encryption of the data stream should that prove to be a bottleneck:
$ rsync -av -e "ssh -oNoneEnabled=yes -oNoneSwitch=yes" jon@remote.system.com:/home/jon/bigdatadir/ scinetdatadir

SciNet's login nodes, 142.150.188.5[1-4], are publicly accessible and can be used for data transfer as long as your material is not one big chunk (much more than 2-3GB each file). We have a 5 minutes CPU time limit on the login nodes and the transfer process may be killed by the kernel before completion. The workaround is to transfer your data using a rsync loop, by checking the rsync return code, assuming some files can be transferred before reaching the CPU limit. For example in a bash shell:
<pre>
for i in {1..100}; do ### try 100 times
rsync ...
[ "$?" == "0" ] && break
done
</pre>

==== ssh tunnel ====

Alternatively you may use a reverse ssh tunnel (ssh -R).

If your transfer is above 10GB you will need to use one of SciNet's datamovers. If your workstation is behind a firewall (as the datamovers are), you'll need a node external to your firewall, on the edge of your organization's network, that will serve as a gateway, and can be accessible via ssh by both your workstation and the datamovers. Initiate a "ssh -R" connection from SciNet's datamover to that node. This node needs to have its ssh GatewayPorts enabled so that your workstation can connect to the specified port on that node, which will forward the traffic back to SciNet's datamover.

=== Transfer speeds ===

==== What transfer speeds could I expect? ====

Below are some typical transfer numbers from datamover1 to another University of Toronto machine with a 1Gb/s link to the campus network:

{| class="wikitable" border="0"
|-
!{{Hl2}}| Mode
!{{Hl2}}| With hpn-ssh
!{{Hl2}}| Without
|-
| rsync
| 60-80 MB/s
| 30-40 MB/s
|-
| scp
| 50 MB/s
| 25 MB/s
|}

==== What can slow down my data transfer? ====

To move data quickly, ''all'' of the stages in the process have to be fast; the file system you are reading data from, the CPU reading the data, the network connection between the sender and the reciever, and the recipient CPU and disk. The slowest element in that chain will slow down the entire transfer.

On SciNet's side, our underlying filesystem is the high-performance [http://www-03.ibm.com/systems/software/gpfs/index.html GPFS] system, and the node we recommend you use (datamover1) has a high-speed connection to the network and fast CPUs.

==== Why are my transfers so much slower? ====

If you get numbers significantly lower than above, then there is a bottleneck in the transfer. The first thing to do is to run <tt>top</tt> on datamover1; if other people are transfering large files at the same time you are trying to, network congestion could result and you'll just have to wait until they are done.

If nothing else is going on on datamover1, there are a number of possibilites:
* network connection between SciNet and your machine - do you know the network connection of your remote machine? Are your systems connections tuned for performance [http://www.psc.edu/networking/projects/tcptune]?
* is the remote server busy?
* are the remote servers disks busy, or known to be slow?

For any further questions, contact us at [mailto:support@scinet.utoronto.ca Support @ SciNet]

User:Pinto

2018-05-05T06:17:30Z

Pinto:

Jaime Pinto - Storage Analyst 
SciNet HPC Consortium - Compute/Calcul Canada 
www.scinet.utoronto.ca - www.computecanada.ca 
University of Toronto 
661 University Ave. (MaRS), Suite 1140 
Toronto, ON, M5G1M1 
P: 416-978-2755 
C: 416-505-1477

Oldwiki.scinet.utoronto.ca:System Alerts

2018-05-05T06:15:01Z

Pinto: /* System Status */

== System Status==

{|
|[[File:up.png|up|link=https://docs.scinet.utoronto.ca/index.php/Main_Page]][https://docs.scinet.utoronto.ca Niagara]
|-
|[[File:up.png|up|link=BGQ]][[BGQ]]
|[[File:up.png|up|link=P7 Linux Cluster]][[P7 Linux Cluster|P7]]
|[[File:up.png|up|link=P8]][[P8]]
|-
|[[File:up.png|up|link=SOSCIP_GPU]][[SOSCIP_GPU|SGC]]
|[[File:up.png|up|link=Knights Landing]][[Knights Landing|KNL]]
|[[File:down.png|up|link=HPSS]][https://docs.scinet.utoronto.ca/index.php/HPSS HPSS]
|-
|[[File:up.png|up|]]File System
|[[File:up.png|up|]]External Network
|
|}

 Mon 23 Apr 2018 GPC-compute is decommissioned, GPC-storage available until 9 May 2018

 Thu 18 Apr 2018 Niagara system will undergo an upgrade to its Infiniband network between 9am and 12pm, should be transparent to users, however there is a chance of network interruption.

 Fri 13 Apr 2018 HPSS system will be down for a few hours on Mon, Apr/16, 9AM, for hardware upgrades, in preparation for the eventual move to the Niagara side.

 Tue 10 Apr 2018 Niagara is open to users.

 Wed 4 Apr 2018 We are very close to the production launch of Niagara, the new system installed at SciNet.
While the RAC allocation year officially starts today, April 4/18, the Niagara system is still undergoing some final tuning and software updates, so the plan is to officially open it to users on next week.

All active GPC users will have their accounts, $HOME, and $PROJECT, transferred to the new
Niagara system. Those of you who are new to SciNet, but got RAC allocations on Niagara,
will have your accounts created and ready for you to login.

We are planning an extended [https://support.scinet.utoronto.ca/education/go.php/370/index.php Intro to SciNet/Niagara session], available in person at our office, and webcast on Vidyo and possibly other means, on Wednesday April 11 at noon EST.

Oldwiki.scinet.utoronto.ca:System Alerts

2018-05-05T06:14:32Z

Pinto: /* System Status */

== System Status==

{|
|[[File:up.png|up|link=https://docs.scinet.utoronto.ca/index.php/Main_Page]][https://docs.scinet.utoronto.ca Niagara]
|-
|[[File:up.png|up|link=BGQ]][[BGQ]]
|[[File:up.png|up|link=P7 Linux Cluster]][[P7 Linux Cluster|P7]]
|[[File:up.png|up|link=P8]][[P8]]
|-
|[[File:up.png|up|link=SOSCIP_GPU]][[SOSCIP_GPU|SGC]]
|[[File:up.png|up|link=Knights Landing]][[Knights Landing|KNL]]
|[[File:down.png|up|link=HPSS]][https://docs.scinet.utoronto.ca/index.php/HPSS]
|-
|[[File:up.png|up|]]File System
|[[File:up.png|up|]]External Network
|
|}

 Mon 23 Apr 2018 GPC-compute is decommissioned, GPC-storage available until 9 May 2018

 Thu 18 Apr 2018 Niagara system will undergo an upgrade to its Infiniband network between 9am and 12pm, should be transparent to users, however there is a chance of network interruption.

 Fri 13 Apr 2018 HPSS system will be down for a few hours on Mon, Apr/16, 9AM, for hardware upgrades, in preparation for the eventual move to the Niagara side.

 Tue 10 Apr 2018 Niagara is open to users.

 Wed 4 Apr 2018 We are very close to the production launch of Niagara, the new system installed at SciNet.
While the RAC allocation year officially starts today, April 4/18, the Niagara system is still undergoing some final tuning and software updates, so the plan is to officially open it to users on next week.

All active GPC users will have their accounts, $HOME, and $PROJECT, transferred to the new
Niagara system. Those of you who are new to SciNet, but got RAC allocations on Niagara,
will have your accounts created and ready for you to login.

We are planning an extended [https://support.scinet.utoronto.ca/education/go.php/370/index.php Intro to SciNet/Niagara session], available in person at our office, and webcast on Vidyo and possibly other means, on Wednesday April 11 at noon EST.

HPSS-by-pomes

2018-05-04T00:30:35Z

Pinto: /* 4. Check HPSS data against the original */

= '''Packing up large data sets and putting them on HPSS''' =
(Pomés group recommendations)

HPSS has the following limitations:

Hundreds of thousands of small files can be offloaded rapidly, but take weeks or months to recall
* No individual file can exceed 1 TB
* You must verify the integrity of your data throughout the data preparation and offload process
* With these limitations in mind, we have developed the following protocol for efficiently offloading data from scratch to HPSS.

== 1. Identify the subdirectories that contain > 1,000 files. ==
a. Create a directory called DU/ and place the following script in that directory:
<source lang="bash">
#!/bin/bash
# du.sh
for i in $(ls ../); do
n=$(find ../$i |wc -l)
s=$(du -hs ../$i | awk '{print $1}')
echo "$i $n $s"
done > my.du.dirs
</source>

b. chmod +x du.sh

c. nohup ./du.sh &
(This step may require hours or days to complete)

d. Now my.du.dirs will contain a listing of the number of files and the total size of each directory.

e. Identify the directories with many files and copy the DU/ directory there and then run du.sh again. Continue this process until you have a good understanding of which directories actually contain large numbers of files.

== 2. Create tar files for these directories. ==
a. This should be scripted to ensure that your tarballs are completely written.

b. Never script the removal of the original files.

c. Here is an example script:
<source lang="bash">
for i in "dir1 dir2 dir3"; do
tar –cf ${i}.tar ${i}
echo "tar $i returned $?"
done > my.tar.results
</source>

d. Note that the evaluation of $? must be done on the very next command after tar. Even inserting an additional echo statement will break the test.

e. Once you are sure that the tar command was successful (return code equals zero), you should delete the originals. You should not script the deletion process because a typo in a rm –rf command can be very costly. If you must script a removal, it is best to do it like this:

mkdir TRASH; for i in $list; do mv $i TRASH; done

then inspect TRASH and remove it manually. Note that the above command could possibly be costly if you make a mistake since files may get overwritten.

== 3. Now upload to HPSS using HSI. ==
You should have less than 10,000 files per TB of uploaded data. If that is not the case, then go back and pack up your data some more before proceeding.

a. It is recommended that you have a new directory structure in HPSS for your fully packed up data. This is because you may have put other things on HPSS in the past and the creation of a clean directory structure is a good way to denote that this data is the final copy. Here, we use FULL_DATA/

b. An example of an HPSS offload script follows:
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J offload
#SBATCH --mail-type=ALL

## scratch files: $SCRATCH/mydata
## HPSS files: $ARCHIVE/FULL_DATA/mydata

trap "echo 'Job script not completed';exit 129" TERM INT

/usr/local/bin/hsi -v <<EOF1
mkdir -p FULL_DATA
cd FULL_DATA
cput -Rpuh $SCRATCH/mydata
end
EOF1

status=$?
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

trap - TERM INT
</source>

c. After the above script has completed, check the output to ensure that your transfer was successful. If you had errors, or if it timed out, simply run the script again. If you continue to get the same errors, contact support@scinet.utoronto.ca

== 4. Check HPSS data against the original ==
Now you must retrieve your data back to scratch so that you can check it against the original copy on scratch, which we have not yet deleted.

a. Run diskUsage to ensure that you have space in your allocation to recall the data to scratch. If the recall will bring you close to your limit, advise your other group members how much space you will be recalling in case another user is also planning a large data recall.

b. An example of an HPSS recall script follows:
<source lang="bash">
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J offload
#SBATCH --mail-type=ALL

## original scratch files: $SCRATCH/mydata
## HPSS files: $ARCHIVE/FULL_DATA/mydata
## new copy scratch files: $SCRATCH/RETREIVED_MODULES/FULL_DATA/mydata

mkdir -p $SCRATCH/RETREIVED_MODULES/FULL_DATA

trap "echo 'Job script not completed';exit 129" TERM INT

/usr/local/bin/hsi -v <<EOF1
lcd $SCRATCH/RETREIVED_MODULES/FULL_DATA/
cget -Rpuh $ARCHIVE/FULL_DATA/mydata
end
EOF1

status=$?
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

trap - TERM INT
</source>

== 5. Run a md5sum over the entire directory ==
Now that you have the original and the cput/cget copy back from HPSS, run a md5sum over the entire directory. An example of a check is as follows. Run this script under nohup from one of the datamover nodes:
<source lang="bash">
#!/bin/bash

COMPUTE_DIFFERENCES=1
dir=$(pwd)
WAS=$SCRATCH/mydata
IS=$SCRATCH/RETREIVED_MODULES/FULL_DATA/mydata
if ((COMPUTE_DIFFERENCES)); then
cd $WAS
find . > ${dir}/tmp.was
echo "find on was returned $?"

cd $IS
find . > ${dir}/tmp.is
echo "find on is returned $?"

cd $dir

changes=$(sort tmp.was >a ; sort tmp.is >b; diff a b)
if ((changes!=0)); then
echo "FILES DIFFER! Diff returned $changes"
exit
fi
fi

for i in $(cat tmp.was); do
if [ -f ${WAS}/${i} ]; then
was=$(md5sum ${WAS}/$i |awk '{print $1}')
is=$(md5sum ${IS}/$i |awk '{print $1}')
same=$(echo $was $is | awk '{if($1==$2) print 1; else print 0}')
if ((same==0)); then
echo "FILES DIFFER -- $i $was $is"
else
echo "OK for $i"
fi
fi
done
</source>

a. when that is done, grep DIFFER on the output (in nohup.out since you ran this script under nohup). Any returned value means there is a problem. Contact scinet support.

b. If everything was a success, you can delete all of the copies that you recalled from HPSS to scratch. You can also delete your original copy in scratch if you would like as you have a complete copy on scinet HPSS.

[[Data Management|BACK TO Data Management]]

HPSS-by-pomes

2018-05-04T00:29:48Z

Pinto: /* 3. Now upload to HPSS using HSI. */

= '''Packing up large data sets and putting them on HPSS''' =
(Pomés group recommendations)

HPSS has the following limitations:

Hundreds of thousands of small files can be offloaded rapidly, but take weeks or months to recall
* No individual file can exceed 1 TB
* You must verify the integrity of your data throughout the data preparation and offload process
* With these limitations in mind, we have developed the following protocol for efficiently offloading data from scratch to HPSS.

== 1. Identify the subdirectories that contain > 1,000 files. ==
a. Create a directory called DU/ and place the following script in that directory:
<source lang="bash">
#!/bin/bash
# du.sh
for i in $(ls ../); do
n=$(find ../$i |wc -l)
s=$(du -hs ../$i | awk '{print $1}')
echo "$i $n $s"
done > my.du.dirs
</source>

b. chmod +x du.sh

c. nohup ./du.sh &
(This step may require hours or days to complete)

d. Now my.du.dirs will contain a listing of the number of files and the total size of each directory.

e. Identify the directories with many files and copy the DU/ directory there and then run du.sh again. Continue this process until you have a good understanding of which directories actually contain large numbers of files.

== 2. Create tar files for these directories. ==
a. This should be scripted to ensure that your tarballs are completely written.

b. Never script the removal of the original files.

c. Here is an example script:
<source lang="bash">
for i in "dir1 dir2 dir3"; do
tar –cf ${i}.tar ${i}
echo "tar $i returned $?"
done > my.tar.results
</source>

d. Note that the evaluation of $? must be done on the very next command after tar. Even inserting an additional echo statement will break the test.

e. Once you are sure that the tar command was successful (return code equals zero), you should delete the originals. You should not script the deletion process because a typo in a rm –rf command can be very costly. If you must script a removal, it is best to do it like this:

mkdir TRASH; for i in $list; do mv $i TRASH; done

then inspect TRASH and remove it manually. Note that the above command could possibly be costly if you make a mistake since files may get overwritten.

== 3. Now upload to HPSS using HSI. ==
You should have less than 10,000 files per TB of uploaded data. If that is not the case, then go back and pack up your data some more before proceeding.

a. It is recommended that you have a new directory structure in HPSS for your fully packed up data. This is because you may have put other things on HPSS in the past and the creation of a clean directory structure is a good way to denote that this data is the final copy. Here, we use FULL_DATA/

b. An example of an HPSS offload script follows:
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J offload
#SBATCH --mail-type=ALL

## scratch files: $SCRATCH/mydata
## HPSS files: $ARCHIVE/FULL_DATA/mydata

trap "echo 'Job script not completed';exit 129" TERM INT

/usr/local/bin/hsi -v <<EOF1
mkdir -p FULL_DATA
cd FULL_DATA
cput -Rpuh $SCRATCH/mydata
end
EOF1

status=$?
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

trap - TERM INT
</source>

c. After the above script has completed, check the output to ensure that your transfer was successful. If you had errors, or if it timed out, simply run the script again. If you continue to get the same errors, contact support@scinet.utoronto.ca

== 4. Check HPSS data against the original ==
Now you must retrieve your data back to scratch so that you can check it against the original copy on scratch, which we have not yet deleted.

a. Run diskUsage to ensure that you have space in your allocation to recall the data to scratch. If the recall will bring you close to your limit, advise your other group members how much space you will be recalling in case another user is also planning a large data recall.

b. An example of an HPSS recall script follows:
<source lang="bash">
#!/bin/bash
#PBS -l walltime=70:00:00
#PBS -q archive
#PBS -N offload
#PBS -j oe
#PBS -me

## original scratch files: $SCRATCH/mydata
## HPSS files: $ARCHIVE/FULL_DATA/mydata
## new copy scratch files: $SCRATCH/RETREIVED_MODULES/FULL_DATA/mydata

mkdir -p $SCRATCH/RETREIVED_MODULES/FULL_DATA

trap "echo 'Job script not completed';exit 129" TERM INT

/usr/local/bin/hsi -v <<EOF1
lcd $SCRATCH/RETREIVED_MODULES/FULL_DATA/
cget -Rpuh $ARCHIVE/FULL_DATA/mydata
end
EOF1

status=$?
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/gpc/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

trap - TERM INT
</source>

== 5. Run a md5sum over the entire directory ==
Now that you have the original and the cput/cget copy back from HPSS, run a md5sum over the entire directory. An example of a check is as follows. Run this script under nohup from one of the datamover nodes:
<source lang="bash">
#!/bin/bash

COMPUTE_DIFFERENCES=1
dir=$(pwd)
WAS=$SCRATCH/mydata
IS=$SCRATCH/RETREIVED_MODULES/FULL_DATA/mydata
if ((COMPUTE_DIFFERENCES)); then
cd $WAS
find . > ${dir}/tmp.was
echo "find on was returned $?"

cd $IS
find . > ${dir}/tmp.is
echo "find on is returned $?"

cd $dir

changes=$(sort tmp.was >a ; sort tmp.is >b; diff a b)
if ((changes!=0)); then
echo "FILES DIFFER! Diff returned $changes"
exit
fi
fi

for i in $(cat tmp.was); do
if [ -f ${WAS}/${i} ]; then
was=$(md5sum ${WAS}/$i |awk '{print $1}')
is=$(md5sum ${IS}/$i |awk '{print $1}')
same=$(echo $was $is | awk '{if($1==$2) print 1; else print 0}')
if ((same==0)); then
echo "FILES DIFFER -- $i $was $is"
else
echo "OK for $i"
fi
fi
done
</source>

a. when that is done, grep DIFFER on the output (in nohup.out since you ran this script under nohup). Any returned value means there is a problem. Contact scinet support.

b. If everything was a success, you can delete all of the copies that you recalled from HPSS to scratch. You can also delete your original copy in scratch if you would like as you have a complete copy on scinet HPSS.

[[Data Management|BACK TO Data Management]]

HPSS

2018-05-04T00:28:18Z

Pinto: /* Packing up large data sets and putting them on HPSS */

{|align=right
|align=center|'''Topology Overview'''
|align=center|'''Submission Queue'''
|-
|[[Image:HPSS-overview.png|right|x200px]]
|[[Image:HPSS-queue2.png|right|x200px]]
|-
|
|
|-
|align=center|'''Servers Rack'''
|align=center|'''TS3500 Library'''
|-
|[[Image:HPSS-servers.png|right|x250px]]
|[[Image:HPSS-TS3500.png|right|x250px]]
|}

= '''High Performance Storage System''' =

The High Performance Storage System ([http://www.hpss-collaboration.org/index.shtml HPSS] [http://en.wikipedia.org/wiki/High_Performance_Storage_System wikipedia]) is a tape-backed hierarchical storage system that provides a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data can be returned to the active GPFS filesystem when it is needed.

Since this system is intended for large data storage, it is accessible only to groups who have been awarded storage space at SciNet beyond 5TB in the yearly RAC resource allocation round. However, upon request, any user may be awarded access to HPSS, up to 2TB per group, so that you may get familiar with the system (just email support@scinet.utoronto.ca)

Access and transfer of data into and out of HPSS is done under the control of the user, whose interaction is expected to be scripted and submitted as a batch job, using one or more of the following utilities:
* [http://www.mgleicher.us/GEL/hsi HSI] is a client with an ftp-like functionality which can be used to archive and retrieve large files. It is also useful for browsing the contents of HPSS.
* [http://www.mgleicher.us/GEL/htar HTAR] is a utility that creates tar formatted archives directly into HPSS. It also creates a separate index file (.idx) that can be accessed and browsed quickly.
* [https://support.scinet.utoronto.ca/wiki/index.php/ISH ISH] is a TUI utility that can perform an inventory of the files and directories in your tarballs.

We're currently running HPSS v 7.3.3 patch 6, and HSI/HTAR version 4.0.1.2

== '''Why should I use and trust HPSS?''' ==
* HPSS is a 25 year-old collaboration between IBM and the DoE labs in the US, and is used by about 45 facilities in the [http://www.top500.org “Top 500”] HPC list (plus some black-sites).
* Over 2.5 ExaBytes of combined storage world-wide.
* The top 3 sites in the World report (fall 2017) having 360PB, 220PB and 125PB in production (ECMWF, UKMO and BNL)
* Environment Canada also adopted HPSS in 2017 to store Nav Canada data as well as to serve as their own archive. Currently has 2 X 100PB capacity installed.
* The SciNet HPSS system has been providing nearline capacity for important research data in Canada since early 2011, already at 10PB levels in 2018
* Very reliable, data redundancy and data insurance built-in (dual copies of everything are kept on tapes at SciNet)
* Data on cache and tapes can be geo-distributed for further resilience and HA.
* Highly scalable; current performance at SciNet - after a modest upgrade in 2017 - Ingest: ~150 TB/day, Recall: ~45 TB/day (aggregated).
* HSI/HTAR clients also very reliable and used on several HPSS sites. ISH was written at SciNet.
* [[Media:HPSS_rationale_SNUG.pdf|HPSS fits well with the Storage Capacity Expansion Plan at SciNet]] (pdf presentation)

== '''Guidelines''' ==
* Expanded storage capacity is provided on tape -- a media that is not suited for storing small files. Files smaller than ~200MB should be grouped into tarballs with tar or htar.
* Optimal performance for aggregated transfers and allocation on tapes is obtained with [[Why not tarballs too large |tarballs of size 500GB or less]], whether ingested by htar or hsi ([[Why not tarballs too large | WHY?]])
* We strongly urge that you use the sample scripts we are providing as the basis for your job submissions.
* Make sure to check the application's exit code and returned logs for errors after any data transfer or tarball creation process

== '''New to the System?''' ==
The first step is to email scinet support and request an HPSS account (or else you will get "Error - authentication/initialization failed" and 71 exit codes).

THIS set of instructions on the wiki is the best and most compressed "manual" we have. It may seem a bit overwhelming at first, because of all the job script templates we make available below (they are here so you don't have to think
too much, just copy and paste), but if you approach the index at the top as a "case switch" mechanism for what you intend to do, everything falls in place.

Try this sequence:

1) [https://wiki.scinet.utoronto.ca/wiki/index.php/HPSS#Access_Through_an_Interactive_HSI_session take a look around HPSS using an interactive HSI session]

(most linux shell commands have an equivalent in HPSS)

2) [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_tarball_create archive a small test directory using HTAR]

2a) use step 1) to see what happened

3) [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_data_offload archive a file using hsi]

3a) use step 1) to see what happened

4) [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_transferring_directories archive a small test directory using HSI]

4a) use step 1) to see what happened

5) now try the other cases and so on. In a couple of hours you'll be in pretty good shape.

== '''Bridge between BGQ and HPSS''' ==

At this time BGQ users will have to migrate data to Niagara scratch prior to transferring it to HPSS. We are looking for ways to improve this workflow.

== '''Access Through the Queue System''' ==
All access to the archive system is done through the [https://docs.computecanada.ca/wiki/Niagara_Quickstart#Submitting_jobs NIA queue system].

* Job submissions should be done to the 'archivelong' queue or the 'archiveshort'
* Short jobs are limited to 1H walltime by default. Long jobs (> 1H) are limited to 72H walltime.
* Users are limited to only 2 long jobs and 2 short jobs at the same time, and 10 jobs total on the each queue.
* There can only be 5 long jobs running at any given time overall. Remaining submissions will be placed on hold for the time being. So far we have not seen a need for overall limit on short jobs.

The status of pending jobs can be monitored with squeue specifying the archive queue:
<pre>
squeue -p archiveshort
OR
squeue -p archivelong
</pre>

== '''Access Through an Interactive HSI session''' ==
* You may want to acquire an interactive shell, start an HSI session and navigate the archive naming-space. Keep in mind, you're restricted to 1H.

<pre>
pinto@nia-login07:~$ salloc -p archiveshort -t 1:00:00
salloc: Granted job allocation 50918
salloc: Waiting for resource configuration
salloc: Nodes hpss-archive02-ib are ready for job
hpss-archive02-ib:~$

hpss-archive02-ib:~$ hsi (DON'T FORGET TO START HSI)
******************************************************************
* Welcome to HPSS@SciNet - High Perfomance Storage System *
* *
* INFO: THIS IS THE NEW 7.5.1 HPSS SYSTEM! *
* *
* Contact Information: support@scinet.utoronto.ca *
* NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead *
* CHECK THE INTEGRITY OF YOUR TARBALLS *
******************************************************************
[HSI]/archive/s/scinet/pinto-> ls

[HSI]/archive/s/scinet/pinto-> cd <some directory>
</pre>

=== Scripted File Transfers ===
File transfers in and out of the HPSS should be scripted into jobs and submitted to the ''archivelong'' queue or the ''archiveshort'' . See generic example below:

<source lang="bash">
#!/bin/bash -l
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_create_tarball_in_hpss
#SBATCH --mail-type=ALL

echo "Creating a htar of finished-job1/ directory tree into HPSS"

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

DEST=$ARCHIVE/finished-job1.tar

# htar WILL overwrite an existing file with the same name so check beforehand.

hsi ls $DEST &> /dev/null
status=$?

if [ $status == 0 ]; then
echo 'File $DEST already exists. Nothing has been done'
exit 1
fi

cd $SCRATCH/workarea/
htar -Humask=0137 -cpf $ARCHIVE/finished-job1.tar finished-job1/
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>
'''Note:''' Always trap the execution of your jobs for abnormal terminations, and be sure to return the exit code

=== Job Dependencies ===

Typically data will be recalled to /scratch when it is needed for analysis. Job dependencies can be constructed so that analysis jobs wait in the queue for data recalls before starting. The qsub flag is
<pre>
--dependency=<type:JOBID>
</pre>
where JOBID is the job number of the archive recalling job that must finish successfully before the analysis job can start.

Here is a short cut for generating the dependency (lookup [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_data_recall data-recall.sh samples]):
<pre>
hpss-archive02-ib:~$ sbatch $(sbatch data-recall.sh | awk {print "--dependency=afterany:"$1}') job-to-work-on-recalled-data.sh
</pre>

== '''HTAR''' ==
''' Please aggregate small files (<~200MB) into tarballs or htar files. '''

''' [[Why not tarballs too large |Keep your tarballs to size 500GB or less]], whether ingested by htar or hsi ([[Why not tarballs too large | WHY?]])'''

HTAR is a utility that is used for aggregating a set of files and directories, by using a sophisticated multithreaded buffering scheme to write files directly from GPFS into HPSS, creating an archive file that conforms to the POSIX TAR specification, thereby achieving a high rate of performance. HTAR does not do gzip compression, however it already has a built-in checksum algorithm.

'''Caution'''
* Files larger than 68 GB cannot be stored in an HTAR archive. If you attempt to start a transfer with any files larger than 68GB the whole HTAR session will fail, and you'll get a notification listing all those files, so that you can transfer them with HSI.
* Files with pathnames too long will be skipped (greater than 100 characters), so as to conform with TAR protocol [[(POSIX 1003.1 USTAR)]] -- Note that the HTAR will erroneously indicate success, however will produce exit code 70. For now, you can check for this type of error by "grep Warning my.output" after the job has completed.
* Unlike with cput/cget in HSI, "prompt before overwrite", this is not the default with (h)tar. Be careful not to unintentionally overwrite a previous htar destination file in HPSS. There could be a similar situation when extracting material back into GPFS and overwriting the originals. Be sure to double-check the logic in your scripts.
* Check the HTAR exit code and log file before removing any files from the GPFS active filesystems.

=== HTAR Usage ===
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, and preserve mask attributes (-p), enter:
<pre>
htar -cpf files.tar file1 file2
OR
htar -cpf $ARCHIVE/files.tar file1 file2
</pre>

* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:
<pre>
htar -cpf subdirA.tar subdirA
</pre>

* To extract all files from the archive file called ''proj1.tar'' in HPSS into the ''project1/src'' directory in GPFS, and use the time of extraction as the modification time, enter:
<pre>
cd project1/src
htar -xpmf proj1.tar
</pre>

* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):
<pre>
htar -vtf out.tar
</pre>

* To ensure that both the htar and the .idx files have read permissions to other members in your group use the umask option
<pre>
htar -Humask=0137 ....
</pre>

For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online

==== Sample tarball create ====
<source lang="bash">
#!/bin/bash -l
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_create_tarball_in_hpss
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be /archive/$(id -gn)/$(whoami)/

DEST=$ARCHIVE/finished-job1.tar

# htar WILL overwrite an existing file with the same name so check beforehand.

hsi ls $DEST &> /dev/null
status=$?

if [ $status == 0 ]; then
echo 'File $DEST already exists. Nothing has been done'
exit 1
fi

cd $SCRATCH/workarea/
htar -Humask=0137 -cpf $DEST finished-job1/
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

'''Note:''' If you attempt to start a transfer with any files larger than 68GB the whole HTAR session will fail, and you'll get a notification listing all those files, so that you can transfer them with HSI.
<pre>
----------------------------------------
INFO: File too large for htar to handle: finished-job1/file1 (86567185745 bytes)
INFO: File too large for htar to handle: finished-job1/file2 (71857244579 bytes)
ERROR: 2 oversize member files found - please correct and retry
ERROR: [FATAL] error(s) generating filename list
HTAR: HTAR FAILED
###WARNING htar returned non-zero exit status
</pre>

==== Sample tarball list ====
<source lang="bash">
#!/bin/bash -l
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_list_tarball_in_hpss
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

DEST=$ARCHIVE/finished-job1.tar

htar -tvf $DEST
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Sample tarball extract ====
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_extract_tarball_from_hpss
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

cd $SCRATCH/recalled-from-hpss
htar -xpmf $ARCHIVE/finished-job1.tar
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

== '''HSI''' ==

HSI may be the primary client with which some users will interact with HPSS. It provides an ftp-like interface for archiving and retrieving tarballs or [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_transferring_directories directory trees]. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents in HPSS. The most commonly used commands will be:
{|border="1" cellpadding="10" cellspacing="0"
|-
| cput
| Conditionally saves or replaces a HPSSpath file to GPFSpath if the GPFS version is new or has been updated
cput [options] GPFSpath [: HPSSpath]
|-
| cget
| Conditionally retrieves a copy of a file from HPSS to GPFS only if a GPFS version does not already exist.
cget [options] [GPFSpath :] HPSSpath
|-
| cd,mkdir,ls,rm,mv
| Operate as one would expect on the contents of HPSS.
|-
| lcd,lls
| ''Local'' commands to GPFS
|}

*There are 3 distinctions about HSI that you should keep in mind, and that can generate a bit of confusion when you're first learning how to use it:
** HSI doesn't currently support renaming directories paths during transfers on-the-fly, therefore the syntax for cput/cget may not work as one would expect in some scenarios, requiring some workarounds.
** HSI has an operator ":" which separates the GPFSpath and HPSSpath, and must be surrounded by whitespace (one or more space characters)
** The order for referring to files in HSI syntax is different from FTP. In HSI the general format is always the same, GPFS first, HPSS second, cput or cget:
<pre>
GPFSfile : HPSSfile
</pre>
For example, when using HSI to store the tarball file from GPFS into HPSS, then recall it to GPFS, the following commands could be used:
<pre>
cput tarball-in-GPFS : tarball-in-HPSS
cget tarball-recalled : tarball-in-HPSS
</pre>

unlike with FTP, where the following syntax would be used:
<pre>
put tarball-in-GPFS tarball-in-HPSS
get tarball-in-HPSS tarball-recalled
</pre>
* Simple commands can be executed on a single line.
<pre>
hsi "mkdir LargeFilesDir; cd LargeFilesDir; cput tarball-in-GPFS : tarball-in-HPSS"
</pre>

* More complex sequences can be performed using an except such as this:
<pre>
hsi <<EOF
mkdir LargeFilesDir
cd LargeFilesDir
cput tarball-in-GPFS : tarball-in-HPSS
lcd $SCRATCH/LargeFilesDir2/
cput -Ruph *
end
EOF
</pre>

* The commands below are equivalent, but we recommend that you always use full path, and organize the contents of HPSS, where the default HSI directory placement is $ARCHIVE:
<pre>
hsi cput tarball
hsi cput tarball : tarball
hsi cput $SCRATCH/tarball : $ARCHIVE/tarball
</pre>

* There are no known issues renaming files on-the-fly:
<pre>
hsi cput $SCRATCH/tarball1 : $ARCHIVE/tarball2
hsi cget $SCRATCH/tarball3 : $ARCHIVE/tarball2
</pre>

* However the syntax forms such as the ones below will fail, since they rename the directory paths.
<pre>
hsi cput -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir (FAILS)
OR
hsi cget -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir2 (FAILS)
OR
hsi cput -Ruph $SCRATCH/LargeFilesDir/* : $ARCHIVE/LargeFilesDir2 (FAILS)
OR
hsi cget -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir (FAILS)
</pre>

One workaround is the following 2-steps process, where you do a "lcd " in GPFS first, and recursively transfer the whole directory (-R), keeping the same name. You may use '-u' option to resume a previously disrupted session, and the '-p' to preserve timestamp, and '-h' to keep the links.
<pre>
hsi <<EOF
lcd $SCRATCH
cget -Ruph LargeFilesDir
end
EOF
</pre>

Another workaround is do a "lcd" into the GPFSpath first and a "cd" in the HPSSpath, but transfer the files individually with the '*' wild character. This option lets you change the directory name:
<pre>
hsi <<EOF
lcd $SCRATCH/LargeFilesDir
mkdir $ARCHIVE/LargeFilesDir2
cd $ARCHIVE/LargeFilesDir2
cput -Ruph *
end
EOF
</pre>

=== Documentation ===
Complete documentation on HSI is available from the Gleicher Enterprises links below. You may peruse those links and come with alternative syntax forms. You may even be already familiar with HPSS/HSI from other HPC facilities, that may or not have procedures similar to ours. HSI doesn't always work as expected when you go outside of our recommended syntax, so '''we strongly urge that you use the sample scripts we are providing as the basis''' for your job submissions
* [http://www.mgleicher.us/hsi/hsi_reference_manual_2/introduction.html HSI Introduction]
* [http://www.mgleicher.us/hsi/hsi_man_page.html man hsi]
* [http://support.scinet.utoronto.ca/wiki/index.php/HSI_help hsi help]
* [http://www.mgleicher.us/hsi/hsi-exit-codes.html exit codes]
'''Note:''' HSI returns the highest-numbered exit code, in case of multiple operations in the same hsi session. You may use '/scinet/niagara/bin/exit2msg $status' to translate those codes into intelligible messages

=== Typical Usage Scripts===
The most common interactions will be ''putting'' data into HPSS, examining the contents (ls,ish), and ''getting'' data back onto GPFS for inspection or analysis.

==== Sample '''data offload''' ====
<source lang="bash">
#!/bin/bash
# This script is named: data-offload.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J offload
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# individual tarballs already exist

/usr/local/bin/hsi -v <<EOF1
mkdir put-away
cd put-away
cput $SCRATCH/workarea/finished-job1.tar.gz : finished-job1.tar.gz
end
EOF1
status=$?
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

/usr/local/bin/hsi -v <<EOF2
mkdir put-away
cd put-away
cput $SCRATCH/workarea/finished-job2.tar.gz : finished-job2.tar.gz
end
EOF2
status=$?
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

trap - TERM INT
</source>

'''Note:''' as in the above example, we recommend that you capture the (highest-numbered) exit code for each hsi session independently. And remember, you may improve your exit code verbosity by adding the excerpt below to your scripts:
<source lang="bash">
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Sample '''data list''' ====
A very trivial way to list the contents of HPSS would be to just submit the HSI 'ls' command.
<source lang="bash">
#!/bin/bash
# This script is named: data-list.sh
#SBATCH -t 1:00:00
#SBATCH -p archiveshort
#SBATCH -N 1
#SBATCH -J hpss_ls
#SBATCH --mail-type=ALL

/usr/local/bin/hsi -v <<EOF
cd put-away
ls -R
end
EOF
</source>
''Warning: if you have a lot of files, the ls command will take a long time to complete. For instance, about 400,000 files can be listed in about an hour. Adjust the walltime accordingly, and be on the safe side.''

However, we provide a much more useful and convenient way to explore the contents of HPSS with the inventory shell [[ISH]]. This example creates an index of all the files in a user's portion of the namespace. The list is placed in the directory /home/$(whoami)/.ish_register that can be inspected from the login nodes.
<source lang="bash">
#!/bin/bash
# This script is named: data-list.sh
#SBATCH -t 1:00:00
#SBATCH -p archiveshort
#SBATCH -N 1
#SBATCH -J hpss_index
#SBATCH --mail-type=ALL

INDEX_DIR=$HOME/.ish_register
if ! [ -e "$INDEX_DIR" ]; then
mkdir -p $INDEX_DIR
fi

export ISHREGISTER="$INDEX_DIR"
/scinet/niagara/bin/ish hindex
</source>
''Note: the above warning on collecting the listing for many files applies here too.''

This index can be browsed or searched with ISH on the development nodes.
<source lang="bash">
hpss-archive02-ib:~$ /scinet/niagara/bin/ish ~/.ish_register/hpss.igz
[ish]hpss.igz> help
</source>

ISH is a powerful tool that is also useful for creating and browsing indices of tar and htar archives, so please look at the [[ISH|documentation]] or built in help.

==== Sample '''data recall''' ====
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_files
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT

mkdir -p $SCRATCH/recalled-from-hpss

# individual tarballs previously organized in HPSS inside the put-away-on-2010/ folder
hsi -v << EOF
cget $SCRATCH/recalled-from-hpss/Jan-2010-jobs.tar.gz : $ARCHIVE/put-away-on-2010/Jan-2010-jobs.tar.gz
cget $SCRATCH/recalled-from-hpss/Feb-2010-jobs.tar.gz : $ARCHIVE/put-away-on-2010/Feb-2010-jobs.tar.gz
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagar/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

We should emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do optimization, as in the following example:
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_files_optimized
#SBATCH --mail-type=AL

trap "echo 'Job script not completed';exit 129" TERM INT
mkdir -p $SCRATCH/recalled-from-hpss

# individual tarballs previously organized in HPSS inside the put-away-on-2010/ folder
hsi -v << EOF
lcd $SCRATCH/recalled-from-hpss/
cd $ARCHIVE/put-away-on-2010/
cget Jan-2010-jobs.tar.gz Feb-2010-jobs.tar.gz
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Sample '''transferring directories''' ====
Remember, it's not possible to rename directories on-the-fly:
<pre>
hsi cget -Ruph $SCRATCH/LargeFiles-recalled : $ARCHIVE/LargeFiles (FAILS)
</pre>

One workaround is transfer the whole directory (and sub-directories) recursively:
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_directories
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT

mkdir -p $SCRATCH/recalled

hsi -v << EOF
lcd $SCRATCH/recalled
cd $ARCHIVE/
cget -Ruph LargeFiles
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

Another workaround is to transfer files and subdirectories individually with the "*" wild character:
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_directories
#SBATCH --mail-type=AL

trap "echo 'Job script not completed';exit 129" TERM INT

mkdir -p $SCRATCH/LargeFiles-recalled

hsi -v << EOF
lcd $SCRATCH/LargeFiles-recalled
cd $ARCHIVE/LargeFiles
cget -Ruph *
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help''']

== '''[[ISH|ISH]]''' ==
=== [[ISH|Documentation and Usage]] ===

== '''File and directory management''' ==
=== Moving/renaming ===
* you may use 'mv' or 'cp' in the same way as the linux version.
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J deletion_script
#SBATCH --mail-type=ALL

echo "HPSS file and directory management"

trap "echo 'Job script not completed';exit 129" TERM INT

/usr/local/bin/hsi -v <<EOF1
mkdir $ARCHIVE/2011
mv $ARCHIVE/oldjobs $ARCHIVE/2011
cp -r $ARCHIVE/almostfinished/*done $ARCHIVE/2011
end
EOF1
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

=== Deletions ===
==== Recommendations ====
* Be careful with the use of 'cd' commands to non-existing directories before the 'rm' command. Results may be unpredictable
* Avoid the use of the stand alone wild character '''*'''. If necessary, whenever possible have it bound to common patterns, such as '*.tmp', so to limit unintentional mis-happens
* Avoid using relative paths, even the env variable $ARCHIVE. Better to explicitly expand the full paths in your scripts
* Avoid using recursive/looped deletion instructions on $SCRATCH contents from the archive job scripts. Even on $ARCHIVE contents, it may be better to do it as an independent job submission, after you have verified that the original ingestion into HPSS finished without any issues.

==== Typical example ====
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J deletion_script
#SBATCH --mail-type=ALL

echo "Deletion of an outdated directory tree into HPSS"

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that the initial directory in HPSS ($ARCHIVE) has the path explicitly expanded

/usr/local/bin/hsi -v <<EOF1
rm /archive/s/scinet/pinto/*.tmp
rm -R /archive/s/scinet/pinto/obsolete
end
EOF1
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Deleting with an interactive HSI session ====
* You may feel more comfortable acquiring an interactive shell, starting an HSI session and proceeding with your deletions that way. Keep in mind, you're restricted to 1H.

* After using the ''qsub -q archive -I'' command you'll get a standard shell prompt on an archive execution node (hpss-archive02), as you would on any compute node. However you will need to run '''HSI''' or '''HTAR''' to access resources on HPSS.

* HSI will give you a prompt very similar to a standard shell, where your can navigate around using commands such 'ls', 'cd', 'pwd', etc ... NOTE: not every bash command has an equivalent on HSI - for instance, you can not 'vi' or 'cat'.

<pre>
pinto@nia-login07:~$ salloc -p archiveshort -t 1:00:00
salloc: Granted job allocation 50359
salloc: Waiting for resource configuration
salloc: Nodes hpss-archive02-ib are ready for job

hpss-archive02-ib:~$ hsi
******************************************************************
* Welcome to HPSS@SciNet - High Perfomance Storage System *
* *
* INFO: THIS IS THE NEW 7.5.1 HPSS SYSTEM! *
* *
* Contact Information: support@scinet.utoronto.ca *
* NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead *
* CHECK THE INTEGRITY OF YOUR TARBALLS *
******************************************************************

[HSI]/archive/s/scinet/pinto-> rm -R junk

</pre>

== '''HPSS for the 'Watchmaker' ''' ==
=== Efficient alternative to htar ===
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J tar_create_tarball_in_hpss_with_hsi_by_piping
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

# When using a pipeline like this
set -o pipefail

# to put (cput will fail)
tar -c $SCRATCH/mydir | hsi put - : $ARCHIVE/mydir.tar
status=$?
if [ ! $status == 0 ]; then
echo 'TAR+HSI+piping returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# to immediately generate an index
ish hindex $ARCHIVE/mydir.tar
status=$?
if [ ! $status == 0 ]; then
echo 'ISH returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# to get
#cd $SCRATCH
#hsi cget - : $ARCHIVE/mydir.tar | tar -xv
#status=$?
# if [ ! $status == 0 ]; then
# echo 'TAR+HSI+piping returned non-zero code.'
# /scinet/niagara/bin/exit2msg $status
# exit $status
#else
# echo 'TRANSFER SUCCESSFUL'
#fi

trap - TERM INT

</source>
'''Notes:'''
* Combining commands in this fashion, besides being HPSS-friendly, should not be that noticeably slower than the recursive put with HSI that stores each file one by one. However, reading the files back from tape in this format will be many times faster. It would also overcome the current 68GB limit on the size of stored files that we have with htar.
* To top things off, we recommend indexing with ish (in the same script) immediately after the tarball creation , while it resides in the HPSS cache. It would be as if htar was used.
* To ensure that an error at any stage of the pipeline shows up in the returned status use: ''set -o pipefail'' (The default is to return the status of the last command in the pipeline and this is not what you want.)
* Optimal performance for aggregated transfers and allocation on tapes is obtained with [[Why not tarballs too large |tarballs of size 500GB or less]], whether ingested by htar or hsi ([[Why not tarballs too large | WHY?]]). Be sure to check the contents of the directory tree with 'du' for the total amount of data before sending them to the tar+HSI piping.

=== Multi-threaded gzip'ed compression with pigz ===
We compiled multi-threaded implementation of gzip called pigz (http://zlib.net/pigz/). It's now part of the "extras" module. It can also be used on any compute or devel nodes. This makes the execution of the previous version of the script much quicker than if you were to use 'tar -cfz'. In addition, by piggy-backing ISH to the end of the script, it will know what to do with the just created mydir.tar.gz compressed tarball.

<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J tar_create_compressed_tarball_in_hpss_with_hsi_by_piping
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

# When using a pipeline like this
set -o pipefail

load module extras

# to put (cput will fail)
tar -c $SCRATCH/mydir | pigz | hsi put - : $ARCHIVE/mydir.tar.gz
status=$?
if [ ! $status == 0 ]; then
echo 'TAR+PIGZ+HSI+piping returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

=== Content Verification ===

==== HTAR CRC checksums ====
Specifies that HTAR should generate CRC checksums when creating the archive.

<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_create_tarball_in_hpss_with_checksum_verification
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

cd $SCRATCH/workarea

# to put
htar -Humask=0137 -cpf $ARCHIVE/finished-job1.tar -Hcrc -Hverify=1 finished-job1/

# to get
#mkdir $SCRATCH/verification
#cd $SCRATCH/verification
#htar -Hcrc -xvpmf $ARCHIVE/finished-job1.tar

status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Current HSI version - Checksum built-in ====

MD5 is the standard Hashing Algorithm for the HSI build at SciNet. For hsi ingestions with the '-c on' option you should be able to query the md5 hash with the hsi command 'hashli'. That value is stored as an UDA (User Defined Attribute) for each file (a feature of HPSS starting with 7.4)

[http://www.mgleicher.us/GEL/hsi/hsi/hsi_reference_manual_2/checksum-feature.html More usage details here]

The checksum algorithm is very CPU-intensive. Although the checksum code is compiled with a high level of compiler optimization, transfer rates can be significantly reduced when checksum creation or verification is in effect. The amount of degradation in transfer rates depends on several factors, such as processor speed, network transfer speed, and speed of the local filesystem (GPFS).

<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J MD5_checksum_verified_transfer
#SBATCH --mail-type=ALL

thefile=<GPFSpath>
storedfile=<HPSSpath>

# Generate checksum on fly (-c on)
hsi -q put -c on $thefile : $storedfile
pid=$!

# Check the exit code of the HSI process
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# verify checksum
hsi lshash $storedfile
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# get the file back with checksum
hsi get -c on $storedfile
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Prior to HSI version 4.0.1.1 ====

This will checksum the contents of the HPSSpath against the original GPFSpath after the transfer has finished.
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J checksum_verified_transfer
#SBATCH --mail-type=ALL

thefile=<GPFSpath>
storedfile=<HPSSpath>

# Generate checksum on fly using a named pipe so that file is only read from GPFS once
mkfifo /tmp/NPIPE
cat $thefile | tee /tmp/NPIPE | hsi -q put - : $storedfile &
pid=$!
md5sum /tmp/NPIPE |tee /tmp/$fname.md5
rm -f /tmp/NPIPE

# Check the exit code of the HSI process
wait $pid
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# change filename to stdin in checksum file
sed -i.1 "s+/tmp/NPIPE+-+" /tmp/$fname.md5

# verify checksum
hsi -q get - : $storedfile | md5sum -c /tmp/$fname.md5
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

== '''Access to HPSS using Globus''' ==
* Please note that Globus access to HPSS is disabled until further notice, due to lack of version compatibility.

* You may now transfer data between SciNet's HPSS and an external source
* Follow the link below
https://globus.computecanada.ca
: Enter your Compute Canada username and password.
* In the 'File Transfer' tab, enter ''''Compute Canada HPSS'''' as one of the Endpoints. To authenticate this endpoint, enter your SciNet username and password.
* You may read more about Compute Canada's Globus Portal here:
https://docs.computecanada.ca/wiki/Globus

== '''Access to HPSS using SME''' ==
* Storage Made Easy - SME - is an Enterprise Cloud Portal adopted by SciNet to allow our users to access HPSS
* Best suitable for light transfers to/from your personal computer and to navigate your contents on HPSS
* Follow the link below using a web browser and login with your SicNet UserID and password. Under File Manager you will find the "'''SciNet HPSS'''" folder.
https://sme.scinet.utoronto.ca
* SME can be configured as a DropBox. To download the Free Cloud File Manager native to your OS (Windows, Mac, Linux, mobile), follow the link below:
https://www.storagemadeeasy.com/clients_and_tools/
Once you have downloaded and installed the Cloud Manager App, fill up the following information:
Server location
https://sme.scinet.utoronto.ca/api
* You may learn more about SME capabilities and features here:
https://www.storagemadeeasy.com/ownFileserver/
https://www.storagemadeeasy.com/pricing/#features (Enterprise)
https://storagemadeeasy.com/faq/

== '''User provided Content/Suggestions''' ==
== '''[[HPSS-by-pomes|Packing up large data sets and putting them on HPSS]]''' ==
(Pomés group recommendations)

[[Data Management|BACK TO Data Management]]

HPSS

2018-05-04T00:27:19Z

Pinto: /* Current HSI version - Checksum built-in */

{|align=right
|align=center|'''Topology Overview'''
|align=center|'''Submission Queue'''
|-
|[[Image:HPSS-overview.png|right|x200px]]
|[[Image:HPSS-queue2.png|right|x200px]]
|-
|
|
|-
|align=center|'''Servers Rack'''
|align=center|'''TS3500 Library'''
|-
|[[Image:HPSS-servers.png|right|x250px]]
|[[Image:HPSS-TS3500.png|right|x250px]]
|}

= '''High Performance Storage System''' =

The High Performance Storage System ([http://www.hpss-collaboration.org/index.shtml HPSS] [http://en.wikipedia.org/wiki/High_Performance_Storage_System wikipedia]) is a tape-backed hierarchical storage system that provides a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data can be returned to the active GPFS filesystem when it is needed.

Since this system is intended for large data storage, it is accessible only to groups who have been awarded storage space at SciNet beyond 5TB in the yearly RAC resource allocation round. However, upon request, any user may be awarded access to HPSS, up to 2TB per group, so that you may get familiar with the system (just email support@scinet.utoronto.ca)

Access and transfer of data into and out of HPSS is done under the control of the user, whose interaction is expected to be scripted and submitted as a batch job, using one or more of the following utilities:
* [http://www.mgleicher.us/GEL/hsi HSI] is a client with an ftp-like functionality which can be used to archive and retrieve large files. It is also useful for browsing the contents of HPSS.
* [http://www.mgleicher.us/GEL/htar HTAR] is a utility that creates tar formatted archives directly into HPSS. It also creates a separate index file (.idx) that can be accessed and browsed quickly.
* [https://support.scinet.utoronto.ca/wiki/index.php/ISH ISH] is a TUI utility that can perform an inventory of the files and directories in your tarballs.

We're currently running HPSS v 7.3.3 patch 6, and HSI/HTAR version 4.0.1.2

== '''Why should I use and trust HPSS?''' ==
* HPSS is a 25 year-old collaboration between IBM and the DoE labs in the US, and is used by about 45 facilities in the [http://www.top500.org “Top 500”] HPC list (plus some black-sites).
* Over 2.5 ExaBytes of combined storage world-wide.
* The top 3 sites in the World report (fall 2017) having 360PB, 220PB and 125PB in production (ECMWF, UKMO and BNL)
* Environment Canada also adopted HPSS in 2017 to store Nav Canada data as well as to serve as their own archive. Currently has 2 X 100PB capacity installed.
* The SciNet HPSS system has been providing nearline capacity for important research data in Canada since early 2011, already at 10PB levels in 2018
* Very reliable, data redundancy and data insurance built-in (dual copies of everything are kept on tapes at SciNet)
* Data on cache and tapes can be geo-distributed for further resilience and HA.
* Highly scalable; current performance at SciNet - after a modest upgrade in 2017 - Ingest: ~150 TB/day, Recall: ~45 TB/day (aggregated).
* HSI/HTAR clients also very reliable and used on several HPSS sites. ISH was written at SciNet.
* [[Media:HPSS_rationale_SNUG.pdf|HPSS fits well with the Storage Capacity Expansion Plan at SciNet]] (pdf presentation)

== '''Guidelines''' ==
* Expanded storage capacity is provided on tape -- a media that is not suited for storing small files. Files smaller than ~200MB should be grouped into tarballs with tar or htar.
* Optimal performance for aggregated transfers and allocation on tapes is obtained with [[Why not tarballs too large |tarballs of size 500GB or less]], whether ingested by htar or hsi ([[Why not tarballs too large | WHY?]])
* We strongly urge that you use the sample scripts we are providing as the basis for your job submissions.
* Make sure to check the application's exit code and returned logs for errors after any data transfer or tarball creation process

== '''New to the System?''' ==
The first step is to email scinet support and request an HPSS account (or else you will get "Error - authentication/initialization failed" and 71 exit codes).

THIS set of instructions on the wiki is the best and most compressed "manual" we have. It may seem a bit overwhelming at first, because of all the job script templates we make available below (they are here so you don't have to think
too much, just copy and paste), but if you approach the index at the top as a "case switch" mechanism for what you intend to do, everything falls in place.

Try this sequence:

1) [https://wiki.scinet.utoronto.ca/wiki/index.php/HPSS#Access_Through_an_Interactive_HSI_session take a look around HPSS using an interactive HSI session]

(most linux shell commands have an equivalent in HPSS)

2) [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_tarball_create archive a small test directory using HTAR]

2a) use step 1) to see what happened

3) [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_data_offload archive a file using hsi]

3a) use step 1) to see what happened

4) [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_transferring_directories archive a small test directory using HSI]

4a) use step 1) to see what happened

5) now try the other cases and so on. In a couple of hours you'll be in pretty good shape.

== '''Bridge between BGQ and HPSS''' ==

At this time BGQ users will have to migrate data to Niagara scratch prior to transferring it to HPSS. We are looking for ways to improve this workflow.

== '''Access Through the Queue System''' ==
All access to the archive system is done through the [https://docs.computecanada.ca/wiki/Niagara_Quickstart#Submitting_jobs NIA queue system].

* Job submissions should be done to the 'archivelong' queue or the 'archiveshort'
* Short jobs are limited to 1H walltime by default. Long jobs (> 1H) are limited to 72H walltime.
* Users are limited to only 2 long jobs and 2 short jobs at the same time, and 10 jobs total on the each queue.
* There can only be 5 long jobs running at any given time overall. Remaining submissions will be placed on hold for the time being. So far we have not seen a need for overall limit on short jobs.

The status of pending jobs can be monitored with squeue specifying the archive queue:
<pre>
squeue -p archiveshort
OR
squeue -p archivelong
</pre>

== '''Access Through an Interactive HSI session''' ==
* You may want to acquire an interactive shell, start an HSI session and navigate the archive naming-space. Keep in mind, you're restricted to 1H.

<pre>
pinto@nia-login07:~$ salloc -p archiveshort -t 1:00:00
salloc: Granted job allocation 50918
salloc: Waiting for resource configuration
salloc: Nodes hpss-archive02-ib are ready for job
hpss-archive02-ib:~$

hpss-archive02-ib:~$ hsi (DON'T FORGET TO START HSI)
******************************************************************
* Welcome to HPSS@SciNet - High Perfomance Storage System *
* *
* INFO: THIS IS THE NEW 7.5.1 HPSS SYSTEM! *
* *
* Contact Information: support@scinet.utoronto.ca *
* NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead *
* CHECK THE INTEGRITY OF YOUR TARBALLS *
******************************************************************
[HSI]/archive/s/scinet/pinto-> ls

[HSI]/archive/s/scinet/pinto-> cd <some directory>
</pre>

=== Scripted File Transfers ===
File transfers in and out of the HPSS should be scripted into jobs and submitted to the ''archivelong'' queue or the ''archiveshort'' . See generic example below:

<source lang="bash">
#!/bin/bash -l
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_create_tarball_in_hpss
#SBATCH --mail-type=ALL

echo "Creating a htar of finished-job1/ directory tree into HPSS"

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

DEST=$ARCHIVE/finished-job1.tar

# htar WILL overwrite an existing file with the same name so check beforehand.

hsi ls $DEST &> /dev/null
status=$?

if [ $status == 0 ]; then
echo 'File $DEST already exists. Nothing has been done'
exit 1
fi

cd $SCRATCH/workarea/
htar -Humask=0137 -cpf $ARCHIVE/finished-job1.tar finished-job1/
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>
'''Note:''' Always trap the execution of your jobs for abnormal terminations, and be sure to return the exit code

=== Job Dependencies ===

Typically data will be recalled to /scratch when it is needed for analysis. Job dependencies can be constructed so that analysis jobs wait in the queue for data recalls before starting. The qsub flag is
<pre>
--dependency=<type:JOBID>
</pre>
where JOBID is the job number of the archive recalling job that must finish successfully before the analysis job can start.

Here is a short cut for generating the dependency (lookup [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_data_recall data-recall.sh samples]):
<pre>
hpss-archive02-ib:~$ sbatch $(sbatch data-recall.sh | awk {print "--dependency=afterany:"$1}') job-to-work-on-recalled-data.sh
</pre>

== '''HTAR''' ==
''' Please aggregate small files (<~200MB) into tarballs or htar files. '''

''' [[Why not tarballs too large |Keep your tarballs to size 500GB or less]], whether ingested by htar or hsi ([[Why not tarballs too large | WHY?]])'''

HTAR is a utility that is used for aggregating a set of files and directories, by using a sophisticated multithreaded buffering scheme to write files directly from GPFS into HPSS, creating an archive file that conforms to the POSIX TAR specification, thereby achieving a high rate of performance. HTAR does not do gzip compression, however it already has a built-in checksum algorithm.

'''Caution'''
* Files larger than 68 GB cannot be stored in an HTAR archive. If you attempt to start a transfer with any files larger than 68GB the whole HTAR session will fail, and you'll get a notification listing all those files, so that you can transfer them with HSI.
* Files with pathnames too long will be skipped (greater than 100 characters), so as to conform with TAR protocol [[(POSIX 1003.1 USTAR)]] -- Note that the HTAR will erroneously indicate success, however will produce exit code 70. For now, you can check for this type of error by "grep Warning my.output" after the job has completed.
* Unlike with cput/cget in HSI, "prompt before overwrite", this is not the default with (h)tar. Be careful not to unintentionally overwrite a previous htar destination file in HPSS. There could be a similar situation when extracting material back into GPFS and overwriting the originals. Be sure to double-check the logic in your scripts.
* Check the HTAR exit code and log file before removing any files from the GPFS active filesystems.

=== HTAR Usage ===
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, and preserve mask attributes (-p), enter:
<pre>
htar -cpf files.tar file1 file2
OR
htar -cpf $ARCHIVE/files.tar file1 file2
</pre>

* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:
<pre>
htar -cpf subdirA.tar subdirA
</pre>

* To extract all files from the archive file called ''proj1.tar'' in HPSS into the ''project1/src'' directory in GPFS, and use the time of extraction as the modification time, enter:
<pre>
cd project1/src
htar -xpmf proj1.tar
</pre>

* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):
<pre>
htar -vtf out.tar
</pre>

* To ensure that both the htar and the .idx files have read permissions to other members in your group use the umask option
<pre>
htar -Humask=0137 ....
</pre>

For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online

==== Sample tarball create ====
<source lang="bash">
#!/bin/bash -l
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_create_tarball_in_hpss
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be /archive/$(id -gn)/$(whoami)/

DEST=$ARCHIVE/finished-job1.tar

# htar WILL overwrite an existing file with the same name so check beforehand.

hsi ls $DEST &> /dev/null
status=$?

if [ $status == 0 ]; then
echo 'File $DEST already exists. Nothing has been done'
exit 1
fi

cd $SCRATCH/workarea/
htar -Humask=0137 -cpf $DEST finished-job1/
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

'''Note:''' If you attempt to start a transfer with any files larger than 68GB the whole HTAR session will fail, and you'll get a notification listing all those files, so that you can transfer them with HSI.
<pre>
----------------------------------------
INFO: File too large for htar to handle: finished-job1/file1 (86567185745 bytes)
INFO: File too large for htar to handle: finished-job1/file2 (71857244579 bytes)
ERROR: 2 oversize member files found - please correct and retry
ERROR: [FATAL] error(s) generating filename list
HTAR: HTAR FAILED
###WARNING htar returned non-zero exit status
</pre>

==== Sample tarball list ====
<source lang="bash">
#!/bin/bash -l
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_list_tarball_in_hpss
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

DEST=$ARCHIVE/finished-job1.tar

htar -tvf $DEST
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Sample tarball extract ====
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_extract_tarball_from_hpss
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

cd $SCRATCH/recalled-from-hpss
htar -xpmf $ARCHIVE/finished-job1.tar
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

== '''HSI''' ==

HSI may be the primary client with which some users will interact with HPSS. It provides an ftp-like interface for archiving and retrieving tarballs or [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_transferring_directories directory trees]. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents in HPSS. The most commonly used commands will be:
{|border="1" cellpadding="10" cellspacing="0"
|-
| cput
| Conditionally saves or replaces a HPSSpath file to GPFSpath if the GPFS version is new or has been updated
cput [options] GPFSpath [: HPSSpath]
|-
| cget
| Conditionally retrieves a copy of a file from HPSS to GPFS only if a GPFS version does not already exist.
cget [options] [GPFSpath :] HPSSpath
|-
| cd,mkdir,ls,rm,mv
| Operate as one would expect on the contents of HPSS.
|-
| lcd,lls
| ''Local'' commands to GPFS
|}

*There are 3 distinctions about HSI that you should keep in mind, and that can generate a bit of confusion when you're first learning how to use it:
** HSI doesn't currently support renaming directories paths during transfers on-the-fly, therefore the syntax for cput/cget may not work as one would expect in some scenarios, requiring some workarounds.
** HSI has an operator ":" which separates the GPFSpath and HPSSpath, and must be surrounded by whitespace (one or more space characters)
** The order for referring to files in HSI syntax is different from FTP. In HSI the general format is always the same, GPFS first, HPSS second, cput or cget:
<pre>
GPFSfile : HPSSfile
</pre>
For example, when using HSI to store the tarball file from GPFS into HPSS, then recall it to GPFS, the following commands could be used:
<pre>
cput tarball-in-GPFS : tarball-in-HPSS
cget tarball-recalled : tarball-in-HPSS
</pre>

unlike with FTP, where the following syntax would be used:
<pre>
put tarball-in-GPFS tarball-in-HPSS
get tarball-in-HPSS tarball-recalled
</pre>
* Simple commands can be executed on a single line.
<pre>
hsi "mkdir LargeFilesDir; cd LargeFilesDir; cput tarball-in-GPFS : tarball-in-HPSS"
</pre>

* More complex sequences can be performed using an except such as this:
<pre>
hsi <<EOF
mkdir LargeFilesDir
cd LargeFilesDir
cput tarball-in-GPFS : tarball-in-HPSS
lcd $SCRATCH/LargeFilesDir2/
cput -Ruph *
end
EOF
</pre>

* The commands below are equivalent, but we recommend that you always use full path, and organize the contents of HPSS, where the default HSI directory placement is $ARCHIVE:
<pre>
hsi cput tarball
hsi cput tarball : tarball
hsi cput $SCRATCH/tarball : $ARCHIVE/tarball
</pre>

* There are no known issues renaming files on-the-fly:
<pre>
hsi cput $SCRATCH/tarball1 : $ARCHIVE/tarball2
hsi cget $SCRATCH/tarball3 : $ARCHIVE/tarball2
</pre>

* However the syntax forms such as the ones below will fail, since they rename the directory paths.
<pre>
hsi cput -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir (FAILS)
OR
hsi cget -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir2 (FAILS)
OR
hsi cput -Ruph $SCRATCH/LargeFilesDir/* : $ARCHIVE/LargeFilesDir2 (FAILS)
OR
hsi cget -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir (FAILS)
</pre>

One workaround is the following 2-steps process, where you do a "lcd " in GPFS first, and recursively transfer the whole directory (-R), keeping the same name. You may use '-u' option to resume a previously disrupted session, and the '-p' to preserve timestamp, and '-h' to keep the links.
<pre>
hsi <<EOF
lcd $SCRATCH
cget -Ruph LargeFilesDir
end
EOF
</pre>

Another workaround is do a "lcd" into the GPFSpath first and a "cd" in the HPSSpath, but transfer the files individually with the '*' wild character. This option lets you change the directory name:
<pre>
hsi <<EOF
lcd $SCRATCH/LargeFilesDir
mkdir $ARCHIVE/LargeFilesDir2
cd $ARCHIVE/LargeFilesDir2
cput -Ruph *
end
EOF
</pre>

=== Documentation ===
Complete documentation on HSI is available from the Gleicher Enterprises links below. You may peruse those links and come with alternative syntax forms. You may even be already familiar with HPSS/HSI from other HPC facilities, that may or not have procedures similar to ours. HSI doesn't always work as expected when you go outside of our recommended syntax, so '''we strongly urge that you use the sample scripts we are providing as the basis''' for your job submissions
* [http://www.mgleicher.us/hsi/hsi_reference_manual_2/introduction.html HSI Introduction]
* [http://www.mgleicher.us/hsi/hsi_man_page.html man hsi]
* [http://support.scinet.utoronto.ca/wiki/index.php/HSI_help hsi help]
* [http://www.mgleicher.us/hsi/hsi-exit-codes.html exit codes]
'''Note:''' HSI returns the highest-numbered exit code, in case of multiple operations in the same hsi session. You may use '/scinet/niagara/bin/exit2msg $status' to translate those codes into intelligible messages

=== Typical Usage Scripts===
The most common interactions will be ''putting'' data into HPSS, examining the contents (ls,ish), and ''getting'' data back onto GPFS for inspection or analysis.

==== Sample '''data offload''' ====
<source lang="bash">
#!/bin/bash
# This script is named: data-offload.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J offload
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# individual tarballs already exist

/usr/local/bin/hsi -v <<EOF1
mkdir put-away
cd put-away
cput $SCRATCH/workarea/finished-job1.tar.gz : finished-job1.tar.gz
end
EOF1
status=$?
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

/usr/local/bin/hsi -v <<EOF2
mkdir put-away
cd put-away
cput $SCRATCH/workarea/finished-job2.tar.gz : finished-job2.tar.gz
end
EOF2
status=$?
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

trap - TERM INT
</source>

'''Note:''' as in the above example, we recommend that you capture the (highest-numbered) exit code for each hsi session independently. And remember, you may improve your exit code verbosity by adding the excerpt below to your scripts:
<source lang="bash">
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Sample '''data list''' ====
A very trivial way to list the contents of HPSS would be to just submit the HSI 'ls' command.
<source lang="bash">
#!/bin/bash
# This script is named: data-list.sh
#SBATCH -t 1:00:00
#SBATCH -p archiveshort
#SBATCH -N 1
#SBATCH -J hpss_ls
#SBATCH --mail-type=ALL

/usr/local/bin/hsi -v <<EOF
cd put-away
ls -R
end
EOF
</source>
''Warning: if you have a lot of files, the ls command will take a long time to complete. For instance, about 400,000 files can be listed in about an hour. Adjust the walltime accordingly, and be on the safe side.''

However, we provide a much more useful and convenient way to explore the contents of HPSS with the inventory shell [[ISH]]. This example creates an index of all the files in a user's portion of the namespace. The list is placed in the directory /home/$(whoami)/.ish_register that can be inspected from the login nodes.
<source lang="bash">
#!/bin/bash
# This script is named: data-list.sh
#SBATCH -t 1:00:00
#SBATCH -p archiveshort
#SBATCH -N 1
#SBATCH -J hpss_index
#SBATCH --mail-type=ALL

INDEX_DIR=$HOME/.ish_register
if ! [ -e "$INDEX_DIR" ]; then
mkdir -p $INDEX_DIR
fi

export ISHREGISTER="$INDEX_DIR"
/scinet/niagara/bin/ish hindex
</source>
''Note: the above warning on collecting the listing for many files applies here too.''

This index can be browsed or searched with ISH on the development nodes.
<source lang="bash">
hpss-archive02-ib:~$ /scinet/niagara/bin/ish ~/.ish_register/hpss.igz
[ish]hpss.igz> help
</source>

ISH is a powerful tool that is also useful for creating and browsing indices of tar and htar archives, so please look at the [[ISH|documentation]] or built in help.

==== Sample '''data recall''' ====
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_files
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT

mkdir -p $SCRATCH/recalled-from-hpss

# individual tarballs previously organized in HPSS inside the put-away-on-2010/ folder
hsi -v << EOF
cget $SCRATCH/recalled-from-hpss/Jan-2010-jobs.tar.gz : $ARCHIVE/put-away-on-2010/Jan-2010-jobs.tar.gz
cget $SCRATCH/recalled-from-hpss/Feb-2010-jobs.tar.gz : $ARCHIVE/put-away-on-2010/Feb-2010-jobs.tar.gz
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagar/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

We should emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do optimization, as in the following example:
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_files_optimized
#SBATCH --mail-type=AL

trap "echo 'Job script not completed';exit 129" TERM INT
mkdir -p $SCRATCH/recalled-from-hpss

# individual tarballs previously organized in HPSS inside the put-away-on-2010/ folder
hsi -v << EOF
lcd $SCRATCH/recalled-from-hpss/
cd $ARCHIVE/put-away-on-2010/
cget Jan-2010-jobs.tar.gz Feb-2010-jobs.tar.gz
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Sample '''transferring directories''' ====
Remember, it's not possible to rename directories on-the-fly:
<pre>
hsi cget -Ruph $SCRATCH/LargeFiles-recalled : $ARCHIVE/LargeFiles (FAILS)
</pre>

One workaround is transfer the whole directory (and sub-directories) recursively:
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_directories
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT

mkdir -p $SCRATCH/recalled

hsi -v << EOF
lcd $SCRATCH/recalled
cd $ARCHIVE/
cget -Ruph LargeFiles
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

Another workaround is to transfer files and subdirectories individually with the "*" wild character:
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_directories
#SBATCH --mail-type=AL

trap "echo 'Job script not completed';exit 129" TERM INT

mkdir -p $SCRATCH/LargeFiles-recalled

hsi -v << EOF
lcd $SCRATCH/LargeFiles-recalled
cd $ARCHIVE/LargeFiles
cget -Ruph *
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help''']

== '''[[ISH|ISH]]''' ==
=== [[ISH|Documentation and Usage]] ===

== '''File and directory management''' ==
=== Moving/renaming ===
* you may use 'mv' or 'cp' in the same way as the linux version.
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J deletion_script
#SBATCH --mail-type=ALL

echo "HPSS file and directory management"

trap "echo 'Job script not completed';exit 129" TERM INT

/usr/local/bin/hsi -v <<EOF1
mkdir $ARCHIVE/2011
mv $ARCHIVE/oldjobs $ARCHIVE/2011
cp -r $ARCHIVE/almostfinished/*done $ARCHIVE/2011
end
EOF1
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

=== Deletions ===
==== Recommendations ====
* Be careful with the use of 'cd' commands to non-existing directories before the 'rm' command. Results may be unpredictable
* Avoid the use of the stand alone wild character '''*'''. If necessary, whenever possible have it bound to common patterns, such as '*.tmp', so to limit unintentional mis-happens
* Avoid using relative paths, even the env variable $ARCHIVE. Better to explicitly expand the full paths in your scripts
* Avoid using recursive/looped deletion instructions on $SCRATCH contents from the archive job scripts. Even on $ARCHIVE contents, it may be better to do it as an independent job submission, after you have verified that the original ingestion into HPSS finished without any issues.

==== Typical example ====
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J deletion_script
#SBATCH --mail-type=ALL

echo "Deletion of an outdated directory tree into HPSS"

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that the initial directory in HPSS ($ARCHIVE) has the path explicitly expanded

/usr/local/bin/hsi -v <<EOF1
rm /archive/s/scinet/pinto/*.tmp
rm -R /archive/s/scinet/pinto/obsolete
end
EOF1
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Deleting with an interactive HSI session ====
* You may feel more comfortable acquiring an interactive shell, starting an HSI session and proceeding with your deletions that way. Keep in mind, you're restricted to 1H.

* After using the ''qsub -q archive -I'' command you'll get a standard shell prompt on an archive execution node (hpss-archive02), as you would on any compute node. However you will need to run '''HSI''' or '''HTAR''' to access resources on HPSS.

* HSI will give you a prompt very similar to a standard shell, where your can navigate around using commands such 'ls', 'cd', 'pwd', etc ... NOTE: not every bash command has an equivalent on HSI - for instance, you can not 'vi' or 'cat'.

<pre>
pinto@nia-login07:~$ salloc -p archiveshort -t 1:00:00
salloc: Granted job allocation 50359
salloc: Waiting for resource configuration
salloc: Nodes hpss-archive02-ib are ready for job

hpss-archive02-ib:~$ hsi
******************************************************************
* Welcome to HPSS@SciNet - High Perfomance Storage System *
* *
* INFO: THIS IS THE NEW 7.5.1 HPSS SYSTEM! *
* *
* Contact Information: support@scinet.utoronto.ca *
* NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead *
* CHECK THE INTEGRITY OF YOUR TARBALLS *
******************************************************************

[HSI]/archive/s/scinet/pinto-> rm -R junk

</pre>

== '''HPSS for the 'Watchmaker' ''' ==
=== Efficient alternative to htar ===
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J tar_create_tarball_in_hpss_with_hsi_by_piping
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

# When using a pipeline like this
set -o pipefail

# to put (cput will fail)
tar -c $SCRATCH/mydir | hsi put - : $ARCHIVE/mydir.tar
status=$?
if [ ! $status == 0 ]; then
echo 'TAR+HSI+piping returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# to immediately generate an index
ish hindex $ARCHIVE/mydir.tar
status=$?
if [ ! $status == 0 ]; then
echo 'ISH returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# to get
#cd $SCRATCH
#hsi cget - : $ARCHIVE/mydir.tar | tar -xv
#status=$?
# if [ ! $status == 0 ]; then
# echo 'TAR+HSI+piping returned non-zero code.'
# /scinet/niagara/bin/exit2msg $status
# exit $status
#else
# echo 'TRANSFER SUCCESSFUL'
#fi

trap - TERM INT

</source>
'''Notes:'''
* Combining commands in this fashion, besides being HPSS-friendly, should not be that noticeably slower than the recursive put with HSI that stores each file one by one. However, reading the files back from tape in this format will be many times faster. It would also overcome the current 68GB limit on the size of stored files that we have with htar.
* To top things off, we recommend indexing with ish (in the same script) immediately after the tarball creation , while it resides in the HPSS cache. It would be as if htar was used.
* To ensure that an error at any stage of the pipeline shows up in the returned status use: ''set -o pipefail'' (The default is to return the status of the last command in the pipeline and this is not what you want.)
* Optimal performance for aggregated transfers and allocation on tapes is obtained with [[Why not tarballs too large |tarballs of size 500GB or less]], whether ingested by htar or hsi ([[Why not tarballs too large | WHY?]]). Be sure to check the contents of the directory tree with 'du' for the total amount of data before sending them to the tar+HSI piping.

=== Multi-threaded gzip'ed compression with pigz ===
We compiled multi-threaded implementation of gzip called pigz (http://zlib.net/pigz/). It's now part of the "extras" module. It can also be used on any compute or devel nodes. This makes the execution of the previous version of the script much quicker than if you were to use 'tar -cfz'. In addition, by piggy-backing ISH to the end of the script, it will know what to do with the just created mydir.tar.gz compressed tarball.

<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J tar_create_compressed_tarball_in_hpss_with_hsi_by_piping
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

# When using a pipeline like this
set -o pipefail

load module extras

# to put (cput will fail)
tar -c $SCRATCH/mydir | pigz | hsi put - : $ARCHIVE/mydir.tar.gz
status=$?
if [ ! $status == 0 ]; then
echo 'TAR+PIGZ+HSI+piping returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

=== Content Verification ===

==== HTAR CRC checksums ====
Specifies that HTAR should generate CRC checksums when creating the archive.

<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_create_tarball_in_hpss_with_checksum_verification
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

cd $SCRATCH/workarea

# to put
htar -Humask=0137 -cpf $ARCHIVE/finished-job1.tar -Hcrc -Hverify=1 finished-job1/

# to get
#mkdir $SCRATCH/verification
#cd $SCRATCH/verification
#htar -Hcrc -xvpmf $ARCHIVE/finished-job1.tar

status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Current HSI version - Checksum built-in ====

MD5 is the standard Hashing Algorithm for the HSI build at SciNet. For hsi ingestions with the '-c on' option you should be able to query the md5 hash with the hsi command 'hashli'. That value is stored as an UDA (User Defined Attribute) for each file (a feature of HPSS starting with 7.4)

[http://www.mgleicher.us/GEL/hsi/hsi/hsi_reference_manual_2/checksum-feature.html More usage details here]

The checksum algorithm is very CPU-intensive. Although the checksum code is compiled with a high level of compiler optimization, transfer rates can be significantly reduced when checksum creation or verification is in effect. The amount of degradation in transfer rates depends on several factors, such as processor speed, network transfer speed, and speed of the local filesystem (GPFS).

<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J MD5_checksum_verified_transfer
#SBATCH --mail-type=ALL

thefile=<GPFSpath>
storedfile=<HPSSpath>

# Generate checksum on fly (-c on)
hsi -q put -c on $thefile : $storedfile
pid=$!

# Check the exit code of the HSI process
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# verify checksum
hsi lshash $storedfile
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# get the file back with checksum
hsi get -c on $storedfile
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Prior to HSI version 4.0.1.1 ====

This will checksum the contents of the HPSSpath against the original GPFSpath after the transfer has finished.
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J checksum_verified_transfer
#SBATCH --mail-type=ALL

thefile=<GPFSpath>
storedfile=<HPSSpath>

# Generate checksum on fly using a named pipe so that file is only read from GPFS once
mkfifo /tmp/NPIPE
cat $thefile | tee /tmp/NPIPE | hsi -q put - : $storedfile &
pid=$!
md5sum /tmp/NPIPE |tee /tmp/$fname.md5
rm -f /tmp/NPIPE

# Check the exit code of the HSI process
wait $pid
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# change filename to stdin in checksum file
sed -i.1 "s+/tmp/NPIPE+-+" /tmp/$fname.md5

# verify checksum
hsi -q get - : $storedfile | md5sum -c /tmp/$fname.md5
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

== '''Access to HPSS using Globus''' ==
* Please note that Globus access to HPSS is disabled until further notice, due to lack of version compatibility.

* You may now transfer data between SciNet's HPSS and an external source
* Follow the link below
https://globus.computecanada.ca
: Enter your Compute Canada username and password.
* In the 'File Transfer' tab, enter ''''Compute Canada HPSS'''' as one of the Endpoints. To authenticate this endpoint, enter your SciNet username and password.
* You may read more about Compute Canada's Globus Portal here:
https://docs.computecanada.ca/wiki/Globus

== '''Access to HPSS using SME''' ==
* Storage Made Easy - SME - is an Enterprise Cloud Portal adopted by SciNet to allow our users to access HPSS
* Best suitable for light transfers to/from your personal computer and to navigate your contents on HPSS
* Follow the link below using a web browser and login with your SicNet UserID and password. Under File Manager you will find the "'''SciNet HPSS'''" folder.
https://sme.scinet.utoronto.ca
* SME can be configured as a DropBox. To download the Free Cloud File Manager native to your OS (Windows, Mac, Linux, mobile), follow the link below:
https://www.storagemadeeasy.com/clients_and_tools/
Once you have downloaded and installed the Cloud Manager App, fill up the following information:
Server location
https://sme.scinet.utoronto.ca/api
* You may learn more about SME capabilities and features here:
https://www.storagemadeeasy.com/ownFileserver/
https://www.storagemadeeasy.com/pricing/#features (Enterprise)
https://storagemadeeasy.com/faq/

== '''User provided Content/Suggestions''' ==
=== '''[[HPSS-by-pomes|Packing up large data sets and putting them on HPSS]]''' ===
(Pomés group recommendations)

[[Data Management|BACK TO Data Management]]

HPSS

2018-05-04T00:24:15Z

Pinto: /* Deleting with an interactive HSI session */

{|align=right
|align=center|'''Topology Overview'''
|align=center|'''Submission Queue'''
|-
|[[Image:HPSS-overview.png|right|x200px]]
|[[Image:HPSS-queue2.png|right|x200px]]
|-
|
|
|-
|align=center|'''Servers Rack'''
|align=center|'''TS3500 Library'''
|-
|[[Image:HPSS-servers.png|right|x250px]]
|[[Image:HPSS-TS3500.png|right|x250px]]
|}

= '''High Performance Storage System''' =

The High Performance Storage System ([http://www.hpss-collaboration.org/index.shtml HPSS] [http://en.wikipedia.org/wiki/High_Performance_Storage_System wikipedia]) is a tape-backed hierarchical storage system that provides a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data can be returned to the active GPFS filesystem when it is needed.

Since this system is intended for large data storage, it is accessible only to groups who have been awarded storage space at SciNet beyond 5TB in the yearly RAC resource allocation round. However, upon request, any user may be awarded access to HPSS, up to 2TB per group, so that you may get familiar with the system (just email support@scinet.utoronto.ca)

Access and transfer of data into and out of HPSS is done under the control of the user, whose interaction is expected to be scripted and submitted as a batch job, using one or more of the following utilities:
* [http://www.mgleicher.us/GEL/hsi HSI] is a client with an ftp-like functionality which can be used to archive and retrieve large files. It is also useful for browsing the contents of HPSS.
* [http://www.mgleicher.us/GEL/htar HTAR] is a utility that creates tar formatted archives directly into HPSS. It also creates a separate index file (.idx) that can be accessed and browsed quickly.
* [https://support.scinet.utoronto.ca/wiki/index.php/ISH ISH] is a TUI utility that can perform an inventory of the files and directories in your tarballs.

We're currently running HPSS v 7.3.3 patch 6, and HSI/HTAR version 4.0.1.2

== '''Why should I use and trust HPSS?''' ==
* HPSS is a 25 year-old collaboration between IBM and the DoE labs in the US, and is used by about 45 facilities in the [http://www.top500.org “Top 500”] HPC list (plus some black-sites).
* Over 2.5 ExaBytes of combined storage world-wide.
* The top 3 sites in the World report (fall 2017) having 360PB, 220PB and 125PB in production (ECMWF, UKMO and BNL)
* Environment Canada also adopted HPSS in 2017 to store Nav Canada data as well as to serve as their own archive. Currently has 2 X 100PB capacity installed.
* The SciNet HPSS system has been providing nearline capacity for important research data in Canada since early 2011, already at 10PB levels in 2018
* Very reliable, data redundancy and data insurance built-in (dual copies of everything are kept on tapes at SciNet)
* Data on cache and tapes can be geo-distributed for further resilience and HA.
* Highly scalable; current performance at SciNet - after a modest upgrade in 2017 - Ingest: ~150 TB/day, Recall: ~45 TB/day (aggregated).
* HSI/HTAR clients also very reliable and used on several HPSS sites. ISH was written at SciNet.
* [[Media:HPSS_rationale_SNUG.pdf|HPSS fits well with the Storage Capacity Expansion Plan at SciNet]] (pdf presentation)

== '''Guidelines''' ==
* Expanded storage capacity is provided on tape -- a media that is not suited for storing small files. Files smaller than ~200MB should be grouped into tarballs with tar or htar.
* Optimal performance for aggregated transfers and allocation on tapes is obtained with [[Why not tarballs too large |tarballs of size 500GB or less]], whether ingested by htar or hsi ([[Why not tarballs too large | WHY?]])
* We strongly urge that you use the sample scripts we are providing as the basis for your job submissions.
* Make sure to check the application's exit code and returned logs for errors after any data transfer or tarball creation process

== '''New to the System?''' ==
The first step is to email scinet support and request an HPSS account (or else you will get "Error - authentication/initialization failed" and 71 exit codes).

THIS set of instructions on the wiki is the best and most compressed "manual" we have. It may seem a bit overwhelming at first, because of all the job script templates we make available below (they are here so you don't have to think
too much, just copy and paste), but if you approach the index at the top as a "case switch" mechanism for what you intend to do, everything falls in place.

Try this sequence:

1) [https://wiki.scinet.utoronto.ca/wiki/index.php/HPSS#Access_Through_an_Interactive_HSI_session take a look around HPSS using an interactive HSI session]

(most linux shell commands have an equivalent in HPSS)

2) [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_tarball_create archive a small test directory using HTAR]

2a) use step 1) to see what happened

3) [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_data_offload archive a file using hsi]

3a) use step 1) to see what happened

4) [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_transferring_directories archive a small test directory using HSI]

4a) use step 1) to see what happened

5) now try the other cases and so on. In a couple of hours you'll be in pretty good shape.

== '''Bridge between BGQ and HPSS''' ==

At this time BGQ users will have to migrate data to Niagara scratch prior to transferring it to HPSS. We are looking for ways to improve this workflow.

== '''Access Through the Queue System''' ==
All access to the archive system is done through the [https://docs.computecanada.ca/wiki/Niagara_Quickstart#Submitting_jobs NIA queue system].

* Job submissions should be done to the 'archivelong' queue or the 'archiveshort'
* Short jobs are limited to 1H walltime by default. Long jobs (> 1H) are limited to 72H walltime.
* Users are limited to only 2 long jobs and 2 short jobs at the same time, and 10 jobs total on the each queue.
* There can only be 5 long jobs running at any given time overall. Remaining submissions will be placed on hold for the time being. So far we have not seen a need for overall limit on short jobs.

The status of pending jobs can be monitored with squeue specifying the archive queue:
<pre>
squeue -p archiveshort
OR
squeue -p archivelong
</pre>

== '''Access Through an Interactive HSI session''' ==
* You may want to acquire an interactive shell, start an HSI session and navigate the archive naming-space. Keep in mind, you're restricted to 1H.

<pre>
pinto@nia-login07:~$ salloc -p archiveshort -t 1:00:00
salloc: Granted job allocation 50918
salloc: Waiting for resource configuration
salloc: Nodes hpss-archive02-ib are ready for job
hpss-archive02-ib:~$

hpss-archive02-ib:~$ hsi (DON'T FORGET TO START HSI)
******************************************************************
* Welcome to HPSS@SciNet - High Perfomance Storage System *
* *
* INFO: THIS IS THE NEW 7.5.1 HPSS SYSTEM! *
* *
* Contact Information: support@scinet.utoronto.ca *
* NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead *
* CHECK THE INTEGRITY OF YOUR TARBALLS *
******************************************************************
[HSI]/archive/s/scinet/pinto-> ls

[HSI]/archive/s/scinet/pinto-> cd <some directory>
</pre>

=== Scripted File Transfers ===
File transfers in and out of the HPSS should be scripted into jobs and submitted to the ''archivelong'' queue or the ''archiveshort'' . See generic example below:

<source lang="bash">
#!/bin/bash -l
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_create_tarball_in_hpss
#SBATCH --mail-type=ALL

echo "Creating a htar of finished-job1/ directory tree into HPSS"

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

DEST=$ARCHIVE/finished-job1.tar

# htar WILL overwrite an existing file with the same name so check beforehand.

hsi ls $DEST &> /dev/null
status=$?

if [ $status == 0 ]; then
echo 'File $DEST already exists. Nothing has been done'
exit 1
fi

cd $SCRATCH/workarea/
htar -Humask=0137 -cpf $ARCHIVE/finished-job1.tar finished-job1/
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>
'''Note:''' Always trap the execution of your jobs for abnormal terminations, and be sure to return the exit code

=== Job Dependencies ===

Typically data will be recalled to /scratch when it is needed for analysis. Job dependencies can be constructed so that analysis jobs wait in the queue for data recalls before starting. The qsub flag is
<pre>
--dependency=<type:JOBID>
</pre>
where JOBID is the job number of the archive recalling job that must finish successfully before the analysis job can start.

Here is a short cut for generating the dependency (lookup [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_data_recall data-recall.sh samples]):
<pre>
hpss-archive02-ib:~$ sbatch $(sbatch data-recall.sh | awk {print "--dependency=afterany:"$1}') job-to-work-on-recalled-data.sh
</pre>

== '''HTAR''' ==
''' Please aggregate small files (<~200MB) into tarballs or htar files. '''

''' [[Why not tarballs too large |Keep your tarballs to size 500GB or less]], whether ingested by htar or hsi ([[Why not tarballs too large | WHY?]])'''

HTAR is a utility that is used for aggregating a set of files and directories, by using a sophisticated multithreaded buffering scheme to write files directly from GPFS into HPSS, creating an archive file that conforms to the POSIX TAR specification, thereby achieving a high rate of performance. HTAR does not do gzip compression, however it already has a built-in checksum algorithm.

'''Caution'''
* Files larger than 68 GB cannot be stored in an HTAR archive. If you attempt to start a transfer with any files larger than 68GB the whole HTAR session will fail, and you'll get a notification listing all those files, so that you can transfer them with HSI.
* Files with pathnames too long will be skipped (greater than 100 characters), so as to conform with TAR protocol [[(POSIX 1003.1 USTAR)]] -- Note that the HTAR will erroneously indicate success, however will produce exit code 70. For now, you can check for this type of error by "grep Warning my.output" after the job has completed.
* Unlike with cput/cget in HSI, "prompt before overwrite", this is not the default with (h)tar. Be careful not to unintentionally overwrite a previous htar destination file in HPSS. There could be a similar situation when extracting material back into GPFS and overwriting the originals. Be sure to double-check the logic in your scripts.
* Check the HTAR exit code and log file before removing any files from the GPFS active filesystems.

=== HTAR Usage ===
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, and preserve mask attributes (-p), enter:
<pre>
htar -cpf files.tar file1 file2
OR
htar -cpf $ARCHIVE/files.tar file1 file2
</pre>

* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:
<pre>
htar -cpf subdirA.tar subdirA
</pre>

* To extract all files from the archive file called ''proj1.tar'' in HPSS into the ''project1/src'' directory in GPFS, and use the time of extraction as the modification time, enter:
<pre>
cd project1/src
htar -xpmf proj1.tar
</pre>

* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):
<pre>
htar -vtf out.tar
</pre>

* To ensure that both the htar and the .idx files have read permissions to other members in your group use the umask option
<pre>
htar -Humask=0137 ....
</pre>

For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online

==== Sample tarball create ====
<source lang="bash">
#!/bin/bash -l
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_create_tarball_in_hpss
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be /archive/$(id -gn)/$(whoami)/

DEST=$ARCHIVE/finished-job1.tar

# htar WILL overwrite an existing file with the same name so check beforehand.

hsi ls $DEST &> /dev/null
status=$?

if [ $status == 0 ]; then
echo 'File $DEST already exists. Nothing has been done'
exit 1
fi

cd $SCRATCH/workarea/
htar -Humask=0137 -cpf $DEST finished-job1/
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

'''Note:''' If you attempt to start a transfer with any files larger than 68GB the whole HTAR session will fail, and you'll get a notification listing all those files, so that you can transfer them with HSI.
<pre>
----------------------------------------
INFO: File too large for htar to handle: finished-job1/file1 (86567185745 bytes)
INFO: File too large for htar to handle: finished-job1/file2 (71857244579 bytes)
ERROR: 2 oversize member files found - please correct and retry
ERROR: [FATAL] error(s) generating filename list
HTAR: HTAR FAILED
###WARNING htar returned non-zero exit status
</pre>

==== Sample tarball list ====
<source lang="bash">
#!/bin/bash -l
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_list_tarball_in_hpss
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

DEST=$ARCHIVE/finished-job1.tar

htar -tvf $DEST
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Sample tarball extract ====
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_extract_tarball_from_hpss
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

cd $SCRATCH/recalled-from-hpss
htar -xpmf $ARCHIVE/finished-job1.tar
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

== '''HSI''' ==

HSI may be the primary client with which some users will interact with HPSS. It provides an ftp-like interface for archiving and retrieving tarballs or [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_transferring_directories directory trees]. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents in HPSS. The most commonly used commands will be:
{|border="1" cellpadding="10" cellspacing="0"
|-
| cput
| Conditionally saves or replaces a HPSSpath file to GPFSpath if the GPFS version is new or has been updated
cput [options] GPFSpath [: HPSSpath]
|-
| cget
| Conditionally retrieves a copy of a file from HPSS to GPFS only if a GPFS version does not already exist.
cget [options] [GPFSpath :] HPSSpath
|-
| cd,mkdir,ls,rm,mv
| Operate as one would expect on the contents of HPSS.
|-
| lcd,lls
| ''Local'' commands to GPFS
|}

*There are 3 distinctions about HSI that you should keep in mind, and that can generate a bit of confusion when you're first learning how to use it:
** HSI doesn't currently support renaming directories paths during transfers on-the-fly, therefore the syntax for cput/cget may not work as one would expect in some scenarios, requiring some workarounds.
** HSI has an operator ":" which separates the GPFSpath and HPSSpath, and must be surrounded by whitespace (one or more space characters)
** The order for referring to files in HSI syntax is different from FTP. In HSI the general format is always the same, GPFS first, HPSS second, cput or cget:
<pre>
GPFSfile : HPSSfile
</pre>
For example, when using HSI to store the tarball file from GPFS into HPSS, then recall it to GPFS, the following commands could be used:
<pre>
cput tarball-in-GPFS : tarball-in-HPSS
cget tarball-recalled : tarball-in-HPSS
</pre>

unlike with FTP, where the following syntax would be used:
<pre>
put tarball-in-GPFS tarball-in-HPSS
get tarball-in-HPSS tarball-recalled
</pre>
* Simple commands can be executed on a single line.
<pre>
hsi "mkdir LargeFilesDir; cd LargeFilesDir; cput tarball-in-GPFS : tarball-in-HPSS"
</pre>

* More complex sequences can be performed using an except such as this:
<pre>
hsi <<EOF
mkdir LargeFilesDir
cd LargeFilesDir
cput tarball-in-GPFS : tarball-in-HPSS
lcd $SCRATCH/LargeFilesDir2/
cput -Ruph *
end
EOF
</pre>

* The commands below are equivalent, but we recommend that you always use full path, and organize the contents of HPSS, where the default HSI directory placement is $ARCHIVE:
<pre>
hsi cput tarball
hsi cput tarball : tarball
hsi cput $SCRATCH/tarball : $ARCHIVE/tarball
</pre>

* There are no known issues renaming files on-the-fly:
<pre>
hsi cput $SCRATCH/tarball1 : $ARCHIVE/tarball2
hsi cget $SCRATCH/tarball3 : $ARCHIVE/tarball2
</pre>

* However the syntax forms such as the ones below will fail, since they rename the directory paths.
<pre>
hsi cput -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir (FAILS)
OR
hsi cget -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir2 (FAILS)
OR
hsi cput -Ruph $SCRATCH/LargeFilesDir/* : $ARCHIVE/LargeFilesDir2 (FAILS)
OR
hsi cget -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir (FAILS)
</pre>

One workaround is the following 2-steps process, where you do a "lcd " in GPFS first, and recursively transfer the whole directory (-R), keeping the same name. You may use '-u' option to resume a previously disrupted session, and the '-p' to preserve timestamp, and '-h' to keep the links.
<pre>
hsi <<EOF
lcd $SCRATCH
cget -Ruph LargeFilesDir
end
EOF
</pre>

Another workaround is do a "lcd" into the GPFSpath first and a "cd" in the HPSSpath, but transfer the files individually with the '*' wild character. This option lets you change the directory name:
<pre>
hsi <<EOF
lcd $SCRATCH/LargeFilesDir
mkdir $ARCHIVE/LargeFilesDir2
cd $ARCHIVE/LargeFilesDir2
cput -Ruph *
end
EOF
</pre>

=== Documentation ===
Complete documentation on HSI is available from the Gleicher Enterprises links below. You may peruse those links and come with alternative syntax forms. You may even be already familiar with HPSS/HSI from other HPC facilities, that may or not have procedures similar to ours. HSI doesn't always work as expected when you go outside of our recommended syntax, so '''we strongly urge that you use the sample scripts we are providing as the basis''' for your job submissions
* [http://www.mgleicher.us/hsi/hsi_reference_manual_2/introduction.html HSI Introduction]
* [http://www.mgleicher.us/hsi/hsi_man_page.html man hsi]
* [http://support.scinet.utoronto.ca/wiki/index.php/HSI_help hsi help]
* [http://www.mgleicher.us/hsi/hsi-exit-codes.html exit codes]
'''Note:''' HSI returns the highest-numbered exit code, in case of multiple operations in the same hsi session. You may use '/scinet/niagara/bin/exit2msg $status' to translate those codes into intelligible messages

=== Typical Usage Scripts===
The most common interactions will be ''putting'' data into HPSS, examining the contents (ls,ish), and ''getting'' data back onto GPFS for inspection or analysis.

==== Sample '''data offload''' ====
<source lang="bash">
#!/bin/bash
# This script is named: data-offload.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J offload
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# individual tarballs already exist

/usr/local/bin/hsi -v <<EOF1
mkdir put-away
cd put-away
cput $SCRATCH/workarea/finished-job1.tar.gz : finished-job1.tar.gz
end
EOF1
status=$?
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

/usr/local/bin/hsi -v <<EOF2
mkdir put-away
cd put-away
cput $SCRATCH/workarea/finished-job2.tar.gz : finished-job2.tar.gz
end
EOF2
status=$?
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

trap - TERM INT
</source>

'''Note:''' as in the above example, we recommend that you capture the (highest-numbered) exit code for each hsi session independently. And remember, you may improve your exit code verbosity by adding the excerpt below to your scripts:
<source lang="bash">
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Sample '''data list''' ====
A very trivial way to list the contents of HPSS would be to just submit the HSI 'ls' command.
<source lang="bash">
#!/bin/bash
# This script is named: data-list.sh
#SBATCH -t 1:00:00
#SBATCH -p archiveshort
#SBATCH -N 1
#SBATCH -J hpss_ls
#SBATCH --mail-type=ALL

/usr/local/bin/hsi -v <<EOF
cd put-away
ls -R
end
EOF
</source>
''Warning: if you have a lot of files, the ls command will take a long time to complete. For instance, about 400,000 files can be listed in about an hour. Adjust the walltime accordingly, and be on the safe side.''

However, we provide a much more useful and convenient way to explore the contents of HPSS with the inventory shell [[ISH]]. This example creates an index of all the files in a user's portion of the namespace. The list is placed in the directory /home/$(whoami)/.ish_register that can be inspected from the login nodes.
<source lang="bash">
#!/bin/bash
# This script is named: data-list.sh
#SBATCH -t 1:00:00
#SBATCH -p archiveshort
#SBATCH -N 1
#SBATCH -J hpss_index
#SBATCH --mail-type=ALL

INDEX_DIR=$HOME/.ish_register
if ! [ -e "$INDEX_DIR" ]; then
mkdir -p $INDEX_DIR
fi

export ISHREGISTER="$INDEX_DIR"
/scinet/niagara/bin/ish hindex
</source>
''Note: the above warning on collecting the listing for many files applies here too.''

This index can be browsed or searched with ISH on the development nodes.
<source lang="bash">
hpss-archive02-ib:~$ /scinet/niagara/bin/ish ~/.ish_register/hpss.igz
[ish]hpss.igz> help
</source>

ISH is a powerful tool that is also useful for creating and browsing indices of tar and htar archives, so please look at the [[ISH|documentation]] or built in help.

==== Sample '''data recall''' ====
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_files
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT

mkdir -p $SCRATCH/recalled-from-hpss

# individual tarballs previously organized in HPSS inside the put-away-on-2010/ folder
hsi -v << EOF
cget $SCRATCH/recalled-from-hpss/Jan-2010-jobs.tar.gz : $ARCHIVE/put-away-on-2010/Jan-2010-jobs.tar.gz
cget $SCRATCH/recalled-from-hpss/Feb-2010-jobs.tar.gz : $ARCHIVE/put-away-on-2010/Feb-2010-jobs.tar.gz
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagar/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

We should emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do optimization, as in the following example:
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_files_optimized
#SBATCH --mail-type=AL

trap "echo 'Job script not completed';exit 129" TERM INT
mkdir -p $SCRATCH/recalled-from-hpss

# individual tarballs previously organized in HPSS inside the put-away-on-2010/ folder
hsi -v << EOF
lcd $SCRATCH/recalled-from-hpss/
cd $ARCHIVE/put-away-on-2010/
cget Jan-2010-jobs.tar.gz Feb-2010-jobs.tar.gz
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Sample '''transferring directories''' ====
Remember, it's not possible to rename directories on-the-fly:
<pre>
hsi cget -Ruph $SCRATCH/LargeFiles-recalled : $ARCHIVE/LargeFiles (FAILS)
</pre>

One workaround is transfer the whole directory (and sub-directories) recursively:
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_directories
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT

mkdir -p $SCRATCH/recalled

hsi -v << EOF
lcd $SCRATCH/recalled
cd $ARCHIVE/
cget -Ruph LargeFiles
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

Another workaround is to transfer files and subdirectories individually with the "*" wild character:
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_directories
#SBATCH --mail-type=AL

trap "echo 'Job script not completed';exit 129" TERM INT

mkdir -p $SCRATCH/LargeFiles-recalled

hsi -v << EOF
lcd $SCRATCH/LargeFiles-recalled
cd $ARCHIVE/LargeFiles
cget -Ruph *
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help''']

== '''[[ISH|ISH]]''' ==
=== [[ISH|Documentation and Usage]] ===

== '''File and directory management''' ==
=== Moving/renaming ===
* you may use 'mv' or 'cp' in the same way as the linux version.
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J deletion_script
#SBATCH --mail-type=ALL

echo "HPSS file and directory management"

trap "echo 'Job script not completed';exit 129" TERM INT

/usr/local/bin/hsi -v <<EOF1
mkdir $ARCHIVE/2011
mv $ARCHIVE/oldjobs $ARCHIVE/2011
cp -r $ARCHIVE/almostfinished/*done $ARCHIVE/2011
end
EOF1
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

=== Deletions ===
==== Recommendations ====
* Be careful with the use of 'cd' commands to non-existing directories before the 'rm' command. Results may be unpredictable
* Avoid the use of the stand alone wild character '''*'''. If necessary, whenever possible have it bound to common patterns, such as '*.tmp', so to limit unintentional mis-happens
* Avoid using relative paths, even the env variable $ARCHIVE. Better to explicitly expand the full paths in your scripts
* Avoid using recursive/looped deletion instructions on $SCRATCH contents from the archive job scripts. Even on $ARCHIVE contents, it may be better to do it as an independent job submission, after you have verified that the original ingestion into HPSS finished without any issues.

==== Typical example ====
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J deletion_script
#SBATCH --mail-type=ALL

echo "Deletion of an outdated directory tree into HPSS"

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that the initial directory in HPSS ($ARCHIVE) has the path explicitly expanded

/usr/local/bin/hsi -v <<EOF1
rm /archive/s/scinet/pinto/*.tmp
rm -R /archive/s/scinet/pinto/obsolete
end
EOF1
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Deleting with an interactive HSI session ====
* You may feel more comfortable acquiring an interactive shell, starting an HSI session and proceeding with your deletions that way. Keep in mind, you're restricted to 1H.

* After using the ''qsub -q archive -I'' command you'll get a standard shell prompt on an archive execution node (hpss-archive02), as you would on any compute node. However you will need to run '''HSI''' or '''HTAR''' to access resources on HPSS.

* HSI will give you a prompt very similar to a standard shell, where your can navigate around using commands such 'ls', 'cd', 'pwd', etc ... NOTE: not every bash command has an equivalent on HSI - for instance, you can not 'vi' or 'cat'.

<pre>
pinto@nia-login07:~$ salloc -p archiveshort -t 1:00:00
salloc: Granted job allocation 50359
salloc: Waiting for resource configuration
salloc: Nodes hpss-archive02-ib are ready for job

hpss-archive02-ib:~$ hsi
******************************************************************
* Welcome to HPSS@SciNet - High Perfomance Storage System *
* *
* INFO: THIS IS THE NEW 7.5.1 HPSS SYSTEM! *
* *
* Contact Information: support@scinet.utoronto.ca *
* NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead *
* CHECK THE INTEGRITY OF YOUR TARBALLS *
******************************************************************

[HSI]/archive/s/scinet/pinto-> rm -R junk

</pre>

== '''HPSS for the 'Watchmaker' ''' ==
=== Efficient alternative to htar ===
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J tar_create_tarball_in_hpss_with_hsi_by_piping
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

# When using a pipeline like this
set -o pipefail

# to put (cput will fail)
tar -c $SCRATCH/mydir | hsi put - : $ARCHIVE/mydir.tar
status=$?
if [ ! $status == 0 ]; then
echo 'TAR+HSI+piping returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# to immediately generate an index
ish hindex $ARCHIVE/mydir.tar
status=$?
if [ ! $status == 0 ]; then
echo 'ISH returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# to get
#cd $SCRATCH
#hsi cget - : $ARCHIVE/mydir.tar | tar -xv
#status=$?
# if [ ! $status == 0 ]; then
# echo 'TAR+HSI+piping returned non-zero code.'
# /scinet/niagara/bin/exit2msg $status
# exit $status
#else
# echo 'TRANSFER SUCCESSFUL'
#fi

trap - TERM INT

</source>
'''Notes:'''
* Combining commands in this fashion, besides being HPSS-friendly, should not be that noticeably slower than the recursive put with HSI that stores each file one by one. However, reading the files back from tape in this format will be many times faster. It would also overcome the current 68GB limit on the size of stored files that we have with htar.
* To top things off, we recommend indexing with ish (in the same script) immediately after the tarball creation , while it resides in the HPSS cache. It would be as if htar was used.
* To ensure that an error at any stage of the pipeline shows up in the returned status use: ''set -o pipefail'' (The default is to return the status of the last command in the pipeline and this is not what you want.)
* Optimal performance for aggregated transfers and allocation on tapes is obtained with [[Why not tarballs too large |tarballs of size 500GB or less]], whether ingested by htar or hsi ([[Why not tarballs too large | WHY?]]). Be sure to check the contents of the directory tree with 'du' for the total amount of data before sending them to the tar+HSI piping.

=== Multi-threaded gzip'ed compression with pigz ===
We compiled multi-threaded implementation of gzip called pigz (http://zlib.net/pigz/). It's now part of the "extras" module. It can also be used on any compute or devel nodes. This makes the execution of the previous version of the script much quicker than if you were to use 'tar -cfz'. In addition, by piggy-backing ISH to the end of the script, it will know what to do with the just created mydir.tar.gz compressed tarball.

<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J tar_create_compressed_tarball_in_hpss_with_hsi_by_piping
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

# When using a pipeline like this
set -o pipefail

load module extras

# to put (cput will fail)
tar -c $SCRATCH/mydir | pigz | hsi put - : $ARCHIVE/mydir.tar.gz
status=$?
if [ ! $status == 0 ]; then
echo 'TAR+PIGZ+HSI+piping returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

=== Content Verification ===

==== HTAR CRC checksums ====
Specifies that HTAR should generate CRC checksums when creating the archive.

<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_create_tarball_in_hpss_with_checksum_verification
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

cd $SCRATCH/workarea

# to put
htar -Humask=0137 -cpf $ARCHIVE/finished-job1.tar -Hcrc -Hverify=1 finished-job1/

# to get
#mkdir $SCRATCH/verification
#cd $SCRATCH/verification
#htar -Hcrc -xvpmf $ARCHIVE/finished-job1.tar

status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Current HSI version - Checksum built-in ====

MD5 is the standard Hashing Algorithm for the HSI build at SciNet. For hsi ingestions with the '-c on' option you should be able to query the md5 hash with the hsi command 'hashli'. That value is stored as an UDA (User Defined Attribute) for each file (a feature of HPSS starting with 7.4)

[http://www.mgleicher.us/GEL/hsi/hsi_reference_manual_2/checksum-feature.html More usage details here]

The checksum algorithm is very CPU-intensive. Although the checksum code is compiled with a high level of compiler optimization, transfer rates can be significantly reduced when checksum creation or verification is in effect. The amount of degradation in transfer rates depends on several factors, such as processor speed, network transfer speed, and speed of the local filesystem (GPFS).

<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J MD5_checksum_verified_transfer
#SBATCH --mail-type=ALL

thefile=<GPFSpath>
storedfile=<HPSSpath>

# Generate checksum on fly (-c on)
hsi -q put -c on $thefile : $storedfile
pid=$!

# Check the exit code of the HSI process
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# verify checksum
hsi lshash $storedfile
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# get the file back with checksum
hsi get -c on $storedfile
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Prior to HSI version 4.0.1.1 ====

This will checksum the contents of the HPSSpath against the original GPFSpath after the transfer has finished.
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J checksum_verified_transfer
#SBATCH --mail-type=ALL

thefile=<GPFSpath>
storedfile=<HPSSpath>

# Generate checksum on fly using a named pipe so that file is only read from GPFS once
mkfifo /tmp/NPIPE
cat $thefile | tee /tmp/NPIPE | hsi -q put - : $storedfile &
pid=$!
md5sum /tmp/NPIPE |tee /tmp/$fname.md5
rm -f /tmp/NPIPE

# Check the exit code of the HSI process
wait $pid
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# change filename to stdin in checksum file
sed -i.1 "s+/tmp/NPIPE+-+" /tmp/$fname.md5

# verify checksum
hsi -q get - : $storedfile | md5sum -c /tmp/$fname.md5
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

== '''Access to HPSS using Globus''' ==
* Please note that Globus access to HPSS is disabled until further notice, due to lack of version compatibility.

* You may now transfer data between SciNet's HPSS and an external source
* Follow the link below
https://globus.computecanada.ca
: Enter your Compute Canada username and password.
* In the 'File Transfer' tab, enter ''''Compute Canada HPSS'''' as one of the Endpoints. To authenticate this endpoint, enter your SciNet username and password.
* You may read more about Compute Canada's Globus Portal here:
https://docs.computecanada.ca/wiki/Globus

== '''Access to HPSS using SME''' ==
* Storage Made Easy - SME - is an Enterprise Cloud Portal adopted by SciNet to allow our users to access HPSS
* Best suitable for light transfers to/from your personal computer and to navigate your contents on HPSS
* Follow the link below using a web browser and login with your SicNet UserID and password. Under File Manager you will find the "'''SciNet HPSS'''" folder.
https://sme.scinet.utoronto.ca
* SME can be configured as a DropBox. To download the Free Cloud File Manager native to your OS (Windows, Mac, Linux, mobile), follow the link below:
https://www.storagemadeeasy.com/clients_and_tools/
Once you have downloaded and installed the Cloud Manager App, fill up the following information:
Server location
https://sme.scinet.utoronto.ca/api
* You may learn more about SME capabilities and features here:
https://www.storagemadeeasy.com/ownFileserver/
https://www.storagemadeeasy.com/pricing/#features (Enterprise)
https://storagemadeeasy.com/faq/

== '''User provided Content/Suggestions''' ==
=== '''[[HPSS-by-pomes|Packing up large data sets and putting them on HPSS]]''' ===
(Pomés group recommendations)

[[Data Management|BACK TO Data Management]]

HPSS

2018-05-04T00:23:51Z

Pinto: /* Deleting with an interactive HSI session */

{|align=right
|align=center|'''Topology Overview'''
|align=center|'''Submission Queue'''
|-
|[[Image:HPSS-overview.png|right|x200px]]
|[[Image:HPSS-queue2.png|right|x200px]]
|-
|
|
|-
|align=center|'''Servers Rack'''
|align=center|'''TS3500 Library'''
|-
|[[Image:HPSS-servers.png|right|x250px]]
|[[Image:HPSS-TS3500.png|right|x250px]]
|}

= '''High Performance Storage System''' =

The High Performance Storage System ([http://www.hpss-collaboration.org/index.shtml HPSS] [http://en.wikipedia.org/wiki/High_Performance_Storage_System wikipedia]) is a tape-backed hierarchical storage system that provides a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data can be returned to the active GPFS filesystem when it is needed.

Since this system is intended for large data storage, it is accessible only to groups who have been awarded storage space at SciNet beyond 5TB in the yearly RAC resource allocation round. However, upon request, any user may be awarded access to HPSS, up to 2TB per group, so that you may get familiar with the system (just email support@scinet.utoronto.ca)

Access and transfer of data into and out of HPSS is done under the control of the user, whose interaction is expected to be scripted and submitted as a batch job, using one or more of the following utilities:
* [http://www.mgleicher.us/GEL/hsi HSI] is a client with an ftp-like functionality which can be used to archive and retrieve large files. It is also useful for browsing the contents of HPSS.
* [http://www.mgleicher.us/GEL/htar HTAR] is a utility that creates tar formatted archives directly into HPSS. It also creates a separate index file (.idx) that can be accessed and browsed quickly.
* [https://support.scinet.utoronto.ca/wiki/index.php/ISH ISH] is a TUI utility that can perform an inventory of the files and directories in your tarballs.

We're currently running HPSS v 7.3.3 patch 6, and HSI/HTAR version 4.0.1.2

== '''Why should I use and trust HPSS?''' ==
* HPSS is a 25 year-old collaboration between IBM and the DoE labs in the US, and is used by about 45 facilities in the [http://www.top500.org “Top 500”] HPC list (plus some black-sites).
* Over 2.5 ExaBytes of combined storage world-wide.
* The top 3 sites in the World report (fall 2017) having 360PB, 220PB and 125PB in production (ECMWF, UKMO and BNL)
* Environment Canada also adopted HPSS in 2017 to store Nav Canada data as well as to serve as their own archive. Currently has 2 X 100PB capacity installed.
* The SciNet HPSS system has been providing nearline capacity for important research data in Canada since early 2011, already at 10PB levels in 2018
* Very reliable, data redundancy and data insurance built-in (dual copies of everything are kept on tapes at SciNet)
* Data on cache and tapes can be geo-distributed for further resilience and HA.
* Highly scalable; current performance at SciNet - after a modest upgrade in 2017 - Ingest: ~150 TB/day, Recall: ~45 TB/day (aggregated).
* HSI/HTAR clients also very reliable and used on several HPSS sites. ISH was written at SciNet.
* [[Media:HPSS_rationale_SNUG.pdf|HPSS fits well with the Storage Capacity Expansion Plan at SciNet]] (pdf presentation)

== '''Guidelines''' ==
* Expanded storage capacity is provided on tape -- a media that is not suited for storing small files. Files smaller than ~200MB should be grouped into tarballs with tar or htar.
* Optimal performance for aggregated transfers and allocation on tapes is obtained with [[Why not tarballs too large |tarballs of size 500GB or less]], whether ingested by htar or hsi ([[Why not tarballs too large | WHY?]])
* We strongly urge that you use the sample scripts we are providing as the basis for your job submissions.
* Make sure to check the application's exit code and returned logs for errors after any data transfer or tarball creation process

== '''New to the System?''' ==
The first step is to email scinet support and request an HPSS account (or else you will get "Error - authentication/initialization failed" and 71 exit codes).

THIS set of instructions on the wiki is the best and most compressed "manual" we have. It may seem a bit overwhelming at first, because of all the job script templates we make available below (they are here so you don't have to think
too much, just copy and paste), but if you approach the index at the top as a "case switch" mechanism for what you intend to do, everything falls in place.

Try this sequence:

1) [https://wiki.scinet.utoronto.ca/wiki/index.php/HPSS#Access_Through_an_Interactive_HSI_session take a look around HPSS using an interactive HSI session]

(most linux shell commands have an equivalent in HPSS)

2) [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_tarball_create archive a small test directory using HTAR]

2a) use step 1) to see what happened

3) [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_data_offload archive a file using hsi]

3a) use step 1) to see what happened

4) [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_transferring_directories archive a small test directory using HSI]

4a) use step 1) to see what happened

5) now try the other cases and so on. In a couple of hours you'll be in pretty good shape.

== '''Bridge between BGQ and HPSS''' ==

At this time BGQ users will have to migrate data to Niagara scratch prior to transferring it to HPSS. We are looking for ways to improve this workflow.

== '''Access Through the Queue System''' ==
All access to the archive system is done through the [https://docs.computecanada.ca/wiki/Niagara_Quickstart#Submitting_jobs NIA queue system].

* Job submissions should be done to the 'archivelong' queue or the 'archiveshort'
* Short jobs are limited to 1H walltime by default. Long jobs (> 1H) are limited to 72H walltime.
* Users are limited to only 2 long jobs and 2 short jobs at the same time, and 10 jobs total on the each queue.
* There can only be 5 long jobs running at any given time overall. Remaining submissions will be placed on hold for the time being. So far we have not seen a need for overall limit on short jobs.

The status of pending jobs can be monitored with squeue specifying the archive queue:
<pre>
squeue -p archiveshort
OR
squeue -p archivelong
</pre>

== '''Access Through an Interactive HSI session''' ==
* You may want to acquire an interactive shell, start an HSI session and navigate the archive naming-space. Keep in mind, you're restricted to 1H.

<pre>
pinto@nia-login07:~$ salloc -p archiveshort -t 1:00:00
salloc: Granted job allocation 50918
salloc: Waiting for resource configuration
salloc: Nodes hpss-archive02-ib are ready for job
hpss-archive02-ib:~$

hpss-archive02-ib:~$ hsi (DON'T FORGET TO START HSI)
******************************************************************
* Welcome to HPSS@SciNet - High Perfomance Storage System *
* *
* INFO: THIS IS THE NEW 7.5.1 HPSS SYSTEM! *
* *
* Contact Information: support@scinet.utoronto.ca *
* NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead *
* CHECK THE INTEGRITY OF YOUR TARBALLS *
******************************************************************
[HSI]/archive/s/scinet/pinto-> ls

[HSI]/archive/s/scinet/pinto-> cd <some directory>
</pre>

=== Scripted File Transfers ===
File transfers in and out of the HPSS should be scripted into jobs and submitted to the ''archivelong'' queue or the ''archiveshort'' . See generic example below:

<source lang="bash">
#!/bin/bash -l
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_create_tarball_in_hpss
#SBATCH --mail-type=ALL

echo "Creating a htar of finished-job1/ directory tree into HPSS"

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

DEST=$ARCHIVE/finished-job1.tar

# htar WILL overwrite an existing file with the same name so check beforehand.

hsi ls $DEST &> /dev/null
status=$?

if [ $status == 0 ]; then
echo 'File $DEST already exists. Nothing has been done'
exit 1
fi

cd $SCRATCH/workarea/
htar -Humask=0137 -cpf $ARCHIVE/finished-job1.tar finished-job1/
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>
'''Note:''' Always trap the execution of your jobs for abnormal terminations, and be sure to return the exit code

=== Job Dependencies ===

Typically data will be recalled to /scratch when it is needed for analysis. Job dependencies can be constructed so that analysis jobs wait in the queue for data recalls before starting. The qsub flag is
<pre>
--dependency=<type:JOBID>
</pre>
where JOBID is the job number of the archive recalling job that must finish successfully before the analysis job can start.

Here is a short cut for generating the dependency (lookup [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_data_recall data-recall.sh samples]):
<pre>
hpss-archive02-ib:~$ sbatch $(sbatch data-recall.sh | awk {print "--dependency=afterany:"$1}') job-to-work-on-recalled-data.sh
</pre>

== '''HTAR''' ==
''' Please aggregate small files (<~200MB) into tarballs or htar files. '''

''' [[Why not tarballs too large |Keep your tarballs to size 500GB or less]], whether ingested by htar or hsi ([[Why not tarballs too large | WHY?]])'''

HTAR is a utility that is used for aggregating a set of files and directories, by using a sophisticated multithreaded buffering scheme to write files directly from GPFS into HPSS, creating an archive file that conforms to the POSIX TAR specification, thereby achieving a high rate of performance. HTAR does not do gzip compression, however it already has a built-in checksum algorithm.

'''Caution'''
* Files larger than 68 GB cannot be stored in an HTAR archive. If you attempt to start a transfer with any files larger than 68GB the whole HTAR session will fail, and you'll get a notification listing all those files, so that you can transfer them with HSI.
* Files with pathnames too long will be skipped (greater than 100 characters), so as to conform with TAR protocol [[(POSIX 1003.1 USTAR)]] -- Note that the HTAR will erroneously indicate success, however will produce exit code 70. For now, you can check for this type of error by "grep Warning my.output" after the job has completed.
* Unlike with cput/cget in HSI, "prompt before overwrite", this is not the default with (h)tar. Be careful not to unintentionally overwrite a previous htar destination file in HPSS. There could be a similar situation when extracting material back into GPFS and overwriting the originals. Be sure to double-check the logic in your scripts.
* Check the HTAR exit code and log file before removing any files from the GPFS active filesystems.

=== HTAR Usage ===
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, and preserve mask attributes (-p), enter:
<pre>
htar -cpf files.tar file1 file2
OR
htar -cpf $ARCHIVE/files.tar file1 file2
</pre>

* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:
<pre>
htar -cpf subdirA.tar subdirA
</pre>

* To extract all files from the archive file called ''proj1.tar'' in HPSS into the ''project1/src'' directory in GPFS, and use the time of extraction as the modification time, enter:
<pre>
cd project1/src
htar -xpmf proj1.tar
</pre>

* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):
<pre>
htar -vtf out.tar
</pre>

* To ensure that both the htar and the .idx files have read permissions to other members in your group use the umask option
<pre>
htar -Humask=0137 ....
</pre>

For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online

==== Sample tarball create ====
<source lang="bash">
#!/bin/bash -l
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_create_tarball_in_hpss
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be /archive/$(id -gn)/$(whoami)/

DEST=$ARCHIVE/finished-job1.tar

# htar WILL overwrite an existing file with the same name so check beforehand.

hsi ls $DEST &> /dev/null
status=$?

if [ $status == 0 ]; then
echo 'File $DEST already exists. Nothing has been done'
exit 1
fi

cd $SCRATCH/workarea/
htar -Humask=0137 -cpf $DEST finished-job1/
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

'''Note:''' If you attempt to start a transfer with any files larger than 68GB the whole HTAR session will fail, and you'll get a notification listing all those files, so that you can transfer them with HSI.
<pre>
----------------------------------------
INFO: File too large for htar to handle: finished-job1/file1 (86567185745 bytes)
INFO: File too large for htar to handle: finished-job1/file2 (71857244579 bytes)
ERROR: 2 oversize member files found - please correct and retry
ERROR: [FATAL] error(s) generating filename list
HTAR: HTAR FAILED
###WARNING htar returned non-zero exit status
</pre>

==== Sample tarball list ====
<source lang="bash">
#!/bin/bash -l
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_list_tarball_in_hpss
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

DEST=$ARCHIVE/finished-job1.tar

htar -tvf $DEST
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Sample tarball extract ====
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_extract_tarball_from_hpss
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

cd $SCRATCH/recalled-from-hpss
htar -xpmf $ARCHIVE/finished-job1.tar
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

== '''HSI''' ==

HSI may be the primary client with which some users will interact with HPSS. It provides an ftp-like interface for archiving and retrieving tarballs or [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_transferring_directories directory trees]. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents in HPSS. The most commonly used commands will be:
{|border="1" cellpadding="10" cellspacing="0"
|-
| cput
| Conditionally saves or replaces a HPSSpath file to GPFSpath if the GPFS version is new or has been updated
cput [options] GPFSpath [: HPSSpath]
|-
| cget
| Conditionally retrieves a copy of a file from HPSS to GPFS only if a GPFS version does not already exist.
cget [options] [GPFSpath :] HPSSpath
|-
| cd,mkdir,ls,rm,mv
| Operate as one would expect on the contents of HPSS.
|-
| lcd,lls
| ''Local'' commands to GPFS
|}

*There are 3 distinctions about HSI that you should keep in mind, and that can generate a bit of confusion when you're first learning how to use it:
** HSI doesn't currently support renaming directories paths during transfers on-the-fly, therefore the syntax for cput/cget may not work as one would expect in some scenarios, requiring some workarounds.
** HSI has an operator ":" which separates the GPFSpath and HPSSpath, and must be surrounded by whitespace (one or more space characters)
** The order for referring to files in HSI syntax is different from FTP. In HSI the general format is always the same, GPFS first, HPSS second, cput or cget:
<pre>
GPFSfile : HPSSfile
</pre>
For example, when using HSI to store the tarball file from GPFS into HPSS, then recall it to GPFS, the following commands could be used:
<pre>
cput tarball-in-GPFS : tarball-in-HPSS
cget tarball-recalled : tarball-in-HPSS
</pre>

unlike with FTP, where the following syntax would be used:
<pre>
put tarball-in-GPFS tarball-in-HPSS
get tarball-in-HPSS tarball-recalled
</pre>
* Simple commands can be executed on a single line.
<pre>
hsi "mkdir LargeFilesDir; cd LargeFilesDir; cput tarball-in-GPFS : tarball-in-HPSS"
</pre>

* More complex sequences can be performed using an except such as this:
<pre>
hsi <<EOF
mkdir LargeFilesDir
cd LargeFilesDir
cput tarball-in-GPFS : tarball-in-HPSS
lcd $SCRATCH/LargeFilesDir2/
cput -Ruph *
end
EOF
</pre>

* The commands below are equivalent, but we recommend that you always use full path, and organize the contents of HPSS, where the default HSI directory placement is $ARCHIVE:
<pre>
hsi cput tarball
hsi cput tarball : tarball
hsi cput $SCRATCH/tarball : $ARCHIVE/tarball
</pre>

* There are no known issues renaming files on-the-fly:
<pre>
hsi cput $SCRATCH/tarball1 : $ARCHIVE/tarball2
hsi cget $SCRATCH/tarball3 : $ARCHIVE/tarball2
</pre>

* However the syntax forms such as the ones below will fail, since they rename the directory paths.
<pre>
hsi cput -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir (FAILS)
OR
hsi cget -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir2 (FAILS)
OR
hsi cput -Ruph $SCRATCH/LargeFilesDir/* : $ARCHIVE/LargeFilesDir2 (FAILS)
OR
hsi cget -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir (FAILS)
</pre>

One workaround is the following 2-steps process, where you do a "lcd " in GPFS first, and recursively transfer the whole directory (-R), keeping the same name. You may use '-u' option to resume a previously disrupted session, and the '-p' to preserve timestamp, and '-h' to keep the links.
<pre>
hsi <<EOF
lcd $SCRATCH
cget -Ruph LargeFilesDir
end
EOF
</pre>

Another workaround is do a "lcd" into the GPFSpath first and a "cd" in the HPSSpath, but transfer the files individually with the '*' wild character. This option lets you change the directory name:
<pre>
hsi <<EOF
lcd $SCRATCH/LargeFilesDir
mkdir $ARCHIVE/LargeFilesDir2
cd $ARCHIVE/LargeFilesDir2
cput -Ruph *
end
EOF
</pre>

=== Documentation ===
Complete documentation on HSI is available from the Gleicher Enterprises links below. You may peruse those links and come with alternative syntax forms. You may even be already familiar with HPSS/HSI from other HPC facilities, that may or not have procedures similar to ours. HSI doesn't always work as expected when you go outside of our recommended syntax, so '''we strongly urge that you use the sample scripts we are providing as the basis''' for your job submissions
* [http://www.mgleicher.us/hsi/hsi_reference_manual_2/introduction.html HSI Introduction]
* [http://www.mgleicher.us/hsi/hsi_man_page.html man hsi]
* [http://support.scinet.utoronto.ca/wiki/index.php/HSI_help hsi help]
* [http://www.mgleicher.us/hsi/hsi-exit-codes.html exit codes]
'''Note:''' HSI returns the highest-numbered exit code, in case of multiple operations in the same hsi session. You may use '/scinet/niagara/bin/exit2msg $status' to translate those codes into intelligible messages

=== Typical Usage Scripts===
The most common interactions will be ''putting'' data into HPSS, examining the contents (ls,ish), and ''getting'' data back onto GPFS for inspection or analysis.

==== Sample '''data offload''' ====
<source lang="bash">
#!/bin/bash
# This script is named: data-offload.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J offload
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# individual tarballs already exist

/usr/local/bin/hsi -v <<EOF1
mkdir put-away
cd put-away
cput $SCRATCH/workarea/finished-job1.tar.gz : finished-job1.tar.gz
end
EOF1
status=$?
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

/usr/local/bin/hsi -v <<EOF2
mkdir put-away
cd put-away
cput $SCRATCH/workarea/finished-job2.tar.gz : finished-job2.tar.gz
end
EOF2
status=$?
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

trap - TERM INT
</source>

'''Note:''' as in the above example, we recommend that you capture the (highest-numbered) exit code for each hsi session independently. And remember, you may improve your exit code verbosity by adding the excerpt below to your scripts:
<source lang="bash">
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Sample '''data list''' ====
A very trivial way to list the contents of HPSS would be to just submit the HSI 'ls' command.
<source lang="bash">
#!/bin/bash
# This script is named: data-list.sh
#SBATCH -t 1:00:00
#SBATCH -p archiveshort
#SBATCH -N 1
#SBATCH -J hpss_ls
#SBATCH --mail-type=ALL

/usr/local/bin/hsi -v <<EOF
cd put-away
ls -R
end
EOF
</source>
''Warning: if you have a lot of files, the ls command will take a long time to complete. For instance, about 400,000 files can be listed in about an hour. Adjust the walltime accordingly, and be on the safe side.''

However, we provide a much more useful and convenient way to explore the contents of HPSS with the inventory shell [[ISH]]. This example creates an index of all the files in a user's portion of the namespace. The list is placed in the directory /home/$(whoami)/.ish_register that can be inspected from the login nodes.
<source lang="bash">
#!/bin/bash
# This script is named: data-list.sh
#SBATCH -t 1:00:00
#SBATCH -p archiveshort
#SBATCH -N 1
#SBATCH -J hpss_index
#SBATCH --mail-type=ALL

INDEX_DIR=$HOME/.ish_register
if ! [ -e "$INDEX_DIR" ]; then
mkdir -p $INDEX_DIR
fi

export ISHREGISTER="$INDEX_DIR"
/scinet/niagara/bin/ish hindex
</source>
''Note: the above warning on collecting the listing for many files applies here too.''

This index can be browsed or searched with ISH on the development nodes.
<source lang="bash">
hpss-archive02-ib:~$ /scinet/niagara/bin/ish ~/.ish_register/hpss.igz
[ish]hpss.igz> help
</source>

ISH is a powerful tool that is also useful for creating and browsing indices of tar and htar archives, so please look at the [[ISH|documentation]] or built in help.

==== Sample '''data recall''' ====
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_files
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT

mkdir -p $SCRATCH/recalled-from-hpss

# individual tarballs previously organized in HPSS inside the put-away-on-2010/ folder
hsi -v << EOF
cget $SCRATCH/recalled-from-hpss/Jan-2010-jobs.tar.gz : $ARCHIVE/put-away-on-2010/Jan-2010-jobs.tar.gz
cget $SCRATCH/recalled-from-hpss/Feb-2010-jobs.tar.gz : $ARCHIVE/put-away-on-2010/Feb-2010-jobs.tar.gz
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagar/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

We should emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do optimization, as in the following example:
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_files_optimized
#SBATCH --mail-type=AL

trap "echo 'Job script not completed';exit 129" TERM INT
mkdir -p $SCRATCH/recalled-from-hpss

# individual tarballs previously organized in HPSS inside the put-away-on-2010/ folder
hsi -v << EOF
lcd $SCRATCH/recalled-from-hpss/
cd $ARCHIVE/put-away-on-2010/
cget Jan-2010-jobs.tar.gz Feb-2010-jobs.tar.gz
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Sample '''transferring directories''' ====
Remember, it's not possible to rename directories on-the-fly:
<pre>
hsi cget -Ruph $SCRATCH/LargeFiles-recalled : $ARCHIVE/LargeFiles (FAILS)
</pre>

One workaround is transfer the whole directory (and sub-directories) recursively:
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_directories
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT

mkdir -p $SCRATCH/recalled

hsi -v << EOF
lcd $SCRATCH/recalled
cd $ARCHIVE/
cget -Ruph LargeFiles
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

Another workaround is to transfer files and subdirectories individually with the "*" wild character:
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_directories
#SBATCH --mail-type=AL

trap "echo 'Job script not completed';exit 129" TERM INT

mkdir -p $SCRATCH/LargeFiles-recalled

hsi -v << EOF
lcd $SCRATCH/LargeFiles-recalled
cd $ARCHIVE/LargeFiles
cget -Ruph *
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help''']

== '''[[ISH|ISH]]''' ==
=== [[ISH|Documentation and Usage]] ===

== '''File and directory management''' ==
=== Moving/renaming ===
* you may use 'mv' or 'cp' in the same way as the linux version.
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J deletion_script
#SBATCH --mail-type=ALL

echo "HPSS file and directory management"

trap "echo 'Job script not completed';exit 129" TERM INT

/usr/local/bin/hsi -v <<EOF1
mkdir $ARCHIVE/2011
mv $ARCHIVE/oldjobs $ARCHIVE/2011
cp -r $ARCHIVE/almostfinished/*done $ARCHIVE/2011
end
EOF1
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

=== Deletions ===
==== Recommendations ====
* Be careful with the use of 'cd' commands to non-existing directories before the 'rm' command. Results may be unpredictable
* Avoid the use of the stand alone wild character '''*'''. If necessary, whenever possible have it bound to common patterns, such as '*.tmp', so to limit unintentional mis-happens
* Avoid using relative paths, even the env variable $ARCHIVE. Better to explicitly expand the full paths in your scripts
* Avoid using recursive/looped deletion instructions on $SCRATCH contents from the archive job scripts. Even on $ARCHIVE contents, it may be better to do it as an independent job submission, after you have verified that the original ingestion into HPSS finished without any issues.

==== Typical example ====
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J deletion_script
#SBATCH --mail-type=ALL

echo "Deletion of an outdated directory tree into HPSS"

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that the initial directory in HPSS ($ARCHIVE) has the path explicitly expanded

/usr/local/bin/hsi -v <<EOF1
rm /archive/s/scinet/pinto/*.tmp
rm -R /archive/s/scinet/pinto/obsolete
end
EOF1
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Deleting with an interactive HSI session ====
* You may feel more comfortable acquiring an interactive shell, starting an HSI session and proceeding with your deletions that way. Keep in mind, you're restricted to 1H.

* After using the ''qsub -q archive -I'' command you'll get a standard shell prompt on an archive execution node (hpss-archive02), as you would on any compute node. However you will need to run '''HSI''' or '''HTAR''' to access resources on HPSS.

* HSI will give you a prompt very similar to a standard shell, where your can navigate around using commands such 'ls', 'cd', 'pwd', etc ... NOTE: not every bash command has an equivalent on HSI - for instance, you can not 'vi' or 'cat'.

<pre>
pinto@nia-login07:~$ salloc -p archiveshort -t 1:00:00
salloc: Granted job allocation 50359
salloc: Waiting for resource configuration
salloc: Nodes hpss-archive02-ib are ready for job

hpss-archive02-ib:~$ hsi
******************************************************************
* Welcome to HPSS@SciNet - High Perfomance Storage System *
* *
* INFO: THIS IS THE NEW 7.5.1 HPSS SYSTEM! *
* *
* Contact Information: support@scinet.utoronto.ca *
* NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead *
* CHECK THE INTEGRITY OF YOUR TARBALLS *
******************************************************************
]
[HSI]/archive/s/scinet/pinto-> rm -R junk

</pre>

== '''HPSS for the 'Watchmaker' ''' ==
=== Efficient alternative to htar ===
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J tar_create_tarball_in_hpss_with_hsi_by_piping
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

# When using a pipeline like this
set -o pipefail

# to put (cput will fail)
tar -c $SCRATCH/mydir | hsi put - : $ARCHIVE/mydir.tar
status=$?
if [ ! $status == 0 ]; then
echo 'TAR+HSI+piping returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# to immediately generate an index
ish hindex $ARCHIVE/mydir.tar
status=$?
if [ ! $status == 0 ]; then
echo 'ISH returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# to get
#cd $SCRATCH
#hsi cget - : $ARCHIVE/mydir.tar | tar -xv
#status=$?
# if [ ! $status == 0 ]; then
# echo 'TAR+HSI+piping returned non-zero code.'
# /scinet/niagara/bin/exit2msg $status
# exit $status
#else
# echo 'TRANSFER SUCCESSFUL'
#fi

trap - TERM INT

</source>
'''Notes:'''
* Combining commands in this fashion, besides being HPSS-friendly, should not be that noticeably slower than the recursive put with HSI that stores each file one by one. However, reading the files back from tape in this format will be many times faster. It would also overcome the current 68GB limit on the size of stored files that we have with htar.
* To top things off, we recommend indexing with ish (in the same script) immediately after the tarball creation , while it resides in the HPSS cache. It would be as if htar was used.
* To ensure that an error at any stage of the pipeline shows up in the returned status use: ''set -o pipefail'' (The default is to return the status of the last command in the pipeline and this is not what you want.)
* Optimal performance for aggregated transfers and allocation on tapes is obtained with [[Why not tarballs too large |tarballs of size 500GB or less]], whether ingested by htar or hsi ([[Why not tarballs too large | WHY?]]). Be sure to check the contents of the directory tree with 'du' for the total amount of data before sending them to the tar+HSI piping.

=== Multi-threaded gzip'ed compression with pigz ===
We compiled multi-threaded implementation of gzip called pigz (http://zlib.net/pigz/). It's now part of the "extras" module. It can also be used on any compute or devel nodes. This makes the execution of the previous version of the script much quicker than if you were to use 'tar -cfz'. In addition, by piggy-backing ISH to the end of the script, it will know what to do with the just created mydir.tar.gz compressed tarball.

<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J tar_create_compressed_tarball_in_hpss_with_hsi_by_piping
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

# When using a pipeline like this
set -o pipefail

load module extras

# to put (cput will fail)
tar -c $SCRATCH/mydir | pigz | hsi put - : $ARCHIVE/mydir.tar.gz
status=$?
if [ ! $status == 0 ]; then
echo 'TAR+PIGZ+HSI+piping returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

=== Content Verification ===

==== HTAR CRC checksums ====
Specifies that HTAR should generate CRC checksums when creating the archive.

<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_create_tarball_in_hpss_with_checksum_verification
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

cd $SCRATCH/workarea

# to put
htar -Humask=0137 -cpf $ARCHIVE/finished-job1.tar -Hcrc -Hverify=1 finished-job1/

# to get
#mkdir $SCRATCH/verification
#cd $SCRATCH/verification
#htar -Hcrc -xvpmf $ARCHIVE/finished-job1.tar

status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Current HSI version - Checksum built-in ====

MD5 is the standard Hashing Algorithm for the HSI build at SciNet. For hsi ingestions with the '-c on' option you should be able to query the md5 hash with the hsi command 'hashli'. That value is stored as an UDA (User Defined Attribute) for each file (a feature of HPSS starting with 7.4)

[http://www.mgleicher.us/GEL/hsi/hsi_reference_manual_2/checksum-feature.html More usage details here]

The checksum algorithm is very CPU-intensive. Although the checksum code is compiled with a high level of compiler optimization, transfer rates can be significantly reduced when checksum creation or verification is in effect. The amount of degradation in transfer rates depends on several factors, such as processor speed, network transfer speed, and speed of the local filesystem (GPFS).

<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J MD5_checksum_verified_transfer
#SBATCH --mail-type=ALL

thefile=<GPFSpath>
storedfile=<HPSSpath>

# Generate checksum on fly (-c on)
hsi -q put -c on $thefile : $storedfile
pid=$!

# Check the exit code of the HSI process
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# verify checksum
hsi lshash $storedfile
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# get the file back with checksum
hsi get -c on $storedfile
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Prior to HSI version 4.0.1.1 ====

This will checksum the contents of the HPSSpath against the original GPFSpath after the transfer has finished.
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J checksum_verified_transfer
#SBATCH --mail-type=ALL

thefile=<GPFSpath>
storedfile=<HPSSpath>

# Generate checksum on fly using a named pipe so that file is only read from GPFS once
mkfifo /tmp/NPIPE
cat $thefile | tee /tmp/NPIPE | hsi -q put - : $storedfile &
pid=$!
md5sum /tmp/NPIPE |tee /tmp/$fname.md5
rm -f /tmp/NPIPE

# Check the exit code of the HSI process
wait $pid
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# change filename to stdin in checksum file
sed -i.1 "s+/tmp/NPIPE+-+" /tmp/$fname.md5

# verify checksum
hsi -q get - : $storedfile | md5sum -c /tmp/$fname.md5
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

== '''Access to HPSS using Globus''' ==
* Please note that Globus access to HPSS is disabled until further notice, due to lack of version compatibility.

* You may now transfer data between SciNet's HPSS and an external source
* Follow the link below
https://globus.computecanada.ca
: Enter your Compute Canada username and password.
* In the 'File Transfer' tab, enter ''''Compute Canada HPSS'''' as one of the Endpoints. To authenticate this endpoint, enter your SciNet username and password.
* You may read more about Compute Canada's Globus Portal here:
https://docs.computecanada.ca/wiki/Globus

== '''Access to HPSS using SME''' ==
* Storage Made Easy - SME - is an Enterprise Cloud Portal adopted by SciNet to allow our users to access HPSS
* Best suitable for light transfers to/from your personal computer and to navigate your contents on HPSS
* Follow the link below using a web browser and login with your SicNet UserID and password. Under File Manager you will find the "'''SciNet HPSS'''" folder.
https://sme.scinet.utoronto.ca
* SME can be configured as a DropBox. To download the Free Cloud File Manager native to your OS (Windows, Mac, Linux, mobile), follow the link below:
https://www.storagemadeeasy.com/clients_and_tools/
Once you have downloaded and installed the Cloud Manager App, fill up the following information:
Server location
https://sme.scinet.utoronto.ca/api
* You may learn more about SME capabilities and features here:
https://www.storagemadeeasy.com/ownFileserver/
https://www.storagemadeeasy.com/pricing/#features (Enterprise)
https://storagemadeeasy.com/faq/

== '''User provided Content/Suggestions''' ==
=== '''[[HPSS-by-pomes|Packing up large data sets and putting them on HPSS]]''' ===
(Pomés group recommendations)

[[Data Management|BACK TO Data Management]]

HPSS

2018-05-04T00:22:15Z

Pinto: /* Sample data list */

{|align=right
|align=center|'''Topology Overview'''
|align=center|'''Submission Queue'''
|-
|[[Image:HPSS-overview.png|right|x200px]]
|[[Image:HPSS-queue2.png|right|x200px]]
|-
|
|
|-
|align=center|'''Servers Rack'''
|align=center|'''TS3500 Library'''
|-
|[[Image:HPSS-servers.png|right|x250px]]
|[[Image:HPSS-TS3500.png|right|x250px]]
|}

= '''High Performance Storage System''' =

The High Performance Storage System ([http://www.hpss-collaboration.org/index.shtml HPSS] [http://en.wikipedia.org/wiki/High_Performance_Storage_System wikipedia]) is a tape-backed hierarchical storage system that provides a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data can be returned to the active GPFS filesystem when it is needed.

Since this system is intended for large data storage, it is accessible only to groups who have been awarded storage space at SciNet beyond 5TB in the yearly RAC resource allocation round. However, upon request, any user may be awarded access to HPSS, up to 2TB per group, so that you may get familiar with the system (just email support@scinet.utoronto.ca)

Access and transfer of data into and out of HPSS is done under the control of the user, whose interaction is expected to be scripted and submitted as a batch job, using one or more of the following utilities:
* [http://www.mgleicher.us/GEL/hsi HSI] is a client with an ftp-like functionality which can be used to archive and retrieve large files. It is also useful for browsing the contents of HPSS.
* [http://www.mgleicher.us/GEL/htar HTAR] is a utility that creates tar formatted archives directly into HPSS. It also creates a separate index file (.idx) that can be accessed and browsed quickly.
* [https://support.scinet.utoronto.ca/wiki/index.php/ISH ISH] is a TUI utility that can perform an inventory of the files and directories in your tarballs.

We're currently running HPSS v 7.3.3 patch 6, and HSI/HTAR version 4.0.1.2

== '''Why should I use and trust HPSS?''' ==
* HPSS is a 25 year-old collaboration between IBM and the DoE labs in the US, and is used by about 45 facilities in the [http://www.top500.org “Top 500”] HPC list (plus some black-sites).
* Over 2.5 ExaBytes of combined storage world-wide.
* The top 3 sites in the World report (fall 2017) having 360PB, 220PB and 125PB in production (ECMWF, UKMO and BNL)
* Environment Canada also adopted HPSS in 2017 to store Nav Canada data as well as to serve as their own archive. Currently has 2 X 100PB capacity installed.
* The SciNet HPSS system has been providing nearline capacity for important research data in Canada since early 2011, already at 10PB levels in 2018
* Very reliable, data redundancy and data insurance built-in (dual copies of everything are kept on tapes at SciNet)
* Data on cache and tapes can be geo-distributed for further resilience and HA.
* Highly scalable; current performance at SciNet - after a modest upgrade in 2017 - Ingest: ~150 TB/day, Recall: ~45 TB/day (aggregated).
* HSI/HTAR clients also very reliable and used on several HPSS sites. ISH was written at SciNet.
* [[Media:HPSS_rationale_SNUG.pdf|HPSS fits well with the Storage Capacity Expansion Plan at SciNet]] (pdf presentation)

== '''Guidelines''' ==
* Expanded storage capacity is provided on tape -- a media that is not suited for storing small files. Files smaller than ~200MB should be grouped into tarballs with tar or htar.
* Optimal performance for aggregated transfers and allocation on tapes is obtained with [[Why not tarballs too large |tarballs of size 500GB or less]], whether ingested by htar or hsi ([[Why not tarballs too large | WHY?]])
* We strongly urge that you use the sample scripts we are providing as the basis for your job submissions.
* Make sure to check the application's exit code and returned logs for errors after any data transfer or tarball creation process

== '''New to the System?''' ==
The first step is to email scinet support and request an HPSS account (or else you will get "Error - authentication/initialization failed" and 71 exit codes).

THIS set of instructions on the wiki is the best and most compressed "manual" we have. It may seem a bit overwhelming at first, because of all the job script templates we make available below (they are here so you don't have to think
too much, just copy and paste), but if you approach the index at the top as a "case switch" mechanism for what you intend to do, everything falls in place.

Try this sequence:

1) [https://wiki.scinet.utoronto.ca/wiki/index.php/HPSS#Access_Through_an_Interactive_HSI_session take a look around HPSS using an interactive HSI session]

(most linux shell commands have an equivalent in HPSS)

2) [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_tarball_create archive a small test directory using HTAR]

2a) use step 1) to see what happened

3) [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_data_offload archive a file using hsi]

3a) use step 1) to see what happened

4) [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_transferring_directories archive a small test directory using HSI]

4a) use step 1) to see what happened

5) now try the other cases and so on. In a couple of hours you'll be in pretty good shape.

== '''Bridge between BGQ and HPSS''' ==

At this time BGQ users will have to migrate data to Niagara scratch prior to transferring it to HPSS. We are looking for ways to improve this workflow.

== '''Access Through the Queue System''' ==
All access to the archive system is done through the [https://docs.computecanada.ca/wiki/Niagara_Quickstart#Submitting_jobs NIA queue system].

* Job submissions should be done to the 'archivelong' queue or the 'archiveshort'
* Short jobs are limited to 1H walltime by default. Long jobs (> 1H) are limited to 72H walltime.
* Users are limited to only 2 long jobs and 2 short jobs at the same time, and 10 jobs total on the each queue.
* There can only be 5 long jobs running at any given time overall. Remaining submissions will be placed on hold for the time being. So far we have not seen a need for overall limit on short jobs.

The status of pending jobs can be monitored with squeue specifying the archive queue:
<pre>
squeue -p archiveshort
OR
squeue -p archivelong
</pre>

== '''Access Through an Interactive HSI session''' ==
* You may want to acquire an interactive shell, start an HSI session and navigate the archive naming-space. Keep in mind, you're restricted to 1H.

<pre>
pinto@nia-login07:~$ salloc -p archiveshort -t 1:00:00
salloc: Granted job allocation 50918
salloc: Waiting for resource configuration
salloc: Nodes hpss-archive02-ib are ready for job
hpss-archive02-ib:~$

hpss-archive02-ib:~$ hsi (DON'T FORGET TO START HSI)
******************************************************************
* Welcome to HPSS@SciNet - High Perfomance Storage System *
* *
* INFO: THIS IS THE NEW 7.5.1 HPSS SYSTEM! *
* *
* Contact Information: support@scinet.utoronto.ca *
* NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead *
* CHECK THE INTEGRITY OF YOUR TARBALLS *
******************************************************************
[HSI]/archive/s/scinet/pinto-> ls

[HSI]/archive/s/scinet/pinto-> cd <some directory>
</pre>

=== Scripted File Transfers ===
File transfers in and out of the HPSS should be scripted into jobs and submitted to the ''archivelong'' queue or the ''archiveshort'' . See generic example below:

<source lang="bash">
#!/bin/bash -l
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_create_tarball_in_hpss
#SBATCH --mail-type=ALL

echo "Creating a htar of finished-job1/ directory tree into HPSS"

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

DEST=$ARCHIVE/finished-job1.tar

# htar WILL overwrite an existing file with the same name so check beforehand.

hsi ls $DEST &> /dev/null
status=$?

if [ $status == 0 ]; then
echo 'File $DEST already exists. Nothing has been done'
exit 1
fi

cd $SCRATCH/workarea/
htar -Humask=0137 -cpf $ARCHIVE/finished-job1.tar finished-job1/
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>
'''Note:''' Always trap the execution of your jobs for abnormal terminations, and be sure to return the exit code

=== Job Dependencies ===

Typically data will be recalled to /scratch when it is needed for analysis. Job dependencies can be constructed so that analysis jobs wait in the queue for data recalls before starting. The qsub flag is
<pre>
--dependency=<type:JOBID>
</pre>
where JOBID is the job number of the archive recalling job that must finish successfully before the analysis job can start.

Here is a short cut for generating the dependency (lookup [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_data_recall data-recall.sh samples]):
<pre>
hpss-archive02-ib:~$ sbatch $(sbatch data-recall.sh | awk {print "--dependency=afterany:"$1}') job-to-work-on-recalled-data.sh
</pre>

== '''HTAR''' ==
''' Please aggregate small files (<~200MB) into tarballs or htar files. '''

''' [[Why not tarballs too large |Keep your tarballs to size 500GB or less]], whether ingested by htar or hsi ([[Why not tarballs too large | WHY?]])'''

HTAR is a utility that is used for aggregating a set of files and directories, by using a sophisticated multithreaded buffering scheme to write files directly from GPFS into HPSS, creating an archive file that conforms to the POSIX TAR specification, thereby achieving a high rate of performance. HTAR does not do gzip compression, however it already has a built-in checksum algorithm.

'''Caution'''
* Files larger than 68 GB cannot be stored in an HTAR archive. If you attempt to start a transfer with any files larger than 68GB the whole HTAR session will fail, and you'll get a notification listing all those files, so that you can transfer them with HSI.
* Files with pathnames too long will be skipped (greater than 100 characters), so as to conform with TAR protocol [[(POSIX 1003.1 USTAR)]] -- Note that the HTAR will erroneously indicate success, however will produce exit code 70. For now, you can check for this type of error by "grep Warning my.output" after the job has completed.
* Unlike with cput/cget in HSI, "prompt before overwrite", this is not the default with (h)tar. Be careful not to unintentionally overwrite a previous htar destination file in HPSS. There could be a similar situation when extracting material back into GPFS and overwriting the originals. Be sure to double-check the logic in your scripts.
* Check the HTAR exit code and log file before removing any files from the GPFS active filesystems.

=== HTAR Usage ===
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, and preserve mask attributes (-p), enter:
<pre>
htar -cpf files.tar file1 file2
OR
htar -cpf $ARCHIVE/files.tar file1 file2
</pre>

* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:
<pre>
htar -cpf subdirA.tar subdirA
</pre>

* To extract all files from the archive file called ''proj1.tar'' in HPSS into the ''project1/src'' directory in GPFS, and use the time of extraction as the modification time, enter:
<pre>
cd project1/src
htar -xpmf proj1.tar
</pre>

* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):
<pre>
htar -vtf out.tar
</pre>

* To ensure that both the htar and the .idx files have read permissions to other members in your group use the umask option
<pre>
htar -Humask=0137 ....
</pre>

For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online

==== Sample tarball create ====
<source lang="bash">
#!/bin/bash -l
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_create_tarball_in_hpss
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be /archive/$(id -gn)/$(whoami)/

DEST=$ARCHIVE/finished-job1.tar

# htar WILL overwrite an existing file with the same name so check beforehand.

hsi ls $DEST &> /dev/null
status=$?

if [ $status == 0 ]; then
echo 'File $DEST already exists. Nothing has been done'
exit 1
fi

cd $SCRATCH/workarea/
htar -Humask=0137 -cpf $DEST finished-job1/
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

'''Note:''' If you attempt to start a transfer with any files larger than 68GB the whole HTAR session will fail, and you'll get a notification listing all those files, so that you can transfer them with HSI.
<pre>
----------------------------------------
INFO: File too large for htar to handle: finished-job1/file1 (86567185745 bytes)
INFO: File too large for htar to handle: finished-job1/file2 (71857244579 bytes)
ERROR: 2 oversize member files found - please correct and retry
ERROR: [FATAL] error(s) generating filename list
HTAR: HTAR FAILED
###WARNING htar returned non-zero exit status
</pre>

==== Sample tarball list ====
<source lang="bash">
#!/bin/bash -l
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_list_tarball_in_hpss
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

DEST=$ARCHIVE/finished-job1.tar

htar -tvf $DEST
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Sample tarball extract ====
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_extract_tarball_from_hpss
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

cd $SCRATCH/recalled-from-hpss
htar -xpmf $ARCHIVE/finished-job1.tar
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

== '''HSI''' ==

HSI may be the primary client with which some users will interact with HPSS. It provides an ftp-like interface for archiving and retrieving tarballs or [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_transferring_directories directory trees]. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents in HPSS. The most commonly used commands will be:
{|border="1" cellpadding="10" cellspacing="0"
|-
| cput
| Conditionally saves or replaces a HPSSpath file to GPFSpath if the GPFS version is new or has been updated
cput [options] GPFSpath [: HPSSpath]
|-
| cget
| Conditionally retrieves a copy of a file from HPSS to GPFS only if a GPFS version does not already exist.
cget [options] [GPFSpath :] HPSSpath
|-
| cd,mkdir,ls,rm,mv
| Operate as one would expect on the contents of HPSS.
|-
| lcd,lls
| ''Local'' commands to GPFS
|}

*There are 3 distinctions about HSI that you should keep in mind, and that can generate a bit of confusion when you're first learning how to use it:
** HSI doesn't currently support renaming directories paths during transfers on-the-fly, therefore the syntax for cput/cget may not work as one would expect in some scenarios, requiring some workarounds.
** HSI has an operator ":" which separates the GPFSpath and HPSSpath, and must be surrounded by whitespace (one or more space characters)
** The order for referring to files in HSI syntax is different from FTP. In HSI the general format is always the same, GPFS first, HPSS second, cput or cget:
<pre>
GPFSfile : HPSSfile
</pre>
For example, when using HSI to store the tarball file from GPFS into HPSS, then recall it to GPFS, the following commands could be used:
<pre>
cput tarball-in-GPFS : tarball-in-HPSS
cget tarball-recalled : tarball-in-HPSS
</pre>

unlike with FTP, where the following syntax would be used:
<pre>
put tarball-in-GPFS tarball-in-HPSS
get tarball-in-HPSS tarball-recalled
</pre>
* Simple commands can be executed on a single line.
<pre>
hsi "mkdir LargeFilesDir; cd LargeFilesDir; cput tarball-in-GPFS : tarball-in-HPSS"
</pre>

* More complex sequences can be performed using an except such as this:
<pre>
hsi <<EOF
mkdir LargeFilesDir
cd LargeFilesDir
cput tarball-in-GPFS : tarball-in-HPSS
lcd $SCRATCH/LargeFilesDir2/
cput -Ruph *
end
EOF
</pre>

* The commands below are equivalent, but we recommend that you always use full path, and organize the contents of HPSS, where the default HSI directory placement is $ARCHIVE:
<pre>
hsi cput tarball
hsi cput tarball : tarball
hsi cput $SCRATCH/tarball : $ARCHIVE/tarball
</pre>

* There are no known issues renaming files on-the-fly:
<pre>
hsi cput $SCRATCH/tarball1 : $ARCHIVE/tarball2
hsi cget $SCRATCH/tarball3 : $ARCHIVE/tarball2
</pre>

* However the syntax forms such as the ones below will fail, since they rename the directory paths.
<pre>
hsi cput -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir (FAILS)
OR
hsi cget -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir2 (FAILS)
OR
hsi cput -Ruph $SCRATCH/LargeFilesDir/* : $ARCHIVE/LargeFilesDir2 (FAILS)
OR
hsi cget -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir (FAILS)
</pre>

One workaround is the following 2-steps process, where you do a "lcd " in GPFS first, and recursively transfer the whole directory (-R), keeping the same name. You may use '-u' option to resume a previously disrupted session, and the '-p' to preserve timestamp, and '-h' to keep the links.
<pre>
hsi <<EOF
lcd $SCRATCH
cget -Ruph LargeFilesDir
end
EOF
</pre>

Another workaround is do a "lcd" into the GPFSpath first and a "cd" in the HPSSpath, but transfer the files individually with the '*' wild character. This option lets you change the directory name:
<pre>
hsi <<EOF
lcd $SCRATCH/LargeFilesDir
mkdir $ARCHIVE/LargeFilesDir2
cd $ARCHIVE/LargeFilesDir2
cput -Ruph *
end
EOF
</pre>

=== Documentation ===
Complete documentation on HSI is available from the Gleicher Enterprises links below. You may peruse those links and come with alternative syntax forms. You may even be already familiar with HPSS/HSI from other HPC facilities, that may or not have procedures similar to ours. HSI doesn't always work as expected when you go outside of our recommended syntax, so '''we strongly urge that you use the sample scripts we are providing as the basis''' for your job submissions
* [http://www.mgleicher.us/hsi/hsi_reference_manual_2/introduction.html HSI Introduction]
* [http://www.mgleicher.us/hsi/hsi_man_page.html man hsi]
* [http://support.scinet.utoronto.ca/wiki/index.php/HSI_help hsi help]
* [http://www.mgleicher.us/hsi/hsi-exit-codes.html exit codes]
'''Note:''' HSI returns the highest-numbered exit code, in case of multiple operations in the same hsi session. You may use '/scinet/niagara/bin/exit2msg $status' to translate those codes into intelligible messages

=== Typical Usage Scripts===
The most common interactions will be ''putting'' data into HPSS, examining the contents (ls,ish), and ''getting'' data back onto GPFS for inspection or analysis.

==== Sample '''data offload''' ====
<source lang="bash">
#!/bin/bash
# This script is named: data-offload.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J offload
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# individual tarballs already exist

/usr/local/bin/hsi -v <<EOF1
mkdir put-away
cd put-away
cput $SCRATCH/workarea/finished-job1.tar.gz : finished-job1.tar.gz
end
EOF1
status=$?
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

/usr/local/bin/hsi -v <<EOF2
mkdir put-away
cd put-away
cput $SCRATCH/workarea/finished-job2.tar.gz : finished-job2.tar.gz
end
EOF2
status=$?
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

trap - TERM INT
</source>

'''Note:''' as in the above example, we recommend that you capture the (highest-numbered) exit code for each hsi session independently. And remember, you may improve your exit code verbosity by adding the excerpt below to your scripts:
<source lang="bash">
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Sample '''data list''' ====
A very trivial way to list the contents of HPSS would be to just submit the HSI 'ls' command.
<source lang="bash">
#!/bin/bash
# This script is named: data-list.sh
#SBATCH -t 1:00:00
#SBATCH -p archiveshort
#SBATCH -N 1
#SBATCH -J hpss_ls
#SBATCH --mail-type=ALL

/usr/local/bin/hsi -v <<EOF
cd put-away
ls -R
end
EOF
</source>
''Warning: if you have a lot of files, the ls command will take a long time to complete. For instance, about 400,000 files can be listed in about an hour. Adjust the walltime accordingly, and be on the safe side.''

However, we provide a much more useful and convenient way to explore the contents of HPSS with the inventory shell [[ISH]]. This example creates an index of all the files in a user's portion of the namespace. The list is placed in the directory /home/$(whoami)/.ish_register that can be inspected from the login nodes.
<source lang="bash">
#!/bin/bash
# This script is named: data-list.sh
#SBATCH -t 1:00:00
#SBATCH -p archiveshort
#SBATCH -N 1
#SBATCH -J hpss_index
#SBATCH --mail-type=ALL

INDEX_DIR=$HOME/.ish_register
if ! [ -e "$INDEX_DIR" ]; then
mkdir -p $INDEX_DIR
fi

export ISHREGISTER="$INDEX_DIR"
/scinet/niagara/bin/ish hindex
</source>
''Note: the above warning on collecting the listing for many files applies here too.''

This index can be browsed or searched with ISH on the development nodes.
<source lang="bash">
hpss-archive02-ib:~$ /scinet/niagara/bin/ish ~/.ish_register/hpss.igz
[ish]hpss.igz> help
</source>

ISH is a powerful tool that is also useful for creating and browsing indices of tar and htar archives, so please look at the [[ISH|documentation]] or built in help.

==== Sample '''data recall''' ====
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_files
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT

mkdir -p $SCRATCH/recalled-from-hpss

# individual tarballs previously organized in HPSS inside the put-away-on-2010/ folder
hsi -v << EOF
cget $SCRATCH/recalled-from-hpss/Jan-2010-jobs.tar.gz : $ARCHIVE/put-away-on-2010/Jan-2010-jobs.tar.gz
cget $SCRATCH/recalled-from-hpss/Feb-2010-jobs.tar.gz : $ARCHIVE/put-away-on-2010/Feb-2010-jobs.tar.gz
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagar/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

We should emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do optimization, as in the following example:
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_files_optimized
#SBATCH --mail-type=AL

trap "echo 'Job script not completed';exit 129" TERM INT
mkdir -p $SCRATCH/recalled-from-hpss

# individual tarballs previously organized in HPSS inside the put-away-on-2010/ folder
hsi -v << EOF
lcd $SCRATCH/recalled-from-hpss/
cd $ARCHIVE/put-away-on-2010/
cget Jan-2010-jobs.tar.gz Feb-2010-jobs.tar.gz
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Sample '''transferring directories''' ====
Remember, it's not possible to rename directories on-the-fly:
<pre>
hsi cget -Ruph $SCRATCH/LargeFiles-recalled : $ARCHIVE/LargeFiles (FAILS)
</pre>

One workaround is transfer the whole directory (and sub-directories) recursively:
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_directories
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT

mkdir -p $SCRATCH/recalled

hsi -v << EOF
lcd $SCRATCH/recalled
cd $ARCHIVE/
cget -Ruph LargeFiles
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

Another workaround is to transfer files and subdirectories individually with the "*" wild character:
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_directories
#SBATCH --mail-type=AL

trap "echo 'Job script not completed';exit 129" TERM INT

mkdir -p $SCRATCH/LargeFiles-recalled

hsi -v << EOF
lcd $SCRATCH/LargeFiles-recalled
cd $ARCHIVE/LargeFiles
cget -Ruph *
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help''']

== '''[[ISH|ISH]]''' ==
=== [[ISH|Documentation and Usage]] ===

== '''File and directory management''' ==
=== Moving/renaming ===
* you may use 'mv' or 'cp' in the same way as the linux version.
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J deletion_script
#SBATCH --mail-type=ALL

echo "HPSS file and directory management"

trap "echo 'Job script not completed';exit 129" TERM INT

/usr/local/bin/hsi -v <<EOF1
mkdir $ARCHIVE/2011
mv $ARCHIVE/oldjobs $ARCHIVE/2011
cp -r $ARCHIVE/almostfinished/*done $ARCHIVE/2011
end
EOF1
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

=== Deletions ===
==== Recommendations ====
* Be careful with the use of 'cd' commands to non-existing directories before the 'rm' command. Results may be unpredictable
* Avoid the use of the stand alone wild character '''*'''. If necessary, whenever possible have it bound to common patterns, such as '*.tmp', so to limit unintentional mis-happens
* Avoid using relative paths, even the env variable $ARCHIVE. Better to explicitly expand the full paths in your scripts
* Avoid using recursive/looped deletion instructions on $SCRATCH contents from the archive job scripts. Even on $ARCHIVE contents, it may be better to do it as an independent job submission, after you have verified that the original ingestion into HPSS finished without any issues.

==== Typical example ====
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J deletion_script
#SBATCH --mail-type=ALL

echo "Deletion of an outdated directory tree into HPSS"

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that the initial directory in HPSS ($ARCHIVE) has the path explicitly expanded

/usr/local/bin/hsi -v <<EOF1
rm /archive/s/scinet/pinto/*.tmp
rm -R /archive/s/scinet/pinto/obsolete
end
EOF1
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Deleting with an interactive HSI session ====
* You may feel more comfortable acquiring an interactive shell, starting an HSI session and proceeding with your deletions that way. Keep in mind, you're restricted to 1H.

* After using the ''qsub -q archive -I'' command you'll get a standard shell prompt on an archive execution node (hpss-archive02), as you would on any compute node. However you will need to run '''HSI''' or '''HTAR''' to access resources on HPSS.

* HSI will give you a prompt very similar to a standard shell, where your can navigate around using commands such 'ls', 'cd', 'pwd', etc ... NOTE: not every bash command has an equivalent on HSI - for instance, you can not 'vi' or 'cat'.

<pre>
pinto@nia-login07:~$ salloc -p archiveshort -t 1:00:00
salloc: Granted job allocation 50359
salloc: Waiting for resource configuration
salloc: Nodes hpss-archive02-ib are ready for job

[pinto@hpss-archive02-ib ~]$ hsi
******************************************************************
* Welcome to HPSS@SciNet - High Perfomance Storage System *
* *
* INFO: THIS IS THE NEW 7.5.1 HPSS SYSTEM! *
* *
* Contact Information: support@scinet.utoronto.ca *
* NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead *
* CHECK THE INTEGRITY OF YOUR TARBALLS *
******************************************************************
]
[HSI]/archive/s/scinet/pinto-> rm -R junk

</pre>

== '''HPSS for the 'Watchmaker' ''' ==
=== Efficient alternative to htar ===
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J tar_create_tarball_in_hpss_with_hsi_by_piping
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

# When using a pipeline like this
set -o pipefail

# to put (cput will fail)
tar -c $SCRATCH/mydir | hsi put - : $ARCHIVE/mydir.tar
status=$?
if [ ! $status == 0 ]; then
echo 'TAR+HSI+piping returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# to immediately generate an index
ish hindex $ARCHIVE/mydir.tar
status=$?
if [ ! $status == 0 ]; then
echo 'ISH returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# to get
#cd $SCRATCH
#hsi cget - : $ARCHIVE/mydir.tar | tar -xv
#status=$?
# if [ ! $status == 0 ]; then
# echo 'TAR+HSI+piping returned non-zero code.'
# /scinet/niagara/bin/exit2msg $status
# exit $status
#else
# echo 'TRANSFER SUCCESSFUL'
#fi

trap - TERM INT

</source>
'''Notes:'''
* Combining commands in this fashion, besides being HPSS-friendly, should not be that noticeably slower than the recursive put with HSI that stores each file one by one. However, reading the files back from tape in this format will be many times faster. It would also overcome the current 68GB limit on the size of stored files that we have with htar.
* To top things off, we recommend indexing with ish (in the same script) immediately after the tarball creation , while it resides in the HPSS cache. It would be as if htar was used.
* To ensure that an error at any stage of the pipeline shows up in the returned status use: ''set -o pipefail'' (The default is to return the status of the last command in the pipeline and this is not what you want.)
* Optimal performance for aggregated transfers and allocation on tapes is obtained with [[Why not tarballs too large |tarballs of size 500GB or less]], whether ingested by htar or hsi ([[Why not tarballs too large | WHY?]]). Be sure to check the contents of the directory tree with 'du' for the total amount of data before sending them to the tar+HSI piping.

=== Multi-threaded gzip'ed compression with pigz ===
We compiled multi-threaded implementation of gzip called pigz (http://zlib.net/pigz/). It's now part of the "extras" module. It can also be used on any compute or devel nodes. This makes the execution of the previous version of the script much quicker than if you were to use 'tar -cfz'. In addition, by piggy-backing ISH to the end of the script, it will know what to do with the just created mydir.tar.gz compressed tarball.

<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J tar_create_compressed_tarball_in_hpss_with_hsi_by_piping
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

# When using a pipeline like this
set -o pipefail

load module extras

# to put (cput will fail)
tar -c $SCRATCH/mydir | pigz | hsi put - : $ARCHIVE/mydir.tar.gz
status=$?
if [ ! $status == 0 ]; then
echo 'TAR+PIGZ+HSI+piping returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

=== Content Verification ===

==== HTAR CRC checksums ====
Specifies that HTAR should generate CRC checksums when creating the archive.

<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_create_tarball_in_hpss_with_checksum_verification
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

cd $SCRATCH/workarea

# to put
htar -Humask=0137 -cpf $ARCHIVE/finished-job1.tar -Hcrc -Hverify=1 finished-job1/

# to get
#mkdir $SCRATCH/verification
#cd $SCRATCH/verification
#htar -Hcrc -xvpmf $ARCHIVE/finished-job1.tar

status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Current HSI version - Checksum built-in ====

MD5 is the standard Hashing Algorithm for the HSI build at SciNet. For hsi ingestions with the '-c on' option you should be able to query the md5 hash with the hsi command 'hashli'. That value is stored as an UDA (User Defined Attribute) for each file (a feature of HPSS starting with 7.4)

[http://www.mgleicher.us/GEL/hsi/hsi_reference_manual_2/checksum-feature.html More usage details here]

The checksum algorithm is very CPU-intensive. Although the checksum code is compiled with a high level of compiler optimization, transfer rates can be significantly reduced when checksum creation or verification is in effect. The amount of degradation in transfer rates depends on several factors, such as processor speed, network transfer speed, and speed of the local filesystem (GPFS).

<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J MD5_checksum_verified_transfer
#SBATCH --mail-type=ALL

thefile=<GPFSpath>
storedfile=<HPSSpath>

# Generate checksum on fly (-c on)
hsi -q put -c on $thefile : $storedfile
pid=$!

# Check the exit code of the HSI process
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# verify checksum
hsi lshash $storedfile
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# get the file back with checksum
hsi get -c on $storedfile
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Prior to HSI version 4.0.1.1 ====

This will checksum the contents of the HPSSpath against the original GPFSpath after the transfer has finished.
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J checksum_verified_transfer
#SBATCH --mail-type=ALL

thefile=<GPFSpath>
storedfile=<HPSSpath>

# Generate checksum on fly using a named pipe so that file is only read from GPFS once
mkfifo /tmp/NPIPE
cat $thefile | tee /tmp/NPIPE | hsi -q put - : $storedfile &
pid=$!
md5sum /tmp/NPIPE |tee /tmp/$fname.md5
rm -f /tmp/NPIPE

# Check the exit code of the HSI process
wait $pid
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# change filename to stdin in checksum file
sed -i.1 "s+/tmp/NPIPE+-+" /tmp/$fname.md5

# verify checksum
hsi -q get - : $storedfile | md5sum -c /tmp/$fname.md5
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

== '''Access to HPSS using Globus''' ==
* Please note that Globus access to HPSS is disabled until further notice, due to lack of version compatibility.

* You may now transfer data between SciNet's HPSS and an external source
* Follow the link below
https://globus.computecanada.ca
: Enter your Compute Canada username and password.
* In the 'File Transfer' tab, enter ''''Compute Canada HPSS'''' as one of the Endpoints. To authenticate this endpoint, enter your SciNet username and password.
* You may read more about Compute Canada's Globus Portal here:
https://docs.computecanada.ca/wiki/Globus

== '''Access to HPSS using SME''' ==
* Storage Made Easy - SME - is an Enterprise Cloud Portal adopted by SciNet to allow our users to access HPSS
* Best suitable for light transfers to/from your personal computer and to navigate your contents on HPSS
* Follow the link below using a web browser and login with your SicNet UserID and password. Under File Manager you will find the "'''SciNet HPSS'''" folder.
https://sme.scinet.utoronto.ca
* SME can be configured as a DropBox. To download the Free Cloud File Manager native to your OS (Windows, Mac, Linux, mobile), follow the link below:
https://www.storagemadeeasy.com/clients_and_tools/
Once you have downloaded and installed the Cloud Manager App, fill up the following information:
Server location
https://sme.scinet.utoronto.ca/api
* You may learn more about SME capabilities and features here:
https://www.storagemadeeasy.com/ownFileserver/
https://www.storagemadeeasy.com/pricing/#features (Enterprise)
https://storagemadeeasy.com/faq/

== '''User provided Content/Suggestions''' ==
=== '''[[HPSS-by-pomes|Packing up large data sets and putting them on HPSS]]''' ===
(Pomés group recommendations)

[[Data Management|BACK TO Data Management]]

HPSS

2018-05-04T00:17:50Z

Pinto: /* Sample tarball list */

{|align=right
|align=center|'''Topology Overview'''
|align=center|'''Submission Queue'''
|-
|[[Image:HPSS-overview.png|right|x200px]]
|[[Image:HPSS-queue2.png|right|x200px]]
|-
|
|
|-
|align=center|'''Servers Rack'''
|align=center|'''TS3500 Library'''
|-
|[[Image:HPSS-servers.png|right|x250px]]
|[[Image:HPSS-TS3500.png|right|x250px]]
|}

= '''High Performance Storage System''' =

The High Performance Storage System ([http://www.hpss-collaboration.org/index.shtml HPSS] [http://en.wikipedia.org/wiki/High_Performance_Storage_System wikipedia]) is a tape-backed hierarchical storage system that provides a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data can be returned to the active GPFS filesystem when it is needed.

Since this system is intended for large data storage, it is accessible only to groups who have been awarded storage space at SciNet beyond 5TB in the yearly RAC resource allocation round. However, upon request, any user may be awarded access to HPSS, up to 2TB per group, so that you may get familiar with the system (just email support@scinet.utoronto.ca)

Access and transfer of data into and out of HPSS is done under the control of the user, whose interaction is expected to be scripted and submitted as a batch job, using one or more of the following utilities:
* [http://www.mgleicher.us/GEL/hsi HSI] is a client with an ftp-like functionality which can be used to archive and retrieve large files. It is also useful for browsing the contents of HPSS.
* [http://www.mgleicher.us/GEL/htar HTAR] is a utility that creates tar formatted archives directly into HPSS. It also creates a separate index file (.idx) that can be accessed and browsed quickly.
* [https://support.scinet.utoronto.ca/wiki/index.php/ISH ISH] is a TUI utility that can perform an inventory of the files and directories in your tarballs.

We're currently running HPSS v 7.3.3 patch 6, and HSI/HTAR version 4.0.1.2

== '''Why should I use and trust HPSS?''' ==
* HPSS is a 25 year-old collaboration between IBM and the DoE labs in the US, and is used by about 45 facilities in the [http://www.top500.org “Top 500”] HPC list (plus some black-sites).
* Over 2.5 ExaBytes of combined storage world-wide.
* The top 3 sites in the World report (fall 2017) having 360PB, 220PB and 125PB in production (ECMWF, UKMO and BNL)
* Environment Canada also adopted HPSS in 2017 to store Nav Canada data as well as to serve as their own archive. Currently has 2 X 100PB capacity installed.
* The SciNet HPSS system has been providing nearline capacity for important research data in Canada since early 2011, already at 10PB levels in 2018
* Very reliable, data redundancy and data insurance built-in (dual copies of everything are kept on tapes at SciNet)
* Data on cache and tapes can be geo-distributed for further resilience and HA.
* Highly scalable; current performance at SciNet - after a modest upgrade in 2017 - Ingest: ~150 TB/day, Recall: ~45 TB/day (aggregated).
* HSI/HTAR clients also very reliable and used on several HPSS sites. ISH was written at SciNet.
* [[Media:HPSS_rationale_SNUG.pdf|HPSS fits well with the Storage Capacity Expansion Plan at SciNet]] (pdf presentation)

== '''Guidelines''' ==
* Expanded storage capacity is provided on tape -- a media that is not suited for storing small files. Files smaller than ~200MB should be grouped into tarballs with tar or htar.
* Optimal performance for aggregated transfers and allocation on tapes is obtained with [[Why not tarballs too large |tarballs of size 500GB or less]], whether ingested by htar or hsi ([[Why not tarballs too large | WHY?]])
* We strongly urge that you use the sample scripts we are providing as the basis for your job submissions.
* Make sure to check the application's exit code and returned logs for errors after any data transfer or tarball creation process

== '''New to the System?''' ==
The first step is to email scinet support and request an HPSS account (or else you will get "Error - authentication/initialization failed" and 71 exit codes).

THIS set of instructions on the wiki is the best and most compressed "manual" we have. It may seem a bit overwhelming at first, because of all the job script templates we make available below (they are here so you don't have to think
too much, just copy and paste), but if you approach the index at the top as a "case switch" mechanism for what you intend to do, everything falls in place.

Try this sequence:

1) [https://wiki.scinet.utoronto.ca/wiki/index.php/HPSS#Access_Through_an_Interactive_HSI_session take a look around HPSS using an interactive HSI session]

(most linux shell commands have an equivalent in HPSS)

2) [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_tarball_create archive a small test directory using HTAR]

2a) use step 1) to see what happened

3) [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_data_offload archive a file using hsi]

3a) use step 1) to see what happened

4) [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_transferring_directories archive a small test directory using HSI]

4a) use step 1) to see what happened

5) now try the other cases and so on. In a couple of hours you'll be in pretty good shape.

== '''Bridge between BGQ and HPSS''' ==

At this time BGQ users will have to migrate data to Niagara scratch prior to transferring it to HPSS. We are looking for ways to improve this workflow.

== '''Access Through the Queue System''' ==
All access to the archive system is done through the [https://docs.computecanada.ca/wiki/Niagara_Quickstart#Submitting_jobs NIA queue system].

* Job submissions should be done to the 'archivelong' queue or the 'archiveshort'
* Short jobs are limited to 1H walltime by default. Long jobs (> 1H) are limited to 72H walltime.
* Users are limited to only 2 long jobs and 2 short jobs at the same time, and 10 jobs total on the each queue.
* There can only be 5 long jobs running at any given time overall. Remaining submissions will be placed on hold for the time being. So far we have not seen a need for overall limit on short jobs.

The status of pending jobs can be monitored with squeue specifying the archive queue:
<pre>
squeue -p archiveshort
OR
squeue -p archivelong
</pre>

== '''Access Through an Interactive HSI session''' ==
* You may want to acquire an interactive shell, start an HSI session and navigate the archive naming-space. Keep in mind, you're restricted to 1H.

<pre>
pinto@nia-login07:~$ salloc -p archiveshort -t 1:00:00
salloc: Granted job allocation 50918
salloc: Waiting for resource configuration
salloc: Nodes hpss-archive02-ib are ready for job
hpss-archive02-ib:~$

hpss-archive02-ib:~$ hsi (DON'T FORGET TO START HSI)
******************************************************************
* Welcome to HPSS@SciNet - High Perfomance Storage System *
* *
* INFO: THIS IS THE NEW 7.5.1 HPSS SYSTEM! *
* *
* Contact Information: support@scinet.utoronto.ca *
* NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead *
* CHECK THE INTEGRITY OF YOUR TARBALLS *
******************************************************************
[HSI]/archive/s/scinet/pinto-> ls

[HSI]/archive/s/scinet/pinto-> cd <some directory>
</pre>

=== Scripted File Transfers ===
File transfers in and out of the HPSS should be scripted into jobs and submitted to the ''archivelong'' queue or the ''archiveshort'' . See generic example below:

<source lang="bash">
#!/bin/bash -l
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_create_tarball_in_hpss
#SBATCH --mail-type=ALL

echo "Creating a htar of finished-job1/ directory tree into HPSS"

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

DEST=$ARCHIVE/finished-job1.tar

# htar WILL overwrite an existing file with the same name so check beforehand.

hsi ls $DEST &> /dev/null
status=$?

if [ $status == 0 ]; then
echo 'File $DEST already exists. Nothing has been done'
exit 1
fi

cd $SCRATCH/workarea/
htar -Humask=0137 -cpf $ARCHIVE/finished-job1.tar finished-job1/
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>
'''Note:''' Always trap the execution of your jobs for abnormal terminations, and be sure to return the exit code

=== Job Dependencies ===

Typically data will be recalled to /scratch when it is needed for analysis. Job dependencies can be constructed so that analysis jobs wait in the queue for data recalls before starting. The qsub flag is
<pre>
--dependency=<type:JOBID>
</pre>
where JOBID is the job number of the archive recalling job that must finish successfully before the analysis job can start.

Here is a short cut for generating the dependency (lookup [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_data_recall data-recall.sh samples]):
<pre>
hpss-archive02-ib:~$ sbatch $(sbatch data-recall.sh | awk {print "--dependency=afterany:"$1}') job-to-work-on-recalled-data.sh
</pre>

== '''HTAR''' ==
''' Please aggregate small files (<~200MB) into tarballs or htar files. '''

''' [[Why not tarballs too large |Keep your tarballs to size 500GB or less]], whether ingested by htar or hsi ([[Why not tarballs too large | WHY?]])'''

HTAR is a utility that is used for aggregating a set of files and directories, by using a sophisticated multithreaded buffering scheme to write files directly from GPFS into HPSS, creating an archive file that conforms to the POSIX TAR specification, thereby achieving a high rate of performance. HTAR does not do gzip compression, however it already has a built-in checksum algorithm.

'''Caution'''
* Files larger than 68 GB cannot be stored in an HTAR archive. If you attempt to start a transfer with any files larger than 68GB the whole HTAR session will fail, and you'll get a notification listing all those files, so that you can transfer them with HSI.
* Files with pathnames too long will be skipped (greater than 100 characters), so as to conform with TAR protocol [[(POSIX 1003.1 USTAR)]] -- Note that the HTAR will erroneously indicate success, however will produce exit code 70. For now, you can check for this type of error by "grep Warning my.output" after the job has completed.
* Unlike with cput/cget in HSI, "prompt before overwrite", this is not the default with (h)tar. Be careful not to unintentionally overwrite a previous htar destination file in HPSS. There could be a similar situation when extracting material back into GPFS and overwriting the originals. Be sure to double-check the logic in your scripts.
* Check the HTAR exit code and log file before removing any files from the GPFS active filesystems.

=== HTAR Usage ===
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, and preserve mask attributes (-p), enter:
<pre>
htar -cpf files.tar file1 file2
OR
htar -cpf $ARCHIVE/files.tar file1 file2
</pre>

* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:
<pre>
htar -cpf subdirA.tar subdirA
</pre>

* To extract all files from the archive file called ''proj1.tar'' in HPSS into the ''project1/src'' directory in GPFS, and use the time of extraction as the modification time, enter:
<pre>
cd project1/src
htar -xpmf proj1.tar
</pre>

* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):
<pre>
htar -vtf out.tar
</pre>

* To ensure that both the htar and the .idx files have read permissions to other members in your group use the umask option
<pre>
htar -Humask=0137 ....
</pre>

For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online

==== Sample tarball create ====
<source lang="bash">
#!/bin/bash -l
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_create_tarball_in_hpss
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be /archive/$(id -gn)/$(whoami)/

DEST=$ARCHIVE/finished-job1.tar

# htar WILL overwrite an existing file with the same name so check beforehand.

hsi ls $DEST &> /dev/null
status=$?

if [ $status == 0 ]; then
echo 'File $DEST already exists. Nothing has been done'
exit 1
fi

cd $SCRATCH/workarea/
htar -Humask=0137 -cpf $DEST finished-job1/
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

'''Note:''' If you attempt to start a transfer with any files larger than 68GB the whole HTAR session will fail, and you'll get a notification listing all those files, so that you can transfer them with HSI.
<pre>
----------------------------------------
INFO: File too large for htar to handle: finished-job1/file1 (86567185745 bytes)
INFO: File too large for htar to handle: finished-job1/file2 (71857244579 bytes)
ERROR: 2 oversize member files found - please correct and retry
ERROR: [FATAL] error(s) generating filename list
HTAR: HTAR FAILED
###WARNING htar returned non-zero exit status
</pre>

==== Sample tarball list ====
<source lang="bash">
#!/bin/bash -l
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_list_tarball_in_hpss
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

DEST=$ARCHIVE/finished-job1.tar

htar -tvf $DEST
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Sample tarball extract ====
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_extract_tarball_from_hpss
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

cd $SCRATCH/recalled-from-hpss
htar -xpmf $ARCHIVE/finished-job1.tar
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

== '''HSI''' ==

HSI may be the primary client with which some users will interact with HPSS. It provides an ftp-like interface for archiving and retrieving tarballs or [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_transferring_directories directory trees]. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents in HPSS. The most commonly used commands will be:
{|border="1" cellpadding="10" cellspacing="0"
|-
| cput
| Conditionally saves or replaces a HPSSpath file to GPFSpath if the GPFS version is new or has been updated
cput [options] GPFSpath [: HPSSpath]
|-
| cget
| Conditionally retrieves a copy of a file from HPSS to GPFS only if a GPFS version does not already exist.
cget [options] [GPFSpath :] HPSSpath
|-
| cd,mkdir,ls,rm,mv
| Operate as one would expect on the contents of HPSS.
|-
| lcd,lls
| ''Local'' commands to GPFS
|}

*There are 3 distinctions about HSI that you should keep in mind, and that can generate a bit of confusion when you're first learning how to use it:
** HSI doesn't currently support renaming directories paths during transfers on-the-fly, therefore the syntax for cput/cget may not work as one would expect in some scenarios, requiring some workarounds.
** HSI has an operator ":" which separates the GPFSpath and HPSSpath, and must be surrounded by whitespace (one or more space characters)
** The order for referring to files in HSI syntax is different from FTP. In HSI the general format is always the same, GPFS first, HPSS second, cput or cget:
<pre>
GPFSfile : HPSSfile
</pre>
For example, when using HSI to store the tarball file from GPFS into HPSS, then recall it to GPFS, the following commands could be used:
<pre>
cput tarball-in-GPFS : tarball-in-HPSS
cget tarball-recalled : tarball-in-HPSS
</pre>

unlike with FTP, where the following syntax would be used:
<pre>
put tarball-in-GPFS tarball-in-HPSS
get tarball-in-HPSS tarball-recalled
</pre>
* Simple commands can be executed on a single line.
<pre>
hsi "mkdir LargeFilesDir; cd LargeFilesDir; cput tarball-in-GPFS : tarball-in-HPSS"
</pre>

* More complex sequences can be performed using an except such as this:
<pre>
hsi <<EOF
mkdir LargeFilesDir
cd LargeFilesDir
cput tarball-in-GPFS : tarball-in-HPSS
lcd $SCRATCH/LargeFilesDir2/
cput -Ruph *
end
EOF
</pre>

* The commands below are equivalent, but we recommend that you always use full path, and organize the contents of HPSS, where the default HSI directory placement is $ARCHIVE:
<pre>
hsi cput tarball
hsi cput tarball : tarball
hsi cput $SCRATCH/tarball : $ARCHIVE/tarball
</pre>

* There are no known issues renaming files on-the-fly:
<pre>
hsi cput $SCRATCH/tarball1 : $ARCHIVE/tarball2
hsi cget $SCRATCH/tarball3 : $ARCHIVE/tarball2
</pre>

* However the syntax forms such as the ones below will fail, since they rename the directory paths.
<pre>
hsi cput -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir (FAILS)
OR
hsi cget -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir2 (FAILS)
OR
hsi cput -Ruph $SCRATCH/LargeFilesDir/* : $ARCHIVE/LargeFilesDir2 (FAILS)
OR
hsi cget -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir (FAILS)
</pre>

One workaround is the following 2-steps process, where you do a "lcd " in GPFS first, and recursively transfer the whole directory (-R), keeping the same name. You may use '-u' option to resume a previously disrupted session, and the '-p' to preserve timestamp, and '-h' to keep the links.
<pre>
hsi <<EOF
lcd $SCRATCH
cget -Ruph LargeFilesDir
end
EOF
</pre>

Another workaround is do a "lcd" into the GPFSpath first and a "cd" in the HPSSpath, but transfer the files individually with the '*' wild character. This option lets you change the directory name:
<pre>
hsi <<EOF
lcd $SCRATCH/LargeFilesDir
mkdir $ARCHIVE/LargeFilesDir2
cd $ARCHIVE/LargeFilesDir2
cput -Ruph *
end
EOF
</pre>

=== Documentation ===
Complete documentation on HSI is available from the Gleicher Enterprises links below. You may peruse those links and come with alternative syntax forms. You may even be already familiar with HPSS/HSI from other HPC facilities, that may or not have procedures similar to ours. HSI doesn't always work as expected when you go outside of our recommended syntax, so '''we strongly urge that you use the sample scripts we are providing as the basis''' for your job submissions
* [http://www.mgleicher.us/hsi/hsi_reference_manual_2/introduction.html HSI Introduction]
* [http://www.mgleicher.us/hsi/hsi_man_page.html man hsi]
* [http://support.scinet.utoronto.ca/wiki/index.php/HSI_help hsi help]
* [http://www.mgleicher.us/hsi/hsi-exit-codes.html exit codes]
'''Note:''' HSI returns the highest-numbered exit code, in case of multiple operations in the same hsi session. You may use '/scinet/niagara/bin/exit2msg $status' to translate those codes into intelligible messages

=== Typical Usage Scripts===
The most common interactions will be ''putting'' data into HPSS, examining the contents (ls,ish), and ''getting'' data back onto GPFS for inspection or analysis.

==== Sample '''data offload''' ====
<source lang="bash">
#!/bin/bash
# This script is named: data-offload.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J offload
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# individual tarballs already exist

/usr/local/bin/hsi -v <<EOF1
mkdir put-away
cd put-away
cput $SCRATCH/workarea/finished-job1.tar.gz : finished-job1.tar.gz
end
EOF1
status=$?
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

/usr/local/bin/hsi -v <<EOF2
mkdir put-away
cd put-away
cput $SCRATCH/workarea/finished-job2.tar.gz : finished-job2.tar.gz
end
EOF2
status=$?
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

trap - TERM INT
</source>

'''Note:''' as in the above example, we recommend that you capture the (highest-numbered) exit code for each hsi session independently. And remember, you may improve your exit code verbosity by adding the excerpt below to your scripts:
<source lang="bash">
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Sample '''data list''' ====
A very trivial way to list the contents of HPSS would be to just submit the HSI 'ls' command.
<source lang="bash">
#!/bin/bash
# This script is named: data-list.sh
#SBATCH -t 1:00:00
#SBATCH -p archiveshort
#SBATCH -N 1
#SBATCH -J hpss_ls
#SBATCH --mail-type=ALL

/usr/local/bin/hsi -v <<EOF
cd put-away
ls -R
end
EOF
</source>
''Warning: if you have a lot of files, the ls command will take a long time to complete. For instance, about 400,000 files can be listed in about an hour. Adjust the walltime accordingly, and be on the safe side.''

However, we provide a much more useful and convenient way to explore the contents of HPSS with the inventory shell [[ISH]]. This example creates an index of all the files in a user's portion of the namespace. The list is placed in the directory /home/$(whoami)/.ish_register that can be inspected from the login nodes.
<source lang="bash">
#!/bin/bash
# This script is named: data-list.sh
#SBATCH -t 1:00:00
#SBATCH -p archiveshort
#SBATCH -N 1
#SBATCH -J hpss_index
#SBATCH --mail-type=ALL

INDEX_DIR=$HOME/.ish_register
if ! [ -e "$INDEX_DIR" ]; then
mkdir -p $INDEX_DIR
fi

export ISHREGISTER="$INDEX_DIR"
/scinet/niagara/bin/ish hindex
</source>
''Note: the above warning on collecting the listing for many files applies here too.''

This index can be browsed or searched with ISH on the development nodes.
<source lang="bash">
[pinto@hpss-archive02-ib ~]$ /scinet/niagara/bin/ish ~/.ish_register/hpss.igz
[ish]hpss.igz> help
</source>

ISH is a powerful tool that is also useful for creating and browsing indices of tar and htar archives, so please look at the [[ISH|documentation]] or built in help.

==== Sample '''data recall''' ====
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_files
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT

mkdir -p $SCRATCH/recalled-from-hpss

# individual tarballs previously organized in HPSS inside the put-away-on-2010/ folder
hsi -v << EOF
cget $SCRATCH/recalled-from-hpss/Jan-2010-jobs.tar.gz : $ARCHIVE/put-away-on-2010/Jan-2010-jobs.tar.gz
cget $SCRATCH/recalled-from-hpss/Feb-2010-jobs.tar.gz : $ARCHIVE/put-away-on-2010/Feb-2010-jobs.tar.gz
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagar/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

We should emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do optimization, as in the following example:
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_files_optimized
#SBATCH --mail-type=AL

trap "echo 'Job script not completed';exit 129" TERM INT
mkdir -p $SCRATCH/recalled-from-hpss

# individual tarballs previously organized in HPSS inside the put-away-on-2010/ folder
hsi -v << EOF
lcd $SCRATCH/recalled-from-hpss/
cd $ARCHIVE/put-away-on-2010/
cget Jan-2010-jobs.tar.gz Feb-2010-jobs.tar.gz
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Sample '''transferring directories''' ====
Remember, it's not possible to rename directories on-the-fly:
<pre>
hsi cget -Ruph $SCRATCH/LargeFiles-recalled : $ARCHIVE/LargeFiles (FAILS)
</pre>

One workaround is transfer the whole directory (and sub-directories) recursively:
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_directories
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT

mkdir -p $SCRATCH/recalled

hsi -v << EOF
lcd $SCRATCH/recalled
cd $ARCHIVE/
cget -Ruph LargeFiles
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

Another workaround is to transfer files and subdirectories individually with the "*" wild character:
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_directories
#SBATCH --mail-type=AL

trap "echo 'Job script not completed';exit 129" TERM INT

mkdir -p $SCRATCH/LargeFiles-recalled

hsi -v << EOF
lcd $SCRATCH/LargeFiles-recalled
cd $ARCHIVE/LargeFiles
cget -Ruph *
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help''']

== '''[[ISH|ISH]]''' ==
=== [[ISH|Documentation and Usage]] ===

== '''File and directory management''' ==
=== Moving/renaming ===
* you may use 'mv' or 'cp' in the same way as the linux version.
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J deletion_script
#SBATCH --mail-type=ALL

echo "HPSS file and directory management"

trap "echo 'Job script not completed';exit 129" TERM INT

/usr/local/bin/hsi -v <<EOF1
mkdir $ARCHIVE/2011
mv $ARCHIVE/oldjobs $ARCHIVE/2011
cp -r $ARCHIVE/almostfinished/*done $ARCHIVE/2011
end
EOF1
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

=== Deletions ===
==== Recommendations ====
* Be careful with the use of 'cd' commands to non-existing directories before the 'rm' command. Results may be unpredictable
* Avoid the use of the stand alone wild character '''*'''. If necessary, whenever possible have it bound to common patterns, such as '*.tmp', so to limit unintentional mis-happens
* Avoid using relative paths, even the env variable $ARCHIVE. Better to explicitly expand the full paths in your scripts
* Avoid using recursive/looped deletion instructions on $SCRATCH contents from the archive job scripts. Even on $ARCHIVE contents, it may be better to do it as an independent job submission, after you have verified that the original ingestion into HPSS finished without any issues.

==== Typical example ====
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J deletion_script
#SBATCH --mail-type=ALL

echo "Deletion of an outdated directory tree into HPSS"

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that the initial directory in HPSS ($ARCHIVE) has the path explicitly expanded

/usr/local/bin/hsi -v <<EOF1
rm /archive/s/scinet/pinto/*.tmp
rm -R /archive/s/scinet/pinto/obsolete
end
EOF1
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Deleting with an interactive HSI session ====
* You may feel more comfortable acquiring an interactive shell, starting an HSI session and proceeding with your deletions that way. Keep in mind, you're restricted to 1H.

* After using the ''qsub -q archive -I'' command you'll get a standard shell prompt on an archive execution node (hpss-archive02), as you would on any compute node. However you will need to run '''HSI''' or '''HTAR''' to access resources on HPSS.

* HSI will give you a prompt very similar to a standard shell, where your can navigate around using commands such 'ls', 'cd', 'pwd', etc ... NOTE: not every bash command has an equivalent on HSI - for instance, you can not 'vi' or 'cat'.

<pre>
pinto@nia-login07:~$ salloc -p archiveshort -t 1:00:00
salloc: Granted job allocation 50359
salloc: Waiting for resource configuration
salloc: Nodes hpss-archive02-ib are ready for job

[pinto@hpss-archive02-ib ~]$ hsi
******************************************************************
* Welcome to HPSS@SciNet - High Perfomance Storage System *
* *
* INFO: THIS IS THE NEW 7.5.1 HPSS SYSTEM! *
* *
* Contact Information: support@scinet.utoronto.ca *
* NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead *
* CHECK THE INTEGRITY OF YOUR TARBALLS *
******************************************************************
]
[HSI]/archive/s/scinet/pinto-> rm -R junk

</pre>

== '''HPSS for the 'Watchmaker' ''' ==
=== Efficient alternative to htar ===
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J tar_create_tarball_in_hpss_with_hsi_by_piping
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

# When using a pipeline like this
set -o pipefail

# to put (cput will fail)
tar -c $SCRATCH/mydir | hsi put - : $ARCHIVE/mydir.tar
status=$?
if [ ! $status == 0 ]; then
echo 'TAR+HSI+piping returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# to immediately generate an index
ish hindex $ARCHIVE/mydir.tar
status=$?
if [ ! $status == 0 ]; then
echo 'ISH returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# to get
#cd $SCRATCH
#hsi cget - : $ARCHIVE/mydir.tar | tar -xv
#status=$?
# if [ ! $status == 0 ]; then
# echo 'TAR+HSI+piping returned non-zero code.'
# /scinet/niagara/bin/exit2msg $status
# exit $status
#else
# echo 'TRANSFER SUCCESSFUL'
#fi

trap - TERM INT

</source>
'''Notes:'''
* Combining commands in this fashion, besides being HPSS-friendly, should not be that noticeably slower than the recursive put with HSI that stores each file one by one. However, reading the files back from tape in this format will be many times faster. It would also overcome the current 68GB limit on the size of stored files that we have with htar.
* To top things off, we recommend indexing with ish (in the same script) immediately after the tarball creation , while it resides in the HPSS cache. It would be as if htar was used.
* To ensure that an error at any stage of the pipeline shows up in the returned status use: ''set -o pipefail'' (The default is to return the status of the last command in the pipeline and this is not what you want.)
* Optimal performance for aggregated transfers and allocation on tapes is obtained with [[Why not tarballs too large |tarballs of size 500GB or less]], whether ingested by htar or hsi ([[Why not tarballs too large | WHY?]]). Be sure to check the contents of the directory tree with 'du' for the total amount of data before sending them to the tar+HSI piping.

=== Multi-threaded gzip'ed compression with pigz ===
We compiled multi-threaded implementation of gzip called pigz (http://zlib.net/pigz/). It's now part of the "extras" module. It can also be used on any compute or devel nodes. This makes the execution of the previous version of the script much quicker than if you were to use 'tar -cfz'. In addition, by piggy-backing ISH to the end of the script, it will know what to do with the just created mydir.tar.gz compressed tarball.

<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J tar_create_compressed_tarball_in_hpss_with_hsi_by_piping
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

# When using a pipeline like this
set -o pipefail

load module extras

# to put (cput will fail)
tar -c $SCRATCH/mydir | pigz | hsi put - : $ARCHIVE/mydir.tar.gz
status=$?
if [ ! $status == 0 ]; then
echo 'TAR+PIGZ+HSI+piping returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

=== Content Verification ===

==== HTAR CRC checksums ====
Specifies that HTAR should generate CRC checksums when creating the archive.

<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_create_tarball_in_hpss_with_checksum_verification
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

cd $SCRATCH/workarea

# to put
htar -Humask=0137 -cpf $ARCHIVE/finished-job1.tar -Hcrc -Hverify=1 finished-job1/

# to get
#mkdir $SCRATCH/verification
#cd $SCRATCH/verification
#htar -Hcrc -xvpmf $ARCHIVE/finished-job1.tar

status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Current HSI version - Checksum built-in ====

MD5 is the standard Hashing Algorithm for the HSI build at SciNet. For hsi ingestions with the '-c on' option you should be able to query the md5 hash with the hsi command 'hashli'. That value is stored as an UDA (User Defined Attribute) for each file (a feature of HPSS starting with 7.4)

[http://www.mgleicher.us/GEL/hsi/hsi_reference_manual_2/checksum-feature.html More usage details here]

The checksum algorithm is very CPU-intensive. Although the checksum code is compiled with a high level of compiler optimization, transfer rates can be significantly reduced when checksum creation or verification is in effect. The amount of degradation in transfer rates depends on several factors, such as processor speed, network transfer speed, and speed of the local filesystem (GPFS).

<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J MD5_checksum_verified_transfer
#SBATCH --mail-type=ALL

thefile=<GPFSpath>
storedfile=<HPSSpath>

# Generate checksum on fly (-c on)
hsi -q put -c on $thefile : $storedfile
pid=$!

# Check the exit code of the HSI process
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# verify checksum
hsi lshash $storedfile
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# get the file back with checksum
hsi get -c on $storedfile
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Prior to HSI version 4.0.1.1 ====

This will checksum the contents of the HPSSpath against the original GPFSpath after the transfer has finished.
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J checksum_verified_transfer
#SBATCH --mail-type=ALL

thefile=<GPFSpath>
storedfile=<HPSSpath>

# Generate checksum on fly using a named pipe so that file is only read from GPFS once
mkfifo /tmp/NPIPE
cat $thefile | tee /tmp/NPIPE | hsi -q put - : $storedfile &
pid=$!
md5sum /tmp/NPIPE |tee /tmp/$fname.md5
rm -f /tmp/NPIPE

# Check the exit code of the HSI process
wait $pid
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# change filename to stdin in checksum file
sed -i.1 "s+/tmp/NPIPE+-+" /tmp/$fname.md5

# verify checksum
hsi -q get - : $storedfile | md5sum -c /tmp/$fname.md5
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

== '''Access to HPSS using Globus''' ==
* Please note that Globus access to HPSS is disabled until further notice, due to lack of version compatibility.

* You may now transfer data between SciNet's HPSS and an external source
* Follow the link below
https://globus.computecanada.ca
: Enter your Compute Canada username and password.
* In the 'File Transfer' tab, enter ''''Compute Canada HPSS'''' as one of the Endpoints. To authenticate this endpoint, enter your SciNet username and password.
* You may read more about Compute Canada's Globus Portal here:
https://docs.computecanada.ca/wiki/Globus

== '''Access to HPSS using SME''' ==
* Storage Made Easy - SME - is an Enterprise Cloud Portal adopted by SciNet to allow our users to access HPSS
* Best suitable for light transfers to/from your personal computer and to navigate your contents on HPSS
* Follow the link below using a web browser and login with your SicNet UserID and password. Under File Manager you will find the "'''SciNet HPSS'''" folder.
https://sme.scinet.utoronto.ca
* SME can be configured as a DropBox. To download the Free Cloud File Manager native to your OS (Windows, Mac, Linux, mobile), follow the link below:
https://www.storagemadeeasy.com/clients_and_tools/
Once you have downloaded and installed the Cloud Manager App, fill up the following information:
Server location
https://sme.scinet.utoronto.ca/api
* You may learn more about SME capabilities and features here:
https://www.storagemadeeasy.com/ownFileserver/
https://www.storagemadeeasy.com/pricing/#features (Enterprise)
https://storagemadeeasy.com/faq/

== '''User provided Content/Suggestions''' ==
=== '''[[HPSS-by-pomes|Packing up large data sets and putting them on HPSS]]''' ===
(Pomés group recommendations)

[[Data Management|BACK TO Data Management]]

HPSS

2018-05-04T00:12:24Z

Pinto: /* HTAR */

{|align=right
|align=center|'''Topology Overview'''
|align=center|'''Submission Queue'''
|-
|[[Image:HPSS-overview.png|right|x200px]]
|[[Image:HPSS-queue2.png|right|x200px]]
|-
|
|
|-
|align=center|'''Servers Rack'''
|align=center|'''TS3500 Library'''
|-
|[[Image:HPSS-servers.png|right|x250px]]
|[[Image:HPSS-TS3500.png|right|x250px]]
|}

= '''High Performance Storage System''' =

The High Performance Storage System ([http://www.hpss-collaboration.org/index.shtml HPSS] [http://en.wikipedia.org/wiki/High_Performance_Storage_System wikipedia]) is a tape-backed hierarchical storage system that provides a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data can be returned to the active GPFS filesystem when it is needed.

Since this system is intended for large data storage, it is accessible only to groups who have been awarded storage space at SciNet beyond 5TB in the yearly RAC resource allocation round. However, upon request, any user may be awarded access to HPSS, up to 2TB per group, so that you may get familiar with the system (just email support@scinet.utoronto.ca)

Access and transfer of data into and out of HPSS is done under the control of the user, whose interaction is expected to be scripted and submitted as a batch job, using one or more of the following utilities:
* [http://www.mgleicher.us/GEL/hsi HSI] is a client with an ftp-like functionality which can be used to archive and retrieve large files. It is also useful for browsing the contents of HPSS.
* [http://www.mgleicher.us/GEL/htar HTAR] is a utility that creates tar formatted archives directly into HPSS. It also creates a separate index file (.idx) that can be accessed and browsed quickly.
* [https://support.scinet.utoronto.ca/wiki/index.php/ISH ISH] is a TUI utility that can perform an inventory of the files and directories in your tarballs.

We're currently running HPSS v 7.3.3 patch 6, and HSI/HTAR version 4.0.1.2

== '''Why should I use and trust HPSS?''' ==
* HPSS is a 25 year-old collaboration between IBM and the DoE labs in the US, and is used by about 45 facilities in the [http://www.top500.org “Top 500”] HPC list (plus some black-sites).
* Over 2.5 ExaBytes of combined storage world-wide.
* The top 3 sites in the World report (fall 2017) having 360PB, 220PB and 125PB in production (ECMWF, UKMO and BNL)
* Environment Canada also adopted HPSS in 2017 to store Nav Canada data as well as to serve as their own archive. Currently has 2 X 100PB capacity installed.
* The SciNet HPSS system has been providing nearline capacity for important research data in Canada since early 2011, already at 10PB levels in 2018
* Very reliable, data redundancy and data insurance built-in (dual copies of everything are kept on tapes at SciNet)
* Data on cache and tapes can be geo-distributed for further resilience and HA.
* Highly scalable; current performance at SciNet - after a modest upgrade in 2017 - Ingest: ~150 TB/day, Recall: ~45 TB/day (aggregated).
* HSI/HTAR clients also very reliable and used on several HPSS sites. ISH was written at SciNet.
* [[Media:HPSS_rationale_SNUG.pdf|HPSS fits well with the Storage Capacity Expansion Plan at SciNet]] (pdf presentation)

== '''Guidelines''' ==
* Expanded storage capacity is provided on tape -- a media that is not suited for storing small files. Files smaller than ~200MB should be grouped into tarballs with tar or htar.
* Optimal performance for aggregated transfers and allocation on tapes is obtained with [[Why not tarballs too large |tarballs of size 500GB or less]], whether ingested by htar or hsi ([[Why not tarballs too large | WHY?]])
* We strongly urge that you use the sample scripts we are providing as the basis for your job submissions.
* Make sure to check the application's exit code and returned logs for errors after any data transfer or tarball creation process

== '''New to the System?''' ==
The first step is to email scinet support and request an HPSS account (or else you will get "Error - authentication/initialization failed" and 71 exit codes).

THIS set of instructions on the wiki is the best and most compressed "manual" we have. It may seem a bit overwhelming at first, because of all the job script templates we make available below (they are here so you don't have to think
too much, just copy and paste), but if you approach the index at the top as a "case switch" mechanism for what you intend to do, everything falls in place.

Try this sequence:

1) [https://wiki.scinet.utoronto.ca/wiki/index.php/HPSS#Access_Through_an_Interactive_HSI_session take a look around HPSS using an interactive HSI session]

(most linux shell commands have an equivalent in HPSS)

2) [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_tarball_create archive a small test directory using HTAR]

2a) use step 1) to see what happened

3) [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_data_offload archive a file using hsi]

3a) use step 1) to see what happened

4) [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_transferring_directories archive a small test directory using HSI]

4a) use step 1) to see what happened

5) now try the other cases and so on. In a couple of hours you'll be in pretty good shape.

== '''Bridge between BGQ and HPSS''' ==

At this time BGQ users will have to migrate data to Niagara scratch prior to transferring it to HPSS. We are looking for ways to improve this workflow.

== '''Access Through the Queue System''' ==
All access to the archive system is done through the [https://docs.computecanada.ca/wiki/Niagara_Quickstart#Submitting_jobs NIA queue system].

* Job submissions should be done to the 'archivelong' queue or the 'archiveshort'
* Short jobs are limited to 1H walltime by default. Long jobs (> 1H) are limited to 72H walltime.
* Users are limited to only 2 long jobs and 2 short jobs at the same time, and 10 jobs total on the each queue.
* There can only be 5 long jobs running at any given time overall. Remaining submissions will be placed on hold for the time being. So far we have not seen a need for overall limit on short jobs.

The status of pending jobs can be monitored with squeue specifying the archive queue:
<pre>
squeue -p archiveshort
OR
squeue -p archivelong
</pre>

== '''Access Through an Interactive HSI session''' ==
* You may want to acquire an interactive shell, start an HSI session and navigate the archive naming-space. Keep in mind, you're restricted to 1H.

<pre>
pinto@nia-login07:~$ salloc -p archiveshort -t 1:00:00
salloc: Granted job allocation 50918
salloc: Waiting for resource configuration
salloc: Nodes hpss-archive02-ib are ready for job
hpss-archive02-ib:~$

hpss-archive02-ib:~$ hsi (DON'T FORGET TO START HSI)
******************************************************************
* Welcome to HPSS@SciNet - High Perfomance Storage System *
* *
* INFO: THIS IS THE NEW 7.5.1 HPSS SYSTEM! *
* *
* Contact Information: support@scinet.utoronto.ca *
* NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead *
* CHECK THE INTEGRITY OF YOUR TARBALLS *
******************************************************************
[HSI]/archive/s/scinet/pinto-> ls

[HSI]/archive/s/scinet/pinto-> cd <some directory>
</pre>

=== Scripted File Transfers ===
File transfers in and out of the HPSS should be scripted into jobs and submitted to the ''archivelong'' queue or the ''archiveshort'' . See generic example below:

<source lang="bash">
#!/bin/bash -l
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_create_tarball_in_hpss
#SBATCH --mail-type=ALL

echo "Creating a htar of finished-job1/ directory tree into HPSS"

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

DEST=$ARCHIVE/finished-job1.tar

# htar WILL overwrite an existing file with the same name so check beforehand.

hsi ls $DEST &> /dev/null
status=$?

if [ $status == 0 ]; then
echo 'File $DEST already exists. Nothing has been done'
exit 1
fi

cd $SCRATCH/workarea/
htar -Humask=0137 -cpf $ARCHIVE/finished-job1.tar finished-job1/
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>
'''Note:''' Always trap the execution of your jobs for abnormal terminations, and be sure to return the exit code

=== Job Dependencies ===

Typically data will be recalled to /scratch when it is needed for analysis. Job dependencies can be constructed so that analysis jobs wait in the queue for data recalls before starting. The qsub flag is
<pre>
--dependency=<type:JOBID>
</pre>
where JOBID is the job number of the archive recalling job that must finish successfully before the analysis job can start.

Here is a short cut for generating the dependency (lookup [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_data_recall data-recall.sh samples]):
<pre>
hpss-archive02-ib:~$ sbatch $(sbatch data-recall.sh | awk {print "--dependency=afterany:"$1}') job-to-work-on-recalled-data.sh
</pre>

== '''HTAR''' ==
''' Please aggregate small files (<~200MB) into tarballs or htar files. '''

''' [[Why not tarballs too large |Keep your tarballs to size 500GB or less]], whether ingested by htar or hsi ([[Why not tarballs too large | WHY?]])'''

HTAR is a utility that is used for aggregating a set of files and directories, by using a sophisticated multithreaded buffering scheme to write files directly from GPFS into HPSS, creating an archive file that conforms to the POSIX TAR specification, thereby achieving a high rate of performance. HTAR does not do gzip compression, however it already has a built-in checksum algorithm.

'''Caution'''
* Files larger than 68 GB cannot be stored in an HTAR archive. If you attempt to start a transfer with any files larger than 68GB the whole HTAR session will fail, and you'll get a notification listing all those files, so that you can transfer them with HSI.
* Files with pathnames too long will be skipped (greater than 100 characters), so as to conform with TAR protocol [[(POSIX 1003.1 USTAR)]] -- Note that the HTAR will erroneously indicate success, however will produce exit code 70. For now, you can check for this type of error by "grep Warning my.output" after the job has completed.
* Unlike with cput/cget in HSI, "prompt before overwrite", this is not the default with (h)tar. Be careful not to unintentionally overwrite a previous htar destination file in HPSS. There could be a similar situation when extracting material back into GPFS and overwriting the originals. Be sure to double-check the logic in your scripts.
* Check the HTAR exit code and log file before removing any files from the GPFS active filesystems.

=== HTAR Usage ===
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, and preserve mask attributes (-p), enter:
<pre>
htar -cpf files.tar file1 file2
OR
htar -cpf $ARCHIVE/files.tar file1 file2
</pre>

* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:
<pre>
htar -cpf subdirA.tar subdirA
</pre>

* To extract all files from the archive file called ''proj1.tar'' in HPSS into the ''project1/src'' directory in GPFS, and use the time of extraction as the modification time, enter:
<pre>
cd project1/src
htar -xpmf proj1.tar
</pre>

* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):
<pre>
htar -vtf out.tar
</pre>

* To ensure that both the htar and the .idx files have read permissions to other members in your group use the umask option
<pre>
htar -Humask=0137 ....
</pre>

For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online

==== Sample tarball create ====
<source lang="bash">
#!/bin/bash -l
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_create_tarball_in_hpss
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be /archive/$(id -gn)/$(whoami)/

DEST=$ARCHIVE/finished-job1.tar

# htar WILL overwrite an existing file with the same name so check beforehand.

hsi ls $DEST &> /dev/null
status=$?

if [ $status == 0 ]; then
echo 'File $DEST already exists. Nothing has been done'
exit 1
fi

cd $SCRATCH/workarea/
htar -Humask=0137 -cpf $DEST finished-job1/
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

'''Note:''' If you attempt to start a transfer with any files larger than 68GB the whole HTAR session will fail, and you'll get a notification listing all those files, so that you can transfer them with HSI.
<pre>
----------------------------------------
INFO: File too large for htar to handle: finished-job1/file1 (86567185745 bytes)
INFO: File too large for htar to handle: finished-job1/file2 (71857244579 bytes)
ERROR: 2 oversize member files found - please correct and retry
ERROR: [FATAL] error(s) generating filename list
HTAR: HTAR FAILED
###WARNING htar returned non-zero exit status
</pre>

==== Sample tarball list ====
<source lang="bash">
#!/bin/bash -l
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_list_tarball_in_hpss
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

DEST=$ARCHIVE/finished-job1.tar

htar -tvf $DEST
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Sample tarball extract ====
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_extract_tarball_from_hpss
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

cd $SCRATCH/recalled-from-hpss
htar -xpmf $ARCHIVE/finished-job1.tar
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

== '''HSI''' ==

HSI may be the primary client with which some users will interact with HPSS. It provides an ftp-like interface for archiving and retrieving tarballs or [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_transferring_directories directory trees]. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents in HPSS. The most commonly used commands will be:
{|border="1" cellpadding="10" cellspacing="0"
|-
| cput
| Conditionally saves or replaces a HPSSpath file to GPFSpath if the GPFS version is new or has been updated
cput [options] GPFSpath [: HPSSpath]
|-
| cget
| Conditionally retrieves a copy of a file from HPSS to GPFS only if a GPFS version does not already exist.
cget [options] [GPFSpath :] HPSSpath
|-
| cd,mkdir,ls,rm,mv
| Operate as one would expect on the contents of HPSS.
|-
| lcd,lls
| ''Local'' commands to GPFS
|}

*There are 3 distinctions about HSI that you should keep in mind, and that can generate a bit of confusion when you're first learning how to use it:
** HSI doesn't currently support renaming directories paths during transfers on-the-fly, therefore the syntax for cput/cget may not work as one would expect in some scenarios, requiring some workarounds.
** HSI has an operator ":" which separates the GPFSpath and HPSSpath, and must be surrounded by whitespace (one or more space characters)
** The order for referring to files in HSI syntax is different from FTP. In HSI the general format is always the same, GPFS first, HPSS second, cput or cget:
<pre>
GPFSfile : HPSSfile
</pre>
For example, when using HSI to store the tarball file from GPFS into HPSS, then recall it to GPFS, the following commands could be used:
<pre>
cput tarball-in-GPFS : tarball-in-HPSS
cget tarball-recalled : tarball-in-HPSS
</pre>

unlike with FTP, where the following syntax would be used:
<pre>
put tarball-in-GPFS tarball-in-HPSS
get tarball-in-HPSS tarball-recalled
</pre>
* Simple commands can be executed on a single line.
<pre>
hsi "mkdir LargeFilesDir; cd LargeFilesDir; cput tarball-in-GPFS : tarball-in-HPSS"
</pre>

* More complex sequences can be performed using an except such as this:
<pre>
hsi <<EOF
mkdir LargeFilesDir
cd LargeFilesDir
cput tarball-in-GPFS : tarball-in-HPSS
lcd $SCRATCH/LargeFilesDir2/
cput -Ruph *
end
EOF
</pre>

* The commands below are equivalent, but we recommend that you always use full path, and organize the contents of HPSS, where the default HSI directory placement is $ARCHIVE:
<pre>
hsi cput tarball
hsi cput tarball : tarball
hsi cput $SCRATCH/tarball : $ARCHIVE/tarball
</pre>

* There are no known issues renaming files on-the-fly:
<pre>
hsi cput $SCRATCH/tarball1 : $ARCHIVE/tarball2
hsi cget $SCRATCH/tarball3 : $ARCHIVE/tarball2
</pre>

* However the syntax forms such as the ones below will fail, since they rename the directory paths.
<pre>
hsi cput -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir (FAILS)
OR
hsi cget -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir2 (FAILS)
OR
hsi cput -Ruph $SCRATCH/LargeFilesDir/* : $ARCHIVE/LargeFilesDir2 (FAILS)
OR
hsi cget -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir (FAILS)
</pre>

One workaround is the following 2-steps process, where you do a "lcd " in GPFS first, and recursively transfer the whole directory (-R), keeping the same name. You may use '-u' option to resume a previously disrupted session, and the '-p' to preserve timestamp, and '-h' to keep the links.
<pre>
hsi <<EOF
lcd $SCRATCH
cget -Ruph LargeFilesDir
end
EOF
</pre>

Another workaround is do a "lcd" into the GPFSpath first and a "cd" in the HPSSpath, but transfer the files individually with the '*' wild character. This option lets you change the directory name:
<pre>
hsi <<EOF
lcd $SCRATCH/LargeFilesDir
mkdir $ARCHIVE/LargeFilesDir2
cd $ARCHIVE/LargeFilesDir2
cput -Ruph *
end
EOF
</pre>

=== Documentation ===
Complete documentation on HSI is available from the Gleicher Enterprises links below. You may peruse those links and come with alternative syntax forms. You may even be already familiar with HPSS/HSI from other HPC facilities, that may or not have procedures similar to ours. HSI doesn't always work as expected when you go outside of our recommended syntax, so '''we strongly urge that you use the sample scripts we are providing as the basis''' for your job submissions
* [http://www.mgleicher.us/hsi/hsi_reference_manual_2/introduction.html HSI Introduction]
* [http://www.mgleicher.us/hsi/hsi_man_page.html man hsi]
* [http://support.scinet.utoronto.ca/wiki/index.php/HSI_help hsi help]
* [http://www.mgleicher.us/hsi/hsi-exit-codes.html exit codes]
'''Note:''' HSI returns the highest-numbered exit code, in case of multiple operations in the same hsi session. You may use '/scinet/niagara/bin/exit2msg $status' to translate those codes into intelligible messages

=== Typical Usage Scripts===
The most common interactions will be ''putting'' data into HPSS, examining the contents (ls,ish), and ''getting'' data back onto GPFS for inspection or analysis.

==== Sample '''data offload''' ====
<source lang="bash">
#!/bin/bash
# This script is named: data-offload.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J offload
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# individual tarballs already exist

/usr/local/bin/hsi -v <<EOF1
mkdir put-away
cd put-away
cput $SCRATCH/workarea/finished-job1.tar.gz : finished-job1.tar.gz
end
EOF1
status=$?
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

/usr/local/bin/hsi -v <<EOF2
mkdir put-away
cd put-away
cput $SCRATCH/workarea/finished-job2.tar.gz : finished-job2.tar.gz
end
EOF2
status=$?
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

trap - TERM INT
</source>

'''Note:''' as in the above example, we recommend that you capture the (highest-numbered) exit code for each hsi session independently. And remember, you may improve your exit code verbosity by adding the excerpt below to your scripts:
<source lang="bash">
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Sample '''data list''' ====
A very trivial way to list the contents of HPSS would be to just submit the HSI 'ls' command.
<source lang="bash">
#!/bin/bash
# This script is named: data-list.sh
#SBATCH -t 1:00:00
#SBATCH -p archiveshort
#SBATCH -N 1
#SBATCH -J hpss_ls
#SBATCH --mail-type=ALL

/usr/local/bin/hsi -v <<EOF
cd put-away
ls -R
end
EOF
</source>
''Warning: if you have a lot of files, the ls command will take a long time to complete. For instance, about 400,000 files can be listed in about an hour. Adjust the walltime accordingly, and be on the safe side.''

However, we provide a much more useful and convenient way to explore the contents of HPSS with the inventory shell [[ISH]]. This example creates an index of all the files in a user's portion of the namespace. The list is placed in the directory /home/$(whoami)/.ish_register that can be inspected from the login nodes.
<source lang="bash">
#!/bin/bash
# This script is named: data-list.sh
#SBATCH -t 1:00:00
#SBATCH -p archiveshort
#SBATCH -N 1
#SBATCH -J hpss_index
#SBATCH --mail-type=ALL

INDEX_DIR=$HOME/.ish_register
if ! [ -e "$INDEX_DIR" ]; then
mkdir -p $INDEX_DIR
fi

export ISHREGISTER="$INDEX_DIR"
/scinet/niagara/bin/ish hindex
</source>
''Note: the above warning on collecting the listing for many files applies here too.''

This index can be browsed or searched with ISH on the development nodes.
<source lang="bash">
[pinto@hpss-archive02-ib ~]$ /scinet/niagara/bin/ish ~/.ish_register/hpss.igz
[ish]hpss.igz> help
</source>

ISH is a powerful tool that is also useful for creating and browsing indices of tar and htar archives, so please look at the [[ISH|documentation]] or built in help.

==== Sample '''data recall''' ====
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_files
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT

mkdir -p $SCRATCH/recalled-from-hpss

# individual tarballs previously organized in HPSS inside the put-away-on-2010/ folder
hsi -v << EOF
cget $SCRATCH/recalled-from-hpss/Jan-2010-jobs.tar.gz : $ARCHIVE/put-away-on-2010/Jan-2010-jobs.tar.gz
cget $SCRATCH/recalled-from-hpss/Feb-2010-jobs.tar.gz : $ARCHIVE/put-away-on-2010/Feb-2010-jobs.tar.gz
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagar/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

We should emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do optimization, as in the following example:
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_files_optimized
#SBATCH --mail-type=AL

trap "echo 'Job script not completed';exit 129" TERM INT
mkdir -p $SCRATCH/recalled-from-hpss

# individual tarballs previously organized in HPSS inside the put-away-on-2010/ folder
hsi -v << EOF
lcd $SCRATCH/recalled-from-hpss/
cd $ARCHIVE/put-away-on-2010/
cget Jan-2010-jobs.tar.gz Feb-2010-jobs.tar.gz
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Sample '''transferring directories''' ====
Remember, it's not possible to rename directories on-the-fly:
<pre>
hsi cget -Ruph $SCRATCH/LargeFiles-recalled : $ARCHIVE/LargeFiles (FAILS)
</pre>

One workaround is transfer the whole directory (and sub-directories) recursively:
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_directories
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT

mkdir -p $SCRATCH/recalled

hsi -v << EOF
lcd $SCRATCH/recalled
cd $ARCHIVE/
cget -Ruph LargeFiles
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

Another workaround is to transfer files and subdirectories individually with the "*" wild character:
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_directories
#SBATCH --mail-type=AL

trap "echo 'Job script not completed';exit 129" TERM INT

mkdir -p $SCRATCH/LargeFiles-recalled

hsi -v << EOF
lcd $SCRATCH/LargeFiles-recalled
cd $ARCHIVE/LargeFiles
cget -Ruph *
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help''']

== '''[[ISH|ISH]]''' ==
=== [[ISH|Documentation and Usage]] ===

== '''File and directory management''' ==
=== Moving/renaming ===
* you may use 'mv' or 'cp' in the same way as the linux version.
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J deletion_script
#SBATCH --mail-type=ALL

echo "HPSS file and directory management"

trap "echo 'Job script not completed';exit 129" TERM INT

/usr/local/bin/hsi -v <<EOF1
mkdir $ARCHIVE/2011
mv $ARCHIVE/oldjobs $ARCHIVE/2011
cp -r $ARCHIVE/almostfinished/*done $ARCHIVE/2011
end
EOF1
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

=== Deletions ===
==== Recommendations ====
* Be careful with the use of 'cd' commands to non-existing directories before the 'rm' command. Results may be unpredictable
* Avoid the use of the stand alone wild character '''*'''. If necessary, whenever possible have it bound to common patterns, such as '*.tmp', so to limit unintentional mis-happens
* Avoid using relative paths, even the env variable $ARCHIVE. Better to explicitly expand the full paths in your scripts
* Avoid using recursive/looped deletion instructions on $SCRATCH contents from the archive job scripts. Even on $ARCHIVE contents, it may be better to do it as an independent job submission, after you have verified that the original ingestion into HPSS finished without any issues.

==== Typical example ====
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J deletion_script
#SBATCH --mail-type=ALL

echo "Deletion of an outdated directory tree into HPSS"

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that the initial directory in HPSS ($ARCHIVE) has the path explicitly expanded

/usr/local/bin/hsi -v <<EOF1
rm /archive/s/scinet/pinto/*.tmp
rm -R /archive/s/scinet/pinto/obsolete
end
EOF1
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Deleting with an interactive HSI session ====
* You may feel more comfortable acquiring an interactive shell, starting an HSI session and proceeding with your deletions that way. Keep in mind, you're restricted to 1H.

* After using the ''qsub -q archive -I'' command you'll get a standard shell prompt on an archive execution node (hpss-archive02), as you would on any compute node. However you will need to run '''HSI''' or '''HTAR''' to access resources on HPSS.

* HSI will give you a prompt very similar to a standard shell, where your can navigate around using commands such 'ls', 'cd', 'pwd', etc ... NOTE: not every bash command has an equivalent on HSI - for instance, you can not 'vi' or 'cat'.

<pre>
pinto@nia-login07:~$ salloc -p archiveshort -t 1:00:00
salloc: Granted job allocation 50359
salloc: Waiting for resource configuration
salloc: Nodes hpss-archive02-ib are ready for job

[pinto@hpss-archive02-ib ~]$ hsi
******************************************************************
* Welcome to HPSS@SciNet - High Perfomance Storage System *
* *
* INFO: THIS IS THE NEW 7.5.1 HPSS SYSTEM! *
* *
* Contact Information: support@scinet.utoronto.ca *
* NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead *
* CHECK THE INTEGRITY OF YOUR TARBALLS *
******************************************************************
]
[HSI]/archive/s/scinet/pinto-> rm -R junk

</pre>

== '''HPSS for the 'Watchmaker' ''' ==
=== Efficient alternative to htar ===
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J tar_create_tarball_in_hpss_with_hsi_by_piping
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

# When using a pipeline like this
set -o pipefail

# to put (cput will fail)
tar -c $SCRATCH/mydir | hsi put - : $ARCHIVE/mydir.tar
status=$?
if [ ! $status == 0 ]; then
echo 'TAR+HSI+piping returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# to immediately generate an index
ish hindex $ARCHIVE/mydir.tar
status=$?
if [ ! $status == 0 ]; then
echo 'ISH returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# to get
#cd $SCRATCH
#hsi cget - : $ARCHIVE/mydir.tar | tar -xv
#status=$?
# if [ ! $status == 0 ]; then
# echo 'TAR+HSI+piping returned non-zero code.'
# /scinet/niagara/bin/exit2msg $status
# exit $status
#else
# echo 'TRANSFER SUCCESSFUL'
#fi

trap - TERM INT

</source>
'''Notes:'''
* Combining commands in this fashion, besides being HPSS-friendly, should not be that noticeably slower than the recursive put with HSI that stores each file one by one. However, reading the files back from tape in this format will be many times faster. It would also overcome the current 68GB limit on the size of stored files that we have with htar.
* To top things off, we recommend indexing with ish (in the same script) immediately after the tarball creation , while it resides in the HPSS cache. It would be as if htar was used.
* To ensure that an error at any stage of the pipeline shows up in the returned status use: ''set -o pipefail'' (The default is to return the status of the last command in the pipeline and this is not what you want.)
* Optimal performance for aggregated transfers and allocation on tapes is obtained with [[Why not tarballs too large |tarballs of size 500GB or less]], whether ingested by htar or hsi ([[Why not tarballs too large | WHY?]]). Be sure to check the contents of the directory tree with 'du' for the total amount of data before sending them to the tar+HSI piping.

=== Multi-threaded gzip'ed compression with pigz ===
We compiled multi-threaded implementation of gzip called pigz (http://zlib.net/pigz/). It's now part of the "extras" module. It can also be used on any compute or devel nodes. This makes the execution of the previous version of the script much quicker than if you were to use 'tar -cfz'. In addition, by piggy-backing ISH to the end of the script, it will know what to do with the just created mydir.tar.gz compressed tarball.

<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J tar_create_compressed_tarball_in_hpss_with_hsi_by_piping
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

# When using a pipeline like this
set -o pipefail

load module extras

# to put (cput will fail)
tar -c $SCRATCH/mydir | pigz | hsi put - : $ARCHIVE/mydir.tar.gz
status=$?
if [ ! $status == 0 ]; then
echo 'TAR+PIGZ+HSI+piping returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

=== Content Verification ===

==== HTAR CRC checksums ====
Specifies that HTAR should generate CRC checksums when creating the archive.

<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_create_tarball_in_hpss_with_checksum_verification
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

cd $SCRATCH/workarea

# to put
htar -Humask=0137 -cpf $ARCHIVE/finished-job1.tar -Hcrc -Hverify=1 finished-job1/

# to get
#mkdir $SCRATCH/verification
#cd $SCRATCH/verification
#htar -Hcrc -xvpmf $ARCHIVE/finished-job1.tar

status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Current HSI version - Checksum built-in ====

MD5 is the standard Hashing Algorithm for the HSI build at SciNet. For hsi ingestions with the '-c on' option you should be able to query the md5 hash with the hsi command 'hashli'. That value is stored as an UDA (User Defined Attribute) for each file (a feature of HPSS starting with 7.4)

[http://www.mgleicher.us/GEL/hsi/hsi_reference_manual_2/checksum-feature.html More usage details here]

The checksum algorithm is very CPU-intensive. Although the checksum code is compiled with a high level of compiler optimization, transfer rates can be significantly reduced when checksum creation or verification is in effect. The amount of degradation in transfer rates depends on several factors, such as processor speed, network transfer speed, and speed of the local filesystem (GPFS).

<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J MD5_checksum_verified_transfer
#SBATCH --mail-type=ALL

thefile=<GPFSpath>
storedfile=<HPSSpath>

# Generate checksum on fly (-c on)
hsi -q put -c on $thefile : $storedfile
pid=$!

# Check the exit code of the HSI process
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# verify checksum
hsi lshash $storedfile
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# get the file back with checksum
hsi get -c on $storedfile
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Prior to HSI version 4.0.1.1 ====

This will checksum the contents of the HPSSpath against the original GPFSpath after the transfer has finished.
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J checksum_verified_transfer
#SBATCH --mail-type=ALL

thefile=<GPFSpath>
storedfile=<HPSSpath>

# Generate checksum on fly using a named pipe so that file is only read from GPFS once
mkfifo /tmp/NPIPE
cat $thefile | tee /tmp/NPIPE | hsi -q put - : $storedfile &
pid=$!
md5sum /tmp/NPIPE |tee /tmp/$fname.md5
rm -f /tmp/NPIPE

# Check the exit code of the HSI process
wait $pid
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# change filename to stdin in checksum file
sed -i.1 "s+/tmp/NPIPE+-+" /tmp/$fname.md5

# verify checksum
hsi -q get - : $storedfile | md5sum -c /tmp/$fname.md5
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

== '''Access to HPSS using Globus''' ==
* Please note that Globus access to HPSS is disabled until further notice, due to lack of version compatibility.

* You may now transfer data between SciNet's HPSS and an external source
* Follow the link below
https://globus.computecanada.ca
: Enter your Compute Canada username and password.
* In the 'File Transfer' tab, enter ''''Compute Canada HPSS'''' as one of the Endpoints. To authenticate this endpoint, enter your SciNet username and password.
* You may read more about Compute Canada's Globus Portal here:
https://docs.computecanada.ca/wiki/Globus

== '''Access to HPSS using SME''' ==
* Storage Made Easy - SME - is an Enterprise Cloud Portal adopted by SciNet to allow our users to access HPSS
* Best suitable for light transfers to/from your personal computer and to navigate your contents on HPSS
* Follow the link below using a web browser and login with your SicNet UserID and password. Under File Manager you will find the "'''SciNet HPSS'''" folder.
https://sme.scinet.utoronto.ca
* SME can be configured as a DropBox. To download the Free Cloud File Manager native to your OS (Windows, Mac, Linux, mobile), follow the link below:
https://www.storagemadeeasy.com/clients_and_tools/
Once you have downloaded and installed the Cloud Manager App, fill up the following information:
Server location
https://sme.scinet.utoronto.ca/api
* You may learn more about SME capabilities and features here:
https://www.storagemadeeasy.com/ownFileserver/
https://www.storagemadeeasy.com/pricing/#features (Enterprise)
https://storagemadeeasy.com/faq/

== '''User provided Content/Suggestions''' ==
=== '''[[HPSS-by-pomes|Packing up large data sets and putting them on HPSS]]''' ===
(Pomés group recommendations)

[[Data Management|BACK TO Data Management]]

HPSS

2018-05-04T00:08:17Z

Pinto: /* Access Through an Interactive HSI session */

{|align=right
|align=center|'''Topology Overview'''
|align=center|'''Submission Queue'''
|-
|[[Image:HPSS-overview.png|right|x200px]]
|[[Image:HPSS-queue2.png|right|x200px]]
|-
|
|
|-
|align=center|'''Servers Rack'''
|align=center|'''TS3500 Library'''
|-
|[[Image:HPSS-servers.png|right|x250px]]
|[[Image:HPSS-TS3500.png|right|x250px]]
|}

= '''High Performance Storage System''' =

The High Performance Storage System ([http://www.hpss-collaboration.org/index.shtml HPSS] [http://en.wikipedia.org/wiki/High_Performance_Storage_System wikipedia]) is a tape-backed hierarchical storage system that provides a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data can be returned to the active GPFS filesystem when it is needed.

Since this system is intended for large data storage, it is accessible only to groups who have been awarded storage space at SciNet beyond 5TB in the yearly RAC resource allocation round. However, upon request, any user may be awarded access to HPSS, up to 2TB per group, so that you may get familiar with the system (just email support@scinet.utoronto.ca)

Access and transfer of data into and out of HPSS is done under the control of the user, whose interaction is expected to be scripted and submitted as a batch job, using one or more of the following utilities:
* [http://www.mgleicher.us/GEL/hsi HSI] is a client with an ftp-like functionality which can be used to archive and retrieve large files. It is also useful for browsing the contents of HPSS.
* [http://www.mgleicher.us/GEL/htar HTAR] is a utility that creates tar formatted archives directly into HPSS. It also creates a separate index file (.idx) that can be accessed and browsed quickly.
* [https://support.scinet.utoronto.ca/wiki/index.php/ISH ISH] is a TUI utility that can perform an inventory of the files and directories in your tarballs.

We're currently running HPSS v 7.3.3 patch 6, and HSI/HTAR version 4.0.1.2

== '''Why should I use and trust HPSS?''' ==
* HPSS is a 25 year-old collaboration between IBM and the DoE labs in the US, and is used by about 45 facilities in the [http://www.top500.org “Top 500”] HPC list (plus some black-sites).
* Over 2.5 ExaBytes of combined storage world-wide.
* The top 3 sites in the World report (fall 2017) having 360PB, 220PB and 125PB in production (ECMWF, UKMO and BNL)
* Environment Canada also adopted HPSS in 2017 to store Nav Canada data as well as to serve as their own archive. Currently has 2 X 100PB capacity installed.
* The SciNet HPSS system has been providing nearline capacity for important research data in Canada since early 2011, already at 10PB levels in 2018
* Very reliable, data redundancy and data insurance built-in (dual copies of everything are kept on tapes at SciNet)
* Data on cache and tapes can be geo-distributed for further resilience and HA.
* Highly scalable; current performance at SciNet - after a modest upgrade in 2017 - Ingest: ~150 TB/day, Recall: ~45 TB/day (aggregated).
* HSI/HTAR clients also very reliable and used on several HPSS sites. ISH was written at SciNet.
* [[Media:HPSS_rationale_SNUG.pdf|HPSS fits well with the Storage Capacity Expansion Plan at SciNet]] (pdf presentation)

== '''Guidelines''' ==
* Expanded storage capacity is provided on tape -- a media that is not suited for storing small files. Files smaller than ~200MB should be grouped into tarballs with tar or htar.
* Optimal performance for aggregated transfers and allocation on tapes is obtained with [[Why not tarballs too large |tarballs of size 500GB or less]], whether ingested by htar or hsi ([[Why not tarballs too large | WHY?]])
* We strongly urge that you use the sample scripts we are providing as the basis for your job submissions.
* Make sure to check the application's exit code and returned logs for errors after any data transfer or tarball creation process

== '''New to the System?''' ==
The first step is to email scinet support and request an HPSS account (or else you will get "Error - authentication/initialization failed" and 71 exit codes).

THIS set of instructions on the wiki is the best and most compressed "manual" we have. It may seem a bit overwhelming at first, because of all the job script templates we make available below (they are here so you don't have to think
too much, just copy and paste), but if you approach the index at the top as a "case switch" mechanism for what you intend to do, everything falls in place.

Try this sequence:

1) [https://wiki.scinet.utoronto.ca/wiki/index.php/HPSS#Access_Through_an_Interactive_HSI_session take a look around HPSS using an interactive HSI session]

(most linux shell commands have an equivalent in HPSS)

2) [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_tarball_create archive a small test directory using HTAR]

2a) use step 1) to see what happened

3) [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_data_offload archive a file using hsi]

3a) use step 1) to see what happened

4) [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_transferring_directories archive a small test directory using HSI]

4a) use step 1) to see what happened

5) now try the other cases and so on. In a couple of hours you'll be in pretty good shape.

== '''Bridge between BGQ and HPSS''' ==

At this time BGQ users will have to migrate data to Niagara scratch prior to transferring it to HPSS. We are looking for ways to improve this workflow.

== '''Access Through the Queue System''' ==
All access to the archive system is done through the [https://docs.computecanada.ca/wiki/Niagara_Quickstart#Submitting_jobs NIA queue system].

* Job submissions should be done to the 'archivelong' queue or the 'archiveshort'
* Short jobs are limited to 1H walltime by default. Long jobs (> 1H) are limited to 72H walltime.
* Users are limited to only 2 long jobs and 2 short jobs at the same time, and 10 jobs total on the each queue.
* There can only be 5 long jobs running at any given time overall. Remaining submissions will be placed on hold for the time being. So far we have not seen a need for overall limit on short jobs.

The status of pending jobs can be monitored with squeue specifying the archive queue:
<pre>
squeue -p archiveshort
OR
squeue -p archivelong
</pre>

== '''Access Through an Interactive HSI session''' ==
* You may want to acquire an interactive shell, start an HSI session and navigate the archive naming-space. Keep in mind, you're restricted to 1H.

<pre>
pinto@nia-login07:~$ salloc -p archiveshort -t 1:00:00
salloc: Granted job allocation 50918
salloc: Waiting for resource configuration
salloc: Nodes hpss-archive02-ib are ready for job
hpss-archive02-ib:~$

hpss-archive02-ib:~$ hsi (DON'T FORGET TO START HSI)
******************************************************************
* Welcome to HPSS@SciNet - High Perfomance Storage System *
* *
* INFO: THIS IS THE NEW 7.5.1 HPSS SYSTEM! *
* *
* Contact Information: support@scinet.utoronto.ca *
* NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead *
* CHECK THE INTEGRITY OF YOUR TARBALLS *
******************************************************************
[HSI]/archive/s/scinet/pinto-> ls

[HSI]/archive/s/scinet/pinto-> cd <some directory>
</pre>

=== Scripted File Transfers ===
File transfers in and out of the HPSS should be scripted into jobs and submitted to the ''archivelong'' queue or the ''archiveshort'' . See generic example below:

<source lang="bash">
#!/bin/bash -l
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_create_tarball_in_hpss
#SBATCH --mail-type=ALL

echo "Creating a htar of finished-job1/ directory tree into HPSS"

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

DEST=$ARCHIVE/finished-job1.tar

# htar WILL overwrite an existing file with the same name so check beforehand.

hsi ls $DEST &> /dev/null
status=$?

if [ $status == 0 ]; then
echo 'File $DEST already exists. Nothing has been done'
exit 1
fi

cd $SCRATCH/workarea/
htar -Humask=0137 -cpf $ARCHIVE/finished-job1.tar finished-job1/
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>
'''Note:''' Always trap the execution of your jobs for abnormal terminations, and be sure to return the exit code

=== Job Dependencies ===

Typically data will be recalled to /scratch when it is needed for analysis. Job dependencies can be constructed so that analysis jobs wait in the queue for data recalls before starting. The qsub flag is
<pre>
--dependency=<type:JOBID>
</pre>
where JOBID is the job number of the archive recalling job that must finish successfully before the analysis job can start.

Here is a short cut for generating the dependency (lookup [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_data_recall data-recall.sh samples]):
<pre>
hpss-archive02-ib:~$ sbatch $(sbatch data-recall.sh | awk {print "--dependency=afterany:"$1}') job-to-work-on-recalled-data.sh
</pre>

== '''HTAR''' ==
''' Please aggregate small files (<~200MB) into tarballs or htar files. '''

''' [[Why not tarballs too large |Keep your tarballs to size 500GB or less]], whether ingested by htar or hsi ([[Why not tarballs too large | WHY?]])'''

HTAR is a utility that is used for aggregating a set of files and directories, by using a sophisticated multithreaded buffering scheme to write files directly from GPFS into HPSS, creating an archive file that conforms to the POSIX TAR specification, thereby achieving a high rate of performance. HTAR does not do gzip compression, however it already has a built-in checksum algorithm.

'''Caution'''
* Files larger than 68 GB cannot be stored in an HTAR archive. If you attempt to start a transfer with any files larger than 68GB the whole HTAR session will fail, and you'll get a notification listing all those files, so that you can transfer them with HSI. unintentionally overwriting the htar destination file in HPSS
* Files with pathnames too long will be skipped (greater than 100 characters), so as to conform with TAR protocol [[(POSIX 1003.1 USTAR)]] -- Note that the HTAR will erroneously indicate success, however will produce exit code 70. For now, you can check for this type of error by "grep Warning my.output" after the job has completed.
* Unlike with cput/cget in HSI, "prompt before overwrite", this is not the default with (h)tar. Be careful not to unintentionally overwrite a previous htar destination file in HPSS. There could be a similar situation when extracting material back into GPFS and overwriting the originals. Be sure to double-check the logic in your scripts.
* Check the HTAR exit code and log file before removing any files from the GPFS active filesystems.

=== HTAR Usage ===
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, and preserve mask attributes (-p), enter:
<pre>
htar -cpf files.tar file1 file2
OR
htar -cpf $ARCHIVE/files.tar file1 file2
</pre>

* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:
<pre>
htar -cpf subdirA.tar subdirA
</pre>

* To extract all files from the archive file called ''proj1.tar'' in HPSS into the ''project1/src'' directory in GPFS, and use the time of extraction as the modification time, enter:
<pre>
cd project1/src
htar -xpmf proj1.tar
</pre>

* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):
<pre>
htar -vtf out.tar
</pre>

* To ensure that both the htar and the .idx files have read permissions to other members in your group use the umask option
<pre>
htar -Humask=0137 ....
</pre>

For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online

==== Sample tarball create ====
<source lang="bash">
#!/bin/bash -l
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_create_tarball_in_hpss
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be /archive/$(id -gn)/$(whoami)/

DEST=$ARCHIVE/finished-job1.tar

# htar WILL overwrite an existing file with the same name so check beforehand.

hsi ls $DEST &> /dev/null
status=$?

if [ $status == 0 ]; then
echo 'File $DEST already exists. Nothing has been done'
exit 1
fi

cd $SCRATCH/workarea/
htar -Humask=0137 -cpf $DEST finished-job1/
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

'''Note:''' If you attempt to start a transfer with any files larger than 68GB the whole HTAR session will fail, and you'll get a notification listing all those files, so that you can transfer them with HSI.
<pre>
----------------------------------------
INFO: File too large for htar to handle: finished-job1/file1 (86567185745 bytes)
INFO: File too large for htar to handle: finished-job1/file2 (71857244579 bytes)
ERROR: 2 oversize member files found - please correct and retry
ERROR: [FATAL] error(s) generating filename list
HTAR: HTAR FAILED
###WARNING htar returned non-zero exit status
</pre>

==== Sample tarball list ====
<source lang="bash">
#!/bin/bash -l
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_list_tarball_in_hpss
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

DEST=$ARCHIVE/finished-job1.tar

htar -tvf $DEST
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Sample tarball extract ====
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_extract_tarball_from_hpss
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

cd $SCRATCH/recalled-from-hpss
htar -xpmf $ARCHIVE/finished-job1.tar
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

== '''HSI''' ==

HSI may be the primary client with which some users will interact with HPSS. It provides an ftp-like interface for archiving and retrieving tarballs or [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_transferring_directories directory trees]. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents in HPSS. The most commonly used commands will be:
{|border="1" cellpadding="10" cellspacing="0"
|-
| cput
| Conditionally saves or replaces a HPSSpath file to GPFSpath if the GPFS version is new or has been updated
cput [options] GPFSpath [: HPSSpath]
|-
| cget
| Conditionally retrieves a copy of a file from HPSS to GPFS only if a GPFS version does not already exist.
cget [options] [GPFSpath :] HPSSpath
|-
| cd,mkdir,ls,rm,mv
| Operate as one would expect on the contents of HPSS.
|-
| lcd,lls
| ''Local'' commands to GPFS
|}

*There are 3 distinctions about HSI that you should keep in mind, and that can generate a bit of confusion when you're first learning how to use it:
** HSI doesn't currently support renaming directories paths during transfers on-the-fly, therefore the syntax for cput/cget may not work as one would expect in some scenarios, requiring some workarounds.
** HSI has an operator ":" which separates the GPFSpath and HPSSpath, and must be surrounded by whitespace (one or more space characters)
** The order for referring to files in HSI syntax is different from FTP. In HSI the general format is always the same, GPFS first, HPSS second, cput or cget:
<pre>
GPFSfile : HPSSfile
</pre>
For example, when using HSI to store the tarball file from GPFS into HPSS, then recall it to GPFS, the following commands could be used:
<pre>
cput tarball-in-GPFS : tarball-in-HPSS
cget tarball-recalled : tarball-in-HPSS
</pre>

unlike with FTP, where the following syntax would be used:
<pre>
put tarball-in-GPFS tarball-in-HPSS
get tarball-in-HPSS tarball-recalled
</pre>
* Simple commands can be executed on a single line.
<pre>
hsi "mkdir LargeFilesDir; cd LargeFilesDir; cput tarball-in-GPFS : tarball-in-HPSS"
</pre>

* More complex sequences can be performed using an except such as this:
<pre>
hsi <<EOF
mkdir LargeFilesDir
cd LargeFilesDir
cput tarball-in-GPFS : tarball-in-HPSS
lcd $SCRATCH/LargeFilesDir2/
cput -Ruph *
end
EOF
</pre>

* The commands below are equivalent, but we recommend that you always use full path, and organize the contents of HPSS, where the default HSI directory placement is $ARCHIVE:
<pre>
hsi cput tarball
hsi cput tarball : tarball
hsi cput $SCRATCH/tarball : $ARCHIVE/tarball
</pre>

* There are no known issues renaming files on-the-fly:
<pre>
hsi cput $SCRATCH/tarball1 : $ARCHIVE/tarball2
hsi cget $SCRATCH/tarball3 : $ARCHIVE/tarball2
</pre>

* However the syntax forms such as the ones below will fail, since they rename the directory paths.
<pre>
hsi cput -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir (FAILS)
OR
hsi cget -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir2 (FAILS)
OR
hsi cput -Ruph $SCRATCH/LargeFilesDir/* : $ARCHIVE/LargeFilesDir2 (FAILS)
OR
hsi cget -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir (FAILS)
</pre>

One workaround is the following 2-steps process, where you do a "lcd " in GPFS first, and recursively transfer the whole directory (-R), keeping the same name. You may use '-u' option to resume a previously disrupted session, and the '-p' to preserve timestamp, and '-h' to keep the links.
<pre>
hsi <<EOF
lcd $SCRATCH
cget -Ruph LargeFilesDir
end
EOF
</pre>

Another workaround is do a "lcd" into the GPFSpath first and a "cd" in the HPSSpath, but transfer the files individually with the '*' wild character. This option lets you change the directory name:
<pre>
hsi <<EOF
lcd $SCRATCH/LargeFilesDir
mkdir $ARCHIVE/LargeFilesDir2
cd $ARCHIVE/LargeFilesDir2
cput -Ruph *
end
EOF
</pre>

=== Documentation ===
Complete documentation on HSI is available from the Gleicher Enterprises links below. You may peruse those links and come with alternative syntax forms. You may even be already familiar with HPSS/HSI from other HPC facilities, that may or not have procedures similar to ours. HSI doesn't always work as expected when you go outside of our recommended syntax, so '''we strongly urge that you use the sample scripts we are providing as the basis''' for your job submissions
* [http://www.mgleicher.us/hsi/hsi_reference_manual_2/introduction.html HSI Introduction]
* [http://www.mgleicher.us/hsi/hsi_man_page.html man hsi]
* [http://support.scinet.utoronto.ca/wiki/index.php/HSI_help hsi help]
* [http://www.mgleicher.us/hsi/hsi-exit-codes.html exit codes]
'''Note:''' HSI returns the highest-numbered exit code, in case of multiple operations in the same hsi session. You may use '/scinet/niagara/bin/exit2msg $status' to translate those codes into intelligible messages

=== Typical Usage Scripts===
The most common interactions will be ''putting'' data into HPSS, examining the contents (ls,ish), and ''getting'' data back onto GPFS for inspection or analysis.

==== Sample '''data offload''' ====
<source lang="bash">
#!/bin/bash
# This script is named: data-offload.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J offload
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# individual tarballs already exist

/usr/local/bin/hsi -v <<EOF1
mkdir put-away
cd put-away
cput $SCRATCH/workarea/finished-job1.tar.gz : finished-job1.tar.gz
end
EOF1
status=$?
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

/usr/local/bin/hsi -v <<EOF2
mkdir put-away
cd put-away
cput $SCRATCH/workarea/finished-job2.tar.gz : finished-job2.tar.gz
end
EOF2
status=$?
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

trap - TERM INT
</source>

'''Note:''' as in the above example, we recommend that you capture the (highest-numbered) exit code for each hsi session independently. And remember, you may improve your exit code verbosity by adding the excerpt below to your scripts:
<source lang="bash">
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Sample '''data list''' ====
A very trivial way to list the contents of HPSS would be to just submit the HSI 'ls' command.
<source lang="bash">
#!/bin/bash
# This script is named: data-list.sh
#SBATCH -t 1:00:00
#SBATCH -p archiveshort
#SBATCH -N 1
#SBATCH -J hpss_ls
#SBATCH --mail-type=ALL

/usr/local/bin/hsi -v <<EOF
cd put-away
ls -R
end
EOF
</source>
''Warning: if you have a lot of files, the ls command will take a long time to complete. For instance, about 400,000 files can be listed in about an hour. Adjust the walltime accordingly, and be on the safe side.''

However, we provide a much more useful and convenient way to explore the contents of HPSS with the inventory shell [[ISH]]. This example creates an index of all the files in a user's portion of the namespace. The list is placed in the directory /home/$(whoami)/.ish_register that can be inspected from the login nodes.
<source lang="bash">
#!/bin/bash
# This script is named: data-list.sh
#SBATCH -t 1:00:00
#SBATCH -p archiveshort
#SBATCH -N 1
#SBATCH -J hpss_index
#SBATCH --mail-type=ALL

INDEX_DIR=$HOME/.ish_register
if ! [ -e "$INDEX_DIR" ]; then
mkdir -p $INDEX_DIR
fi

export ISHREGISTER="$INDEX_DIR"
/scinet/niagara/bin/ish hindex
</source>
''Note: the above warning on collecting the listing for many files applies here too.''

This index can be browsed or searched with ISH on the development nodes.
<source lang="bash">
[pinto@hpss-archive02-ib ~]$ /scinet/niagara/bin/ish ~/.ish_register/hpss.igz
[ish]hpss.igz> help
</source>

ISH is a powerful tool that is also useful for creating and browsing indices of tar and htar archives, so please look at the [[ISH|documentation]] or built in help.

==== Sample '''data recall''' ====
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_files
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT

mkdir -p $SCRATCH/recalled-from-hpss

# individual tarballs previously organized in HPSS inside the put-away-on-2010/ folder
hsi -v << EOF
cget $SCRATCH/recalled-from-hpss/Jan-2010-jobs.tar.gz : $ARCHIVE/put-away-on-2010/Jan-2010-jobs.tar.gz
cget $SCRATCH/recalled-from-hpss/Feb-2010-jobs.tar.gz : $ARCHIVE/put-away-on-2010/Feb-2010-jobs.tar.gz
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagar/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

We should emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do optimization, as in the following example:
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_files_optimized
#SBATCH --mail-type=AL

trap "echo 'Job script not completed';exit 129" TERM INT
mkdir -p $SCRATCH/recalled-from-hpss

# individual tarballs previously organized in HPSS inside the put-away-on-2010/ folder
hsi -v << EOF
lcd $SCRATCH/recalled-from-hpss/
cd $ARCHIVE/put-away-on-2010/
cget Jan-2010-jobs.tar.gz Feb-2010-jobs.tar.gz
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Sample '''transferring directories''' ====
Remember, it's not possible to rename directories on-the-fly:
<pre>
hsi cget -Ruph $SCRATCH/LargeFiles-recalled : $ARCHIVE/LargeFiles (FAILS)
</pre>

One workaround is transfer the whole directory (and sub-directories) recursively:
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_directories
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT

mkdir -p $SCRATCH/recalled

hsi -v << EOF
lcd $SCRATCH/recalled
cd $ARCHIVE/
cget -Ruph LargeFiles
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

Another workaround is to transfer files and subdirectories individually with the "*" wild character:
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_directories
#SBATCH --mail-type=AL

trap "echo 'Job script not completed';exit 129" TERM INT

mkdir -p $SCRATCH/LargeFiles-recalled

hsi -v << EOF
lcd $SCRATCH/LargeFiles-recalled
cd $ARCHIVE/LargeFiles
cget -Ruph *
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help''']

== '''[[ISH|ISH]]''' ==
=== [[ISH|Documentation and Usage]] ===

== '''File and directory management''' ==
=== Moving/renaming ===
* you may use 'mv' or 'cp' in the same way as the linux version.
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J deletion_script
#SBATCH --mail-type=ALL

echo "HPSS file and directory management"

trap "echo 'Job script not completed';exit 129" TERM INT

/usr/local/bin/hsi -v <<EOF1
mkdir $ARCHIVE/2011
mv $ARCHIVE/oldjobs $ARCHIVE/2011
cp -r $ARCHIVE/almostfinished/*done $ARCHIVE/2011
end
EOF1
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

=== Deletions ===
==== Recommendations ====
* Be careful with the use of 'cd' commands to non-existing directories before the 'rm' command. Results may be unpredictable
* Avoid the use of the stand alone wild character '''*'''. If necessary, whenever possible have it bound to common patterns, such as '*.tmp', so to limit unintentional mis-happens
* Avoid using relative paths, even the env variable $ARCHIVE. Better to explicitly expand the full paths in your scripts
* Avoid using recursive/looped deletion instructions on $SCRATCH contents from the archive job scripts. Even on $ARCHIVE contents, it may be better to do it as an independent job submission, after you have verified that the original ingestion into HPSS finished without any issues.

==== Typical example ====
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J deletion_script
#SBATCH --mail-type=ALL

echo "Deletion of an outdated directory tree into HPSS"

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that the initial directory in HPSS ($ARCHIVE) has the path explicitly expanded

/usr/local/bin/hsi -v <<EOF1
rm /archive/s/scinet/pinto/*.tmp
rm -R /archive/s/scinet/pinto/obsolete
end
EOF1
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Deleting with an interactive HSI session ====
* You may feel more comfortable acquiring an interactive shell, starting an HSI session and proceeding with your deletions that way. Keep in mind, you're restricted to 1H.

* After using the ''qsub -q archive -I'' command you'll get a standard shell prompt on an archive execution node (hpss-archive02), as you would on any compute node. However you will need to run '''HSI''' or '''HTAR''' to access resources on HPSS.

* HSI will give you a prompt very similar to a standard shell, where your can navigate around using commands such 'ls', 'cd', 'pwd', etc ... NOTE: not every bash command has an equivalent on HSI - for instance, you can not 'vi' or 'cat'.

<pre>
pinto@nia-login07:~$ salloc -p archiveshort -t 1:00:00
salloc: Granted job allocation 50359
salloc: Waiting for resource configuration
salloc: Nodes hpss-archive02-ib are ready for job

[pinto@hpss-archive02-ib ~]$ hsi
******************************************************************
* Welcome to HPSS@SciNet - High Perfomance Storage System *
* *
* INFO: THIS IS THE NEW 7.5.1 HPSS SYSTEM! *
* *
* Contact Information: support@scinet.utoronto.ca *
* NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead *
* CHECK THE INTEGRITY OF YOUR TARBALLS *
******************************************************************
]
[HSI]/archive/s/scinet/pinto-> rm -R junk

</pre>

== '''HPSS for the 'Watchmaker' ''' ==
=== Efficient alternative to htar ===
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J tar_create_tarball_in_hpss_with_hsi_by_piping
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

# When using a pipeline like this
set -o pipefail

# to put (cput will fail)
tar -c $SCRATCH/mydir | hsi put - : $ARCHIVE/mydir.tar
status=$?
if [ ! $status == 0 ]; then
echo 'TAR+HSI+piping returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# to immediately generate an index
ish hindex $ARCHIVE/mydir.tar
status=$?
if [ ! $status == 0 ]; then
echo 'ISH returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# to get
#cd $SCRATCH
#hsi cget - : $ARCHIVE/mydir.tar | tar -xv
#status=$?
# if [ ! $status == 0 ]; then
# echo 'TAR+HSI+piping returned non-zero code.'
# /scinet/niagara/bin/exit2msg $status
# exit $status
#else
# echo 'TRANSFER SUCCESSFUL'
#fi

trap - TERM INT

</source>
'''Notes:'''
* Combining commands in this fashion, besides being HPSS-friendly, should not be that noticeably slower than the recursive put with HSI that stores each file one by one. However, reading the files back from tape in this format will be many times faster. It would also overcome the current 68GB limit on the size of stored files that we have with htar.
* To top things off, we recommend indexing with ish (in the same script) immediately after the tarball creation , while it resides in the HPSS cache. It would be as if htar was used.
* To ensure that an error at any stage of the pipeline shows up in the returned status use: ''set -o pipefail'' (The default is to return the status of the last command in the pipeline and this is not what you want.)
* Optimal performance for aggregated transfers and allocation on tapes is obtained with [[Why not tarballs too large |tarballs of size 500GB or less]], whether ingested by htar or hsi ([[Why not tarballs too large | WHY?]]). Be sure to check the contents of the directory tree with 'du' for the total amount of data before sending them to the tar+HSI piping.

=== Multi-threaded gzip'ed compression with pigz ===
We compiled multi-threaded implementation of gzip called pigz (http://zlib.net/pigz/). It's now part of the "extras" module. It can also be used on any compute or devel nodes. This makes the execution of the previous version of the script much quicker than if you were to use 'tar -cfz'. In addition, by piggy-backing ISH to the end of the script, it will know what to do with the just created mydir.tar.gz compressed tarball.

<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J tar_create_compressed_tarball_in_hpss_with_hsi_by_piping
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

# When using a pipeline like this
set -o pipefail

load module extras

# to put (cput will fail)
tar -c $SCRATCH/mydir | pigz | hsi put - : $ARCHIVE/mydir.tar.gz
status=$?
if [ ! $status == 0 ]; then
echo 'TAR+PIGZ+HSI+piping returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

=== Content Verification ===

==== HTAR CRC checksums ====
Specifies that HTAR should generate CRC checksums when creating the archive.

<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_create_tarball_in_hpss_with_checksum_verification
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

cd $SCRATCH/workarea

# to put
htar -Humask=0137 -cpf $ARCHIVE/finished-job1.tar -Hcrc -Hverify=1 finished-job1/

# to get
#mkdir $SCRATCH/verification
#cd $SCRATCH/verification
#htar -Hcrc -xvpmf $ARCHIVE/finished-job1.tar

status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Current HSI version - Checksum built-in ====

MD5 is the standard Hashing Algorithm for the HSI build at SciNet. For hsi ingestions with the '-c on' option you should be able to query the md5 hash with the hsi command 'hashli'. That value is stored as an UDA (User Defined Attribute) for each file (a feature of HPSS starting with 7.4)

[http://www.mgleicher.us/GEL/hsi/hsi_reference_manual_2/checksum-feature.html More usage details here]

The checksum algorithm is very CPU-intensive. Although the checksum code is compiled with a high level of compiler optimization, transfer rates can be significantly reduced when checksum creation or verification is in effect. The amount of degradation in transfer rates depends on several factors, such as processor speed, network transfer speed, and speed of the local filesystem (GPFS).

<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J MD5_checksum_verified_transfer
#SBATCH --mail-type=ALL

thefile=<GPFSpath>
storedfile=<HPSSpath>

# Generate checksum on fly (-c on)
hsi -q put -c on $thefile : $storedfile
pid=$!

# Check the exit code of the HSI process
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# verify checksum
hsi lshash $storedfile
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# get the file back with checksum
hsi get -c on $storedfile
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Prior to HSI version 4.0.1.1 ====

This will checksum the contents of the HPSSpath against the original GPFSpath after the transfer has finished.
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J checksum_verified_transfer
#SBATCH --mail-type=ALL

thefile=<GPFSpath>
storedfile=<HPSSpath>

# Generate checksum on fly using a named pipe so that file is only read from GPFS once
mkfifo /tmp/NPIPE
cat $thefile | tee /tmp/NPIPE | hsi -q put - : $storedfile &
pid=$!
md5sum /tmp/NPIPE |tee /tmp/$fname.md5
rm -f /tmp/NPIPE

# Check the exit code of the HSI process
wait $pid
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# change filename to stdin in checksum file
sed -i.1 "s+/tmp/NPIPE+-+" /tmp/$fname.md5

# verify checksum
hsi -q get - : $storedfile | md5sum -c /tmp/$fname.md5
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

== '''Access to HPSS using Globus''' ==
* Please note that Globus access to HPSS is disabled until further notice, due to lack of version compatibility.

* You may now transfer data between SciNet's HPSS and an external source
* Follow the link below
https://globus.computecanada.ca
: Enter your Compute Canada username and password.
* In the 'File Transfer' tab, enter ''''Compute Canada HPSS'''' as one of the Endpoints. To authenticate this endpoint, enter your SciNet username and password.
* You may read more about Compute Canada's Globus Portal here:
https://docs.computecanada.ca/wiki/Globus

== '''Access to HPSS using SME''' ==
* Storage Made Easy - SME - is an Enterprise Cloud Portal adopted by SciNet to allow our users to access HPSS
* Best suitable for light transfers to/from your personal computer and to navigate your contents on HPSS
* Follow the link below using a web browser and login with your SicNet UserID and password. Under File Manager you will find the "'''SciNet HPSS'''" folder.
https://sme.scinet.utoronto.ca
* SME can be configured as a DropBox. To download the Free Cloud File Manager native to your OS (Windows, Mac, Linux, mobile), follow the link below:
https://www.storagemadeeasy.com/clients_and_tools/
Once you have downloaded and installed the Cloud Manager App, fill up the following information:
Server location
https://sme.scinet.utoronto.ca/api
* You may learn more about SME capabilities and features here:
https://www.storagemadeeasy.com/ownFileserver/
https://www.storagemadeeasy.com/pricing/#features (Enterprise)
https://storagemadeeasy.com/faq/

== '''User provided Content/Suggestions''' ==
=== '''[[HPSS-by-pomes|Packing up large data sets and putting them on HPSS]]''' ===
(Pomés group recommendations)

[[Data Management|BACK TO Data Management]]

HPSS

2018-05-04T00:08:00Z

Pinto: /* Access Through an Interactive HSI session */

{|align=right
|align=center|'''Topology Overview'''
|align=center|'''Submission Queue'''
|-
|[[Image:HPSS-overview.png|right|x200px]]
|[[Image:HPSS-queue2.png|right|x200px]]
|-
|
|
|-
|align=center|'''Servers Rack'''
|align=center|'''TS3500 Library'''
|-
|[[Image:HPSS-servers.png|right|x250px]]
|[[Image:HPSS-TS3500.png|right|x250px]]
|}

= '''High Performance Storage System''' =

The High Performance Storage System ([http://www.hpss-collaboration.org/index.shtml HPSS] [http://en.wikipedia.org/wiki/High_Performance_Storage_System wikipedia]) is a tape-backed hierarchical storage system that provides a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data can be returned to the active GPFS filesystem when it is needed.

Since this system is intended for large data storage, it is accessible only to groups who have been awarded storage space at SciNet beyond 5TB in the yearly RAC resource allocation round. However, upon request, any user may be awarded access to HPSS, up to 2TB per group, so that you may get familiar with the system (just email support@scinet.utoronto.ca)

Access and transfer of data into and out of HPSS is done under the control of the user, whose interaction is expected to be scripted and submitted as a batch job, using one or more of the following utilities:
* [http://www.mgleicher.us/GEL/hsi HSI] is a client with an ftp-like functionality which can be used to archive and retrieve large files. It is also useful for browsing the contents of HPSS.
* [http://www.mgleicher.us/GEL/htar HTAR] is a utility that creates tar formatted archives directly into HPSS. It also creates a separate index file (.idx) that can be accessed and browsed quickly.
* [https://support.scinet.utoronto.ca/wiki/index.php/ISH ISH] is a TUI utility that can perform an inventory of the files and directories in your tarballs.

We're currently running HPSS v 7.3.3 patch 6, and HSI/HTAR version 4.0.1.2

== '''Why should I use and trust HPSS?''' ==
* HPSS is a 25 year-old collaboration between IBM and the DoE labs in the US, and is used by about 45 facilities in the [http://www.top500.org “Top 500”] HPC list (plus some black-sites).
* Over 2.5 ExaBytes of combined storage world-wide.
* The top 3 sites in the World report (fall 2017) having 360PB, 220PB and 125PB in production (ECMWF, UKMO and BNL)
* Environment Canada also adopted HPSS in 2017 to store Nav Canada data as well as to serve as their own archive. Currently has 2 X 100PB capacity installed.
* The SciNet HPSS system has been providing nearline capacity for important research data in Canada since early 2011, already at 10PB levels in 2018
* Very reliable, data redundancy and data insurance built-in (dual copies of everything are kept on tapes at SciNet)
* Data on cache and tapes can be geo-distributed for further resilience and HA.
* Highly scalable; current performance at SciNet - after a modest upgrade in 2017 - Ingest: ~150 TB/day, Recall: ~45 TB/day (aggregated).
* HSI/HTAR clients also very reliable and used on several HPSS sites. ISH was written at SciNet.
* [[Media:HPSS_rationale_SNUG.pdf|HPSS fits well with the Storage Capacity Expansion Plan at SciNet]] (pdf presentation)

== '''Guidelines''' ==
* Expanded storage capacity is provided on tape -- a media that is not suited for storing small files. Files smaller than ~200MB should be grouped into tarballs with tar or htar.
* Optimal performance for aggregated transfers and allocation on tapes is obtained with [[Why not tarballs too large |tarballs of size 500GB or less]], whether ingested by htar or hsi ([[Why not tarballs too large | WHY?]])
* We strongly urge that you use the sample scripts we are providing as the basis for your job submissions.
* Make sure to check the application's exit code and returned logs for errors after any data transfer or tarball creation process

== '''New to the System?''' ==
The first step is to email scinet support and request an HPSS account (or else you will get "Error - authentication/initialization failed" and 71 exit codes).

THIS set of instructions on the wiki is the best and most compressed "manual" we have. It may seem a bit overwhelming at first, because of all the job script templates we make available below (they are here so you don't have to think
too much, just copy and paste), but if you approach the index at the top as a "case switch" mechanism for what you intend to do, everything falls in place.

Try this sequence:

1) [https://wiki.scinet.utoronto.ca/wiki/index.php/HPSS#Access_Through_an_Interactive_HSI_session take a look around HPSS using an interactive HSI session]

(most linux shell commands have an equivalent in HPSS)

2) [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_tarball_create archive a small test directory using HTAR]

2a) use step 1) to see what happened

3) [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_data_offload archive a file using hsi]

3a) use step 1) to see what happened

4) [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_transferring_directories archive a small test directory using HSI]

4a) use step 1) to see what happened

5) now try the other cases and so on. In a couple of hours you'll be in pretty good shape.

== '''Bridge between BGQ and HPSS''' ==

At this time BGQ users will have to migrate data to Niagara scratch prior to transferring it to HPSS. We are looking for ways to improve this workflow.

== '''Access Through the Queue System''' ==
All access to the archive system is done through the [https://docs.computecanada.ca/wiki/Niagara_Quickstart#Submitting_jobs NIA queue system].

* Job submissions should be done to the 'archivelong' queue or the 'archiveshort'
* Short jobs are limited to 1H walltime by default. Long jobs (> 1H) are limited to 72H walltime.
* Users are limited to only 2 long jobs and 2 short jobs at the same time, and 10 jobs total on the each queue.
* There can only be 5 long jobs running at any given time overall. Remaining submissions will be placed on hold for the time being. So far we have not seen a need for overall limit on short jobs.

The status of pending jobs can be monitored with squeue specifying the archive queue:
<pre>
squeue -p archiveshort
OR
squeue -p archivelong
</pre>

== '''Access Through an Interactive HSI session''' ==
* You may want to acquire an interactive shell, start an HSI session and navigate the archive naming-space. Keep in mind, you're restricted to 1H.

<pre>
pinto@nia-login07:~$ salloc -p archiveshort -t 1:00:00
salloc: Granted job allocation 50918
salloc: Waiting for resource configuration
salloc: Nodes hpss-archive02-ib are ready for job
hpss-archive02-ib:~$

hpss-archive02-ib:~$ hsi (DON'T FORGET TO START HSI)
******************************************************************
* Welcome to HPSS@SciNet - High Perfomance Storage System *
* *
* INFO: THIS IS THE NEW 7.5.1 HPSS SYSTEM! *
* *
* Contact Information: support@scinet.utoronto.ca *
* NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead *
* CHECK THE INTEGRITY OF YOUR TARBALLS *
******************************************************************
[HSI]/archive/s/scinet/pinto-> ls

[HSI]/archive/s/scinet/pinto-> cd <some directory>

</pre>

=== Scripted File Transfers ===
File transfers in and out of the HPSS should be scripted into jobs and submitted to the ''archivelong'' queue or the ''archiveshort'' . See generic example below:

<source lang="bash">
#!/bin/bash -l
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_create_tarball_in_hpss
#SBATCH --mail-type=ALL

echo "Creating a htar of finished-job1/ directory tree into HPSS"

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

DEST=$ARCHIVE/finished-job1.tar

# htar WILL overwrite an existing file with the same name so check beforehand.

hsi ls $DEST &> /dev/null
status=$?

if [ $status == 0 ]; then
echo 'File $DEST already exists. Nothing has been done'
exit 1
fi

cd $SCRATCH/workarea/
htar -Humask=0137 -cpf $ARCHIVE/finished-job1.tar finished-job1/
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>
'''Note:''' Always trap the execution of your jobs for abnormal terminations, and be sure to return the exit code

=== Job Dependencies ===

Typically data will be recalled to /scratch when it is needed for analysis. Job dependencies can be constructed so that analysis jobs wait in the queue for data recalls before starting. The qsub flag is
<pre>
--dependency=<type:JOBID>
</pre>
where JOBID is the job number of the archive recalling job that must finish successfully before the analysis job can start.

Here is a short cut for generating the dependency (lookup [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_data_recall data-recall.sh samples]):
<pre>
hpss-archive02-ib:~$ sbatch $(sbatch data-recall.sh | awk {print "--dependency=afterany:"$1}') job-to-work-on-recalled-data.sh
</pre>

== '''HTAR''' ==
''' Please aggregate small files (<~200MB) into tarballs or htar files. '''

''' [[Why not tarballs too large |Keep your tarballs to size 500GB or less]], whether ingested by htar or hsi ([[Why not tarballs too large | WHY?]])'''

HTAR is a utility that is used for aggregating a set of files and directories, by using a sophisticated multithreaded buffering scheme to write files directly from GPFS into HPSS, creating an archive file that conforms to the POSIX TAR specification, thereby achieving a high rate of performance. HTAR does not do gzip compression, however it already has a built-in checksum algorithm.

'''Caution'''
* Files larger than 68 GB cannot be stored in an HTAR archive. If you attempt to start a transfer with any files larger than 68GB the whole HTAR session will fail, and you'll get a notification listing all those files, so that you can transfer them with HSI. unintentionally overwriting the htar destination file in HPSS
* Files with pathnames too long will be skipped (greater than 100 characters), so as to conform with TAR protocol [[(POSIX 1003.1 USTAR)]] -- Note that the HTAR will erroneously indicate success, however will produce exit code 70. For now, you can check for this type of error by "grep Warning my.output" after the job has completed.
* Unlike with cput/cget in HSI, "prompt before overwrite", this is not the default with (h)tar. Be careful not to unintentionally overwrite a previous htar destination file in HPSS. There could be a similar situation when extracting material back into GPFS and overwriting the originals. Be sure to double-check the logic in your scripts.
* Check the HTAR exit code and log file before removing any files from the GPFS active filesystems.

=== HTAR Usage ===
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, and preserve mask attributes (-p), enter:
<pre>
htar -cpf files.tar file1 file2
OR
htar -cpf $ARCHIVE/files.tar file1 file2
</pre>

* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:
<pre>
htar -cpf subdirA.tar subdirA
</pre>

* To extract all files from the archive file called ''proj1.tar'' in HPSS into the ''project1/src'' directory in GPFS, and use the time of extraction as the modification time, enter:
<pre>
cd project1/src
htar -xpmf proj1.tar
</pre>

* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):
<pre>
htar -vtf out.tar
</pre>

* To ensure that both the htar and the .idx files have read permissions to other members in your group use the umask option
<pre>
htar -Humask=0137 ....
</pre>

For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online

==== Sample tarball create ====
<source lang="bash">
#!/bin/bash -l
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_create_tarball_in_hpss
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be /archive/$(id -gn)/$(whoami)/

DEST=$ARCHIVE/finished-job1.tar

# htar WILL overwrite an existing file with the same name so check beforehand.

hsi ls $DEST &> /dev/null
status=$?

if [ $status == 0 ]; then
echo 'File $DEST already exists. Nothing has been done'
exit 1
fi

cd $SCRATCH/workarea/
htar -Humask=0137 -cpf $DEST finished-job1/
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

'''Note:''' If you attempt to start a transfer with any files larger than 68GB the whole HTAR session will fail, and you'll get a notification listing all those files, so that you can transfer them with HSI.
<pre>
----------------------------------------
INFO: File too large for htar to handle: finished-job1/file1 (86567185745 bytes)
INFO: File too large for htar to handle: finished-job1/file2 (71857244579 bytes)
ERROR: 2 oversize member files found - please correct and retry
ERROR: [FATAL] error(s) generating filename list
HTAR: HTAR FAILED
###WARNING htar returned non-zero exit status
</pre>

==== Sample tarball list ====
<source lang="bash">
#!/bin/bash -l
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_list_tarball_in_hpss
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

DEST=$ARCHIVE/finished-job1.tar

htar -tvf $DEST
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Sample tarball extract ====
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_extract_tarball_from_hpss
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

cd $SCRATCH/recalled-from-hpss
htar -xpmf $ARCHIVE/finished-job1.tar
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

== '''HSI''' ==

HSI may be the primary client with which some users will interact with HPSS. It provides an ftp-like interface for archiving and retrieving tarballs or [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_transferring_directories directory trees]. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents in HPSS. The most commonly used commands will be:
{|border="1" cellpadding="10" cellspacing="0"
|-
| cput
| Conditionally saves or replaces a HPSSpath file to GPFSpath if the GPFS version is new or has been updated
cput [options] GPFSpath [: HPSSpath]
|-
| cget
| Conditionally retrieves a copy of a file from HPSS to GPFS only if a GPFS version does not already exist.
cget [options] [GPFSpath :] HPSSpath
|-
| cd,mkdir,ls,rm,mv
| Operate as one would expect on the contents of HPSS.
|-
| lcd,lls
| ''Local'' commands to GPFS
|}

*There are 3 distinctions about HSI that you should keep in mind, and that can generate a bit of confusion when you're first learning how to use it:
** HSI doesn't currently support renaming directories paths during transfers on-the-fly, therefore the syntax for cput/cget may not work as one would expect in some scenarios, requiring some workarounds.
** HSI has an operator ":" which separates the GPFSpath and HPSSpath, and must be surrounded by whitespace (one or more space characters)
** The order for referring to files in HSI syntax is different from FTP. In HSI the general format is always the same, GPFS first, HPSS second, cput or cget:
<pre>
GPFSfile : HPSSfile
</pre>
For example, when using HSI to store the tarball file from GPFS into HPSS, then recall it to GPFS, the following commands could be used:
<pre>
cput tarball-in-GPFS : tarball-in-HPSS
cget tarball-recalled : tarball-in-HPSS
</pre>

unlike with FTP, where the following syntax would be used:
<pre>
put tarball-in-GPFS tarball-in-HPSS
get tarball-in-HPSS tarball-recalled
</pre>
* Simple commands can be executed on a single line.
<pre>
hsi "mkdir LargeFilesDir; cd LargeFilesDir; cput tarball-in-GPFS : tarball-in-HPSS"
</pre>

* More complex sequences can be performed using an except such as this:
<pre>
hsi <<EOF
mkdir LargeFilesDir
cd LargeFilesDir
cput tarball-in-GPFS : tarball-in-HPSS
lcd $SCRATCH/LargeFilesDir2/
cput -Ruph *
end
EOF
</pre>

* The commands below are equivalent, but we recommend that you always use full path, and organize the contents of HPSS, where the default HSI directory placement is $ARCHIVE:
<pre>
hsi cput tarball
hsi cput tarball : tarball
hsi cput $SCRATCH/tarball : $ARCHIVE/tarball
</pre>

* There are no known issues renaming files on-the-fly:
<pre>
hsi cput $SCRATCH/tarball1 : $ARCHIVE/tarball2
hsi cget $SCRATCH/tarball3 : $ARCHIVE/tarball2
</pre>

* However the syntax forms such as the ones below will fail, since they rename the directory paths.
<pre>
hsi cput -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir (FAILS)
OR
hsi cget -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir2 (FAILS)
OR
hsi cput -Ruph $SCRATCH/LargeFilesDir/* : $ARCHIVE/LargeFilesDir2 (FAILS)
OR
hsi cget -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir (FAILS)
</pre>

One workaround is the following 2-steps process, where you do a "lcd " in GPFS first, and recursively transfer the whole directory (-R), keeping the same name. You may use '-u' option to resume a previously disrupted session, and the '-p' to preserve timestamp, and '-h' to keep the links.
<pre>
hsi <<EOF
lcd $SCRATCH
cget -Ruph LargeFilesDir
end
EOF
</pre>

Another workaround is do a "lcd" into the GPFSpath first and a "cd" in the HPSSpath, but transfer the files individually with the '*' wild character. This option lets you change the directory name:
<pre>
hsi <<EOF
lcd $SCRATCH/LargeFilesDir
mkdir $ARCHIVE/LargeFilesDir2
cd $ARCHIVE/LargeFilesDir2
cput -Ruph *
end
EOF
</pre>

=== Documentation ===
Complete documentation on HSI is available from the Gleicher Enterprises links below. You may peruse those links and come with alternative syntax forms. You may even be already familiar with HPSS/HSI from other HPC facilities, that may or not have procedures similar to ours. HSI doesn't always work as expected when you go outside of our recommended syntax, so '''we strongly urge that you use the sample scripts we are providing as the basis''' for your job submissions
* [http://www.mgleicher.us/hsi/hsi_reference_manual_2/introduction.html HSI Introduction]
* [http://www.mgleicher.us/hsi/hsi_man_page.html man hsi]
* [http://support.scinet.utoronto.ca/wiki/index.php/HSI_help hsi help]
* [http://www.mgleicher.us/hsi/hsi-exit-codes.html exit codes]
'''Note:''' HSI returns the highest-numbered exit code, in case of multiple operations in the same hsi session. You may use '/scinet/niagara/bin/exit2msg $status' to translate those codes into intelligible messages

=== Typical Usage Scripts===
The most common interactions will be ''putting'' data into HPSS, examining the contents (ls,ish), and ''getting'' data back onto GPFS for inspection or analysis.

==== Sample '''data offload''' ====
<source lang="bash">
#!/bin/bash
# This script is named: data-offload.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J offload
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# individual tarballs already exist

/usr/local/bin/hsi -v <<EOF1
mkdir put-away
cd put-away
cput $SCRATCH/workarea/finished-job1.tar.gz : finished-job1.tar.gz
end
EOF1
status=$?
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

/usr/local/bin/hsi -v <<EOF2
mkdir put-away
cd put-away
cput $SCRATCH/workarea/finished-job2.tar.gz : finished-job2.tar.gz
end
EOF2
status=$?
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

trap - TERM INT
</source>

'''Note:''' as in the above example, we recommend that you capture the (highest-numbered) exit code for each hsi session independently. And remember, you may improve your exit code verbosity by adding the excerpt below to your scripts:
<source lang="bash">
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Sample '''data list''' ====
A very trivial way to list the contents of HPSS would be to just submit the HSI 'ls' command.
<source lang="bash">
#!/bin/bash
# This script is named: data-list.sh
#SBATCH -t 1:00:00
#SBATCH -p archiveshort
#SBATCH -N 1
#SBATCH -J hpss_ls
#SBATCH --mail-type=ALL

/usr/local/bin/hsi -v <<EOF
cd put-away
ls -R
end
EOF
</source>
''Warning: if you have a lot of files, the ls command will take a long time to complete. For instance, about 400,000 files can be listed in about an hour. Adjust the walltime accordingly, and be on the safe side.''

However, we provide a much more useful and convenient way to explore the contents of HPSS with the inventory shell [[ISH]]. This example creates an index of all the files in a user's portion of the namespace. The list is placed in the directory /home/$(whoami)/.ish_register that can be inspected from the login nodes.
<source lang="bash">
#!/bin/bash
# This script is named: data-list.sh
#SBATCH -t 1:00:00
#SBATCH -p archiveshort
#SBATCH -N 1
#SBATCH -J hpss_index
#SBATCH --mail-type=ALL

INDEX_DIR=$HOME/.ish_register
if ! [ -e "$INDEX_DIR" ]; then
mkdir -p $INDEX_DIR
fi

export ISHREGISTER="$INDEX_DIR"
/scinet/niagara/bin/ish hindex
</source>
''Note: the above warning on collecting the listing for many files applies here too.''

This index can be browsed or searched with ISH on the development nodes.
<source lang="bash">
[pinto@hpss-archive02-ib ~]$ /scinet/niagara/bin/ish ~/.ish_register/hpss.igz
[ish]hpss.igz> help
</source>

ISH is a powerful tool that is also useful for creating and browsing indices of tar and htar archives, so please look at the [[ISH|documentation]] or built in help.

==== Sample '''data recall''' ====
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_files
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT

mkdir -p $SCRATCH/recalled-from-hpss

# individual tarballs previously organized in HPSS inside the put-away-on-2010/ folder
hsi -v << EOF
cget $SCRATCH/recalled-from-hpss/Jan-2010-jobs.tar.gz : $ARCHIVE/put-away-on-2010/Jan-2010-jobs.tar.gz
cget $SCRATCH/recalled-from-hpss/Feb-2010-jobs.tar.gz : $ARCHIVE/put-away-on-2010/Feb-2010-jobs.tar.gz
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagar/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

We should emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do optimization, as in the following example:
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_files_optimized
#SBATCH --mail-type=AL

trap "echo 'Job script not completed';exit 129" TERM INT
mkdir -p $SCRATCH/recalled-from-hpss

# individual tarballs previously organized in HPSS inside the put-away-on-2010/ folder
hsi -v << EOF
lcd $SCRATCH/recalled-from-hpss/
cd $ARCHIVE/put-away-on-2010/
cget Jan-2010-jobs.tar.gz Feb-2010-jobs.tar.gz
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Sample '''transferring directories''' ====
Remember, it's not possible to rename directories on-the-fly:
<pre>
hsi cget -Ruph $SCRATCH/LargeFiles-recalled : $ARCHIVE/LargeFiles (FAILS)
</pre>

One workaround is transfer the whole directory (and sub-directories) recursively:
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_directories
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT

mkdir -p $SCRATCH/recalled

hsi -v << EOF
lcd $SCRATCH/recalled
cd $ARCHIVE/
cget -Ruph LargeFiles
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

Another workaround is to transfer files and subdirectories individually with the "*" wild character:
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_directories
#SBATCH --mail-type=AL

trap "echo 'Job script not completed';exit 129" TERM INT

mkdir -p $SCRATCH/LargeFiles-recalled

hsi -v << EOF
lcd $SCRATCH/LargeFiles-recalled
cd $ARCHIVE/LargeFiles
cget -Ruph *
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help''']

== '''[[ISH|ISH]]''' ==
=== [[ISH|Documentation and Usage]] ===

== '''File and directory management''' ==
=== Moving/renaming ===
* you may use 'mv' or 'cp' in the same way as the linux version.
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J deletion_script
#SBATCH --mail-type=ALL

echo "HPSS file and directory management"

trap "echo 'Job script not completed';exit 129" TERM INT

/usr/local/bin/hsi -v <<EOF1
mkdir $ARCHIVE/2011
mv $ARCHIVE/oldjobs $ARCHIVE/2011
cp -r $ARCHIVE/almostfinished/*done $ARCHIVE/2011
end
EOF1
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

=== Deletions ===
==== Recommendations ====
* Be careful with the use of 'cd' commands to non-existing directories before the 'rm' command. Results may be unpredictable
* Avoid the use of the stand alone wild character '''*'''. If necessary, whenever possible have it bound to common patterns, such as '*.tmp', so to limit unintentional mis-happens
* Avoid using relative paths, even the env variable $ARCHIVE. Better to explicitly expand the full paths in your scripts
* Avoid using recursive/looped deletion instructions on $SCRATCH contents from the archive job scripts. Even on $ARCHIVE contents, it may be better to do it as an independent job submission, after you have verified that the original ingestion into HPSS finished without any issues.

==== Typical example ====
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J deletion_script
#SBATCH --mail-type=ALL

echo "Deletion of an outdated directory tree into HPSS"

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that the initial directory in HPSS ($ARCHIVE) has the path explicitly expanded

/usr/local/bin/hsi -v <<EOF1
rm /archive/s/scinet/pinto/*.tmp
rm -R /archive/s/scinet/pinto/obsolete
end
EOF1
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Deleting with an interactive HSI session ====
* You may feel more comfortable acquiring an interactive shell, starting an HSI session and proceeding with your deletions that way. Keep in mind, you're restricted to 1H.

* After using the ''qsub -q archive -I'' command you'll get a standard shell prompt on an archive execution node (hpss-archive02), as you would on any compute node. However you will need to run '''HSI''' or '''HTAR''' to access resources on HPSS.

* HSI will give you a prompt very similar to a standard shell, where your can navigate around using commands such 'ls', 'cd', 'pwd', etc ... NOTE: not every bash command has an equivalent on HSI - for instance, you can not 'vi' or 'cat'.

<pre>
pinto@nia-login07:~$ salloc -p archiveshort -t 1:00:00
salloc: Granted job allocation 50359
salloc: Waiting for resource configuration
salloc: Nodes hpss-archive02-ib are ready for job

[pinto@hpss-archive02-ib ~]$ hsi
******************************************************************
* Welcome to HPSS@SciNet - High Perfomance Storage System *
* *
* INFO: THIS IS THE NEW 7.5.1 HPSS SYSTEM! *
* *
* Contact Information: support@scinet.utoronto.ca *
* NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead *
* CHECK THE INTEGRITY OF YOUR TARBALLS *
******************************************************************
]
[HSI]/archive/s/scinet/pinto-> rm -R junk

</pre>

== '''HPSS for the 'Watchmaker' ''' ==
=== Efficient alternative to htar ===
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J tar_create_tarball_in_hpss_with_hsi_by_piping
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

# When using a pipeline like this
set -o pipefail

# to put (cput will fail)
tar -c $SCRATCH/mydir | hsi put - : $ARCHIVE/mydir.tar
status=$?
if [ ! $status == 0 ]; then
echo 'TAR+HSI+piping returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# to immediately generate an index
ish hindex $ARCHIVE/mydir.tar
status=$?
if [ ! $status == 0 ]; then
echo 'ISH returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# to get
#cd $SCRATCH
#hsi cget - : $ARCHIVE/mydir.tar | tar -xv
#status=$?
# if [ ! $status == 0 ]; then
# echo 'TAR+HSI+piping returned non-zero code.'
# /scinet/niagara/bin/exit2msg $status
# exit $status
#else
# echo 'TRANSFER SUCCESSFUL'
#fi

trap - TERM INT

</source>
'''Notes:'''
* Combining commands in this fashion, besides being HPSS-friendly, should not be that noticeably slower than the recursive put with HSI that stores each file one by one. However, reading the files back from tape in this format will be many times faster. It would also overcome the current 68GB limit on the size of stored files that we have with htar.
* To top things off, we recommend indexing with ish (in the same script) immediately after the tarball creation , while it resides in the HPSS cache. It would be as if htar was used.
* To ensure that an error at any stage of the pipeline shows up in the returned status use: ''set -o pipefail'' (The default is to return the status of the last command in the pipeline and this is not what you want.)
* Optimal performance for aggregated transfers and allocation on tapes is obtained with [[Why not tarballs too large |tarballs of size 500GB or less]], whether ingested by htar or hsi ([[Why not tarballs too large | WHY?]]). Be sure to check the contents of the directory tree with 'du' for the total amount of data before sending them to the tar+HSI piping.

=== Multi-threaded gzip'ed compression with pigz ===
We compiled multi-threaded implementation of gzip called pigz (http://zlib.net/pigz/). It's now part of the "extras" module. It can also be used on any compute or devel nodes. This makes the execution of the previous version of the script much quicker than if you were to use 'tar -cfz'. In addition, by piggy-backing ISH to the end of the script, it will know what to do with the just created mydir.tar.gz compressed tarball.

<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J tar_create_compressed_tarball_in_hpss_with_hsi_by_piping
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

# When using a pipeline like this
set -o pipefail

load module extras

# to put (cput will fail)
tar -c $SCRATCH/mydir | pigz | hsi put - : $ARCHIVE/mydir.tar.gz
status=$?
if [ ! $status == 0 ]; then
echo 'TAR+PIGZ+HSI+piping returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

=== Content Verification ===

==== HTAR CRC checksums ====
Specifies that HTAR should generate CRC checksums when creating the archive.

<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_create_tarball_in_hpss_with_checksum_verification
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

cd $SCRATCH/workarea

# to put
htar -Humask=0137 -cpf $ARCHIVE/finished-job1.tar -Hcrc -Hverify=1 finished-job1/

# to get
#mkdir $SCRATCH/verification
#cd $SCRATCH/verification
#htar -Hcrc -xvpmf $ARCHIVE/finished-job1.tar

status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Current HSI version - Checksum built-in ====

MD5 is the standard Hashing Algorithm for the HSI build at SciNet. For hsi ingestions with the '-c on' option you should be able to query the md5 hash with the hsi command 'hashli'. That value is stored as an UDA (User Defined Attribute) for each file (a feature of HPSS starting with 7.4)

[http://www.mgleicher.us/GEL/hsi/hsi_reference_manual_2/checksum-feature.html More usage details here]

The checksum algorithm is very CPU-intensive. Although the checksum code is compiled with a high level of compiler optimization, transfer rates can be significantly reduced when checksum creation or verification is in effect. The amount of degradation in transfer rates depends on several factors, such as processor speed, network transfer speed, and speed of the local filesystem (GPFS).

<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J MD5_checksum_verified_transfer
#SBATCH --mail-type=ALL

thefile=<GPFSpath>
storedfile=<HPSSpath>

# Generate checksum on fly (-c on)
hsi -q put -c on $thefile : $storedfile
pid=$!

# Check the exit code of the HSI process
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# verify checksum
hsi lshash $storedfile
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# get the file back with checksum
hsi get -c on $storedfile
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Prior to HSI version 4.0.1.1 ====

This will checksum the contents of the HPSSpath against the original GPFSpath after the transfer has finished.
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J checksum_verified_transfer
#SBATCH --mail-type=ALL

thefile=<GPFSpath>
storedfile=<HPSSpath>

# Generate checksum on fly using a named pipe so that file is only read from GPFS once
mkfifo /tmp/NPIPE
cat $thefile | tee /tmp/NPIPE | hsi -q put - : $storedfile &
pid=$!
md5sum /tmp/NPIPE |tee /tmp/$fname.md5
rm -f /tmp/NPIPE

# Check the exit code of the HSI process
wait $pid
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# change filename to stdin in checksum file
sed -i.1 "s+/tmp/NPIPE+-+" /tmp/$fname.md5

# verify checksum
hsi -q get - : $storedfile | md5sum -c /tmp/$fname.md5
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

== '''Access to HPSS using Globus''' ==
* Please note that Globus access to HPSS is disabled until further notice, due to lack of version compatibility.

* You may now transfer data between SciNet's HPSS and an external source
* Follow the link below
https://globus.computecanada.ca
: Enter your Compute Canada username and password.
* In the 'File Transfer' tab, enter ''''Compute Canada HPSS'''' as one of the Endpoints. To authenticate this endpoint, enter your SciNet username and password.
* You may read more about Compute Canada's Globus Portal here:
https://docs.computecanada.ca/wiki/Globus

== '''Access to HPSS using SME''' ==
* Storage Made Easy - SME - is an Enterprise Cloud Portal adopted by SciNet to allow our users to access HPSS
* Best suitable for light transfers to/from your personal computer and to navigate your contents on HPSS
* Follow the link below using a web browser and login with your SicNet UserID and password. Under File Manager you will find the "'''SciNet HPSS'''" folder.
https://sme.scinet.utoronto.ca
* SME can be configured as a DropBox. To download the Free Cloud File Manager native to your OS (Windows, Mac, Linux, mobile), follow the link below:
https://www.storagemadeeasy.com/clients_and_tools/
Once you have downloaded and installed the Cloud Manager App, fill up the following information:
Server location
https://sme.scinet.utoronto.ca/api
* You may learn more about SME capabilities and features here:
https://www.storagemadeeasy.com/ownFileserver/
https://www.storagemadeeasy.com/pricing/#features (Enterprise)
https://storagemadeeasy.com/faq/

== '''User provided Content/Suggestions''' ==
=== '''[[HPSS-by-pomes|Packing up large data sets and putting them on HPSS]]''' ===
(Pomés group recommendations)

[[Data Management|BACK TO Data Management]]

HPSS

2018-05-04T00:07:15Z

Pinto: /* Access Through an Interactive HSI session */

{|align=right
|align=center|'''Topology Overview'''
|align=center|'''Submission Queue'''
|-
|[[Image:HPSS-overview.png|right|x200px]]
|[[Image:HPSS-queue2.png|right|x200px]]
|-
|
|
|-
|align=center|'''Servers Rack'''
|align=center|'''TS3500 Library'''
|-
|[[Image:HPSS-servers.png|right|x250px]]
|[[Image:HPSS-TS3500.png|right|x250px]]
|}

= '''High Performance Storage System''' =

The High Performance Storage System ([http://www.hpss-collaboration.org/index.shtml HPSS] [http://en.wikipedia.org/wiki/High_Performance_Storage_System wikipedia]) is a tape-backed hierarchical storage system that provides a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data can be returned to the active GPFS filesystem when it is needed.

Since this system is intended for large data storage, it is accessible only to groups who have been awarded storage space at SciNet beyond 5TB in the yearly RAC resource allocation round. However, upon request, any user may be awarded access to HPSS, up to 2TB per group, so that you may get familiar with the system (just email support@scinet.utoronto.ca)

Access and transfer of data into and out of HPSS is done under the control of the user, whose interaction is expected to be scripted and submitted as a batch job, using one or more of the following utilities:
* [http://www.mgleicher.us/GEL/hsi HSI] is a client with an ftp-like functionality which can be used to archive and retrieve large files. It is also useful for browsing the contents of HPSS.
* [http://www.mgleicher.us/GEL/htar HTAR] is a utility that creates tar formatted archives directly into HPSS. It also creates a separate index file (.idx) that can be accessed and browsed quickly.
* [https://support.scinet.utoronto.ca/wiki/index.php/ISH ISH] is a TUI utility that can perform an inventory of the files and directories in your tarballs.

We're currently running HPSS v 7.3.3 patch 6, and HSI/HTAR version 4.0.1.2

== '''Why should I use and trust HPSS?''' ==
* HPSS is a 25 year-old collaboration between IBM and the DoE labs in the US, and is used by about 45 facilities in the [http://www.top500.org “Top 500”] HPC list (plus some black-sites).
* Over 2.5 ExaBytes of combined storage world-wide.
* The top 3 sites in the World report (fall 2017) having 360PB, 220PB and 125PB in production (ECMWF, UKMO and BNL)
* Environment Canada also adopted HPSS in 2017 to store Nav Canada data as well as to serve as their own archive. Currently has 2 X 100PB capacity installed.
* The SciNet HPSS system has been providing nearline capacity for important research data in Canada since early 2011, already at 10PB levels in 2018
* Very reliable, data redundancy and data insurance built-in (dual copies of everything are kept on tapes at SciNet)
* Data on cache and tapes can be geo-distributed for further resilience and HA.
* Highly scalable; current performance at SciNet - after a modest upgrade in 2017 - Ingest: ~150 TB/day, Recall: ~45 TB/day (aggregated).
* HSI/HTAR clients also very reliable and used on several HPSS sites. ISH was written at SciNet.
* [[Media:HPSS_rationale_SNUG.pdf|HPSS fits well with the Storage Capacity Expansion Plan at SciNet]] (pdf presentation)

== '''Guidelines''' ==
* Expanded storage capacity is provided on tape -- a media that is not suited for storing small files. Files smaller than ~200MB should be grouped into tarballs with tar or htar.
* Optimal performance for aggregated transfers and allocation on tapes is obtained with [[Why not tarballs too large |tarballs of size 500GB or less]], whether ingested by htar or hsi ([[Why not tarballs too large | WHY?]])
* We strongly urge that you use the sample scripts we are providing as the basis for your job submissions.
* Make sure to check the application's exit code and returned logs for errors after any data transfer or tarball creation process

== '''New to the System?''' ==
The first step is to email scinet support and request an HPSS account (or else you will get "Error - authentication/initialization failed" and 71 exit codes).

THIS set of instructions on the wiki is the best and most compressed "manual" we have. It may seem a bit overwhelming at first, because of all the job script templates we make available below (they are here so you don't have to think
too much, just copy and paste), but if you approach the index at the top as a "case switch" mechanism for what you intend to do, everything falls in place.

Try this sequence:

1) [https://wiki.scinet.utoronto.ca/wiki/index.php/HPSS#Access_Through_an_Interactive_HSI_session take a look around HPSS using an interactive HSI session]

(most linux shell commands have an equivalent in HPSS)

2) [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_tarball_create archive a small test directory using HTAR]

2a) use step 1) to see what happened

3) [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_data_offload archive a file using hsi]

3a) use step 1) to see what happened

4) [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_transferring_directories archive a small test directory using HSI]

4a) use step 1) to see what happened

5) now try the other cases and so on. In a couple of hours you'll be in pretty good shape.

== '''Bridge between BGQ and HPSS''' ==

At this time BGQ users will have to migrate data to Niagara scratch prior to transferring it to HPSS. We are looking for ways to improve this workflow.

== '''Access Through the Queue System''' ==
All access to the archive system is done through the [https://docs.computecanada.ca/wiki/Niagara_Quickstart#Submitting_jobs NIA queue system].

* Job submissions should be done to the 'archivelong' queue or the 'archiveshort'
* Short jobs are limited to 1H walltime by default. Long jobs (> 1H) are limited to 72H walltime.
* Users are limited to only 2 long jobs and 2 short jobs at the same time, and 10 jobs total on the each queue.
* There can only be 5 long jobs running at any given time overall. Remaining submissions will be placed on hold for the time being. So far we have not seen a need for overall limit on short jobs.

The status of pending jobs can be monitored with squeue specifying the archive queue:
<pre>
squeue -p archiveshort
OR
squeue -p archivelong
</pre>

== '''Access Through an Interactive HSI session''' ==
* You may want to acquire an interactive shell, start an HSI session and navigate the archive naming-space. Keep in mind, you're restricted to 1H.

<pre>
pinto@nia-login07:~$ salloc -p archiveshort -t 1:00:00
salloc: Granted job allocation 50918
salloc: Waiting for resource configuration
salloc: Nodes hpss-archive02-ib are ready for job
hpss-archive02-ib:~$

hpss-archive02-ib:~$ hsi (DON'T FORGET TO START HSI)
******************************************************************
* Welcome to HPSS@SciNet - High Perfomance Storage System *
* *
* INFO: THIS IS THE NEW 7.5.1 HPSS SYSTEM! *
* *
* Contact Information: support@scinet.utoronto.ca *
* NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead *
* CHECK THE INTEGRITY OF YOUR TARBALLS *
******************************************************************
[HSI]/archive/s/scinet/pinto-> ls

[HSI]/archive/s/scinet/pinto-> cd <some directory>

</pre>

[https://wiki.scinet.utoronto.ca/wiki/index.php/HPSS#Access_Through_an_Interactive_HSI_session take a look around HPSS using an interactive HSI session]

=== Scripted File Transfers ===
File transfers in and out of the HPSS should be scripted into jobs and submitted to the ''archivelong'' queue or the ''archiveshort'' . See generic example below:

<source lang="bash">
#!/bin/bash -l
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_create_tarball_in_hpss
#SBATCH --mail-type=ALL

echo "Creating a htar of finished-job1/ directory tree into HPSS"

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

DEST=$ARCHIVE/finished-job1.tar

# htar WILL overwrite an existing file with the same name so check beforehand.

hsi ls $DEST &> /dev/null
status=$?

if [ $status == 0 ]; then
echo 'File $DEST already exists. Nothing has been done'
exit 1
fi

cd $SCRATCH/workarea/
htar -Humask=0137 -cpf $ARCHIVE/finished-job1.tar finished-job1/
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>
'''Note:''' Always trap the execution of your jobs for abnormal terminations, and be sure to return the exit code

=== Job Dependencies ===

Typically data will be recalled to /scratch when it is needed for analysis. Job dependencies can be constructed so that analysis jobs wait in the queue for data recalls before starting. The qsub flag is
<pre>
--dependency=<type:JOBID>
</pre>
where JOBID is the job number of the archive recalling job that must finish successfully before the analysis job can start.

Here is a short cut for generating the dependency (lookup [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_data_recall data-recall.sh samples]):
<pre>
hpss-archive02-ib:~$ sbatch $(sbatch data-recall.sh | awk {print "--dependency=afterany:"$1}') job-to-work-on-recalled-data.sh
</pre>

== '''HTAR''' ==
''' Please aggregate small files (<~200MB) into tarballs or htar files. '''

''' [[Why not tarballs too large |Keep your tarballs to size 500GB or less]], whether ingested by htar or hsi ([[Why not tarballs too large | WHY?]])'''

HTAR is a utility that is used for aggregating a set of files and directories, by using a sophisticated multithreaded buffering scheme to write files directly from GPFS into HPSS, creating an archive file that conforms to the POSIX TAR specification, thereby achieving a high rate of performance. HTAR does not do gzip compression, however it already has a built-in checksum algorithm.

'''Caution'''
* Files larger than 68 GB cannot be stored in an HTAR archive. If you attempt to start a transfer with any files larger than 68GB the whole HTAR session will fail, and you'll get a notification listing all those files, so that you can transfer them with HSI. unintentionally overwriting the htar destination file in HPSS
* Files with pathnames too long will be skipped (greater than 100 characters), so as to conform with TAR protocol [[(POSIX 1003.1 USTAR)]] -- Note that the HTAR will erroneously indicate success, however will produce exit code 70. For now, you can check for this type of error by "grep Warning my.output" after the job has completed.
* Unlike with cput/cget in HSI, "prompt before overwrite", this is not the default with (h)tar. Be careful not to unintentionally overwrite a previous htar destination file in HPSS. There could be a similar situation when extracting material back into GPFS and overwriting the originals. Be sure to double-check the logic in your scripts.
* Check the HTAR exit code and log file before removing any files from the GPFS active filesystems.

=== HTAR Usage ===
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, and preserve mask attributes (-p), enter:
<pre>
htar -cpf files.tar file1 file2
OR
htar -cpf $ARCHIVE/files.tar file1 file2
</pre>

* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:
<pre>
htar -cpf subdirA.tar subdirA
</pre>

* To extract all files from the archive file called ''proj1.tar'' in HPSS into the ''project1/src'' directory in GPFS, and use the time of extraction as the modification time, enter:
<pre>
cd project1/src
htar -xpmf proj1.tar
</pre>

* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):
<pre>
htar -vtf out.tar
</pre>

* To ensure that both the htar and the .idx files have read permissions to other members in your group use the umask option
<pre>
htar -Humask=0137 ....
</pre>

For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online

==== Sample tarball create ====
<source lang="bash">
#!/bin/bash -l
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_create_tarball_in_hpss
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be /archive/$(id -gn)/$(whoami)/

DEST=$ARCHIVE/finished-job1.tar

# htar WILL overwrite an existing file with the same name so check beforehand.

hsi ls $DEST &> /dev/null
status=$?

if [ $status == 0 ]; then
echo 'File $DEST already exists. Nothing has been done'
exit 1
fi

cd $SCRATCH/workarea/
htar -Humask=0137 -cpf $DEST finished-job1/
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

'''Note:''' If you attempt to start a transfer with any files larger than 68GB the whole HTAR session will fail, and you'll get a notification listing all those files, so that you can transfer them with HSI.
<pre>
----------------------------------------
INFO: File too large for htar to handle: finished-job1/file1 (86567185745 bytes)
INFO: File too large for htar to handle: finished-job1/file2 (71857244579 bytes)
ERROR: 2 oversize member files found - please correct and retry
ERROR: [FATAL] error(s) generating filename list
HTAR: HTAR FAILED
###WARNING htar returned non-zero exit status
</pre>

==== Sample tarball list ====
<source lang="bash">
#!/bin/bash -l
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_list_tarball_in_hpss
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

DEST=$ARCHIVE/finished-job1.tar

htar -tvf $DEST
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Sample tarball extract ====
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_extract_tarball_from_hpss
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

cd $SCRATCH/recalled-from-hpss
htar -xpmf $ARCHIVE/finished-job1.tar
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

== '''HSI''' ==

HSI may be the primary client with which some users will interact with HPSS. It provides an ftp-like interface for archiving and retrieving tarballs or [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_transferring_directories directory trees]. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents in HPSS. The most commonly used commands will be:
{|border="1" cellpadding="10" cellspacing="0"
|-
| cput
| Conditionally saves or replaces a HPSSpath file to GPFSpath if the GPFS version is new or has been updated
cput [options] GPFSpath [: HPSSpath]
|-
| cget
| Conditionally retrieves a copy of a file from HPSS to GPFS only if a GPFS version does not already exist.
cget [options] [GPFSpath :] HPSSpath
|-
| cd,mkdir,ls,rm,mv
| Operate as one would expect on the contents of HPSS.
|-
| lcd,lls
| ''Local'' commands to GPFS
|}

*There are 3 distinctions about HSI that you should keep in mind, and that can generate a bit of confusion when you're first learning how to use it:
** HSI doesn't currently support renaming directories paths during transfers on-the-fly, therefore the syntax for cput/cget may not work as one would expect in some scenarios, requiring some workarounds.
** HSI has an operator ":" which separates the GPFSpath and HPSSpath, and must be surrounded by whitespace (one or more space characters)
** The order for referring to files in HSI syntax is different from FTP. In HSI the general format is always the same, GPFS first, HPSS second, cput or cget:
<pre>
GPFSfile : HPSSfile
</pre>
For example, when using HSI to store the tarball file from GPFS into HPSS, then recall it to GPFS, the following commands could be used:
<pre>
cput tarball-in-GPFS : tarball-in-HPSS
cget tarball-recalled : tarball-in-HPSS
</pre>

unlike with FTP, where the following syntax would be used:
<pre>
put tarball-in-GPFS tarball-in-HPSS
get tarball-in-HPSS tarball-recalled
</pre>
* Simple commands can be executed on a single line.
<pre>
hsi "mkdir LargeFilesDir; cd LargeFilesDir; cput tarball-in-GPFS : tarball-in-HPSS"
</pre>

* More complex sequences can be performed using an except such as this:
<pre>
hsi <<EOF
mkdir LargeFilesDir
cd LargeFilesDir
cput tarball-in-GPFS : tarball-in-HPSS
lcd $SCRATCH/LargeFilesDir2/
cput -Ruph *
end
EOF
</pre>

* The commands below are equivalent, but we recommend that you always use full path, and organize the contents of HPSS, where the default HSI directory placement is $ARCHIVE:
<pre>
hsi cput tarball
hsi cput tarball : tarball
hsi cput $SCRATCH/tarball : $ARCHIVE/tarball
</pre>

* There are no known issues renaming files on-the-fly:
<pre>
hsi cput $SCRATCH/tarball1 : $ARCHIVE/tarball2
hsi cget $SCRATCH/tarball3 : $ARCHIVE/tarball2
</pre>

* However the syntax forms such as the ones below will fail, since they rename the directory paths.
<pre>
hsi cput -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir (FAILS)
OR
hsi cget -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir2 (FAILS)
OR
hsi cput -Ruph $SCRATCH/LargeFilesDir/* : $ARCHIVE/LargeFilesDir2 (FAILS)
OR
hsi cget -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir (FAILS)
</pre>

One workaround is the following 2-steps process, where you do a "lcd " in GPFS first, and recursively transfer the whole directory (-R), keeping the same name. You may use '-u' option to resume a previously disrupted session, and the '-p' to preserve timestamp, and '-h' to keep the links.
<pre>
hsi <<EOF
lcd $SCRATCH
cget -Ruph LargeFilesDir
end
EOF
</pre>

Another workaround is do a "lcd" into the GPFSpath first and a "cd" in the HPSSpath, but transfer the files individually with the '*' wild character. This option lets you change the directory name:
<pre>
hsi <<EOF
lcd $SCRATCH/LargeFilesDir
mkdir $ARCHIVE/LargeFilesDir2
cd $ARCHIVE/LargeFilesDir2
cput -Ruph *
end
EOF
</pre>

=== Documentation ===
Complete documentation on HSI is available from the Gleicher Enterprises links below. You may peruse those links and come with alternative syntax forms. You may even be already familiar with HPSS/HSI from other HPC facilities, that may or not have procedures similar to ours. HSI doesn't always work as expected when you go outside of our recommended syntax, so '''we strongly urge that you use the sample scripts we are providing as the basis''' for your job submissions
* [http://www.mgleicher.us/hsi/hsi_reference_manual_2/introduction.html HSI Introduction]
* [http://www.mgleicher.us/hsi/hsi_man_page.html man hsi]
* [http://support.scinet.utoronto.ca/wiki/index.php/HSI_help hsi help]
* [http://www.mgleicher.us/hsi/hsi-exit-codes.html exit codes]
'''Note:''' HSI returns the highest-numbered exit code, in case of multiple operations in the same hsi session. You may use '/scinet/niagara/bin/exit2msg $status' to translate those codes into intelligible messages

=== Typical Usage Scripts===
The most common interactions will be ''putting'' data into HPSS, examining the contents (ls,ish), and ''getting'' data back onto GPFS for inspection or analysis.

==== Sample '''data offload''' ====
<source lang="bash">
#!/bin/bash
# This script is named: data-offload.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J offload
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# individual tarballs already exist

/usr/local/bin/hsi -v <<EOF1
mkdir put-away
cd put-away
cput $SCRATCH/workarea/finished-job1.tar.gz : finished-job1.tar.gz
end
EOF1
status=$?
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

/usr/local/bin/hsi -v <<EOF2
mkdir put-away
cd put-away
cput $SCRATCH/workarea/finished-job2.tar.gz : finished-job2.tar.gz
end
EOF2
status=$?
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

trap - TERM INT
</source>

'''Note:''' as in the above example, we recommend that you capture the (highest-numbered) exit code for each hsi session independently. And remember, you may improve your exit code verbosity by adding the excerpt below to your scripts:
<source lang="bash">
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Sample '''data list''' ====
A very trivial way to list the contents of HPSS would be to just submit the HSI 'ls' command.
<source lang="bash">
#!/bin/bash
# This script is named: data-list.sh
#SBATCH -t 1:00:00
#SBATCH -p archiveshort
#SBATCH -N 1
#SBATCH -J hpss_ls
#SBATCH --mail-type=ALL

/usr/local/bin/hsi -v <<EOF
cd put-away
ls -R
end
EOF
</source>
''Warning: if you have a lot of files, the ls command will take a long time to complete. For instance, about 400,000 files can be listed in about an hour. Adjust the walltime accordingly, and be on the safe side.''

However, we provide a much more useful and convenient way to explore the contents of HPSS with the inventory shell [[ISH]]. This example creates an index of all the files in a user's portion of the namespace. The list is placed in the directory /home/$(whoami)/.ish_register that can be inspected from the login nodes.
<source lang="bash">
#!/bin/bash
# This script is named: data-list.sh
#SBATCH -t 1:00:00
#SBATCH -p archiveshort
#SBATCH -N 1
#SBATCH -J hpss_index
#SBATCH --mail-type=ALL

INDEX_DIR=$HOME/.ish_register
if ! [ -e "$INDEX_DIR" ]; then
mkdir -p $INDEX_DIR
fi

export ISHREGISTER="$INDEX_DIR"
/scinet/niagara/bin/ish hindex
</source>
''Note: the above warning on collecting the listing for many files applies here too.''

This index can be browsed or searched with ISH on the development nodes.
<source lang="bash">
[pinto@hpss-archive02-ib ~]$ /scinet/niagara/bin/ish ~/.ish_register/hpss.igz
[ish]hpss.igz> help
</source>

ISH is a powerful tool that is also useful for creating and browsing indices of tar and htar archives, so please look at the [[ISH|documentation]] or built in help.

==== Sample '''data recall''' ====
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_files
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT

mkdir -p $SCRATCH/recalled-from-hpss

# individual tarballs previously organized in HPSS inside the put-away-on-2010/ folder
hsi -v << EOF
cget $SCRATCH/recalled-from-hpss/Jan-2010-jobs.tar.gz : $ARCHIVE/put-away-on-2010/Jan-2010-jobs.tar.gz
cget $SCRATCH/recalled-from-hpss/Feb-2010-jobs.tar.gz : $ARCHIVE/put-away-on-2010/Feb-2010-jobs.tar.gz
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagar/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

We should emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do optimization, as in the following example:
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_files_optimized
#SBATCH --mail-type=AL

trap "echo 'Job script not completed';exit 129" TERM INT
mkdir -p $SCRATCH/recalled-from-hpss

# individual tarballs previously organized in HPSS inside the put-away-on-2010/ folder
hsi -v << EOF
lcd $SCRATCH/recalled-from-hpss/
cd $ARCHIVE/put-away-on-2010/
cget Jan-2010-jobs.tar.gz Feb-2010-jobs.tar.gz
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Sample '''transferring directories''' ====
Remember, it's not possible to rename directories on-the-fly:
<pre>
hsi cget -Ruph $SCRATCH/LargeFiles-recalled : $ARCHIVE/LargeFiles (FAILS)
</pre>

One workaround is transfer the whole directory (and sub-directories) recursively:
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_directories
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT

mkdir -p $SCRATCH/recalled

hsi -v << EOF
lcd $SCRATCH/recalled
cd $ARCHIVE/
cget -Ruph LargeFiles
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

Another workaround is to transfer files and subdirectories individually with the "*" wild character:
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_directories
#SBATCH --mail-type=AL

trap "echo 'Job script not completed';exit 129" TERM INT

mkdir -p $SCRATCH/LargeFiles-recalled

hsi -v << EOF
lcd $SCRATCH/LargeFiles-recalled
cd $ARCHIVE/LargeFiles
cget -Ruph *
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help''']

== '''[[ISH|ISH]]''' ==
=== [[ISH|Documentation and Usage]] ===

== '''File and directory management''' ==
=== Moving/renaming ===
* you may use 'mv' or 'cp' in the same way as the linux version.
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J deletion_script
#SBATCH --mail-type=ALL

echo "HPSS file and directory management"

trap "echo 'Job script not completed';exit 129" TERM INT

/usr/local/bin/hsi -v <<EOF1
mkdir $ARCHIVE/2011
mv $ARCHIVE/oldjobs $ARCHIVE/2011
cp -r $ARCHIVE/almostfinished/*done $ARCHIVE/2011
end
EOF1
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

=== Deletions ===
==== Recommendations ====
* Be careful with the use of 'cd' commands to non-existing directories before the 'rm' command. Results may be unpredictable
* Avoid the use of the stand alone wild character '''*'''. If necessary, whenever possible have it bound to common patterns, such as '*.tmp', so to limit unintentional mis-happens
* Avoid using relative paths, even the env variable $ARCHIVE. Better to explicitly expand the full paths in your scripts
* Avoid using recursive/looped deletion instructions on $SCRATCH contents from the archive job scripts. Even on $ARCHIVE contents, it may be better to do it as an independent job submission, after you have verified that the original ingestion into HPSS finished without any issues.

==== Typical example ====
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J deletion_script
#SBATCH --mail-type=ALL

echo "Deletion of an outdated directory tree into HPSS"

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that the initial directory in HPSS ($ARCHIVE) has the path explicitly expanded

/usr/local/bin/hsi -v <<EOF1
rm /archive/s/scinet/pinto/*.tmp
rm -R /archive/s/scinet/pinto/obsolete
end
EOF1
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Deleting with an interactive HSI session ====
* You may feel more comfortable acquiring an interactive shell, starting an HSI session and proceeding with your deletions that way. Keep in mind, you're restricted to 1H.

* After using the ''qsub -q archive -I'' command you'll get a standard shell prompt on an archive execution node (hpss-archive02), as you would on any compute node. However you will need to run '''HSI''' or '''HTAR''' to access resources on HPSS.

* HSI will give you a prompt very similar to a standard shell, where your can navigate around using commands such 'ls', 'cd', 'pwd', etc ... NOTE: not every bash command has an equivalent on HSI - for instance, you can not 'vi' or 'cat'.

<pre>
pinto@nia-login07:~$ salloc -p archiveshort -t 1:00:00
salloc: Granted job allocation 50359
salloc: Waiting for resource configuration
salloc: Nodes hpss-archive02-ib are ready for job

[pinto@hpss-archive02-ib ~]$ hsi
******************************************************************
* Welcome to HPSS@SciNet - High Perfomance Storage System *
* *
* INFO: THIS IS THE NEW 7.5.1 HPSS SYSTEM! *
* *
* Contact Information: support@scinet.utoronto.ca *
* NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead *
* CHECK THE INTEGRITY OF YOUR TARBALLS *
******************************************************************
]
[HSI]/archive/s/scinet/pinto-> rm -R junk

</pre>

== '''HPSS for the 'Watchmaker' ''' ==
=== Efficient alternative to htar ===
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J tar_create_tarball_in_hpss_with_hsi_by_piping
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

# When using a pipeline like this
set -o pipefail

# to put (cput will fail)
tar -c $SCRATCH/mydir | hsi put - : $ARCHIVE/mydir.tar
status=$?
if [ ! $status == 0 ]; then
echo 'TAR+HSI+piping returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# to immediately generate an index
ish hindex $ARCHIVE/mydir.tar
status=$?
if [ ! $status == 0 ]; then
echo 'ISH returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# to get
#cd $SCRATCH
#hsi cget - : $ARCHIVE/mydir.tar | tar -xv
#status=$?
# if [ ! $status == 0 ]; then
# echo 'TAR+HSI+piping returned non-zero code.'
# /scinet/niagara/bin/exit2msg $status
# exit $status
#else
# echo 'TRANSFER SUCCESSFUL'
#fi

trap - TERM INT

</source>
'''Notes:'''
* Combining commands in this fashion, besides being HPSS-friendly, should not be that noticeably slower than the recursive put with HSI that stores each file one by one. However, reading the files back from tape in this format will be many times faster. It would also overcome the current 68GB limit on the size of stored files that we have with htar.
* To top things off, we recommend indexing with ish (in the same script) immediately after the tarball creation , while it resides in the HPSS cache. It would be as if htar was used.
* To ensure that an error at any stage of the pipeline shows up in the returned status use: ''set -o pipefail'' (The default is to return the status of the last command in the pipeline and this is not what you want.)
* Optimal performance for aggregated transfers and allocation on tapes is obtained with [[Why not tarballs too large |tarballs of size 500GB or less]], whether ingested by htar or hsi ([[Why not tarballs too large | WHY?]]). Be sure to check the contents of the directory tree with 'du' for the total amount of data before sending them to the tar+HSI piping.

=== Multi-threaded gzip'ed compression with pigz ===
We compiled multi-threaded implementation of gzip called pigz (http://zlib.net/pigz/). It's now part of the "extras" module. It can also be used on any compute or devel nodes. This makes the execution of the previous version of the script much quicker than if you were to use 'tar -cfz'. In addition, by piggy-backing ISH to the end of the script, it will know what to do with the just created mydir.tar.gz compressed tarball.

<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J tar_create_compressed_tarball_in_hpss_with_hsi_by_piping
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

# When using a pipeline like this
set -o pipefail

load module extras

# to put (cput will fail)
tar -c $SCRATCH/mydir | pigz | hsi put - : $ARCHIVE/mydir.tar.gz
status=$?
if [ ! $status == 0 ]; then
echo 'TAR+PIGZ+HSI+piping returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

=== Content Verification ===

==== HTAR CRC checksums ====
Specifies that HTAR should generate CRC checksums when creating the archive.

<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_create_tarball_in_hpss_with_checksum_verification
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

cd $SCRATCH/workarea

# to put
htar -Humask=0137 -cpf $ARCHIVE/finished-job1.tar -Hcrc -Hverify=1 finished-job1/

# to get
#mkdir $SCRATCH/verification
#cd $SCRATCH/verification
#htar -Hcrc -xvpmf $ARCHIVE/finished-job1.tar

status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Current HSI version - Checksum built-in ====

MD5 is the standard Hashing Algorithm for the HSI build at SciNet. For hsi ingestions with the '-c on' option you should be able to query the md5 hash with the hsi command 'hashli'. That value is stored as an UDA (User Defined Attribute) for each file (a feature of HPSS starting with 7.4)

[http://www.mgleicher.us/GEL/hsi/hsi_reference_manual_2/checksum-feature.html More usage details here]

The checksum algorithm is very CPU-intensive. Although the checksum code is compiled with a high level of compiler optimization, transfer rates can be significantly reduced when checksum creation or verification is in effect. The amount of degradation in transfer rates depends on several factors, such as processor speed, network transfer speed, and speed of the local filesystem (GPFS).

<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J MD5_checksum_verified_transfer
#SBATCH --mail-type=ALL

thefile=<GPFSpath>
storedfile=<HPSSpath>

# Generate checksum on fly (-c on)
hsi -q put -c on $thefile : $storedfile
pid=$!

# Check the exit code of the HSI process
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# verify checksum
hsi lshash $storedfile
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# get the file back with checksum
hsi get -c on $storedfile
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Prior to HSI version 4.0.1.1 ====

This will checksum the contents of the HPSSpath against the original GPFSpath after the transfer has finished.
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J checksum_verified_transfer
#SBATCH --mail-type=ALL

thefile=<GPFSpath>
storedfile=<HPSSpath>

# Generate checksum on fly using a named pipe so that file is only read from GPFS once
mkfifo /tmp/NPIPE
cat $thefile | tee /tmp/NPIPE | hsi -q put - : $storedfile &
pid=$!
md5sum /tmp/NPIPE |tee /tmp/$fname.md5
rm -f /tmp/NPIPE

# Check the exit code of the HSI process
wait $pid
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# change filename to stdin in checksum file
sed -i.1 "s+/tmp/NPIPE+-+" /tmp/$fname.md5

# verify checksum
hsi -q get - : $storedfile | md5sum -c /tmp/$fname.md5
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

== '''Access to HPSS using Globus''' ==
* Please note that Globus access to HPSS is disabled until further notice, due to lack of version compatibility.

* You may now transfer data between SciNet's HPSS and an external source
* Follow the link below
https://globus.computecanada.ca
: Enter your Compute Canada username and password.
* In the 'File Transfer' tab, enter ''''Compute Canada HPSS'''' as one of the Endpoints. To authenticate this endpoint, enter your SciNet username and password.
* You may read more about Compute Canada's Globus Portal here:
https://docs.computecanada.ca/wiki/Globus

== '''Access to HPSS using SME''' ==
* Storage Made Easy - SME - is an Enterprise Cloud Portal adopted by SciNet to allow our users to access HPSS
* Best suitable for light transfers to/from your personal computer and to navigate your contents on HPSS
* Follow the link below using a web browser and login with your SicNet UserID and password. Under File Manager you will find the "'''SciNet HPSS'''" folder.
https://sme.scinet.utoronto.ca
* SME can be configured as a DropBox. To download the Free Cloud File Manager native to your OS (Windows, Mac, Linux, mobile), follow the link below:
https://www.storagemadeeasy.com/clients_and_tools/
Once you have downloaded and installed the Cloud Manager App, fill up the following information:
Server location
https://sme.scinet.utoronto.ca/api
* You may learn more about SME capabilities and features here:
https://www.storagemadeeasy.com/ownFileserver/
https://www.storagemadeeasy.com/pricing/#features (Enterprise)
https://storagemadeeasy.com/faq/

== '''User provided Content/Suggestions''' ==
=== '''[[HPSS-by-pomes|Packing up large data sets and putting them on HPSS]]''' ===
(Pomés group recommendations)

[[Data Management|BACK TO Data Management]]

HPSS

2018-05-04T00:06:12Z

Pinto: /* Access Through an Interactive HSI session */

{|align=right
|align=center|'''Topology Overview'''
|align=center|'''Submission Queue'''
|-
|[[Image:HPSS-overview.png|right|x200px]]
|[[Image:HPSS-queue2.png|right|x200px]]
|-
|
|
|-
|align=center|'''Servers Rack'''
|align=center|'''TS3500 Library'''
|-
|[[Image:HPSS-servers.png|right|x250px]]
|[[Image:HPSS-TS3500.png|right|x250px]]
|}

= '''High Performance Storage System''' =

The High Performance Storage System ([http://www.hpss-collaboration.org/index.shtml HPSS] [http://en.wikipedia.org/wiki/High_Performance_Storage_System wikipedia]) is a tape-backed hierarchical storage system that provides a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data can be returned to the active GPFS filesystem when it is needed.

Since this system is intended for large data storage, it is accessible only to groups who have been awarded storage space at SciNet beyond 5TB in the yearly RAC resource allocation round. However, upon request, any user may be awarded access to HPSS, up to 2TB per group, so that you may get familiar with the system (just email support@scinet.utoronto.ca)

Access and transfer of data into and out of HPSS is done under the control of the user, whose interaction is expected to be scripted and submitted as a batch job, using one or more of the following utilities:
* [http://www.mgleicher.us/GEL/hsi HSI] is a client with an ftp-like functionality which can be used to archive and retrieve large files. It is also useful for browsing the contents of HPSS.
* [http://www.mgleicher.us/GEL/htar HTAR] is a utility that creates tar formatted archives directly into HPSS. It also creates a separate index file (.idx) that can be accessed and browsed quickly.
* [https://support.scinet.utoronto.ca/wiki/index.php/ISH ISH] is a TUI utility that can perform an inventory of the files and directories in your tarballs.

We're currently running HPSS v 7.3.3 patch 6, and HSI/HTAR version 4.0.1.2

== '''Why should I use and trust HPSS?''' ==
* HPSS is a 25 year-old collaboration between IBM and the DoE labs in the US, and is used by about 45 facilities in the [http://www.top500.org “Top 500”] HPC list (plus some black-sites).
* Over 2.5 ExaBytes of combined storage world-wide.
* The top 3 sites in the World report (fall 2017) having 360PB, 220PB and 125PB in production (ECMWF, UKMO and BNL)
* Environment Canada also adopted HPSS in 2017 to store Nav Canada data as well as to serve as their own archive. Currently has 2 X 100PB capacity installed.
* The SciNet HPSS system has been providing nearline capacity for important research data in Canada since early 2011, already at 10PB levels in 2018
* Very reliable, data redundancy and data insurance built-in (dual copies of everything are kept on tapes at SciNet)
* Data on cache and tapes can be geo-distributed for further resilience and HA.
* Highly scalable; current performance at SciNet - after a modest upgrade in 2017 - Ingest: ~150 TB/day, Recall: ~45 TB/day (aggregated).
* HSI/HTAR clients also very reliable and used on several HPSS sites. ISH was written at SciNet.
* [[Media:HPSS_rationale_SNUG.pdf|HPSS fits well with the Storage Capacity Expansion Plan at SciNet]] (pdf presentation)

== '''Guidelines''' ==
* Expanded storage capacity is provided on tape -- a media that is not suited for storing small files. Files smaller than ~200MB should be grouped into tarballs with tar or htar.
* Optimal performance for aggregated transfers and allocation on tapes is obtained with [[Why not tarballs too large |tarballs of size 500GB or less]], whether ingested by htar or hsi ([[Why not tarballs too large | WHY?]])
* We strongly urge that you use the sample scripts we are providing as the basis for your job submissions.
* Make sure to check the application's exit code and returned logs for errors after any data transfer or tarball creation process

== '''New to the System?''' ==
The first step is to email scinet support and request an HPSS account (or else you will get "Error - authentication/initialization failed" and 71 exit codes).

THIS set of instructions on the wiki is the best and most compressed "manual" we have. It may seem a bit overwhelming at first, because of all the job script templates we make available below (they are here so you don't have to think
too much, just copy and paste), but if you approach the index at the top as a "case switch" mechanism for what you intend to do, everything falls in place.

Try this sequence:

1) [https://wiki.scinet.utoronto.ca/wiki/index.php/HPSS#Access_Through_an_Interactive_HSI_session take a look around HPSS using an interactive HSI session]

(most linux shell commands have an equivalent in HPSS)

2) [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_tarball_create archive a small test directory using HTAR]

2a) use step 1) to see what happened

3) [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_data_offload archive a file using hsi]

3a) use step 1) to see what happened

4) [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_transferring_directories archive a small test directory using HSI]

4a) use step 1) to see what happened

5) now try the other cases and so on. In a couple of hours you'll be in pretty good shape.

== '''Bridge between BGQ and HPSS''' ==

At this time BGQ users will have to migrate data to Niagara scratch prior to transferring it to HPSS. We are looking for ways to improve this workflow.

== '''Access Through the Queue System''' ==
All access to the archive system is done through the [https://docs.computecanada.ca/wiki/Niagara_Quickstart#Submitting_jobs NIA queue system].

* Job submissions should be done to the 'archivelong' queue or the 'archiveshort'
* Short jobs are limited to 1H walltime by default. Long jobs (> 1H) are limited to 72H walltime.
* Users are limited to only 2 long jobs and 2 short jobs at the same time, and 10 jobs total on the each queue.
* There can only be 5 long jobs running at any given time overall. Remaining submissions will be placed on hold for the time being. So far we have not seen a need for overall limit on short jobs.

The status of pending jobs can be monitored with squeue specifying the archive queue:
<pre>
squeue -p archiveshort
OR
squeue -p archivelong
</pre>

== '''Access Through an Interactive HSI session''' ==
* You may want to acquire an interactive shell, start an HSI session and navigate the archive naming-space. Keep in mind, you're restricted to 1H.

<pre>
pinto@nia-login07:~$ salloc -p archiveshort -t 1:00:00
salloc: Granted job allocation 50918
salloc: Waiting for resource configuration
salloc: Nodes hpss-archive02-ib are ready for job
hpss-archive02-ib:~$

hpss-archive02-ib:~$ hsi (DON'T FORGET TO START HSI)
******************************************************************
* Welcome to HPSS@SciNet - High Perfomance Storage System *
* *
* INFO: THIS IS THE NEW 7.5.1 HPSS SYSTEM! *
* *
* Contact Information: support@scinet.utoronto.ca *
* NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead *
* CHECK THE INTEGRITY OF YOUR TARBALLS *
******************************************************************
[HSI]/archive/s/scinet/pinto-> ls

[HSI]/archive/s/scinet/pinto-> cd <some directory>

</pre>

[https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Deleting_with_an_interactive_HSI_session take a look around HPSS using an interactive HSI session]

=== Scripted File Transfers ===
File transfers in and out of the HPSS should be scripted into jobs and submitted to the ''archivelong'' queue or the ''archiveshort'' . See generic example below:

<source lang="bash">
#!/bin/bash -l
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_create_tarball_in_hpss
#SBATCH --mail-type=ALL

echo "Creating a htar of finished-job1/ directory tree into HPSS"

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

DEST=$ARCHIVE/finished-job1.tar

# htar WILL overwrite an existing file with the same name so check beforehand.

hsi ls $DEST &> /dev/null
status=$?

if [ $status == 0 ]; then
echo 'File $DEST already exists. Nothing has been done'
exit 1
fi

cd $SCRATCH/workarea/
htar -Humask=0137 -cpf $ARCHIVE/finished-job1.tar finished-job1/
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>
'''Note:''' Always trap the execution of your jobs for abnormal terminations, and be sure to return the exit code

=== Job Dependencies ===

Typically data will be recalled to /scratch when it is needed for analysis. Job dependencies can be constructed so that analysis jobs wait in the queue for data recalls before starting. The qsub flag is
<pre>
--dependency=<type:JOBID>
</pre>
where JOBID is the job number of the archive recalling job that must finish successfully before the analysis job can start.

Here is a short cut for generating the dependency (lookup [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_data_recall data-recall.sh samples]):
<pre>
hpss-archive02-ib:~$ sbatch $(sbatch data-recall.sh | awk {print "--dependency=afterany:"$1}') job-to-work-on-recalled-data.sh
</pre>

== '''HTAR''' ==
''' Please aggregate small files (<~200MB) into tarballs or htar files. '''

''' [[Why not tarballs too large |Keep your tarballs to size 500GB or less]], whether ingested by htar or hsi ([[Why not tarballs too large | WHY?]])'''

HTAR is a utility that is used for aggregating a set of files and directories, by using a sophisticated multithreaded buffering scheme to write files directly from GPFS into HPSS, creating an archive file that conforms to the POSIX TAR specification, thereby achieving a high rate of performance. HTAR does not do gzip compression, however it already has a built-in checksum algorithm.

'''Caution'''
* Files larger than 68 GB cannot be stored in an HTAR archive. If you attempt to start a transfer with any files larger than 68GB the whole HTAR session will fail, and you'll get a notification listing all those files, so that you can transfer them with HSI. unintentionally overwriting the htar destination file in HPSS
* Files with pathnames too long will be skipped (greater than 100 characters), so as to conform with TAR protocol [[(POSIX 1003.1 USTAR)]] -- Note that the HTAR will erroneously indicate success, however will produce exit code 70. For now, you can check for this type of error by "grep Warning my.output" after the job has completed.
* Unlike with cput/cget in HSI, "prompt before overwrite", this is not the default with (h)tar. Be careful not to unintentionally overwrite a previous htar destination file in HPSS. There could be a similar situation when extracting material back into GPFS and overwriting the originals. Be sure to double-check the logic in your scripts.
* Check the HTAR exit code and log file before removing any files from the GPFS active filesystems.

=== HTAR Usage ===
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, and preserve mask attributes (-p), enter:
<pre>
htar -cpf files.tar file1 file2
OR
htar -cpf $ARCHIVE/files.tar file1 file2
</pre>

* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:
<pre>
htar -cpf subdirA.tar subdirA
</pre>

* To extract all files from the archive file called ''proj1.tar'' in HPSS into the ''project1/src'' directory in GPFS, and use the time of extraction as the modification time, enter:
<pre>
cd project1/src
htar -xpmf proj1.tar
</pre>

* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):
<pre>
htar -vtf out.tar
</pre>

* To ensure that both the htar and the .idx files have read permissions to other members in your group use the umask option
<pre>
htar -Humask=0137 ....
</pre>

For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online

==== Sample tarball create ====
<source lang="bash">
#!/bin/bash -l
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_create_tarball_in_hpss
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be /archive/$(id -gn)/$(whoami)/

DEST=$ARCHIVE/finished-job1.tar

# htar WILL overwrite an existing file with the same name so check beforehand.

hsi ls $DEST &> /dev/null
status=$?

if [ $status == 0 ]; then
echo 'File $DEST already exists. Nothing has been done'
exit 1
fi

cd $SCRATCH/workarea/
htar -Humask=0137 -cpf $DEST finished-job1/
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

'''Note:''' If you attempt to start a transfer with any files larger than 68GB the whole HTAR session will fail, and you'll get a notification listing all those files, so that you can transfer them with HSI.
<pre>
----------------------------------------
INFO: File too large for htar to handle: finished-job1/file1 (86567185745 bytes)
INFO: File too large for htar to handle: finished-job1/file2 (71857244579 bytes)
ERROR: 2 oversize member files found - please correct and retry
ERROR: [FATAL] error(s) generating filename list
HTAR: HTAR FAILED
###WARNING htar returned non-zero exit status
</pre>

==== Sample tarball list ====
<source lang="bash">
#!/bin/bash -l
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_list_tarball_in_hpss
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

DEST=$ARCHIVE/finished-job1.tar

htar -tvf $DEST
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Sample tarball extract ====
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_extract_tarball_from_hpss
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

cd $SCRATCH/recalled-from-hpss
htar -xpmf $ARCHIVE/finished-job1.tar
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

== '''HSI''' ==

HSI may be the primary client with which some users will interact with HPSS. It provides an ftp-like interface for archiving and retrieving tarballs or [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_transferring_directories directory trees]. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents in HPSS. The most commonly used commands will be:
{|border="1" cellpadding="10" cellspacing="0"
|-
| cput
| Conditionally saves or replaces a HPSSpath file to GPFSpath if the GPFS version is new or has been updated
cput [options] GPFSpath [: HPSSpath]
|-
| cget
| Conditionally retrieves a copy of a file from HPSS to GPFS only if a GPFS version does not already exist.
cget [options] [GPFSpath :] HPSSpath
|-
| cd,mkdir,ls,rm,mv
| Operate as one would expect on the contents of HPSS.
|-
| lcd,lls
| ''Local'' commands to GPFS
|}

*There are 3 distinctions about HSI that you should keep in mind, and that can generate a bit of confusion when you're first learning how to use it:
** HSI doesn't currently support renaming directories paths during transfers on-the-fly, therefore the syntax for cput/cget may not work as one would expect in some scenarios, requiring some workarounds.
** HSI has an operator ":" which separates the GPFSpath and HPSSpath, and must be surrounded by whitespace (one or more space characters)
** The order for referring to files in HSI syntax is different from FTP. In HSI the general format is always the same, GPFS first, HPSS second, cput or cget:
<pre>
GPFSfile : HPSSfile
</pre>
For example, when using HSI to store the tarball file from GPFS into HPSS, then recall it to GPFS, the following commands could be used:
<pre>
cput tarball-in-GPFS : tarball-in-HPSS
cget tarball-recalled : tarball-in-HPSS
</pre>

unlike with FTP, where the following syntax would be used:
<pre>
put tarball-in-GPFS tarball-in-HPSS
get tarball-in-HPSS tarball-recalled
</pre>
* Simple commands can be executed on a single line.
<pre>
hsi "mkdir LargeFilesDir; cd LargeFilesDir; cput tarball-in-GPFS : tarball-in-HPSS"
</pre>

* More complex sequences can be performed using an except such as this:
<pre>
hsi <<EOF
mkdir LargeFilesDir
cd LargeFilesDir
cput tarball-in-GPFS : tarball-in-HPSS
lcd $SCRATCH/LargeFilesDir2/
cput -Ruph *
end
EOF
</pre>

* The commands below are equivalent, but we recommend that you always use full path, and organize the contents of HPSS, where the default HSI directory placement is $ARCHIVE:
<pre>
hsi cput tarball
hsi cput tarball : tarball
hsi cput $SCRATCH/tarball : $ARCHIVE/tarball
</pre>

* There are no known issues renaming files on-the-fly:
<pre>
hsi cput $SCRATCH/tarball1 : $ARCHIVE/tarball2
hsi cget $SCRATCH/tarball3 : $ARCHIVE/tarball2
</pre>

* However the syntax forms such as the ones below will fail, since they rename the directory paths.
<pre>
hsi cput -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir (FAILS)
OR
hsi cget -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir2 (FAILS)
OR
hsi cput -Ruph $SCRATCH/LargeFilesDir/* : $ARCHIVE/LargeFilesDir2 (FAILS)
OR
hsi cget -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir (FAILS)
</pre>

One workaround is the following 2-steps process, where you do a "lcd " in GPFS first, and recursively transfer the whole directory (-R), keeping the same name. You may use '-u' option to resume a previously disrupted session, and the '-p' to preserve timestamp, and '-h' to keep the links.
<pre>
hsi <<EOF
lcd $SCRATCH
cget -Ruph LargeFilesDir
end
EOF
</pre>

Another workaround is do a "lcd" into the GPFSpath first and a "cd" in the HPSSpath, but transfer the files individually with the '*' wild character. This option lets you change the directory name:
<pre>
hsi <<EOF
lcd $SCRATCH/LargeFilesDir
mkdir $ARCHIVE/LargeFilesDir2
cd $ARCHIVE/LargeFilesDir2
cput -Ruph *
end
EOF
</pre>

=== Documentation ===
Complete documentation on HSI is available from the Gleicher Enterprises links below. You may peruse those links and come with alternative syntax forms. You may even be already familiar with HPSS/HSI from other HPC facilities, that may or not have procedures similar to ours. HSI doesn't always work as expected when you go outside of our recommended syntax, so '''we strongly urge that you use the sample scripts we are providing as the basis''' for your job submissions
* [http://www.mgleicher.us/hsi/hsi_reference_manual_2/introduction.html HSI Introduction]
* [http://www.mgleicher.us/hsi/hsi_man_page.html man hsi]
* [http://support.scinet.utoronto.ca/wiki/index.php/HSI_help hsi help]
* [http://www.mgleicher.us/hsi/hsi-exit-codes.html exit codes]
'''Note:''' HSI returns the highest-numbered exit code, in case of multiple operations in the same hsi session. You may use '/scinet/niagara/bin/exit2msg $status' to translate those codes into intelligible messages

=== Typical Usage Scripts===
The most common interactions will be ''putting'' data into HPSS, examining the contents (ls,ish), and ''getting'' data back onto GPFS for inspection or analysis.

==== Sample '''data offload''' ====
<source lang="bash">
#!/bin/bash
# This script is named: data-offload.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J offload
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# individual tarballs already exist

/usr/local/bin/hsi -v <<EOF1
mkdir put-away
cd put-away
cput $SCRATCH/workarea/finished-job1.tar.gz : finished-job1.tar.gz
end
EOF1
status=$?
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

/usr/local/bin/hsi -v <<EOF2
mkdir put-away
cd put-away
cput $SCRATCH/workarea/finished-job2.tar.gz : finished-job2.tar.gz
end
EOF2
status=$?
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

trap - TERM INT
</source>

'''Note:''' as in the above example, we recommend that you capture the (highest-numbered) exit code for each hsi session independently. And remember, you may improve your exit code verbosity by adding the excerpt below to your scripts:
<source lang="bash">
if [ ! $status == 0 ];then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Sample '''data list''' ====
A very trivial way to list the contents of HPSS would be to just submit the HSI 'ls' command.
<source lang="bash">
#!/bin/bash
# This script is named: data-list.sh
#SBATCH -t 1:00:00
#SBATCH -p archiveshort
#SBATCH -N 1
#SBATCH -J hpss_ls
#SBATCH --mail-type=ALL

/usr/local/bin/hsi -v <<EOF
cd put-away
ls -R
end
EOF
</source>
''Warning: if you have a lot of files, the ls command will take a long time to complete. For instance, about 400,000 files can be listed in about an hour. Adjust the walltime accordingly, and be on the safe side.''

However, we provide a much more useful and convenient way to explore the contents of HPSS with the inventory shell [[ISH]]. This example creates an index of all the files in a user's portion of the namespace. The list is placed in the directory /home/$(whoami)/.ish_register that can be inspected from the login nodes.
<source lang="bash">
#!/bin/bash
# This script is named: data-list.sh
#SBATCH -t 1:00:00
#SBATCH -p archiveshort
#SBATCH -N 1
#SBATCH -J hpss_index
#SBATCH --mail-type=ALL

INDEX_DIR=$HOME/.ish_register
if ! [ -e "$INDEX_DIR" ]; then
mkdir -p $INDEX_DIR
fi

export ISHREGISTER="$INDEX_DIR"
/scinet/niagara/bin/ish hindex
</source>
''Note: the above warning on collecting the listing for many files applies here too.''

This index can be browsed or searched with ISH on the development nodes.
<source lang="bash">
[pinto@hpss-archive02-ib ~]$ /scinet/niagara/bin/ish ~/.ish_register/hpss.igz
[ish]hpss.igz> help
</source>

ISH is a powerful tool that is also useful for creating and browsing indices of tar and htar archives, so please look at the [[ISH|documentation]] or built in help.

==== Sample '''data recall''' ====
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_files
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT

mkdir -p $SCRATCH/recalled-from-hpss

# individual tarballs previously organized in HPSS inside the put-away-on-2010/ folder
hsi -v << EOF
cget $SCRATCH/recalled-from-hpss/Jan-2010-jobs.tar.gz : $ARCHIVE/put-away-on-2010/Jan-2010-jobs.tar.gz
cget $SCRATCH/recalled-from-hpss/Feb-2010-jobs.tar.gz : $ARCHIVE/put-away-on-2010/Feb-2010-jobs.tar.gz
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagar/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

We should emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do optimization, as in the following example:
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_files_optimized
#SBATCH --mail-type=AL

trap "echo 'Job script not completed';exit 129" TERM INT
mkdir -p $SCRATCH/recalled-from-hpss

# individual tarballs previously organized in HPSS inside the put-away-on-2010/ folder
hsi -v << EOF
lcd $SCRATCH/recalled-from-hpss/
cd $ARCHIVE/put-away-on-2010/
cget Jan-2010-jobs.tar.gz Feb-2010-jobs.tar.gz
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Sample '''transferring directories''' ====
Remember, it's not possible to rename directories on-the-fly:
<pre>
hsi cget -Ruph $SCRATCH/LargeFiles-recalled : $ARCHIVE/LargeFiles (FAILS)
</pre>

One workaround is transfer the whole directory (and sub-directories) recursively:
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_directories
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT

mkdir -p $SCRATCH/recalled

hsi -v << EOF
lcd $SCRATCH/recalled
cd $ARCHIVE/
cget -Ruph LargeFiles
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

Another workaround is to transfer files and subdirectories individually with the "*" wild character:
<source lang="bash">
#!/bin/bash
# This script is named: data-recall.sh
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J recall_directories
#SBATCH --mail-type=AL

trap "echo 'Job script not completed';exit 129" TERM INT

mkdir -p $SCRATCH/LargeFiles-recalled

hsi -v << EOF
lcd $SCRATCH/LargeFiles-recalled
cd $ARCHIVE/LargeFiles
cget -Ruph *
end
EOF
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help''']

== '''[[ISH|ISH]]''' ==
=== [[ISH|Documentation and Usage]] ===

== '''File and directory management''' ==
=== Moving/renaming ===
* you may use 'mv' or 'cp' in the same way as the linux version.
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J deletion_script
#SBATCH --mail-type=ALL

echo "HPSS file and directory management"

trap "echo 'Job script not completed';exit 129" TERM INT

/usr/local/bin/hsi -v <<EOF1
mkdir $ARCHIVE/2011
mv $ARCHIVE/oldjobs $ARCHIVE/2011
cp -r $ARCHIVE/almostfinished/*done $ARCHIVE/2011
end
EOF1
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

=== Deletions ===
==== Recommendations ====
* Be careful with the use of 'cd' commands to non-existing directories before the 'rm' command. Results may be unpredictable
* Avoid the use of the stand alone wild character '''*'''. If necessary, whenever possible have it bound to common patterns, such as '*.tmp', so to limit unintentional mis-happens
* Avoid using relative paths, even the env variable $ARCHIVE. Better to explicitly expand the full paths in your scripts
* Avoid using recursive/looped deletion instructions on $SCRATCH contents from the archive job scripts. Even on $ARCHIVE contents, it may be better to do it as an independent job submission, after you have verified that the original ingestion into HPSS finished without any issues.

==== Typical example ====
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J deletion_script
#SBATCH --mail-type=ALL

echo "Deletion of an outdated directory tree into HPSS"

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that the initial directory in HPSS ($ARCHIVE) has the path explicitly expanded

/usr/local/bin/hsi -v <<EOF1
rm /archive/s/scinet/pinto/*.tmp
rm -R /archive/s/scinet/pinto/obsolete
end
EOF1
status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Deleting with an interactive HSI session ====
* You may feel more comfortable acquiring an interactive shell, starting an HSI session and proceeding with your deletions that way. Keep in mind, you're restricted to 1H.

* After using the ''qsub -q archive -I'' command you'll get a standard shell prompt on an archive execution node (hpss-archive02), as you would on any compute node. However you will need to run '''HSI''' or '''HTAR''' to access resources on HPSS.

* HSI will give you a prompt very similar to a standard shell, where your can navigate around using commands such 'ls', 'cd', 'pwd', etc ... NOTE: not every bash command has an equivalent on HSI - for instance, you can not 'vi' or 'cat'.

<pre>
pinto@nia-login07:~$ salloc -p archiveshort -t 1:00:00
salloc: Granted job allocation 50359
salloc: Waiting for resource configuration
salloc: Nodes hpss-archive02-ib are ready for job

[pinto@hpss-archive02-ib ~]$ hsi
******************************************************************
* Welcome to HPSS@SciNet - High Perfomance Storage System *
* *
* INFO: THIS IS THE NEW 7.5.1 HPSS SYSTEM! *
* *
* Contact Information: support@scinet.utoronto.ca *
* NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead *
* CHECK THE INTEGRITY OF YOUR TARBALLS *
******************************************************************
]
[HSI]/archive/s/scinet/pinto-> rm -R junk

</pre>

== '''HPSS for the 'Watchmaker' ''' ==
=== Efficient alternative to htar ===
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J tar_create_tarball_in_hpss_with_hsi_by_piping
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

# When using a pipeline like this
set -o pipefail

# to put (cput will fail)
tar -c $SCRATCH/mydir | hsi put - : $ARCHIVE/mydir.tar
status=$?
if [ ! $status == 0 ]; then
echo 'TAR+HSI+piping returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# to immediately generate an index
ish hindex $ARCHIVE/mydir.tar
status=$?
if [ ! $status == 0 ]; then
echo 'ISH returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# to get
#cd $SCRATCH
#hsi cget - : $ARCHIVE/mydir.tar | tar -xv
#status=$?
# if [ ! $status == 0 ]; then
# echo 'TAR+HSI+piping returned non-zero code.'
# /scinet/niagara/bin/exit2msg $status
# exit $status
#else
# echo 'TRANSFER SUCCESSFUL'
#fi

trap - TERM INT

</source>
'''Notes:'''
* Combining commands in this fashion, besides being HPSS-friendly, should not be that noticeably slower than the recursive put with HSI that stores each file one by one. However, reading the files back from tape in this format will be many times faster. It would also overcome the current 68GB limit on the size of stored files that we have with htar.
* To top things off, we recommend indexing with ish (in the same script) immediately after the tarball creation , while it resides in the HPSS cache. It would be as if htar was used.
* To ensure that an error at any stage of the pipeline shows up in the returned status use: ''set -o pipefail'' (The default is to return the status of the last command in the pipeline and this is not what you want.)
* Optimal performance for aggregated transfers and allocation on tapes is obtained with [[Why not tarballs too large |tarballs of size 500GB or less]], whether ingested by htar or hsi ([[Why not tarballs too large | WHY?]]). Be sure to check the contents of the directory tree with 'du' for the total amount of data before sending them to the tar+HSI piping.

=== Multi-threaded gzip'ed compression with pigz ===
We compiled multi-threaded implementation of gzip called pigz (http://zlib.net/pigz/). It's now part of the "extras" module. It can also be used on any compute or devel nodes. This makes the execution of the previous version of the script much quicker than if you were to use 'tar -cfz'. In addition, by piggy-backing ISH to the end of the script, it will know what to do with the just created mydir.tar.gz compressed tarball.

<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J tar_create_compressed_tarball_in_hpss_with_hsi_by_piping
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

# When using a pipeline like this
set -o pipefail

load module extras

# to put (cput will fail)
tar -c $SCRATCH/mydir | pigz | hsi put - : $ARCHIVE/mydir.tar.gz
status=$?
if [ ! $status == 0 ]; then
echo 'TAR+PIGZ+HSI+piping returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

=== Content Verification ===

==== HTAR CRC checksums ====
Specifies that HTAR should generate CRC checksums when creating the archive.

<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J htar_create_tarball_in_hpss_with_checksum_verification
#SBATCH --mail-type=ALL

trap "echo 'Job script not completed';exit 129" TERM INT
# Note that your initial directory in HPSS will be $ARCHIVE

cd $SCRATCH/workarea

# to put
htar -Humask=0137 -cpf $ARCHIVE/finished-job1.tar -Hcrc -Hverify=1 finished-job1/

# to get
#mkdir $SCRATCH/verification
#cd $SCRATCH/verification
#htar -Hcrc -xvpmf $ARCHIVE/finished-job1.tar

status=$?

trap - TERM INT

if [ ! $status == 0 ]; then
echo 'HTAR returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Current HSI version - Checksum built-in ====

MD5 is the standard Hashing Algorithm for the HSI build at SciNet. For hsi ingestions with the '-c on' option you should be able to query the md5 hash with the hsi command 'hashli'. That value is stored as an UDA (User Defined Attribute) for each file (a feature of HPSS starting with 7.4)

[http://www.mgleicher.us/GEL/hsi/hsi_reference_manual_2/checksum-feature.html More usage details here]

The checksum algorithm is very CPU-intensive. Although the checksum code is compiled with a high level of compiler optimization, transfer rates can be significantly reduced when checksum creation or verification is in effect. The amount of degradation in transfer rates depends on several factors, such as processor speed, network transfer speed, and speed of the local filesystem (GPFS).

<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J MD5_checksum_verified_transfer
#SBATCH --mail-type=ALL

thefile=<GPFSpath>
storedfile=<HPSSpath>

# Generate checksum on fly (-c on)
hsi -q put -c on $thefile : $storedfile
pid=$!

# Check the exit code of the HSI process
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# verify checksum
hsi lshash $storedfile
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# get the file back with checksum
hsi get -c on $storedfile
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

==== Prior to HSI version 4.0.1.1 ====

This will checksum the contents of the HPSSpath against the original GPFSpath after the transfer has finished.
<source lang="bash">
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH -p archivelong
#SBATCH -N 1
#SBATCH -J checksum_verified_transfer
#SBATCH --mail-type=ALL

thefile=<GPFSpath>
storedfile=<HPSSpath>

# Generate checksum on fly using a named pipe so that file is only read from GPFS once
mkfifo /tmp/NPIPE
cat $thefile | tee /tmp/NPIPE | hsi -q put - : $storedfile &
pid=$!
md5sum /tmp/NPIPE |tee /tmp/$fname.md5
rm -f /tmp/NPIPE

# Check the exit code of the HSI process
wait $pid
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi

# change filename to stdin in checksum file
sed -i.1 "s+/tmp/NPIPE+-+" /tmp/$fname.md5

# verify checksum
hsi -q get - : $storedfile | md5sum -c /tmp/$fname.md5
status=$?

if [ ! $status == 0 ]; then
echo 'HSI returned non-zero code.'
/scinet/niagara/bin/exit2msg $status
exit $status
else
echo 'TRANSFER SUCCESSFUL'
fi
</source>

== '''Access to HPSS using Globus''' ==
* Please note that Globus access to HPSS is disabled until further notice, due to lack of version compatibility.

* You may now transfer data between SciNet's HPSS and an external source
* Follow the link below
https://globus.computecanada.ca
: Enter your Compute Canada username and password.
* In the 'File Transfer' tab, enter ''''Compute Canada HPSS'''' as one of the Endpoints. To authenticate this endpoint, enter your SciNet username and password.
* You may read more about Compute Canada's Globus Portal here:
https://docs.computecanada.ca/wiki/Globus

== '''Access to HPSS using SME''' ==
* Storage Made Easy - SME - is an Enterprise Cloud Portal adopted by SciNet to allow our users to access HPSS
* Best suitable for light transfers to/from your personal computer and to navigate your contents on HPSS
* Follow the link below using a web browser and login with your SicNet UserID and password. Under File Manager you will find the "'''SciNet HPSS'''" folder.
https://sme.scinet.utoronto.ca
* SME can be configured as a DropBox. To download the Free Cloud File Manager native to your OS (Windows, Mac, Linux, mobile), follow the link below:
https://www.storagemadeeasy.com/clients_and_tools/
Once you have downloaded and installed the Cloud Manager App, fill up the following information:
Server location
https://sme.scinet.utoronto.ca/api
* You may learn more about SME capabilities and features here:
https://www.storagemadeeasy.com/ownFileserver/
https://www.storagemadeeasy.com/pricing/#features (Enterprise)
https://storagemadeeasy.com/faq/

== '''User provided Content/Suggestions''' ==
=== '''[[HPSS-by-pomes|Packing up large data sets and putting them on HPSS]]''' ===
(Pomés group recommendations)

[[Data Management|BACK TO Data Management]]

HPSS

2018-05-04T00:03:30Z