<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en-GB">
	<id>https://oldwiki.scinet.utoronto.ca/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Knecht</id>
	<title>oldwiki.scinet.utoronto.ca - User contributions [en-gb]</title>
	<link rel="self" type="application/atom+xml" href="https://oldwiki.scinet.utoronto.ca/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Knecht"/>
	<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php/Special:Contributions/Knecht"/>
	<updated>2026-05-22T07:07:19Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.35.12</generator>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=4806</id>
		<title>HPSS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=4806"/>
		<updated>2012-05-28T19:42:48Z</updated>

		<summary type="html">&lt;p&gt;Knecht: Changes to exit status in examples.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''High Performance Storage System''' =&lt;br /&gt;
&lt;br /&gt;
The High Performance Storage System ([http://www.hpss-collaboration.org/index.shtml HPSS]) is a tape-backed hierarchical storage system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data can be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
Because this system is intended for large data storage, it is accessible only to groups who have been awarded storage space at SciNet beyond 5TB in the yearly RAC resource allocation round.&lt;br /&gt;
&lt;br /&gt;
Access and transfer of data into and out of HPSS is done under the control of the user, whose interaction is expected to be scripted and submitted as a batch job, using one or more of the following utilities:&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi HSI] is a client with an ftp-like functionality which can be used to archive and retrieve large files. It is also useful for browsing the contents of HPSS.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/htar HTAR] is a utility that creates tar formatted archives directly into HPSS. It also creates a separate index file (.idx) that can be accessed and browsed quickly.&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/ISH ISH] is a TUI utility that can perform an inventory of the files and directories in your tarballs.&lt;br /&gt;
&lt;br /&gt;
== '''Why should I use and trust HPSS?''' ==&lt;br /&gt;
* 10+ years history, used by about 40 facilities in the [http://www.top500.org “Top 500”] HPC list&lt;br /&gt;
* very reliable, data redundancy and data insurance built-in.&lt;br /&gt;
* highly scalable, reasonable performance at SciNet - Ingest: ~24 TB/day, Recall: ~12 TB/day (aggregated)&lt;br /&gt;
* HSI/HTAR clients also very reliable and used on several HPSS sites. ISH was written at SciNet.&lt;br /&gt;
* [[Media:HPSS_rationale_SNUG.pdf|HPSS fits well with the Storage Capacity Expansion Plan at SciNet]] (pdf presentation)&lt;br /&gt;
&lt;br /&gt;
== '''Guidelines''' ==&lt;br /&gt;
* Expanded storage capacity is provided on tape -- a media that is not suited for storing small files. Files smaller than ~200MB should be grouped into tarballs with tar or htar.&lt;br /&gt;
* Optimal performance for aggregated transfers and allocation on tapes is obtained with tarballs of size around 100GB.&lt;br /&gt;
* We strongly urge that you use the sample scripts we are providing as the basis for your job submissions.&lt;br /&gt;
* Make sure to check the application's exit code and returned logs for errors after any data transfer or tarball creation process&lt;br /&gt;
&lt;br /&gt;
== '''Access Through the Queue System'''  ==&lt;br /&gt;
All access to the archive system is done through the [[Moab|GPC queue system]].&lt;br /&gt;
&lt;br /&gt;
* Job submissions should be done to the 'archive' queue&lt;br /&gt;
* Short jobs are limited to 1H walltime by default. Long jobs (&amp;gt; 1H) are limited to 72H walltime.&lt;br /&gt;
*  Users are limited to only 1 long job and 1 short job at the same time.&lt;br /&gt;
* There can only be 5 long jobs running at any given time overall. Remaining submissions will be placed on hold for the time being. So far we have not seen a need for overall limit on short jobs.&lt;br /&gt;
&lt;br /&gt;
The status of pending jobs can be monitored with showq specifying the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archiveshort&lt;br /&gt;
OR&lt;br /&gt;
showq -w class=archivelong&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the HPSS should be scripted into jobs and submitted to the ''archive'' queue. See generic example below.&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#PBS -l walltime=72:00:00&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N htar_create_tarball_in_hpss&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -m e&lt;br /&gt;
 &lt;br /&gt;
echo &amp;quot;Creating a htar of finished-job1/ directory tree into HPSS&amp;quot;&lt;br /&gt;
&lt;br /&gt;
trap &amp;quot;echo 'Job script not completed';exit 129&amp;quot; TERM INT&lt;br /&gt;
# Note that your initial directory in HPSS will be $ARCHIVE&lt;br /&gt;
 &lt;br /&gt;
cd $SCRATCH/workarea/ &lt;br /&gt;
htar -cpf $ARCHIVE/finished-job1.tar finished-job1/ &lt;br /&gt;
status=$?&lt;br /&gt;
 &lt;br /&gt;
trap - TERM INT&lt;br /&gt;
 &lt;br /&gt;
if [ ! $status == 0 ]; then&lt;br /&gt;
   echo 'HTAR returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
'''Note:''' Always trap the execution of your jobs for abnormal terminations, and be sure to return the exit code&lt;br /&gt;
&lt;br /&gt;
=== Job Dependencies ===&lt;br /&gt;
&lt;br /&gt;
Typically data will be recalled to /scratch when it is needed for analysis. Job dependencies can be constructed so that analysis jobs wait in the queue for data recalls before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the archive recalling job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency (lookup [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_data_recall data-recall.sh samples]):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-recall.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-recalled-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''HTAR''' ==&lt;br /&gt;
''' Please aggregate small files (&amp;lt;~200MB) into tarballs or htar files. '''&lt;br /&gt;
&lt;br /&gt;
HTAR is a utility that is used for aggregating a set of files and directories, by using a sophisticated multithreaded buffering scheme to write files directly from GPFS into HPSS, creating an archive file that conforms to the POSIX TAR specification, thereby achieving a high rate of performance. HTAR does not do gzip compression, however it already has a built-in checksum algorithm.&lt;br /&gt;
&lt;br /&gt;
'''Caution'''&lt;br /&gt;
* Files larger than 68 GB cannot be stored in an HTAR archive. If you attempt to start a transfer with any files larger than 68GB the whole HTAR session will fail, and you'll get a notification listing all those files, so that you can transfer them with HSI. unintentionally overwriting the htar destination file in HPSS&lt;br /&gt;
* Files with pathnames too long will be skipped (greater than 100 characters), so as to conform with TAR protocol [[(POSIX 1003.1 USTAR)]] -- Note that the HTAR will erroneously indicate success, however will produce exit code 70. For now, you can check for this type of error by &amp;quot;grep Warning my.output&amp;quot; after the job has completed.&lt;br /&gt;
* Unlike with cput/cget in HSI, &amp;quot;prompt before overwrite&amp;quot;, this is not the default with (h)tar. Be careful not to unintentionally overwrite a previous htar destination file in HPSS. There could be a similar situation when extracting material back into GPFS and overwriting the originals. Be sure to double-check the logic in your scripts.&lt;br /&gt;
* Check the HTAR exit code and log file before removing any files from the GPFS active filesystems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== HTAR Usage ===&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, and preserve mask attributes (-p), enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cpf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cpf $ARCHIVE/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cpf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the archive file called ''proj1.tar'' in HPSS into the ''project1/src'' directory in GPFS, and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xpmf proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;br /&gt;
&lt;br /&gt;
 &lt;br /&gt;
==== Sample tarball create ====&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#PBS -l walltime=72:00:00&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N htar_create_tarball_in_hpss&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -m e&lt;br /&gt;
 &lt;br /&gt;
trap &amp;quot;echo 'Job script not completed';exit 129&amp;quot; TERM INT&lt;br /&gt;
# Note that your initial directory in HPSS will be /archive/$(id -gn)/$(whoami)/&lt;br /&gt;
&lt;br /&gt;
DEST=$ARCHIVE/finished-job1.tar&lt;br /&gt;
&lt;br /&gt;
# htar WILL overwrite an existing file with the same name so check beforehand.&lt;br /&gt;
if [ -f $DEST ];&lt;br /&gt;
then    &lt;br /&gt;
    echo 'File $DEST already exists. Nothing has been done'&lt;br /&gt;
    exit 1&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
cd $SCRATCH/workarea/ &lt;br /&gt;
htar -cpf $DEST finished-job1/ &lt;br /&gt;
status=$?&lt;br /&gt;
 &lt;br /&gt;
trap - TERM INT&lt;br /&gt;
 &lt;br /&gt;
if [ ! $status == 0 ]; then&lt;br /&gt;
   echo 'HTAR returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Note:''' If you attempt to start a transfer with any files larger than 68GB the whole HTAR session will fail, and you'll get a notification listing all those files, so that you can transfer them with HSI. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
----------------------------------------&lt;br /&gt;
INFO: File too large for htar to handle: finished-job1/file1 (86567185745 bytes)&lt;br /&gt;
INFO: File too large for htar to handle: finished-job1/file2 (71857244579 bytes)&lt;br /&gt;
ERROR: 2 oversize member files found - please correct and retry&lt;br /&gt;
ERROR: [FATAL] error(s) generating filename list &lt;br /&gt;
HTAR: HTAR FAILED&lt;br /&gt;
###WARNING  htar returned non-zero exit status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Sample tarball list ====&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#PBS -l walltime=72:00:00&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N htar_list_tarball_in_hpss&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -m e&lt;br /&gt;
 &lt;br /&gt;
trap &amp;quot;echo 'Job script not completed';exit 129&amp;quot; TERM INT&lt;br /&gt;
# Note that your initial directory in HPSS will be $ARCHIVE&lt;br /&gt;
&lt;br /&gt;
DEST=$ARCHIVE/finished-job1.tar&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
htar -tvf $DEST&lt;br /&gt;
status=$?&lt;br /&gt;
&lt;br /&gt;
trap - TERM INT&lt;br /&gt;
 &lt;br /&gt;
if [ ! $status == 0 ]; then&lt;br /&gt;
   echo 'HTAR returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Sample tarball extract ====&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#PBS -l walltime=72:00:00&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N htar_extract_tarball_from_hpss&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -m e&lt;br /&gt;
 &lt;br /&gt;
trap &amp;quot;echo 'Job script not completed';exit 129&amp;quot; TERM INT&lt;br /&gt;
# Note that your initial directory in HPSS will be $ARCHIVE&lt;br /&gt;
 &lt;br /&gt;
cd $SCRATCH/recalled-from-hpss&lt;br /&gt;
htar -xpmf $ARCHIVE/finished-job1.tar&lt;br /&gt;
status=$?&lt;br /&gt;
 &lt;br /&gt;
trap - TERM INT&lt;br /&gt;
 &lt;br /&gt;
if [ ! $status == 0 ]; then&lt;br /&gt;
   echo 'HTAR returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI may be the primary client with which some users will interact with HPSS. It provides an ftp-like interface for archiving and retrieving tarballs or [https://support.scinet.utoronto.ca/wiki/index.php/HPSS#Sample_transferring_directories directory trees]. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents in HPSS. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally saves or replaces a HPSSpath file to GPFSpath if the GPFS version is new or has been updated&lt;br /&gt;
 cput [options] GPFSpath [: HPSSpath]&lt;br /&gt;
|-&lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to GPFS only if a GPFS version does not already exist. &lt;br /&gt;
 cget [options] [GPFSpath :] HPSSpath&lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of HPSS.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands to GPFS&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
*There are 3 distinctions about HSI that you should keep in mind, and that can generate a bit of confusion when you're first learning how to use it:&lt;br /&gt;
** HSI doesn't currently support renaming directories paths during transfers on-the-fly, therefore the syntax for cput/cget may not work as one would expect in some scenarios, requiring some workarounds.&lt;br /&gt;
** HSI has an operator &amp;quot;:&amp;quot; which separates the GPFSpath and HPSSpath, and must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
** The order for referring to files in HSI syntax is different from FTP. In HSI the general format is always the same, GPFS first, HPSS second, cput or cget:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     GPFSfile : HPSSfile&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
For example, when using HSI to store the tarball file from GPFS into HPSS, then recall it to GPFS, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    cput tarball-in-GPFS : tarball-in-HPSS&lt;br /&gt;
    cget tarball-recalled : tarball-in-HPSS&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
unlike with FTP, where the following syntax would be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put tarball-in-GPFS tarball-in-HPSS &lt;br /&gt;
    get tarball-in-HPSS tarball-recalled&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;quot;mkdir LargeFilesDir; cd LargeFilesDir; cput tarball-in-GPFS : tarball-in-HPSS&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* More complex sequences can be performed using an except such as this:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;&amp;lt;EOF&lt;br /&gt;
      mkdir LargeFilesDir&lt;br /&gt;
      cd LargeFilesDir&lt;br /&gt;
      cput tarball-in-GPFS : tarball-in-HPSS&lt;br /&gt;
      lcd $SCRATCH/LargeFilesDir2/&lt;br /&gt;
      cput -Ruph *  &lt;br /&gt;
    end&lt;br /&gt;
    EOF&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent, but we recommend that you always use full path, and organize the contents of HPSS, where the default HSI directory placement is $ARCHIVE:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi cput tarball&lt;br /&gt;
    hsi cput tarball : tarball&lt;br /&gt;
    hsi cput $SCRATCH/tarball : $ARCHIVE/tarball&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* There are no known issues renaming files on-the-fly:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi cput $SCRATCH/tarball1 : $ARCHIVE/tarball2&lt;br /&gt;
    hsi cget $SCRATCH/tarball3 : $ARCHIVE/tarball2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* However the syntax forms such as the ones below will fail, since they rename the directory paths.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi cput -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir     (FAILS)&lt;br /&gt;
OR&lt;br /&gt;
   hsi cget -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir2    (FAILS)&lt;br /&gt;
OR&lt;br /&gt;
   hsi cput -Ruph $SCRATCH/LargeFilesDir/* : $ARCHIVE/LargeFilesDir2  (FAILS)&lt;br /&gt;
OR&lt;br /&gt;
   hsi cget -Ruph $SCRATCH/LargeFilesDir : $ARCHIVE/LargeFilesDir     (FAILS)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
One workaround is the following 2-steps process, where you do a &amp;quot;lcd &amp;quot; in GPFS first, and recursively transfer the whole directory (-R), keeping the same name. You may use '-u' option to resume a previously disrupted session, and the '-p' to  preserve timestamp, and '-h' to keep the links.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;&amp;lt;EOF&lt;br /&gt;
      lcd $SCRATCH&lt;br /&gt;
      cget -Ruph LargeFilesDir&lt;br /&gt;
    end&lt;br /&gt;
    EOF&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Another workaround is do a &amp;quot;lcd&amp;quot; into the GPFSpath first and a &amp;quot;cd&amp;quot; in the HPSSpath, but transfer the files individually with the '*' wild character. This option lets you change the directory name:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;&amp;lt;EOF&lt;br /&gt;
      lcd $SCRATCH/LargeFilesDir&lt;br /&gt;
      mkdir $ARCHIVE/LargeFilesDir2&lt;br /&gt;
      cd $ARCHIVE/LargeFilesDir2&lt;br /&gt;
      cput -Ruph *  &lt;br /&gt;
    end&lt;br /&gt;
    EOF&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Documentation === &lt;br /&gt;
Complete documentation on HSI is available from the Gleicher Enterprises links below. You may peruse those links and come with alternative syntax forms. You may even be already familiar with HPSS/HSI from other HPC facilities, that may or not have procedures similar to ours. HSI doesn't always work as expected when you go outside of our recommended syntax, so '''we strongly urge that you use the sample scripts we are providing as the basis''' for your job submissions&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/ HSI Introduction]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help hsi help]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes] &lt;br /&gt;
'''Note:''' HSI returns the highest-numbered exit code, in case of multiple operations in the same hsi session. You may use '/scinet/gpc/bin/exit2msg $status' to translate those codes into intelligible messages&lt;br /&gt;
&lt;br /&gt;
=== Typical Usage Scripts===&lt;br /&gt;
The most common interactions will be ''putting'' data into HPSS, examining the contents (ls,ish), and ''getting'' data back onto GPFS for inspection or analysis.&lt;br /&gt;
&lt;br /&gt;
==== Sample '''data offload''' ====&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
#PBS -l walltime=72:00:00&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
trap &amp;quot;echo 'Job script not completed';exit 129&amp;quot; TERM INT&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF1&lt;br /&gt;
mkdir put-away&lt;br /&gt;
cd put-away&lt;br /&gt;
cput $SCRATCH/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
end&lt;br /&gt;
EOF1&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo 'HSI returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF2&lt;br /&gt;
mkdir put-away&lt;br /&gt;
cd put-away&lt;br /&gt;
cput $SCRATCH/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
end&lt;br /&gt;
EOF2&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo 'HSI returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
trap - TERM INT&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Note:''' as in the above example, we recommend that you capture the (highest-numbered) exit code for each hsi session independently. And remember, you may improve your exit code verbosity by adding the excerpt below to your scripts:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo 'HSI returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Sample '''data list''' ====&lt;br /&gt;
A very trivial way to list the contents of HPSS would be to just submit the HSI 'ls' command.&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
#PBS -l walltime=1:00:00&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_ls&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cd put-away&lt;br /&gt;
ls -R&lt;br /&gt;
end&lt;br /&gt;
EOF&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
''Warning: if you have a lot of files, the ls command will take a long time to complete. For instance, about 400,000 files can be listed in about an hour. Adjust the walltime accordingly, and be on the safe side.''&lt;br /&gt;
&lt;br /&gt;
However, we provide a much more useful and convenient way to explore the contents of HPSS with the inventory shell [[ISH]]. This example creates an index of all the files in a user's portion of the namespace. The list is placed in the directory /home/$(whoami)/.ish_register that can be inspected from the gpc-devel nodes.&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
#PBS -l walltime=1:00:00&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_index&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
INDEX_DIR=$HOME/.ish_register&lt;br /&gt;
if ! [ -e &amp;quot;$INDEX_DIR&amp;quot; ]; then&lt;br /&gt;
  mkdir -p $INDEX_DIR&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
export ISHREGISTER=&amp;quot;$INDEX_DIR&amp;quot;&lt;br /&gt;
/scinet/gpc/bin/ish hindex&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
''Note: the above warning on collecting the listing for many files applies here too.''&lt;br /&gt;
&lt;br /&gt;
This index can be browsed or searched with ISH on the development nodes.&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
gpc-f104n084-$ /scinet/gpc/bin/ish ~/.ish_register/hpss.igz &lt;br /&gt;
[ish]hpss.igz&amp;gt; help&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
ISH is a powerful tool that is also useful for creating and browsing indices of tar and htar archives, so please look at the [[ISH|documentation]] or built in help.&lt;br /&gt;
&lt;br /&gt;
==== Sample '''data recall''' ====&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-recall.sh&lt;br /&gt;
#PBS -l walltime=72:00:00&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N recall_files&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
trap &amp;quot;echo 'Job script not completed';exit 129&amp;quot; TERM INT&lt;br /&gt;
&lt;br /&gt;
mkdir -p $SCRATCH/recalled-from-hpss&lt;br /&gt;
&lt;br /&gt;
# individual tarballs previously organized in HPSS inside the put-away-on-2010/ folder&lt;br /&gt;
hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
cget $SCRATCH/recalled-from-hpss/Jan-2010-jobs.tar.gz : $ARCHIVE/put-away-on-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
cget $SCRATCH/recalled-from-hpss/Feb-2010-jobs.tar.gz : $ARCHIVE/put-away-on-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
end&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
&lt;br /&gt;
trap - TERM INT&lt;br /&gt;
&lt;br /&gt;
if [ ! $status == 0 ]; then&lt;br /&gt;
   echo 'HSI returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
We should emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do optimization, as in the following example:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-recall.sh&lt;br /&gt;
#PBS -l walltime=72:00:00&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N recall_files_optimized&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
trap &amp;quot;echo 'Job script not completed';exit 129&amp;quot; TERM INT&lt;br /&gt;
mkdir -p $SCRATCH/recalled-from-hpss&lt;br /&gt;
&lt;br /&gt;
# individual tarballs previously organized in HPSS inside the put-away-on-2010/ folder&lt;br /&gt;
hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
lcd $SCRATCH/recalled-from-hpss/&lt;br /&gt;
cd $ARCHIVE/put-away-on-2010/&lt;br /&gt;
cget Jan-2010-jobs.tar.gz Feb-2010-jobs.tar.gz&lt;br /&gt;
end&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
&lt;br /&gt;
trap - TERM INT&lt;br /&gt;
 &lt;br /&gt;
if [ ! $status == 0 ]; then&lt;br /&gt;
   echo 'HSI returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Sample '''transferring directories''' ====&lt;br /&gt;
Remember, it's not possible to rename directories on-the-fly:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi cget -Ruph $SCRATCH/LargeFiles-recalled : $ARCHIVE/LargeFiles    (FAILS)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
One workaround is transfer the whole directory (and sub-directories) recursively:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-recall.sh&lt;br /&gt;
#PBS -l walltime=72:00:00&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N recall_directories&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
trap &amp;quot;echo 'Job script not completed';exit 129&amp;quot; TERM INT&lt;br /&gt;
&lt;br /&gt;
mkdir -p $SCRATCH/recalled&lt;br /&gt;
&lt;br /&gt;
hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
lcd $SCRATCH/recalled&lt;br /&gt;
cd $ARCHIVE/&lt;br /&gt;
cget -Ruph LargeFiles&lt;br /&gt;
end&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
 &lt;br /&gt;
trap - TERM INT&lt;br /&gt;
 &lt;br /&gt;
if [ ! $status == 0 ]; then&lt;br /&gt;
   echo 'HSI returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Another workaround is to transfer files and subdirectories individually with the &amp;quot;*&amp;quot; wild character:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-recall.sh&lt;br /&gt;
#PBS -l walltime=72:00:00&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N recall_directories&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
trap &amp;quot;echo 'Job script not completed';exit 129&amp;quot; TERM INT&lt;br /&gt;
&lt;br /&gt;
mkdir -p $SCRATCH/LargeFiles-recalled&lt;br /&gt;
&lt;br /&gt;
hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
lcd $SCRATCH/LargeFiles-recalled&lt;br /&gt;
cd $ARCHIVE/LargeFiles&lt;br /&gt;
cget -Ruph *&lt;br /&gt;
end&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
 &lt;br /&gt;
trap - TERM INT&lt;br /&gt;
 &lt;br /&gt;
if [ ! $status == 0 ]; then&lt;br /&gt;
   echo 'HSI returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help''']&lt;br /&gt;
&lt;br /&gt;
== '''[[ISH|ISH]]''' ==&lt;br /&gt;
=== [[ISH|Documentation and Usage]] ===&lt;br /&gt;
 &lt;br /&gt;
== '''File and directory management''' ==&lt;br /&gt;
=== Moving/renaming ===&lt;br /&gt;
* you may use 'mv' or 'cp' in the same way as the linux version.&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#PBS -l walltime=72:00:00&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N deletion_script&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -m e&lt;br /&gt;
 &lt;br /&gt;
echo &amp;quot;HPSS file and directory management&amp;quot;&lt;br /&gt;
&lt;br /&gt;
trap &amp;quot;echo 'Job script not completed';exit 129&amp;quot; TERM INT&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi -v &amp;lt;&amp;lt;EOF1&lt;br /&gt;
    mkdir $ARCHIVE/2011&lt;br /&gt;
    mv $ARCHIVE/oldjobs $ARCHIVE/2011&lt;br /&gt;
    cp -r $ARCHIVE/almostfinished/*done $ARCHIVE/2011&lt;br /&gt;
end&lt;br /&gt;
EOF1&lt;br /&gt;
status=$?&lt;br /&gt;
 &lt;br /&gt;
trap - TERM INT&lt;br /&gt;
 &lt;br /&gt;
if [ ! $status == 0 ]; then&lt;br /&gt;
   echo 'HSI returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Deletions ===&lt;br /&gt;
==== Recommendations ====&lt;br /&gt;
* Be careful with the use of 'cd' commands to non-existing directories before the 'rm' command. Results may be unpredictable&lt;br /&gt;
* Avoid the use of the stand alone wild character '''*'''. If necessary, whenever possible have it bound to common patterns, such as '*.tmp', so to limit unintentional mis-happens&lt;br /&gt;
* Avoid using relative paths, even the env variable $ARCHIVE. Better to explicitly expand the full paths in your scripts&lt;br /&gt;
* Avoid using recursive/looped deletion instructions on $SCRATCH contents from the archive job scripts. Even on $ARCHIVE contents, it may be better to do it as an independent job submission, after you have verified that the original ingestion into HPSS finished without any issues.&lt;br /&gt;
&lt;br /&gt;
==== Typical example ====&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#PBS -l walltime=72:00:00&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N deletion_script&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -m e&lt;br /&gt;
 &lt;br /&gt;
echo &amp;quot;Deletion of an outdated directory tree into HPSS&amp;quot;&lt;br /&gt;
&lt;br /&gt;
trap &amp;quot;echo 'Job script not completed';exit 129&amp;quot; TERM INT&lt;br /&gt;
# Note that the initial directory in HPSS ($ARCHIVE) has the path explicitly expanded&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi -v &amp;lt;&amp;lt;EOF1&lt;br /&gt;
    rm /archive/s/scinet/pinto/*.tmp&lt;br /&gt;
    rm -R /archive/s/scinet/pinto/obsolete&lt;br /&gt;
end&lt;br /&gt;
EOF1&lt;br /&gt;
status=$?&lt;br /&gt;
 &lt;br /&gt;
trap - TERM INT&lt;br /&gt;
 &lt;br /&gt;
if [ ! $status == 0 ]; then&lt;br /&gt;
   echo 'HSI returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Deleting with an interactive HSI session ====&lt;br /&gt;
* You may feel more comfortable acquiring an interactive shell, starting an HSI session and proceeding with your deletions that way. Keep in mind, you're restricted to 1H.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc-f103n084-$ qsub -q archive -I&lt;br /&gt;
qsub: waiting for job 11611291.gpc-sched to start&lt;br /&gt;
qsub: job 11611291.gpc-sched ready&lt;br /&gt;
&lt;br /&gt;
----------------------------------------&lt;br /&gt;
Begin PBS Prologue Mon May 28 13:15:28 EDT 2012 1338225328&lt;br /&gt;
Job ID:		11611291.gpc-sched&lt;br /&gt;
Username:	pinto&lt;br /&gt;
Group:		scinet&lt;br /&gt;
Nodes:		gpc-archive01&lt;br /&gt;
End PBS Prologue Mon May 28 13:15:28 EDT 2012 1338225328&lt;br /&gt;
----------------------------------------&lt;br /&gt;
hpss-archive01-$ hsi&lt;br /&gt;
******************************************************************&lt;br /&gt;
*     Welcome to HPSS@SciNet - High Perfomance Storage System    *&lt;br /&gt;
*                                                                *&lt;br /&gt;
*        Contact Information: support@scinet.utoronto.ca         *&lt;br /&gt;
*  NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead  *&lt;br /&gt;
*              CHECK THE INTEGRITY OF YOUR TARBALLS              *&lt;br /&gt;
******************************************************************&lt;br /&gt;
Username: pinto  UID: 10010  Acct: 10010(10010) Copies: 2 Firewall: off [hsi.4.0.1 Thu Mar 22 11:44:03 EDT 2012] &lt;br /&gt;
[HSI]/archive/s/scinet/pinto-&amp;gt; rm -R junk&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''HPSS for the 'Watchmaker' ''' ==&lt;br /&gt;
=== Efficient alternative to htar ===&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#PBS -l walltime=72:00:00&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N tar_create_tarball_in_hpss_with_hsi_by_piping&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -m e&lt;br /&gt;
 &lt;br /&gt;
trap &amp;quot;echo 'Job script not completed';exit 129&amp;quot; TERM INT&lt;br /&gt;
# Note that your initial directory in HPSS will be $ARCHIVE&lt;br /&gt;
&lt;br /&gt;
# When using a pipeline like this&lt;br /&gt;
set -o pipefail &lt;br /&gt;
&lt;br /&gt;
# to put&lt;br /&gt;
tar -c $SCRATCH/mydir | hsi cput - : $ARCHIVE/mydir.tar&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ]; then&lt;br /&gt;
   echo 'TAR+HSI+piping returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# to immediately generate an index&lt;br /&gt;
ish hindex $ARCHIVE/mydir.tar&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ]; then&lt;br /&gt;
   echo 'ISH returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# to get&lt;br /&gt;
#cd $SCRATCH&lt;br /&gt;
#hsi cget - : $ARCHIVE/mydir.tar | tar -xv &lt;br /&gt;
#status=$?&lt;br /&gt;
# if [ ! $status == 0 ]; then&lt;br /&gt;
#   echo 'TAR+HSI+piping returned non-zero code.'&lt;br /&gt;
#   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
#   exit $status&lt;br /&gt;
#else&lt;br /&gt;
#   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
#fi&lt;br /&gt;
&lt;br /&gt;
trap - TERM INT&lt;br /&gt;
 &lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
'''Notes:''' &lt;br /&gt;
* Combining commands in this fashion, besides being HPSS-friendly, should not be that noticeably slower than the recursive put with HSI that stores each file one by one. However, reading the files back from tape in this format will be many times faster. It would also overcome the current 68GB limit on the size of stored files that we have with htar.&lt;br /&gt;
* To top things off, we recommend indexing with ish (in the same script) immediately after the tarball creation , while it resides in the HPSS cache. It would be as if htar was used.&lt;br /&gt;
* To ensure that an error at any stage of the pipeline shows up in the returned status use: ''set -o pipefail'' (The default is to return the status of the last command in the pipeline and this is not what you want.)&lt;br /&gt;
&lt;br /&gt;
=== Content Verification ===&lt;br /&gt;
Specifies that HTAR should generate CRC checksums when creating the archive.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#PBS -l walltime=72:00:00&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N htar_create_tarball_in_hpss_with_checksum_verification&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -m e&lt;br /&gt;
 &lt;br /&gt;
trap &amp;quot;echo 'Job script not completed';exit 129&amp;quot; TERM INT&lt;br /&gt;
# Note that your initial directory in HPSS will be $ARCHIVE&lt;br /&gt;
 &lt;br /&gt;
cd $SCRATCH/workarea&lt;br /&gt;
&lt;br /&gt;
# to put&lt;br /&gt;
htar -cpf $ARCHIVE/finished-job1.tar -Hcrc -Hverify=1 finished-job1/&lt;br /&gt;
&lt;br /&gt;
# to get&lt;br /&gt;
#mkdir $SCRATCH/verification&lt;br /&gt;
#cd $SCRATCH/verification&lt;br /&gt;
#htar -Hcrc -xvpmf $ARCHIVE/finished-job1.tar &lt;br /&gt;
&lt;br /&gt;
status=$?&lt;br /&gt;
 &lt;br /&gt;
trap - TERM INT&lt;br /&gt;
 &lt;br /&gt;
if [ ! $status == 0 ]; then&lt;br /&gt;
   echo 'HTAR returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Sample '''verify checksum''' ===&lt;br /&gt;
This will checksum the contents of the HPSSpath against the original GPFSpath after the transfer has finished.&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#PBS -l walltime=72:00:00&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N checksum_verified_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
thefile=&amp;lt;GPFSpath&amp;gt;&lt;br /&gt;
storedfile=&amp;lt;HPSSpath&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# Generate checksum on fly using a named pipe so that file is only read from GPFS once&lt;br /&gt;
mkfifo /tmp/NPIPE&lt;br /&gt;
cat $thefile  | tee /tmp/NPIPE | hsi -q put - : $storedfile &amp;amp;&lt;br /&gt;
pid=$!&lt;br /&gt;
md5sum /tmp/NPIPE |tee /tmp/$fname.md5&lt;br /&gt;
rm -f  /tmp/NPIPE&lt;br /&gt;
&lt;br /&gt;
# Check the exit code of the HSI process  &lt;br /&gt;
wait $pid&lt;br /&gt;
status=$?&lt;br /&gt;
&lt;br /&gt;
if [ ! $status == 0 ]; then&lt;br /&gt;
   echo 'HSI returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# change filename to stdin in checksum file&lt;br /&gt;
sed -i.1 &amp;quot;s+/tmp/NPIPE+-+&amp;quot; /tmp/$fname.md5&lt;br /&gt;
&lt;br /&gt;
# verify checksum&lt;br /&gt;
hsi -q get - : $storedfile  | md5sum -c  /tmp/$fname.md5&lt;br /&gt;
status=$?&lt;br /&gt;
 &lt;br /&gt;
if [ ! $status == 0 ]; then&lt;br /&gt;
   echo 'HSI returned non-zero code.'&lt;br /&gt;
   /scinet/gpc/bin/exit2msg $status&lt;br /&gt;
   exit $status&lt;br /&gt;
else&lt;br /&gt;
   echo 'TRANSFER SUCCESSFUL'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Data Management|BACK TO Data Management]]&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3518</id>
		<title>HPSS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3518"/>
		<updated>2011-06-27T20:42:56Z</updated>

		<summary type="html">&lt;p&gt;Knecht: /* Other HSI Examples */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''High Performance Storage System''' =&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The High Performance Storage System ([http://www.hpss-collaboration.org/index.shtml HPSS]) is a tape-backed hierarchical storage system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data will be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
Access and transfer of data into and out of HPSS is done under the control of the user, whose interaction is expected to be scripted and submitted as a batch job, using one or more of the following utilities:&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi HSI] is a client with an ftp-like interface which can be used to archive and retrieve large files. It is also useful for browsing the contents of HPSS.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/htar HTAR] is a utility that creates tar formatted archives directly into HPSS. It also creates a separate index file (.idx) that can be accessed quickly.&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/ISH ISH] is a TUI utility that can perform an inventory of the files and directories in your tarballs.&lt;br /&gt;
&lt;br /&gt;
== '''Why should I use and trust HPSS?''' ==&lt;br /&gt;
* 10+ years history, used by 50+ facilities in the “Top 500” HPC list&lt;br /&gt;
* very reliable, data redundancy and data insurance built-in.&lt;br /&gt;
* highly scalable, reasonable performance at SciNet - Ingest: ~12 TB/day, Recall: ~24 TB/day (aggregated)&lt;br /&gt;
* HSI/HTAR clients also very reliable and used on several HPSS sites. ISH was written at SciNet.&lt;br /&gt;
* [[Media:HPSS_rational.pdf|HPSS fits well with the Storage Capacity Expansion Plan at SciNet]] (pdf presentation)&lt;br /&gt;
&lt;br /&gt;
== '''Guidelines''' ==&lt;br /&gt;
* Expanded storage capacity is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into tarballs with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the repository is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application's exit code and the returned log file for errors after all data transfers and any tarball creation process.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Access Through the Queue System'''  ==&lt;br /&gt;
All access to the archive system is done through the [[Moab|GPC queue system]].&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the HPSS should be scripted into jobs and submitted to the ''archive'' queue. See HSI example below.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_put_file_in_hpss&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The status of pending jobs can be monitored with showq specifying the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Recalling Data for Analysis ===&lt;br /&gt;
&lt;br /&gt;
Typically, data will be recalled to the /scratch file system when it is needed for analysis. Job dependencies can be used to have analysis jobs wait in the queue for data recalls before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-recall.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-recalled-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with HPSS. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents in HPSS. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the file does not already exist in HPSS&lt;br /&gt;
|-&lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local storage only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of HPSS.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
* Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* More complex sequences can be performed using a script such as this:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
Complete documentation of HSI is available on the Gleicher Enterprises web site.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/ HSI Introduction]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help hsi help]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Typical Usage ===&lt;br /&gt;
The most common interactions will be ''putting'' data into HPSS, examining the contents (ls), and ''getting'' data back onto one of the active filesystems for inspection or analysis.&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away&lt;br /&gt;
cd put-away&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
A convenient way to explore the contents of HPSS is with the inventory shell [[ISH]]. This example creates an index of all the files in a user's portion of the namespace. The list is placed in the file /home/$USER/HPSSdm/hsi.igz that can be inspected from the gpc-devel nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_index&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
TODAY=$(date +%Y%m%d)&lt;br /&gt;
INDEX_DIR=/home/$USER/HPSSdm&lt;br /&gt;
if [[ -! -e $INDEX_DIR ]];then&lt;br /&gt;
  mkdir $INDEX_DIR&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
export ISHREGISTER=$HOME/HPSSdm&lt;br /&gt;
ish hindex&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This index can be browsed or searched with ISH on the development nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc-f104n084-$ ish ~/HPSSdm/hsi.igz &lt;br /&gt;
[ish]hsi.igz&amp;gt; help&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
ISH is a powerful tool that is also useful for creating and browsing indices of tar and htar archives, so please look at the [[ISH|documentation]] or built in help.&lt;br /&gt;
&lt;br /&gt;
* sample '''data recall'''&lt;br /&gt;
   - This example should be modified to emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do optimization.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-recall.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N recall&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/recalled-from-hpss&lt;br /&gt;
hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
cget /scratch/$USER/recalled-from-hpss/Jan-2010-jobs.tar.gz : put-away-on-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
cget /scratch/$USER/recalled-from-hpss/Feb-2010-jobs.tar.gz : put-away-on-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
&lt;br /&gt;
exit $status&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI vs. FTP ===&lt;br /&gt;
HSI syntax and usage is very similar to that of FTP. Please note the following information adapted from the HSI man page:&lt;br /&gt;
&lt;br /&gt;
HSI supports several of the commonly used FTP commands, including &amp;quot;dir&amp;quot;,&amp;quot;get&amp;quot;,&amp;quot;ls&amp;quot;,&amp;quot;mdelete&amp;quot;,&amp;quot;mget&amp;quot;,&amp;quot;put&amp;quot;,&amp;quot;mput&amp;quot; and &amp;quot;prompt&amp;quot;, with the following differences:&lt;br /&gt;
&lt;br /&gt;
* The &amp;quot;dir&amp;quot; command is an alias for &amp;quot;ls&amp;quot; in HSI. The &amp;quot;ls&amp;quot; command supports an extensive set of options for displaying files, including wildcard pattern-matching, and the ability to recursively list a directory tree&lt;br /&gt;
* The &amp;quot;put&amp;quot; and &amp;quot;get&amp;quot; family of commands support recursion&lt;br /&gt;
* There are &amp;quot;conditional put&amp;quot; and &amp;quot;conditional&amp;quot; get commands (cput, cget)&lt;br /&gt;
* The syntax for renaming files during transfers with HSI is different from FTP. With HSI, the general format is always &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     &amp;quot;local_file : hpss_file&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and multiple such pairs may be specified on a single command line.&lt;br /&gt;
&lt;br /&gt;
For example, when using HSI to store the local file &amp;quot;file1&amp;quot; as &amp;quot;hpss_file1&amp;quot; into HPSS, then retrieve it back to the local filesystem as &amp;quot;file1.bak&amp;quot;, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 : hpss_file1&lt;br /&gt;
    get file1.bak : hpss_file1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* unlike with FTP, where the following syntax would be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 hpss_file1 &lt;br /&gt;
    get hpss_file1 file1.bak&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The &amp;quot;m&amp;quot; prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The &amp;quot;m&amp;quot; series of commands are intended to provide a measure of compatibility for FTP users.&lt;br /&gt;
&lt;br /&gt;
=== Other HSI Examples === &lt;br /&gt;
&lt;br /&gt;
* Creating tar archive of C source programs and header files on the fly by piping stdout:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HSI pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Retrieve the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Put a subdirectory ''LargeFiles'' and all its contents recursively. You may use '-u' option to resume a previously disrupted session.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   cput -R -u LargeFiles&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help''']&lt;br /&gt;
&lt;br /&gt;
* verify checksum&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N checksum_verified_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
thefile=&amp;lt;localpath&amp;gt;&lt;br /&gt;
storedfile=&amp;lt;hpsspath&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# Generate checksum on fly using a named pipe so that file is only read from GPFS once&lt;br /&gt;
mkfifo /tmp/NPIPE&lt;br /&gt;
cat $thefile  | tee /tmp/NPIPE | hsi -q put - : $storedfile &amp;amp;&lt;br /&gt;
pid=$!&lt;br /&gt;
md5sum /tmp/NPIPE |tee /tmp/$fname.md5&lt;br /&gt;
rm     /tmp/NPIPE&lt;br /&gt;
&lt;br /&gt;
# Check the exit code of the HSI process  &lt;br /&gt;
wait $pid&lt;br /&gt;
sc=$?&lt;br /&gt;
if [ $sc != 0 ];then&lt;br /&gt;
  echo &amp;quot;File transfer failed&amp;quot;&lt;br /&gt;
  exit $sc&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# change filename to stdin in checksum file&lt;br /&gt;
sed -i.1 &amp;quot;s+/tmp/NPIPE+-+&amp;quot; /tmp/$fname.md5&lt;br /&gt;
&lt;br /&gt;
# verify checksum&lt;br /&gt;
hsi -q get - : $storedfile  | md5sum -c  /tmp/$fname.md5&lt;br /&gt;
sc=$?&lt;br /&gt;
if [ $sc != 0 ]; then&lt;br /&gt;
  echo '!!! Job Failed !!!'&lt;br /&gt;
  echo 'error=' $sc&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''HTAR''' ==&lt;br /&gt;
''' Please aggregate small files (&amp;lt;~100MB) into tarballs or htar files. '''&lt;br /&gt;
&lt;br /&gt;
HTAR is a utility that is used for aggregating a set of files and directories, by using a sophisticated multithreaded buffering scheme to write files from the local filesystem directly into HPSS, creating an archive file that conforms to the POSIX TAR specification, thereby achieving a high rate of performance. &lt;br /&gt;
&lt;br /&gt;
=== '''CAUTION''' ===&lt;br /&gt;
* Files larger than 68 GB cannot be stored in an htar archive (you'll get an error message for the whole operation)&lt;br /&gt;
* HTAR archives cannot contain more than 1 million files.&lt;br /&gt;
* Check the HTAR exit code and log file before removing any files from the active filesystems.&lt;br /&gt;
&lt;br /&gt;
=== HTAR Usage ===&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3517</id>
		<title>HPSS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3517"/>
		<updated>2011-06-27T20:34:56Z</updated>

		<summary type="html">&lt;p&gt;Knecht: /* Other HSI Examples */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''High Performance Storage System''' =&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The High Performance Storage System ([http://www.hpss-collaboration.org/index.shtml HPSS]) is a tape-backed hierarchical storage system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data will be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
Access and transfer of data into and out of HPSS is done under the control of the user, whose interaction is expected to be scripted and submitted as a batch job, using one or more of the following utilities:&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi HSI] is a client with an ftp-like interface which can be used to archive and retrieve large files. It is also useful for browsing the contents of HPSS.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/htar HTAR] is a utility that creates tar formatted archives directly into HPSS. It also creates a separate index file (.idx) that can be accessed quickly.&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/ISH ISH] is a TUI utility that can perform an inventory of the files and directories in your tarballs.&lt;br /&gt;
&lt;br /&gt;
== '''Why should I use and trust HPSS?''' ==&lt;br /&gt;
* 10+ years history, used by 50+ facilities in the “Top 500” HPC list&lt;br /&gt;
* very reliable, data redundancy and data insurance built-in.&lt;br /&gt;
* highly scalable, reasonable performance at SciNet - Ingest: ~12 TB/day, Recall: ~24 TB/day (aggregated)&lt;br /&gt;
* HSI/HTAR clients also very reliable and used on several HPSS sites. ISH was written at SciNet.&lt;br /&gt;
* [[Media:HPSS_rational.pdf|HPSS fits well with the Storage Capacity Expansion Plan at SciNet]] (pdf presentation)&lt;br /&gt;
&lt;br /&gt;
== '''Guidelines''' ==&lt;br /&gt;
* Expanded storage capacity is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into tarballs with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the repository is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application's exit code and the returned log file for errors after all data transfers and any tarball creation process.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Access Through the Queue System'''  ==&lt;br /&gt;
All access to the archive system is done through the [[Moab|GPC queue system]].&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the HPSS should be scripted into jobs and submitted to the ''archive'' queue. See HSI example below.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_put_file_in_hpss&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The status of pending jobs can be monitored with showq specifying the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Recalling Data for Analysis ===&lt;br /&gt;
&lt;br /&gt;
Typically, data will be recalled to the /scratch file system when it is needed for analysis. Job dependencies can be used to have analysis jobs wait in the queue for data recalls before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-recall.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-recalled-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with HPSS. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents in HPSS. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the file does not already exist in HPSS&lt;br /&gt;
|-&lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local storage only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of HPSS.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
* Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* More complex sequences can be performed using a script such as this:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
Complete documentation of HSI is available on the Gleicher Enterprises web site.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/ HSI Introduction]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help hsi help]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Typical Usage ===&lt;br /&gt;
The most common interactions will be ''putting'' data into HPSS, examining the contents (ls), and ''getting'' data back onto one of the active filesystems for inspection or analysis.&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away&lt;br /&gt;
cd put-away&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
A convenient way to explore the contents of HPSS is with the inventory shell [[ISH]]. This example creates an index of all the files in a user's portion of the namespace. The list is placed in the file /home/$USER/HPSSdm/hsi.igz that can be inspected from the gpc-devel nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_index&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
TODAY=$(date +%Y%m%d)&lt;br /&gt;
INDEX_DIR=/home/$USER/HPSSdm&lt;br /&gt;
if [[ -! -e $INDEX_DIR ]];then&lt;br /&gt;
  mkdir $INDEX_DIR&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
export ISHREGISTER=$HOME/HPSSdm&lt;br /&gt;
ish hindex&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This index can be browsed or searched with ISH on the development nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc-f104n084-$ ish ~/HPSSdm/hsi.igz &lt;br /&gt;
[ish]hsi.igz&amp;gt; help&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
ISH is a powerful tool that is also useful for creating and browsing indices of tar and htar archives, so please look at the [[ISH|documentation]] or built in help.&lt;br /&gt;
&lt;br /&gt;
* sample '''data recall'''&lt;br /&gt;
   - This example should be modified to emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do optimization.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-recall.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N recall&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/recalled-from-hpss&lt;br /&gt;
hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
cget /scratch/$USER/recalled-from-hpss/Jan-2010-jobs.tar.gz : put-away-on-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
cget /scratch/$USER/recalled-from-hpss/Feb-2010-jobs.tar.gz : put-away-on-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
&lt;br /&gt;
exit $status&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI vs. FTP ===&lt;br /&gt;
HSI syntax and usage is very similar to that of FTP. Please note the following information adapted from the HSI man page:&lt;br /&gt;
&lt;br /&gt;
HSI supports several of the commonly used FTP commands, including &amp;quot;dir&amp;quot;,&amp;quot;get&amp;quot;,&amp;quot;ls&amp;quot;,&amp;quot;mdelete&amp;quot;,&amp;quot;mget&amp;quot;,&amp;quot;put&amp;quot;,&amp;quot;mput&amp;quot; and &amp;quot;prompt&amp;quot;, with the following differences:&lt;br /&gt;
&lt;br /&gt;
* The &amp;quot;dir&amp;quot; command is an alias for &amp;quot;ls&amp;quot; in HSI. The &amp;quot;ls&amp;quot; command supports an extensive set of options for displaying files, including wildcard pattern-matching, and the ability to recursively list a directory tree&lt;br /&gt;
* The &amp;quot;put&amp;quot; and &amp;quot;get&amp;quot; family of commands support recursion&lt;br /&gt;
* There are &amp;quot;conditional put&amp;quot; and &amp;quot;conditional&amp;quot; get commands (cput, cget)&lt;br /&gt;
* The syntax for renaming files during transfers with HSI is different from FTP. With HSI, the general format is always &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     &amp;quot;local_file : hpss_file&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and multiple such pairs may be specified on a single command line.&lt;br /&gt;
&lt;br /&gt;
For example, when using HSI to store the local file &amp;quot;file1&amp;quot; as &amp;quot;hpss_file1&amp;quot; into HPSS, then retrieve it back to the local filesystem as &amp;quot;file1.bak&amp;quot;, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 : hpss_file1&lt;br /&gt;
    get file1.bak : hpss_file1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* unlike with FTP, where the following syntax would be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 hpss_file1 &lt;br /&gt;
    get hpss_file1 file1.bak&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The &amp;quot;m&amp;quot; prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The &amp;quot;m&amp;quot; series of commands are intended to provide a measure of compatibility for FTP users.&lt;br /&gt;
&lt;br /&gt;
=== Other HSI Examples === &lt;br /&gt;
&lt;br /&gt;
* Creating tar archive of C source programs and header files on the fly by piping stdout:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HSI pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Retrieve the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Put a subdirectory ''LargeFiles'' and all its contents recursively. You may use '-u' option to resume a previously disrupted session.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   cput -R -u LargeFiles&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help''']&lt;br /&gt;
&lt;br /&gt;
* verify checksum&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N checksum_verified_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
thefile=&amp;lt;localpath&amp;gt;&lt;br /&gt;
storedfile=&amp;lt;hpsspath&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# Generate checksum on fly using a named pipe so that file is only read from GPFS once&lt;br /&gt;
mkfifo /tmp/NPIPE&lt;br /&gt;
cat $thefile  | tee /tmp/NPIPE | hsi -q put - : $storedfile &amp;amp;&lt;br /&gt;
pid=$!&lt;br /&gt;
md5sum /tmp/NPIPE |tee /tmp/$fname.md5&lt;br /&gt;
rm     /tmp/NPIPE&lt;br /&gt;
&lt;br /&gt;
# Check the exit code of the HSI process  &lt;br /&gt;
wait $pid&lt;br /&gt;
sc=$?&lt;br /&gt;
if [ $sc != 0 ];then&lt;br /&gt;
  echo &amp;quot;File transfer failed&amp;quot;&lt;br /&gt;
  exit $sc&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# change filename to stdin in checksum file&lt;br /&gt;
sed -i.1 &amp;quot;s+/tmp/NPIPE+-+&amp;quot; /tmp/$fname.md5&lt;br /&gt;
&lt;br /&gt;
# verify checksum&lt;br /&gt;
hsi -q get - : $storedfile  | md5sum -c  /tmp/$fname.md5&lt;br /&gt;
sc=$?&lt;br /&gt;
if [ $sc != 0 ]; then&lt;br /&gt;
  echo '!!! Job Failed !!!'&lt;br /&gt;
  echo 'error=' $sc&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
== '''HTAR''' ==&lt;br /&gt;
''' Please aggregate small files (&amp;lt;~100MB) into tarballs or htar files. '''&lt;br /&gt;
&lt;br /&gt;
HTAR is a utility that is used for aggregating a set of files and directories, by using a sophisticated multithreaded buffering scheme to write files from the local filesystem directly into HPSS, creating an archive file that conforms to the POSIX TAR specification, thereby achieving a high rate of performance. &lt;br /&gt;
&lt;br /&gt;
=== '''CAUTION''' ===&lt;br /&gt;
* Files larger than 68 GB cannot be stored in an htar archive (you'll get an error message for the whole operation)&lt;br /&gt;
* HTAR archives cannot contain more than 1 million files.&lt;br /&gt;
* Check the HTAR exit code and log file before removing any files from the active filesystems.&lt;br /&gt;
&lt;br /&gt;
=== HTAR Usage ===&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3488</id>
		<title>HPSS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3488"/>
		<updated>2011-06-24T22:20:41Z</updated>

		<summary type="html">&lt;p&gt;Knecht: ISH corrections&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''High Performance Storage System''' =&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The High Performance Storage System ([http://www.hpss-collaboration.org/index.shtml HPSS]) is a tape backed hierarchical storage system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data will be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
Transfer of data into and out of the repository will be under the control of the user who will interact with the system using one or more of the following utilities:&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi HSI] is a client with an ftp-like interface which can be used to archive and retrieve large files. It is also useful for browsing the contents of the repository.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/htar HTAR] is a utility that creates tar format archives resident in the repository. It also creates a separate index file (.idx) that can be accessed quickly.&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/ISH ISH] is a TUI utility that can perform an inventory of the files and directories in your tarballs.&lt;br /&gt;
&lt;br /&gt;
User access will be controlled by the job scheduling system of the GPC. Transfer of data into or out of the repository is expected to be scripted and submitted as a batch job.&lt;br /&gt;
&lt;br /&gt;
== '''Guidelines''' ==&lt;br /&gt;
* Expanded storage capacity is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into tarballs with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the repository is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application's exit code and the returned log file for errors after all data transfers and any tarball creation process.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Access Through the Queue System'''  ==&lt;br /&gt;
All access to the archive system is through the GPC queue system documented [[Moab|here]].&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the repository should be scripted into jobs and submitted to the ''archive'' queue. Scripts should use the HSI and/or HTAR commands as in the example below.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_put_file_in_repo&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The status of pending jobs can be monitored with showq specifying the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Recalling Data for Analysis ===&lt;br /&gt;
&lt;br /&gt;
Typically, data will be recalled to the /scratch file system when it is needed for analysis. Job dependencies can be used to have analysis jobs wait in the queue for data recalls before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-recall.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-recalled-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with the archive system. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents of the repository. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the file does not already exist in the repository&lt;br /&gt;
|-&lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from the repository to your local storage only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of the repository.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
* Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* More complex sequences can be performed using a script such as this:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
Complete documentation of HSI is available on the Gleicher Enterprises web site.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/ HSI Introduction]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help hsi help]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Typical Usage ===&lt;br /&gt;
The most common interactions will be ''putting'' data into the repository, examining the contents of the repository (ls), and ''getting'' data back onto one of the active filesystems for inspection or analysis.&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away&lt;br /&gt;
cd put-away&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
A convenient way to explore the contents of the archive is with the inventory shell [[ISH]]. This example creates an index of all the files in a user's portion of the namespace. The list is placed in the file /home/$USER/HPSSdm/hsi.igz that can be inspected from the gpc-devel nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N repository_index&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
TODAY=$(date +%Y%m%d)&lt;br /&gt;
INDEX_DIR=/home/$USER/HPSSdm&lt;br /&gt;
if [[ -! -e $INDEX_DIR ]];then&lt;br /&gt;
  mkdir $INDEX_DIR&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
export ISHREGISTER=$HOME/HPSSdm&lt;br /&gt;
ish hindex&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This index can be browsed or seached with ISH on the development nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc-f104n084-$ ish ~/HPSSdm/hsi.igz &lt;br /&gt;
[ish]hsi.igz&amp;gt; help&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
ISH is a powerful tool that is also useful for creating and browsing indices of tar and htar archives, so please look at the [[ISH|documentation]] or built in help.&lt;br /&gt;
&lt;br /&gt;
* sample '''data recall'''&lt;br /&gt;
   - This example should be modified to emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do optimization.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-recall.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N recall&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/recalled-from-repository&lt;br /&gt;
hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
cget /scratch/$USER/recalled-from-repository/Jan-2010-jobs.tar.gz : put-away-on-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
cget /scratch/$USER/recalled-from-repository/Feb-2010-jobs.tar.gz : put-away-on-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
&lt;br /&gt;
exit $status&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI vs. FTP ===&lt;br /&gt;
HSI syntax and usage is very similar to that of FTP. Please note the following information adapted from the HSI man page:&lt;br /&gt;
&lt;br /&gt;
HSI supports several of the commonly used FTP commands, including &amp;quot;dir&amp;quot;,&amp;quot;get&amp;quot;,&amp;quot;ls&amp;quot;,&amp;quot;mdelete&amp;quot;,&amp;quot;mget&amp;quot;,&amp;quot;put&amp;quot;,&amp;quot;mput&amp;quot; and &amp;quot;prompt&amp;quot;, with the following differences:&lt;br /&gt;
&lt;br /&gt;
* The &amp;quot;dir&amp;quot; command is an alias for &amp;quot;ls&amp;quot; in HSI. The &amp;quot;ls&amp;quot; command supports an extensive set of options for displaying files, including wildcard pattern-matching, and the ability to recursively list a directory tree&lt;br /&gt;
* The &amp;quot;put&amp;quot; and &amp;quot;get&amp;quot; family of commands support recursion&lt;br /&gt;
* There are &amp;quot;conditional put&amp;quot; and &amp;quot;conditional&amp;quot; get commands (cput, cget)&lt;br /&gt;
&lt;br /&gt;
* The syntax for renaming files during transfers with HSI is different from FTP. With HSI, the general format is always &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     &amp;quot;local_file : repository_file&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and multiple such pairs may be specified on a single command line.&lt;br /&gt;
With FTP, the local filename is specified first on a &amp;quot;put&amp;quot; command, and second on a &amp;quot;get&amp;quot; command. &lt;br /&gt;
&lt;br /&gt;
For example, when using HSI to store the local file &amp;quot;file1&amp;quot; as &amp;quot;repository_file1&amp;quot; on the archival system, then retrieve it back to the local filesystem as &amp;quot;file1.bak&amp;quot;, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 : repository_file1&lt;br /&gt;
    get file1.bak : repository_file1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* With FTP, the following syntax would be used instead:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 repository_file1 &lt;br /&gt;
    get repository_file1 file1.bak&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The &amp;quot;m&amp;quot; prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The &amp;quot;m&amp;quot; series of commands are intended to provide a measure of compatibility for FTP users.&lt;br /&gt;
&lt;br /&gt;
=== Other HSI Examples === &lt;br /&gt;
&lt;br /&gt;
* Creating tar archive of C source programs and header files on the fly by piping stdout:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HSI pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Retrieve the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Put a subdirectory ''LargeFiles'' and all its contents recursively. You may use '-u' option to resume a previously disrupted session.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   cput -R -u LargeFiles&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help''']&lt;br /&gt;
&lt;br /&gt;
== '''HTAR''' ==&lt;br /&gt;
''' Please aggregate small files (&amp;lt;~100MB) into tarballs or htar files. '''&lt;br /&gt;
&lt;br /&gt;
HTAR is a utility that is used for aggregating a set of files from the local filesystem directly into the repository, creating a file that conforms to the POSIX TAR specification.&lt;br /&gt;
It does this without having to first create an intermediate file on the local filesystem; instead, it uses a sophisticated multithreaded buffering scheme to write files directly into the repository, thereby achieving a high rate of performance. &lt;br /&gt;
&lt;br /&gt;
=== '''CAUTION''' ===&lt;br /&gt;
* Files larger than 68 GB cannot be stored in an htar archive (you'll get an error message for the whole operation)&lt;br /&gt;
* HTAR archives cannot contain more than 1 million files.&lt;br /&gt;
* Check the HTAR exit code and log file before removing any files from the active filesystems.&lt;br /&gt;
&lt;br /&gt;
=== HTAR Usage ===&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the Archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3487</id>
		<title>HPSS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3487"/>
		<updated>2011-06-24T22:17:32Z</updated>

		<summary type="html">&lt;p&gt;Knecht: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''High Performance Storage System''' =&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The High Performance Storage System ([http://www.hpss-collaboration.org/index.shtml HPSS]) is a tape backed hierarchical storage system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data will be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
Transfer of data into and out of the repository will be under the control of the user who will interact with the system using one or more of the following utilities:&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi HSI] is a client with an ftp-like interface which can be used to archive and retrieve large files. It is also useful for browsing the contents of the repository.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/htar HTAR] is a utility that creates tar format archives resident in the repository. It also creates a separate index file (.idx) that can be accessed quickly.&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/ISH ISH] is a TUI utility that can perform an inventory of the files and directories in your tarballs.&lt;br /&gt;
&lt;br /&gt;
User access will be controlled by the job scheduling system of the GPC. Transfer of data into or out of the repository is expected to be scripted and submitted as a batch job.&lt;br /&gt;
&lt;br /&gt;
== '''Guidelines''' ==&lt;br /&gt;
* Expanded storage capacity is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into tarballs with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the repository is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application's exit code and the returned log file for errors after all data transfers and any tarball creation process.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Access Through the Queue System'''  ==&lt;br /&gt;
All access to the archive system is through the GPC queue system documented [[Moab|here]].&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the repository should be scripted into jobs and submitted to the ''archive'' queue. Scripts should use the HSI and/or HTAR commands as in the example below.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_put_file_in_repo&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The status of pending jobs can be monitored with showq specifying the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Recalling Data for Analysis ===&lt;br /&gt;
&lt;br /&gt;
Typically, data will be recalled to the /scratch file system when it is needed for analysis. Job dependencies can be used to have analysis jobs wait in the queue for data recalls before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-recall.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-recalled-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with the archive system. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents of the repository. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the file does not already exist in the repository&lt;br /&gt;
|-&lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from the repository to your local storage only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of the repository.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
* Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* More complex sequences can be performed using a script such as this:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
Complete documentation of HSI is available on the Gleicher Enterprises web site.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/ HSI Introduction]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help hsi help]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Typical Usage ===&lt;br /&gt;
The most common interactions will be ''putting'' data into the repository, examining the contents of the repository (ls), and ''getting'' data back onto one of the active filesystems for inspection or analysis.&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away&lt;br /&gt;
cd put-away&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
A convenient way to explore the contents of the archive is with the inventory shell [[ISH]]. This example creates an index of all the files in a user's portion of the namespace. The list is placed in the file /home/$USER/HPSSdm/hsi.igz that can be inspected from the gpc-devel nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N repository_index&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
TODAY=$(date +%Y%m%d)&lt;br /&gt;
INDEX_DIR=/home/$USER/HPSSdm&lt;br /&gt;
if [[ -! -e $INDEX_DIR ]];then&lt;br /&gt;
  mkdir $INDEX_DIR&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
export ISHREGISTER=$HOME/HPSSdm&lt;br /&gt;
ish hindex&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This index can be browsed with ISH on the development nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc-f104n084-$ ish $HPSSdm/hsi.idx&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
ISH is a powerful tool that is also useful for creating and browsing indices of tar and htar archives, so please look at the [[ISH|documentation]] or built in help.&lt;br /&gt;
&lt;br /&gt;
* sample '''data recall'''&lt;br /&gt;
   - This example should be modified to emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do optimization.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-recall.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N recall&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/recalled-from-repository&lt;br /&gt;
hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
cget /scratch/$USER/recalled-from-repository/Jan-2010-jobs.tar.gz : put-away-on-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
cget /scratch/$USER/recalled-from-repository/Feb-2010-jobs.tar.gz : put-away-on-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
&lt;br /&gt;
exit $status&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI vs. FTP ===&lt;br /&gt;
HSI syntax and usage is very similar to that of FTP. Please note the following information adapted from the HSI man page:&lt;br /&gt;
&lt;br /&gt;
HSI supports several of the commonly used FTP commands, including &amp;quot;dir&amp;quot;,&amp;quot;get&amp;quot;,&amp;quot;ls&amp;quot;,&amp;quot;mdelete&amp;quot;,&amp;quot;mget&amp;quot;,&amp;quot;put&amp;quot;,&amp;quot;mput&amp;quot; and &amp;quot;prompt&amp;quot;, with the following differences:&lt;br /&gt;
&lt;br /&gt;
* The &amp;quot;dir&amp;quot; command is an alias for &amp;quot;ls&amp;quot; in HSI. The &amp;quot;ls&amp;quot; command supports an extensive set of options for displaying files, including wildcard pattern-matching, and the ability to recursively list a directory tree&lt;br /&gt;
* The &amp;quot;put&amp;quot; and &amp;quot;get&amp;quot; family of commands support recursion&lt;br /&gt;
* There are &amp;quot;conditional put&amp;quot; and &amp;quot;conditional&amp;quot; get commands (cput, cget)&lt;br /&gt;
&lt;br /&gt;
* The syntax for renaming files during transfers with HSI is different from FTP. With HSI, the general format is always &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     &amp;quot;local_file : repository_file&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and multiple such pairs may be specified on a single command line.&lt;br /&gt;
With FTP, the local filename is specified first on a &amp;quot;put&amp;quot; command, and second on a &amp;quot;get&amp;quot; command. &lt;br /&gt;
&lt;br /&gt;
For example, when using HSI to store the local file &amp;quot;file1&amp;quot; as &amp;quot;repository_file1&amp;quot; on the archival system, then retrieve it back to the local filesystem as &amp;quot;file1.bak&amp;quot;, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 : repository_file1&lt;br /&gt;
    get file1.bak : repository_file1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* With FTP, the following syntax would be used instead:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 repository_file1 &lt;br /&gt;
    get repository_file1 file1.bak&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The &amp;quot;m&amp;quot; prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The &amp;quot;m&amp;quot; series of commands are intended to provide a measure of compatibility for FTP users.&lt;br /&gt;
&lt;br /&gt;
=== Other HSI Examples === &lt;br /&gt;
&lt;br /&gt;
* Creating tar archive of C source programs and header files on the fly by piping stdout:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HSI pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Retrieve the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Put a subdirectory ''LargeFiles'' and all its contents recursively. You may use '-u' option to resume a previously disrupted session.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   cput -R -u LargeFiles&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help''']&lt;br /&gt;
&lt;br /&gt;
== '''HTAR''' ==&lt;br /&gt;
''' Please aggregate small files (&amp;lt;~100MB) into tarballs or htar files. '''&lt;br /&gt;
&lt;br /&gt;
HTAR is a utility that is used for aggregating a set of files from the local filesystem directly into the repository, creating a file that conforms to the POSIX TAR specification.&lt;br /&gt;
It does this without having to first create an intermediate file on the local filesystem; instead, it uses a sophisticated multithreaded buffering scheme to write files directly into the repository, thereby achieving a high rate of performance. &lt;br /&gt;
&lt;br /&gt;
=== '''CAUTION''' ===&lt;br /&gt;
* Files larger than 68 GB cannot be stored in an htar archive (you'll get an error message for the whole operation)&lt;br /&gt;
* HTAR archives cannot contain more than 1 million files.&lt;br /&gt;
* Check the HTAR exit code and log file before removing any files from the active filesystems.&lt;br /&gt;
&lt;br /&gt;
=== HTAR Usage ===&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the Archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3461</id>
		<title>HPSS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=HPSS&amp;diff=3461"/>
		<updated>2011-06-24T15:09:52Z</updated>

		<summary type="html">&lt;p&gt;Knecht: typo in example&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''High Performance Storage System''' =&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The High Performance Storage System ([http://www.hpss-collaboration.org/index.shtml HPSS]) is a tape backed hierarchical storage system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data will be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
Migration of data into and out of the repository will be under the control of the user who will interact with the system using one or more of the following utilities:&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi HSI] is a client with an ftp-like interface which can be used to archive and retrieve large files. It is also useful for browsing the contents of the repository.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/htar HTAR] is a utility that creates tar format archives resident in the repository. It also creates a separate index file (.idx) that can be accessed quickly.&lt;br /&gt;
* [https://support.scinet.utoronto.ca/wiki/index.php/ISH ISH] is a TUI utility that can perform an inventory of the files and directories in your tarballs.&lt;br /&gt;
&lt;br /&gt;
User access will be controlled by the job scheduling system of the GPC. Transfer of data into or out of the repository is expected to be scripted and submitted as a batch job.&lt;br /&gt;
&lt;br /&gt;
== '''Guidelines''' ==&lt;br /&gt;
* Expanded storage capacity is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into archive files with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the repository is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application's exit code and the returned log file for errors after all data transfers.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Access Through the Queue System'''  ==&lt;br /&gt;
All access to the archive system is through the GPC queue system documented [[Moab|here]].&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the repository should be scripted into jobs and submitted to the ''archive'' queue. Scripts should use the HSI and/or HTAR commands as in the example below.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_put_file_in_repo&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The status of pending jobs can be monitored with showq specifying the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Recalling Data for Analysis ===&lt;br /&gt;
&lt;br /&gt;
Typically, data will be recalled to the /scratch file system when it is needed for analysis. Job dependencies can be used to have analysis jobs wait in the queue for data recalls before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-recall.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-recalled-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with the archive system. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents of the repository. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the file does not already exist in the repository |- &lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from the repository to your local storage only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of the repository.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More complex operations can be performed using a task sequence such as this:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
Complete documentation of HSI is available on the Gleicher Enterprises web site.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Typical Usage ===&lt;br /&gt;
The most common interactions will be ''putting'' data into the repository, examining the contents of the repository (ls), and ''getting'' data back onto one of the active filesystems for inspection or analysis.&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away&lt;br /&gt;
cd put-away&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
    - This example performs a recursive list of all files in a user's portion of the namespace. The list is placed in a dated file in the directory /home/$USER/repository.ix that can be inspected from the gpc-devel nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N repository_dump&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
TODAY=$(date +%Y%m%d)&lt;br /&gt;
INDEX_DIR=/home/$USER/repository.ix&lt;br /&gt;
if [[ -! -e $INDEX_DIR ]];then&lt;br /&gt;
  mkdir $INDEX_DIR&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  &amp;lt;&amp;lt;EOF &amp;gt; $INDEX_DIR/contents-$TODAY&lt;br /&gt;
ls -lUR&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data recall'''&lt;br /&gt;
   - This example should be modified to emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do optimization.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-recall.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N recall&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/recalled-from-repository&lt;br /&gt;
hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
cget /scratch/$USER/recalled-from-repository/Jan-2010-jobs.tar.gz : put-away-on-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
cget /scratch/$USER/recalled-from-repository/Feb-2010-jobs.tar.gz : put-away-on-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
&lt;br /&gt;
exit $status&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI vs. FTP ===&lt;br /&gt;
HSI syntax and usage is very similar to that of FTP. Please note the following information adapted from the HSI man page:&lt;br /&gt;
&lt;br /&gt;
HSI supports several of the commonly used FTP commands, including &amp;quot;dir&amp;quot;,&amp;quot;get&amp;quot;,&amp;quot;ls&amp;quot;,&amp;quot;mdelete&amp;quot;,&amp;quot;mget&amp;quot;,&amp;quot;put&amp;quot;,&amp;quot;mput&amp;quot; and &amp;quot;prompt&amp;quot;, with the following differences:&lt;br /&gt;
&lt;br /&gt;
* The &amp;quot;dir&amp;quot; command is an alias for &amp;quot;ls&amp;quot; in HSI. The &amp;quot;ls&amp;quot; command supports an extensive set of options for displaying files, including wildcard pattern-matching, and the ability to recursively list a directory tree&lt;br /&gt;
* The &amp;quot;put&amp;quot; and &amp;quot;get&amp;quot; family of commands support recursion&lt;br /&gt;
* There are &amp;quot;conditional put&amp;quot; and &amp;quot;conditional&amp;quot; get commands (cput, cget)&lt;br /&gt;
&lt;br /&gt;
* The syntax for renaming files during transfers with HSI is different from FTP. With HSI, the general format is always &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     &amp;quot;local_file : repository_file&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and multiple such pairs may be specified on a single command line.&lt;br /&gt;
With FTP, the local filename is specified first on a &amp;quot;put&amp;quot; command, and second on a &amp;quot;get&amp;quot; command. &lt;br /&gt;
&lt;br /&gt;
For example, when using HSI to store the local file &amp;quot;file1&amp;quot; as &amp;quot;repository_file1&amp;quot; on the archival system, then retrieve it back to the local filesystem as &amp;quot;file1.bak&amp;quot;, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 : repository_file1&lt;br /&gt;
    get file1.bak : repository_file1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* With FTP, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 repository_file1 &lt;br /&gt;
    get repository_file1 file1.bak&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The &amp;quot;m&amp;quot; prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The &amp;quot;m&amp;quot; series of commands are intended to provide a measure of compatibility for FTP users.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Other HSI Examples === &lt;br /&gt;
&lt;br /&gt;
* Creating tar archive on the fly by piping stdout:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HSI pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Retrieve the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' online or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help'''] from the hsi prompt.&lt;br /&gt;
&lt;br /&gt;
== '''HTAR''' ==&lt;br /&gt;
''' Please aggregate small files (&amp;lt;~100MB) into tarballs or htar files. '''&lt;br /&gt;
&lt;br /&gt;
HTAR is a utility that is used for aggregating a set of files from the local filesystem directly into the repository, creating a file that conforms to the POSIX TAR specification.&lt;br /&gt;
It does this without having to first create an intermediate file on the local filesystem; instead, it uses a sophisticated multithreaded buffering scheme to write files directly into the repository, thereby achieving a high rate of performance. &lt;br /&gt;
&lt;br /&gt;
=== '''CAUTION''' ===&lt;br /&gt;
* Files larger than 68 GB cannot be stored in an htar archive.&lt;br /&gt;
* HTAR archives cannot contain more than 1 million files.&lt;br /&gt;
* Check the HTAR exit code and log file before removing any files from the active filesystems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== HTAR Usage ===&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the Archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;br /&gt;
&lt;br /&gt;
== '''More detailed examples''' ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# create a tarball on-the-fly of the finished-job3 directory&lt;br /&gt;
/usr/local/bin/htar -cf finished-job3.tar /scratch/$USER/workarea/finished-job3/&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
cd /scratch/$USER/recalled-from-repository/usr/local/bin/htar -xf finished-job3.tar&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3432</id>
		<title>MARS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3432"/>
		<updated>2011-06-23T19:00:49Z</updated>

		<summary type="html">&lt;p&gt;Knecht: Add always-email and raising of exit status to all qsub examples.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''Massive Archive and Retrieval System''' =&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The SciNet Massive Archive and Retrieval System (MARS) is a tape backed hierarchical storage management system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data will be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Migration of data into and out of the repository will be under the control of the user who will interact with system using one or both of the &lt;br /&gt;
following utilities:&lt;br /&gt;
* HSI is a client with an ftp-like interface which can be used to archive and retrieve large files. It is also useful for browsing the contents of the repository.&lt;br /&gt;
* HTAR is a utility that creates tar format archives resident in the repository. It also creates a separate index file that can be accessed quickly.&lt;br /&gt;
* ISH is a TUI utility that can perform an inventory of the files and directories in your tarballs.&lt;br /&gt;
&lt;br /&gt;
User access will be controlled by the job scheduling system of the GPC. Transfer of data into or out of the archive is expected to be scripted and submitted as a batch job.&lt;br /&gt;
&lt;br /&gt;
== '''Guidelines''' ==&lt;br /&gt;
* HPSS storage space is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into archive files with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the archive is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application return code and check the  log file for errors after all data transfers.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Access Through the Queue System'''  ==&lt;br /&gt;
All access to the archive system is through the GPC queue system documented [[Moab|here]].&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the archive should be scripted into jobs and submitted to the ''archive'' queue. Script will use either the HSI or HTAR commands documented below.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_put_file_in_repo&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The status of pending jobs can be monitored with showq specifying the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Recalling Data for Analysis ===&lt;br /&gt;
&lt;br /&gt;
Typically, data will be recalled to the /scratch file system when it is needed for analysis. Job dependencies can be used to have analysis jobs wait in the queue for data recalls before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-restore.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-restored-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with the archive system. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents of the archive. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the HPSS file does not exist &lt;br /&gt;
|- &lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local file space on the host system only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of the archive.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More complex operations can be performed using a Here Document.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;-EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
Complete documentation of HSI is available on the Gleicher Enterprises web site.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Typical Usage ===&lt;br /&gt;
The most common interactions with the HPSS archive will ''putting'' data into the archive, examining the contents of the archive (ls), and ''getting'' data back onto one of the main filesystems for inspection or analysis.&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away&lt;br /&gt;
cd put-away&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
if [ ! $status == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
exit $status&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
    - This example performs a recursive list of all files in a user's portion of the namespace. The list is placed in a dated file in the directory /home/$USER/HPSS.ix that can be inspected from the gpc-devel nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_dump&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
TODAY=$(date +%Y%m%d)&lt;br /&gt;
INDEX_DIR=/home/$USER/HPSS.ix&lt;br /&gt;
if [[ -! -e $INDEX_DIR ]];then&lt;br /&gt;
  mkdir $INDEX_DIR&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  &amp;lt;&amp;lt;EOF &amp;gt; $INDEX_DIR/contents-$TODAY&lt;br /&gt;
ls -lUR&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data restore'''&lt;br /&gt;
   - This example should be modified to emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do staging optimization.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-restore.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N restore&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -me&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/restored-from-MARS&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
cget /scratch/$USER/restored-from-MARS/Jan-2010-jobs.tar.gz : forgotten-from-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
cget /scratch/$USER/restored-from-MARS/Feb-2010-jobs.tar.gz : forgotten-from-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
status=$?&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
exit $status&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI vs. FTP ===&lt;br /&gt;
HSI syntax and usage is very similar to that of FTP. Please note the following information adapted from the HSI man page:&lt;br /&gt;
&lt;br /&gt;
HSI supports several of the commonly used FTP commands, including &amp;quot;dir&amp;quot;,&amp;quot;get&amp;quot;,&amp;quot;ls&amp;quot;,&amp;quot;mdelete&amp;quot;,&amp;quot;mget&amp;quot;,&amp;quot;put&amp;quot;,&amp;quot;mput&amp;quot; and &amp;quot;prompt&amp;quot;, with the following differences:&lt;br /&gt;
&lt;br /&gt;
* The &amp;quot;dir&amp;quot; command is an alias for &amp;quot;ls&amp;quot; in HSI. The &amp;quot;ls&amp;quot; command supports an extensive set of options for displaying files, including wildcard pattern-matching, and the ability to recursively list a directory tree&lt;br /&gt;
* The &amp;quot;put&amp;quot; and &amp;quot;get&amp;quot; family of commands support recursion&lt;br /&gt;
* There are &amp;quot;conditional put&amp;quot; and &amp;quot;conditional&amp;quot; get commands (cput, cget)&lt;br /&gt;
&lt;br /&gt;
* The syntax for renaming local files when storing files to HPSS or retrieving files from HPSS is different than FTP. With HSI, the syntax is always &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     &amp;quot;local_file : HPSS_file&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and multiple such pairs may be specified on a single command line.&lt;br /&gt;
With FTP, the local filename is specified first on a &amp;quot;put&amp;quot; command, and second on a &amp;quot;get&amp;quot; command. &lt;br /&gt;
&lt;br /&gt;
For example, when using HSI to store the local file &amp;quot;file1&amp;quot; as HPSS file &amp;quot;hpss_file1&amp;quot;, then retrieve it back to the local filesystem as &amp;quot;file1.bak&amp;quot;, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 : hpss_file1&lt;br /&gt;
    get file1.bak : hpss_file1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* With FTP, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 hpss_file1 &lt;br /&gt;
    get hpss_file1 file1.bak&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The &amp;quot;m&amp;quot; prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The &amp;quot;m&amp;quot; series of commands are intended to provide a measure of compatibility for FTP users.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Other HSI Examples === &lt;br /&gt;
&lt;br /&gt;
* Creating tar archive on the fly by piping stdout:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HPSS pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Restore the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' online or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help'''] from the hsi prompt.&lt;br /&gt;
&lt;br /&gt;
== '''HTAR''' ==&lt;br /&gt;
''' Please aggregate small files (&amp;lt;~100MB) into tarballs or htar files. '''&lt;br /&gt;
&lt;br /&gt;
HTAR is a utility that is used for aggregating a set of files from the local file system directly into HPSS, creating a file that conforms to the POSIX TAR specification.&lt;br /&gt;
It does this without having to first create an intermediate file on the local filesystem; instead, it uses a sophisticated multithreaded buffering scheme to write files directly into HPSS, thereby achieving a high rate of performance. &lt;br /&gt;
&lt;br /&gt;
=== '''CAUTION''' ===&lt;br /&gt;
* Files larger than 68 GB cannot be stored in an htar archive.&lt;br /&gt;
* HTAR archives cannot contain more than 1 million files.&lt;br /&gt;
* Check the HTAR exit code and log file before removing any files from the active filesystems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== HTAR Usage ===&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the Archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;br /&gt;
&lt;br /&gt;
== '''More detailed examples''' ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# create a tarball on-the-fly of the finished-job3 directory&lt;br /&gt;
/usr/local/bin/htar -cf finished-job3.tar /scratch/$USER/workarea/finished-job3/&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
cd /scratch/$USER/restored-from-MARS&lt;br /&gt;
/usr/local/bin/htar -xf finished-job3.tar&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3431</id>
		<title>MARS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3431"/>
		<updated>2011-06-23T16:36:24Z</updated>

		<summary type="html">&lt;p&gt;Knecht: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''Massive Archive and Retrieval System''' =&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The SciNet Massive Archive and Retrieval System (MARS) is a tape backed hierarchical storage management system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data will be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Migration of data into and out of the repository will be under the control of the user who will interact with system using one or both of the &lt;br /&gt;
following utilities:&lt;br /&gt;
* HSI is a client with an ftp-like interface which can be used to archive and retrieve large files. It is also useful for browsing the contents of the repository.&lt;br /&gt;
* HTAR is a utility that creates tar format archives resident in the repository. It also creates a separate index file that can be accessed quickly.&lt;br /&gt;
* ISH is a TUI utility that can perform an inventory of the files and directories in your tarballs.&lt;br /&gt;
&lt;br /&gt;
User access will be controlled by the job scheduling system of the GPC. Transfer of data into or out of the archive is expected to be scripted and submitted as a batch job.&lt;br /&gt;
&lt;br /&gt;
== '''Guidelines''' ==&lt;br /&gt;
* HPSS storage space is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into archive files with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the archive is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application return code and check the  log file for errors after all data transfers.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Access Through the Queue System'''  ==&lt;br /&gt;
All access to the archive system is through the GPC queue system documented [[Moab|here]].&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the archive should be scripted into jobs and submitted to the ''archive'' queue. Script will use either the HSI or HTAR commands documented below.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_put_file_in_repo&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
if [ ! $? == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The status of pending jobs can be monitored with showq specifying the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Recalling Data for Analysis ===&lt;br /&gt;
&lt;br /&gt;
Typically, data will be recalled to the /scratch file system when it is needed for analysis. Job dependencies can be used to have analysis jobs wait in the queue for data recalls before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-restore.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-restored-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with the archive system. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents of the archive. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the HPSS file does not exist &lt;br /&gt;
|- &lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local file space on the host system only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of the archive.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More complex operations can be performed using a Here Document.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;-EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
if [ ! $? == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
Complete documentation of HSI is available on the Gleicher Enterprises web site.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Typical Usage ===&lt;br /&gt;
The most common interactions with the HPSS archive will ''putting'' data into the archive, examining the contents of the archive (ls), and ''getting'' data back onto one of the main filesystems for inspection or analysis.&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away&lt;br /&gt;
cd put-away&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
if [ ! $? == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
    - This example performs a recursive list of all files in a user's portion of the namespace. The list is placed in a dated file in the directory /home/$USER/HPSS.ix that can be inspected from the gpc-devel nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_dump&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
&lt;br /&gt;
TODAY=$(date +%Y%m%d)&lt;br /&gt;
INDEX_DIR=/home/$USER/HPSS.ix&lt;br /&gt;
if [[ -! -e $INDEX_DIR ]];then&lt;br /&gt;
  mkdir $INDEX_DIR&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  &amp;lt;&amp;lt;EOF &amp;gt; $INDEX_DIR/contents-$TODAY&lt;br /&gt;
ls -lUR&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data restore'''&lt;br /&gt;
   - This example should be modified to emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do staging optimization.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-restore.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N restore&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/restored-from-MARS&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
cget /scratch/$USER/restored-from-MARS/Jan-2010-jobs.tar.gz : forgotten-from-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
cget /scratch/$USER/restored-from-MARS/Feb-2010-jobs.tar.gz : forgotten-from-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI vs. FTP ===&lt;br /&gt;
HSI syntax and usage is very similar to that of FTP. Please note the following information adapted from the HSI man page:&lt;br /&gt;
&lt;br /&gt;
HSI supports several of the commonly used FTP commands, including &amp;quot;dir&amp;quot;,&amp;quot;get&amp;quot;,&amp;quot;ls&amp;quot;,&amp;quot;mdelete&amp;quot;,&amp;quot;mget&amp;quot;,&amp;quot;put&amp;quot;,&amp;quot;mput&amp;quot; and &amp;quot;prompt&amp;quot;, with the following differences:&lt;br /&gt;
&lt;br /&gt;
* The &amp;quot;dir&amp;quot; command is an alias for &amp;quot;ls&amp;quot; in HSI. The &amp;quot;ls&amp;quot; command supports an extensive set of options for displaying files, including wildcard pattern-matching, and the ability to recursively list a directory tree&lt;br /&gt;
* The &amp;quot;put&amp;quot; and &amp;quot;get&amp;quot; family of commands support recursion&lt;br /&gt;
* There are &amp;quot;conditional put&amp;quot; and &amp;quot;conditional&amp;quot; get commands (cput, cget)&lt;br /&gt;
&lt;br /&gt;
* The syntax for renaming local files when storing files to HPSS or retrieving files from HPSS is different than FTP. With HSI, the syntax is always &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     &amp;quot;local_file : HPSS_file&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and multiple such pairs may be specified on a single command line.&lt;br /&gt;
With FTP, the local filename is specified first on a &amp;quot;put&amp;quot; command, and second on a &amp;quot;get&amp;quot; command. &lt;br /&gt;
&lt;br /&gt;
For example, when using HSI to store the local file &amp;quot;file1&amp;quot; as HPSS file &amp;quot;hpss_file1&amp;quot;, then retrieve it back to the local filesystem as &amp;quot;file1.bak&amp;quot;, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 : hpss_file1&lt;br /&gt;
    get file1.bak : hpss_file1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* With FTP, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 hpss_file1 &lt;br /&gt;
    get hpss_file1 file1.bak&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The &amp;quot;m&amp;quot; prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The &amp;quot;m&amp;quot; series of commands are intended to provide a measure of compatibility for FTP users.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Other HSI Examples === &lt;br /&gt;
&lt;br /&gt;
* Creating tar archive on the fly by piping stdout:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HPSS pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Restore the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' online or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help'''] from the hsi prompt.&lt;br /&gt;
&lt;br /&gt;
== '''HTAR''' ==&lt;br /&gt;
''' Please aggregate small files (&amp;lt;~100MB) into tarballs or htar files. '''&lt;br /&gt;
&lt;br /&gt;
HTAR is a utility that is used for aggregating a set of files from the local file system directly into HPSS, creating a file that conforms to the POSIX TAR specification.&lt;br /&gt;
It does this without having to first create an intermediate file on the local filesystem; instead, it uses a sophisticated multithreaded buffering scheme to write files directly into HPSS, thereby achieving a high rate of performance. &lt;br /&gt;
&lt;br /&gt;
=== '''CAUTION''' ===&lt;br /&gt;
* Files larger than 68 GB cannot be stored in an htar archive.&lt;br /&gt;
* HTAR archives cannot contain more than 1 million files.&lt;br /&gt;
* Check the HTAR exit code and log file before removing any files from the active filesystems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== HTAR Usage ===&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the Archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;br /&gt;
&lt;br /&gt;
== '''More detailed examples''' ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# create a tarball on-the-fly of the finished-job3 directory&lt;br /&gt;
/usr/local/bin/htar -cf finished-job3.tar /scratch/$USER/workarea/finished-job3/&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
cd /scratch/$USER/restored-from-MARS&lt;br /&gt;
/usr/local/bin/htar -xf finished-job3.tar&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3430</id>
		<title>MARS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3430"/>
		<updated>2011-06-23T16:34:08Z</updated>

		<summary type="html">&lt;p&gt;Knecht: Move &amp;quot;typical&amp;quot; HSI examples to HSI section.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''Massive Archive and Retrieval System''' =&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The SciNet Massive Archive and Retrieval System (MARS) is a tape backed hierarchical storage management system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data will be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Migration of data into and out of the repository will be under the control of the user who will interact with system using one or both of the &lt;br /&gt;
following utilities:&lt;br /&gt;
* HSI is a client with an ftp-like interface which can be used to archive and retrieve large files. It is also useful for browsing the contents of the repository.&lt;br /&gt;
* HTAR is a utility that creates tar format archives resident in the repository. It also creates a separate index file that can be accessed quickly.&lt;br /&gt;
* ISH is a TUI utility that can perform an inventory of the files and directories in your tarballs.&lt;br /&gt;
&lt;br /&gt;
User access will be controlled by the job scheduling system of the GPC. Transfer of data into or out of the archive is expected to be scripted and submitted as a batch job.&lt;br /&gt;
&lt;br /&gt;
== '''Guidelines''' ==&lt;br /&gt;
* HPSS storage space is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into archive files with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the archive is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application return code and check the  log file for errors after all data transfers.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Access Through the Queue System'''  ==&lt;br /&gt;
All access to the archive system is through the GPC queue system documented [[Moab|here]].&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the archive should be scripted into jobs and submitted to the ''archive'' queue. Script will use either the HSI or HTAR commands documented below.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_put_file_in_repo&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
if [ ! $? == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The status of pending jobs can be monitored with showq specifying the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Recalling Data for Analysis ===&lt;br /&gt;
&lt;br /&gt;
Typically, data will be recalled to the /scratch file system when it is needed for analysis. Job dependencies can be used to have analysis jobs wait in the queue for data recalls before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-restore.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-restored-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with the archive system. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents of the archive. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the HPSS file does not exist &lt;br /&gt;
|- &lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local file space on the host system only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of the archive.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More complex operations can be performed using a Here Document.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;-EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
if [ ! $? == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
Complete documentation of HSI is available on the Gleicher Enterprises web site.&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Typical Usage ===&lt;br /&gt;
The most common interactions with the HPSS archive will ''putting'' data into the archive, examining the contents of the archive (ls), and ''getting'' data back onto one of the main filesystems for inspection or analysis.&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away&lt;br /&gt;
cd put-away&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
cput /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
if [ ! $? == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
    - This example performs a recursive list of all files in a user's portion of the namespace. The list is placed in a dated file in the directory /home/$USER/HPSS.ix that can be inspected from the gpc-devel nodes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_dump&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
&lt;br /&gt;
TODAY=$(date +%Y%m%d)&lt;br /&gt;
INDEX_DIR=/home/$USER/HPSS.ix&lt;br /&gt;
if [[ -! -e $INDEX_DIR ]];then&lt;br /&gt;
  mkdir $INDEX_DIR&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  &amp;lt;&amp;lt;EOF &amp;gt; $INDEX_DIR/contents-$TODAY&lt;br /&gt;
ls -lUR&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data restore'''&lt;br /&gt;
   - This example should be modified to emphasize that a single ''cget'' of multiple files (rather than several separate gets) allows HSI to do staging optimization.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-restore.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N restore&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/restored-from-MARS&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
cget /scratch/$USER/restored-from-MARS/Jan-2010-jobs.tar.gz : forgotten-from-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
cget /scratch/$USER/restored-from-MARS/Feb-2010-jobs.tar.gz : forgotten-from-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI vs. FTP ===&lt;br /&gt;
HSI syntax and usage is very similar to that of FTP. Please note the following information adapted from the HSI man page:&lt;br /&gt;
&lt;br /&gt;
HSI supports several of the commonly used FTP commands, including &amp;quot;dir&amp;quot;,&amp;quot;get&amp;quot;,&amp;quot;ls&amp;quot;,&amp;quot;mdelete&amp;quot;,&amp;quot;mget&amp;quot;,&amp;quot;put&amp;quot;,&amp;quot;mput&amp;quot; and &amp;quot;prompt&amp;quot;, with the following differences:&lt;br /&gt;
&lt;br /&gt;
* The &amp;quot;dir&amp;quot; command is an alias for &amp;quot;ls&amp;quot; in HSI. The &amp;quot;ls&amp;quot; command supports an extensive set of options for displaying files, including wildcard pattern-matching, and the ability to recursively list a directory tree&lt;br /&gt;
* The &amp;quot;put&amp;quot; and &amp;quot;get&amp;quot; family of commands support recursion&lt;br /&gt;
* There are &amp;quot;conditional put&amp;quot; and &amp;quot;conditional&amp;quot; get commands (cput, cget)&lt;br /&gt;
&lt;br /&gt;
* The syntax for renaming local files when storing files to HPSS or retrieving files from HPSS is different than FTP. With HSI, the syntax is always &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     &amp;quot;local_file : HPSS_file&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and multiple such pairs may be specified on a single command line.&lt;br /&gt;
With FTP, the local filename is specified first on a &amp;quot;put&amp;quot; command, and second on a &amp;quot;get&amp;quot; command. &lt;br /&gt;
&lt;br /&gt;
For example, when using HSI to store the local file &amp;quot;file1&amp;quot; as HPSS file &amp;quot;hpss_file1&amp;quot;, then retrieve it back to the local filesystem as &amp;quot;file1.bak&amp;quot;, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 : hpss_file1&lt;br /&gt;
    get file1.bak : hpss_file1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* With FTP, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 hpss_file1 &lt;br /&gt;
    get hpss_file1 file1.bak&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The &amp;quot;m&amp;quot; prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The &amp;quot;m&amp;quot; series of commands are intended to provide a measure of compatibility for FTP users.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Other HSI Examples === &lt;br /&gt;
&lt;br /&gt;
* Creating tar archive on the fly by piping stdout:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HPSS pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Restore the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' online or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help'''] from the hsi prompt.&lt;br /&gt;
&lt;br /&gt;
== '''HTAR''' ==&lt;br /&gt;
''' Please aggregate small files (&amp;lt;~100MB) into tarballs or htar files. '''&lt;br /&gt;
&lt;br /&gt;
HTAR is a utility that is used for aggregating a set of files from the local file system directly into HPSS, creating a file that conforms to the POSIX TAR specification.&lt;br /&gt;
It does this without having to first create an intermediate file on the local filesystem; instead, it uses a sophisticated multithreaded buffering scheme to write files directly into HPSS, thereby achieving a high rate of performance. &lt;br /&gt;
&lt;br /&gt;
=== '''CAUTION''' ===&lt;br /&gt;
* Files larger than 68MB cannot be stored in an htar archive.&lt;br /&gt;
* HTAR archives cannot contain more than 1 million files.&lt;br /&gt;
* Check the HTAR exit code and log file before removing any files from the active filesystems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== HTAR Usage ===&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the Archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;br /&gt;
&lt;br /&gt;
== '''More detailed examples''' ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# create a tarball on-the-fly of the finished-job3 directory&lt;br /&gt;
/usr/local/bin/htar -cf finished-job3.tar /scratch/$USER/workarea/finished-job3/&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
cd /scratch/$USER/restored-from-MARS&lt;br /&gt;
/usr/local/bin/htar -xf finished-job3.tar&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3429</id>
		<title>MARS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3429"/>
		<updated>2011-06-23T16:03:04Z</updated>

		<summary type="html">&lt;p&gt;Knecht: batch usage clarifications&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''Massive Archive and Retrieval System''' =&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The SciNet Massive Archive and Retrieval System (MARS) is a tape backed hierarchical storage management system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data will be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Migration of data into and out of the repository will be under the control of the user who will interact with system using one or both of the &lt;br /&gt;
following utilities:&lt;br /&gt;
* HSI is a client with an ftp-like interface which can be used to archive and retrieve large files. It is also useful for browsing the contents of the repository.&lt;br /&gt;
* HTAR is a utility that creates tar format archives resident in the repository. It also creates a separate index file that can be accessed quickly.&lt;br /&gt;
* ISH is a TUI utility that can perform an inventory of the files and directories in your tarballs.&lt;br /&gt;
&lt;br /&gt;
User access will be controlled by the job scheduling system of the GPC. Transfer of data into or out of the archive is expected to be scripted and submitted as a batch job.&lt;br /&gt;
&lt;br /&gt;
== '''Guidelines''' ==&lt;br /&gt;
* HPSS storage space is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into archive files with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the archive is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application return code and check the  log file for errors after all data transfers.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Access Through the Queue System'''  ==&lt;br /&gt;
All access to the archive system is through the GPC queue system documented [[Moab|here]].&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the archive should be scripted into jobs and submitted to the ''archive'' queue. Script will use either the HSI or HTAR commands documented below.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_put_file_in_repo&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
if [ ! $? == 0 ];then&lt;br /&gt;
   echo '!!! TRANSFER FAILED !!!'&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The status of pending jobs can be monitored with showq specifying the archive queue:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Recalling Data for Analysis ===&lt;br /&gt;
&lt;br /&gt;
Typically, data will be recalled to the /scratch file system when it is needed for analysis. Job dependencies can be used to have analysis jobs wait in the queue for data recalls before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-restore.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-restored-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with the archive system. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents of the archive. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the HPSS file does not exist &lt;br /&gt;
|- &lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local file space on the host system only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of the archive.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More complex operations can be performed using a Here Document.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;-EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI vs. FTP ===&lt;br /&gt;
HSI syntax and usage is very similar to that of FTP. Please note the following information adapted from the HSI man page:&lt;br /&gt;
&lt;br /&gt;
HSI supports several of the commonly used FTP commands, including &amp;quot;dir&amp;quot;,&amp;quot;get&amp;quot;,&amp;quot;ls&amp;quot;,&amp;quot;mdelete&amp;quot;,&amp;quot;mget&amp;quot;,&amp;quot;put&amp;quot;,&amp;quot;mput&amp;quot; and &amp;quot;prompt&amp;quot;, with the following differences:&lt;br /&gt;
&lt;br /&gt;
* The &amp;quot;dir&amp;quot; command is an alias for &amp;quot;ls&amp;quot; in HSI. The &amp;quot;ls&amp;quot; command supports an extensive set of options for displaying files, including wildcard pattern-matching, and the ability to recursively list a directory tree&lt;br /&gt;
* The &amp;quot;put&amp;quot; and &amp;quot;get&amp;quot; family of commands support recursion&lt;br /&gt;
* There are &amp;quot;conditional put&amp;quot; and &amp;quot;conditional&amp;quot; get commands (cput, cget)&lt;br /&gt;
&lt;br /&gt;
* The syntax for renaming local files when storing files to HPSS or retrieving files from HPSS is different than FTP. With HSI, the syntax is always &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     &amp;quot;local_file : HPSS_file&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and multiple such pairs may be specified on a single command line.&lt;br /&gt;
With FTP, the local filename is specified first on a &amp;quot;put&amp;quot; command, and second on a &amp;quot;get&amp;quot; command. &lt;br /&gt;
&lt;br /&gt;
For example, when using HSI to store the local file &amp;quot;file1&amp;quot; as HPSS file &amp;quot;hpss_file1&amp;quot;, then retrieve it back to the local filesystem as &amp;quot;file1.bak&amp;quot;, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 : hpss_file1&lt;br /&gt;
    get file1.bak : hpss_file1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* With FTP, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 hpss_file1 &lt;br /&gt;
    get hpss_file1 file1.bak&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The &amp;quot;m&amp;quot; prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The &amp;quot;m&amp;quot; series of commands are intended to provide a measure of compatibility for FTP users.&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Other HSI Examples === &lt;br /&gt;
&lt;br /&gt;
* Creating tar archive on the fly by piping stdout:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HPSS pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Restore the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' online or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help'''] from the hsi prompt.&lt;br /&gt;
&lt;br /&gt;
== '''HTAR''' ==&lt;br /&gt;
''' Please aggregate small files (&amp;lt;~100MB) into tarballs or htar files. '''&lt;br /&gt;
&lt;br /&gt;
HTAR is a utility that is used for aggregating a set of files from the local file system directly into HPSS, creating a file that conforms to the POSIX TAR specification.&lt;br /&gt;
It does this without having to first create an intermediate file on the local filesystem; instead, it uses a sophisticated multithreaded buffering scheme to write files directly into HPSS, thereby achieving a high rate of performance. &lt;br /&gt;
&lt;br /&gt;
=== '''CAUTION''' ===&lt;br /&gt;
* Files larger than 68MB cannot be stored in an htar archive.&lt;br /&gt;
* HTAR archives cannot contain more than 1 million files.&lt;br /&gt;
* Check the HTAR exit code and log file before removing any files from the active filesystems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== HTAR Usage ===&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the Archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;br /&gt;
&lt;br /&gt;
== '''More detailed examples''' ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&lt;br /&gt;
qsub -I -q archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away&lt;br /&gt;
cd put-away&lt;br /&gt;
put /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
put /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# create a tarball on-the-fly of the finished-job3 directory&lt;br /&gt;
/usr/local/bin/htar -cf finished-job3.tar /scratch/$USER/workarea/finished-job3/&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
   - Very painful without interactive browsing&lt;br /&gt;
       -Tentative solution: dump all user files to log file and use that as file index&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_dump&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
echo ===========&lt;br /&gt;
echo&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
ls -lUR&lt;br /&gt;
EOF&lt;br /&gt;
echo&lt;br /&gt;
echo ===========&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data restore'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-restore.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N restore&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/restored-from-MARS&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Jan-2010-jobs.tar.gz : forgotten-from-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Feb-2010-jobs.tar.gz : forgotten-from-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
cd /scratch/$USER/restored-from-MARS&lt;br /&gt;
/usr/local/bin/htar -xf finished-job3.tar&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3428</id>
		<title>MARS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3428"/>
		<updated>2011-06-23T15:47:06Z</updated>

		<summary type="html">&lt;p&gt;Knecht: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= '''Massive Archive and Retrieval System''' =&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The SciNet Massive Archive and Retrieval System (MARS) is a tape backed hierarchical storage management system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data will be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Migration of data into and out of the repository will be under the control of the user who will interact with system using one or both of the &lt;br /&gt;
following utilities:&lt;br /&gt;
* HSI is a client with an ftp-like interface which can be used to archive and retrieve large files. It is also useful for browsing the contents of the repository.&lt;br /&gt;
* HTAR is a utility that creates tar format archives resident in the repository. It also creates a separate index file that can be accessed quickly.&lt;br /&gt;
* ISH is a TUI utility that can perform an inventory of the files and directories in your tarballs.&lt;br /&gt;
&lt;br /&gt;
User access will be controlled by the job scheduling system of the GPC. Transfer of data into or out of the archive is expected to be scripted and submitted as a batch job.&lt;br /&gt;
&lt;br /&gt;
== '''Guidelines''' ==&lt;br /&gt;
* HPSS storage space is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into archive files with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the archive is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application return code and check the  log file for errors after all data transfers.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Access Through the Queue System'''  ==&lt;br /&gt;
All access to the archive system is through the queue system. &lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the archive should be scripted into jobs and submitted to the archive queue.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_file_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
if [ ! $? == 0 ];then&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Recalling Data for Analysis ===&lt;br /&gt;
&lt;br /&gt;
Job dependencies can be used to make analysis jobs wait for data recalls before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-restore.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-restored-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with the archive system. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents of the archive. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the HPSS file does not exist &lt;br /&gt;
|- &lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local file space on the host system only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of the archive.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More complex operations can be performed using a Here Document.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;-EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI vs. FTP ===&lt;br /&gt;
HSI syntax and usage is very similar to that of FTP. Please note the following information adapted from the HSI man page:&lt;br /&gt;
&lt;br /&gt;
HSI supports several of the commonly used FTP commands, including &amp;quot;dir&amp;quot;,&amp;quot;get&amp;quot;,&amp;quot;ls&amp;quot;,&amp;quot;mdelete&amp;quot;,&amp;quot;mget&amp;quot;,&amp;quot;put&amp;quot;,&amp;quot;mput&amp;quot; and &amp;quot;prompt&amp;quot;, with the following differences:&lt;br /&gt;
&lt;br /&gt;
* The &amp;quot;dir&amp;quot; command is an alias for &amp;quot;ls&amp;quot; in HSI. The &amp;quot;ls&amp;quot; command supports an extensive set of options for displaying files, including wildcard pattern-matching, and the ability to recursively list a directory tree&lt;br /&gt;
* The &amp;quot;put&amp;quot; and &amp;quot;get&amp;quot; family of commands support recursion&lt;br /&gt;
* There are &amp;quot;conditional put&amp;quot; and &amp;quot;conditional&amp;quot; get commands (cput, cget)&lt;br /&gt;
&lt;br /&gt;
* The syntax for renaming local files when storing files to HPSS or retrieving files from HPSS is different than FTP. With HSI, the syntax is always &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     &amp;quot;local_file : HPSS_file&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and multiple such pairs may be specified on a single command line.&lt;br /&gt;
With FTP, the local filename is specified first on a &amp;quot;put&amp;quot; command, and second on a &amp;quot;get&amp;quot; command. &lt;br /&gt;
&lt;br /&gt;
For example, when using HSI to store the local file &amp;quot;file1&amp;quot; as HPSS file &amp;quot;hpss_file1&amp;quot;, then retrieve it back to the local filesystem as &amp;quot;file1.bak&amp;quot;, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 : hpss_file1&lt;br /&gt;
    get file1.bak : hpss_file1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* With FTP, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 hpss_file1 &lt;br /&gt;
    get hpss_file1 file1.bak&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The &amp;quot;m&amp;quot; prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The &amp;quot;m&amp;quot; series of commands are intended to provide a measure of compatibility for FTP users.&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Other HSI Examples === &lt;br /&gt;
&lt;br /&gt;
* Creating tar archive on the fly by piping stdout:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HPSS pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Restore the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' online or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help'''] from the hsi prompt.&lt;br /&gt;
&lt;br /&gt;
== '''HTAR''' ==&lt;br /&gt;
''' Please aggregate small files (&amp;lt;~100MB) into tarballs or htar files. '''&lt;br /&gt;
&lt;br /&gt;
HTAR is a utility that is used for aggregating a set of files from the local file system directly into HPSS, creating a file that conforms to the POSIX TAR specification.&lt;br /&gt;
It does this without having to first create an intermediate file on the local filesystem; instead, it uses a sophisticated multithreaded buffering scheme to write files directly into HPSS, thereby achieving a high rate of performance. &lt;br /&gt;
&lt;br /&gt;
=== '''CAUTION''' ===&lt;br /&gt;
* Files larger than 68MB cannot be stored in an htar archive.&lt;br /&gt;
* HTAR archives cannot contain more than 1 million files.&lt;br /&gt;
* Check the HTAR exit code and log file before removing any files from the active filesystems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== HTAR Usage ===&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the Archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;br /&gt;
&lt;br /&gt;
== '''More detailed examples''' ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&lt;br /&gt;
qsub -I -q archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away&lt;br /&gt;
cd put-away&lt;br /&gt;
put /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
put /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# create a tarball on-the-fly of the finished-job3 directory&lt;br /&gt;
/usr/local/bin/htar -cf finished-job3.tar /scratch/$USER/workarea/finished-job3/&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
   - Very painful without interactive browsing&lt;br /&gt;
       -Tentative solution: dump all user files to log file and use that as file index&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_dump&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
echo ===========&lt;br /&gt;
echo&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
ls -lUR&lt;br /&gt;
EOF&lt;br /&gt;
echo&lt;br /&gt;
echo ===========&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data restore'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-restore.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N restore&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/restored-from-MARS&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Jan-2010-jobs.tar.gz : forgotten-from-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Feb-2010-jobs.tar.gz : forgotten-from-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
cd /scratch/$USER/restored-from-MARS&lt;br /&gt;
/usr/local/bin/htar -xf finished-job3.tar&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3419</id>
		<title>MARS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3419"/>
		<updated>2011-06-20T17:53:12Z</updated>

		<summary type="html">&lt;p&gt;Knecht: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== '''Massive Archive and Retrieval System''' ==&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The SciNet Massive Archive and Retrieval System (MARS) is a tape backed hierarchical storage management system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data will be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Migration of data into and out of the repository will be under the control of the user who will interact with system using one or both of the &lt;br /&gt;
following utilities:&lt;br /&gt;
* HSI is a client with an ftp-like interface will be used to archive and retrieve large files. It is also useful for browsing the contents of the repository.&lt;br /&gt;
* HTAR is a utility that creates tar format archives resident in the archive. It also creates a separate index file that can be accessed quickly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
User access will be controlled by the job scheduling system of the GPC. Transfer of data into or out of the archive is expected to be scripted and submitted as a batch job.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Guidelines ==&lt;br /&gt;
* HPSS storage space is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into archive files with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the archive is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application return code and check the  log file for errors after all data transfers.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Access Through the Queue System  ==&lt;br /&gt;
All access to the archive system is through the queue system. &lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the archive should be scripted into jobs and submitted to the archive queue.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_file_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
if [ ! $? == 0 ];then&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Staging Data for Analysis ===&lt;br /&gt;
&lt;br /&gt;
Job dependencies can be used to make analysis jobs wait for data staging before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-restore.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-restored-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with the archive system. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents of the archive. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the HPSS file does not exist &lt;br /&gt;
|- &lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local file space on the host system only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of the archive.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More complex operations can be performed using a Here Document.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;-EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI vs. FTP ===&lt;br /&gt;
HSI syntax and usage is very similar to that of FTP. Please note the following information adapted from the HSI man page:&lt;br /&gt;
&lt;br /&gt;
HSI supports several of the commonly used FTP commands, including &amp;quot;dir&amp;quot;,&amp;quot;get&amp;quot;,&amp;quot;ls&amp;quot;,&amp;quot;mdelete&amp;quot;,&amp;quot;mget&amp;quot;,&amp;quot;put&amp;quot;,&amp;quot;mput&amp;quot; and &amp;quot;prompt&amp;quot;, with the following differences:&lt;br /&gt;
&lt;br /&gt;
* The &amp;quot;dir&amp;quot; command is an alias for &amp;quot;ls&amp;quot; in HSI. The &amp;quot;ls&amp;quot; command supports an extensive set of options for displaying files, including wildcard pattern-matching, and the ability to recursively list a directory tree&lt;br /&gt;
* The &amp;quot;put&amp;quot; and &amp;quot;get&amp;quot; family of commands support recursion&lt;br /&gt;
* There are &amp;quot;conditional put&amp;quot; and &amp;quot;conditional&amp;quot; get commands (cput, cget)&lt;br /&gt;
&lt;br /&gt;
* The syntax for renaming local files when storing files to HPSS or retrieving files from HPSS is different than FTP. With HSI, the syntax is always &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     &amp;quot;local_file : HPSS_file&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and multiple such pairs may be specified on a single command line.&lt;br /&gt;
With FTP, the local filename is specified first on a &amp;quot;put&amp;quot; command, and second on a &amp;quot;get&amp;quot; command. &lt;br /&gt;
&lt;br /&gt;
For example, when using HSI to store the local file &amp;quot;file1&amp;quot; as HPSS file &amp;quot;hpss_file1&amp;quot;, then retrieve it back to the local filesystem as &amp;quot;file1.bak&amp;quot;, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 : hpss_file1&lt;br /&gt;
    get file1.bak : hpss_file1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* With FTP, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 hpss_file1 &lt;br /&gt;
    get hpss_file1 file1.bak&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The &amp;quot;m&amp;quot; prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The &amp;quot;m&amp;quot; series of commands are intended to provide a measure of compatibility for FTP users.&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Other HSI Examples === &lt;br /&gt;
&lt;br /&gt;
* Creating tar archive on the fly by piping stdout:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HPSS pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Restore the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' online or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help'''] from the hsi prompt.&lt;br /&gt;
&lt;br /&gt;
== '''HTAR''' ==&lt;br /&gt;
''' Please aggregate small files (&amp;lt;~100MB) into tarballs or htar files. '''&lt;br /&gt;
&lt;br /&gt;
HTAR is a utility that is used for aggregating a set of files from the local file system directly into HPSS, creating a file that conforms to the POSIX TAR specification.&lt;br /&gt;
It does this without having to first create an intermediate file on the local filesystem; instead, it uses a sophisticated multithreaded buffering scheme to write files directly into HPSS, thereby achieving a high rate of performance. &lt;br /&gt;
&lt;br /&gt;
=== '''CAUTION''' ===&lt;br /&gt;
* Files larger than 68MB cannot be stored in an htar archive.&lt;br /&gt;
* HTAR archives cannot contain more than 1 million files.&lt;br /&gt;
* Check the HTAR exit code and log file before removing any files from the active filesystems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== HTAR Usage ===&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*  To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' on a remote FTP server called &amp;quot;blue.pacific.llnl.gov&amp;quot;, creating the tar file in the user’s remote FTP home directory, enter (bonus HTAR functionality to sites outside SciNet):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar -F blue.pacific.llnl.gov file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the Archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;br /&gt;
&lt;br /&gt;
== '''More detailed examples''' ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&lt;br /&gt;
qsub -I -q archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away-and-forget&lt;br /&gt;
cd put-away-and-forget&lt;br /&gt;
put /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
put /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# create a tarball on-the-fly of the finished-job3 directory&lt;br /&gt;
/usr/local/bin/htar -cf finished-job3.tar /scratch/$USER/workarea/finished-job3/&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
   - Very painful without interactive browsing&lt;br /&gt;
       -Tentative solution: dump all user files to log file and use that as file index&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_dump&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
echo ===========&lt;br /&gt;
echo&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
ls -lUR&lt;br /&gt;
EOF&lt;br /&gt;
echo&lt;br /&gt;
echo ===========&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data restore'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-restore.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N restore&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/restored-from-MARS&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Jan-2010-jobs.tar.gz : forgotten-from-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Feb-2010-jobs.tar.gz : forgotten-from-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
cd /scratch/$USER/restored-from-MARS&lt;br /&gt;
/usr/local/bin/htar -xf finished-job3.tar&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3418</id>
		<title>MARS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3418"/>
		<updated>2011-06-20T17:45:34Z</updated>

		<summary type="html">&lt;p&gt;Knecht: more interactive references&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== '''Massive Archive and Retrieval System''' ==&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The SciNet Massive Archive and Retrieval System (MARS) is a tape backed hierarchical storage management system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data will be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Migration of data into and out of the repository will be under the control of the user who will interact with system using one or both of the &lt;br /&gt;
following utilities:&lt;br /&gt;
* HSI is a client with an ftp-like interface will be used to archive and retrieve large files. It is also useful for browsing the contents of the repository.&lt;br /&gt;
* HTAR is a utility that creates tar format archives resident in the archive. It also creates a separate index file that can be accessed quickly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
User access will be controlled by the job scheduling system of the GPC. Transfer of data into or out of the archive is expected to be scripted and submitted as a batch job.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Guidelines ==&lt;br /&gt;
* HPSS storage space is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into archive files with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the archive is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application return code and check the  log file for errors after all data transfers.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Access Through the Queue System  ==&lt;br /&gt;
All access to the archive system is through the queue system. &lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the archive should be scripted into jobs and submitted to the archive queue.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_file_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
if [ ! $? == 0 ];then&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Staging Data for Analysis ===&lt;br /&gt;
&lt;br /&gt;
Job dependencies can be used to make analysis jobs wait for data staging before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-restore.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-restored-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with the archive system. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents of the archive. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the HPSS file does not exist &lt;br /&gt;
|- &lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local file space on the host system only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of the archive.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More complex operations can be performed using a Here Document.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;-EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI vs. FTP ===&lt;br /&gt;
HSI syntax and usage is very similar to that of FTP. Please note the following information adapted from the HSI man page:&lt;br /&gt;
&lt;br /&gt;
HSI supports several of the commonly used FTP commands, including &amp;quot;dir&amp;quot;,&amp;quot;get&amp;quot;,&amp;quot;ls&amp;quot;,&amp;quot;mdelete&amp;quot;,&amp;quot;mget&amp;quot;,&amp;quot;put&amp;quot;,&amp;quot;mput&amp;quot; and &amp;quot;prompt&amp;quot;, with the following differences:&lt;br /&gt;
&lt;br /&gt;
* The &amp;quot;dir&amp;quot; command is an alias for &amp;quot;ls&amp;quot; in HSI. The &amp;quot;ls&amp;quot; command supports an extensive set of options for displaying files, including wildcard pattern-matching, and the ability to recursively list a directory tree&lt;br /&gt;
* The &amp;quot;put&amp;quot; and &amp;quot;get&amp;quot; family of commands support recursion&lt;br /&gt;
* There are &amp;quot;conditional put&amp;quot; and &amp;quot;conditional&amp;quot; get commands (cput, cget)&lt;br /&gt;
&lt;br /&gt;
* The syntax for renaming local files when storing files to HPSS or retrieving files from HPSS is different than FTP. With HSI, the syntax is always &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     &amp;quot;local_file : HPSS_file&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and multiple such pairs may be specified on a single command line.&lt;br /&gt;
With FTP, the local filename is specified first on a &amp;quot;put&amp;quot; command, and second on a &amp;quot;get&amp;quot; command. &lt;br /&gt;
&lt;br /&gt;
For example, when using HSI to store the local file &amp;quot;file1&amp;quot; as HPSS file &amp;quot;hpss_file1&amp;quot;, then retrieve it back to the local filesystem as &amp;quot;file1.bak&amp;quot;, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 : hpss_file1&lt;br /&gt;
    get file1.bak : hpss_file1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* With FTP, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 hpss_file1 &lt;br /&gt;
    get hpss_file1 file1.bak&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The &amp;quot;m&amp;quot; prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The &amp;quot;m&amp;quot; series of commands are intended to provide a measure of compatibility for FTP users.&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Other Examples === &lt;br /&gt;
&lt;br /&gt;
* Creating tar archive on the fly by piping stdout:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HPSS pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Restore the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' online or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help'''] from the hsi prompt.&lt;br /&gt;
&lt;br /&gt;
== '''HTAR''' ==&lt;br /&gt;
''' Please aggregate small file (&amp;lt;~100MB) into tarballs or htar files. '''&lt;br /&gt;
&lt;br /&gt;
HTAR is a utility that is used for aggregating a set of files from the local file system directly into HPSS, creating a file that conforms to the POSIX TAR specification.&lt;br /&gt;
It does this without having to first create an intermediate file on the local filesystem; instead, it uses a sophisticated multithreaded buffering scheme to write files directly into HPSS, thereby achieving a high rate of performance. &lt;br /&gt;
&lt;br /&gt;
=== '''CAUTION''' ===&lt;br /&gt;
* Files larger than 68MB cannot be stored in an htar archive.&lt;br /&gt;
* HTAR archives cannot contain more than 1 million files.&lt;br /&gt;
* Check the HTAR exit code and log file before removing any files from the active filesystems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== HTAR Usage ===&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*  To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' on a remote FTP server called &amp;quot;blue.pacific.llnl.gov&amp;quot;, creating the tar file in the user’s remote FTP home directory, enter (bonus HTAR functionality to sites outside SciNet):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar -F blue.pacific.llnl.gov file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the Archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;br /&gt;
&lt;br /&gt;
== '''More detailed examples''' ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&lt;br /&gt;
qsub -I -q archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away-and-forget&lt;br /&gt;
cd put-away-and-forget&lt;br /&gt;
put /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
put /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# create a tarball on-the-fly of the finished-job3 directory&lt;br /&gt;
/usr/local/bin/htar -cf finished-job3.tar /scratch/$USER/workarea/finished-job3/&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
   - Very painful without interactive browsing&lt;br /&gt;
       -Tentative solution: dump all user files to log file and use that as file index&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_dump&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
echo ===========&lt;br /&gt;
echo&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
ls -lUR&lt;br /&gt;
EOF&lt;br /&gt;
echo&lt;br /&gt;
echo ===========&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data restore'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-restore.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N restore&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/restored-from-MARS&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Jan-2010-jobs.tar.gz : forgotten-from-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Feb-2010-jobs.tar.gz : forgotten-from-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
cd /scratch/$USER/restored-from-MARS&lt;br /&gt;
/usr/local/bin/htar -xf finished-job3.tar&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3417</id>
		<title>MARS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3417"/>
		<updated>2011-06-20T17:42:16Z</updated>

		<summary type="html">&lt;p&gt;Knecht: htar notes&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== '''Massive Archive and Retrieval System''' ==&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The SciNet Massive Archive and Retrieval System (MARS) is a tape backed hierarchical storage management system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data will be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Migration of data into and out of the repository will be under the control of the user who will interact with system using one or both of the &lt;br /&gt;
following utilities:&lt;br /&gt;
* HSI is a client with an ftp-like interface will be used to archive and retrieve large files. It is also useful for browsing the contents of the repository.&lt;br /&gt;
* HTAR is a utility that creates tar format archives resident in the archive. It also creates a separate index file that can be accessed quickly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
User access will be controlled by the job scheduling system of the GPC. An interactive session can be requested that will allow a user to list, rearrange or remove files with the HSI client. Transfer of data into or out of the archive is expected to be scripted and submitted as a batch job.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Guidelines ==&lt;br /&gt;
* HPSS storage space is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into archive files with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the archive is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application return code and check the  log file for errors after all data transfers.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Access Through the Queue System  ==&lt;br /&gt;
All access to the archive system is through the queue system. &lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the archive should be scripted into jobs and submitted to the archive queue.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_file_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
if [ ! $? == 0 ];then&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Staging Data for Analysis ===&lt;br /&gt;
&lt;br /&gt;
Job dependencies can be used to make analysis jobs wait for data staging before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-restore.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-restored-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with the archive system. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents of the archive. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the HPSS file does not exist &lt;br /&gt;
|- &lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local file space on the host system only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of the archive.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More complex operations can be performed using a Here Document.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;-EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI vs. FTP ===&lt;br /&gt;
HSI syntax and usage is very similar to that of FTP. Please note the following information adapted from the HSI man page:&lt;br /&gt;
&lt;br /&gt;
HSI supports several of the commonly used FTP commands, including &amp;quot;dir&amp;quot;,&amp;quot;get&amp;quot;,&amp;quot;ls&amp;quot;,&amp;quot;mdelete&amp;quot;,&amp;quot;mget&amp;quot;,&amp;quot;put&amp;quot;,&amp;quot;mput&amp;quot; and &amp;quot;prompt&amp;quot;, with the following differences:&lt;br /&gt;
&lt;br /&gt;
* The &amp;quot;dir&amp;quot; command is an alias for &amp;quot;ls&amp;quot; in HSI. The &amp;quot;ls&amp;quot; command supports an extensive set of options for displaying files, including wildcard pattern-matching, and the ability to recursively list a directory tree&lt;br /&gt;
* The &amp;quot;put&amp;quot; and &amp;quot;get&amp;quot; family of commands support recursion&lt;br /&gt;
* There are &amp;quot;conditional put&amp;quot; and &amp;quot;conditional&amp;quot; get commands (cput, cget)&lt;br /&gt;
&lt;br /&gt;
* The syntax for renaming local files when storing files to HPSS or retrieving files from HPSS is different than FTP. With HSI, the syntax is always &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     &amp;quot;local_file : HPSS_file&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and multiple such pairs may be specified on a single command line.&lt;br /&gt;
With FTP, the local filename is specified first on a &amp;quot;put&amp;quot; command, and second on a &amp;quot;get&amp;quot; command. &lt;br /&gt;
&lt;br /&gt;
For example, when using HSI to store the local file &amp;quot;file1&amp;quot; as HPSS file &amp;quot;hpss_file1&amp;quot;, then retrieve it back to the local filesystem as &amp;quot;file1.bak&amp;quot;, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 : hpss_file1&lt;br /&gt;
    get file1.bak : hpss_file1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* With FTP, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 hpss_file1 &lt;br /&gt;
    get hpss_file1 file1.bak&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The &amp;quot;m&amp;quot; prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The &amp;quot;m&amp;quot; series of commands are intended to provide a measure of compatibility for FTP users.&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Other Examples === &lt;br /&gt;
&lt;br /&gt;
* Creating tar archive on the fly by piping stdout:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HPSS pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Restore the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' online or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help'''] from the hsi prompt.&lt;br /&gt;
&lt;br /&gt;
== '''HTAR''' ==&lt;br /&gt;
''' Please aggregate small file (&amp;lt;~100MB) into tarballs or htar files. '''&lt;br /&gt;
&lt;br /&gt;
HTAR is a utility that is used for aggregating a set of files from the local file system directly into HPSS, creating a file that conforms to the POSIX TAR specification.&lt;br /&gt;
It does this without having to first create an intermediate file on the local filesystem; instead, it uses a sophisticated multithreaded buffering scheme to write files directly into HPSS, thereby achieving a high rate of performance. &lt;br /&gt;
&lt;br /&gt;
=== '''CAUTION''' ===&lt;br /&gt;
* Files larger than 68MB cannot be stored in an htar archive.&lt;br /&gt;
* HTAR archives cannot contain more than 1 million files.&lt;br /&gt;
* Check the HTAR exit code and log file before removing any files from the active filesystems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== HTAR Usage ===&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*  To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' on a remote FTP server called &amp;quot;blue.pacific.llnl.gov&amp;quot;, creating the tar file in the user’s remote FTP home directory, enter (bonus HTAR functionality to sites outside SciNet):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar -F blue.pacific.llnl.gov file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the Archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;br /&gt;
&lt;br /&gt;
== '''More detailed examples''' ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&lt;br /&gt;
qsub -I -q archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away-and-forget&lt;br /&gt;
cd put-away-and-forget&lt;br /&gt;
put /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
put /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# create a tarball on-the-fly of the finished-job3 directory&lt;br /&gt;
/usr/local/bin/htar -cf finished-job3.tar /scratch/$USER/workarea/finished-job3/&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
   - Very painful without interactive browsing&lt;br /&gt;
       -Tentative solution: dump all user files to log file and use that as file index&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_dump&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
echo ===========&lt;br /&gt;
echo&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
ls -lUR&lt;br /&gt;
EOF&lt;br /&gt;
echo&lt;br /&gt;
echo ===========&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data restore'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-restore.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N restore&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/restored-from-MARS&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Jan-2010-jobs.tar.gz : forgotten-from-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Feb-2010-jobs.tar.gz : forgotten-from-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
cd /scratch/$USER/restored-from-MARS&lt;br /&gt;
/usr/local/bin/htar -xf finished-job3.tar&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3416</id>
		<title>MARS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3416"/>
		<updated>2011-06-20T17:35:15Z</updated>

		<summary type="html">&lt;p&gt;Knecht: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== '''Massive Archive and Retrieval System''' ==&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The SciNet Massive Archive and Retrieval System (MARS) is a tape backed hierarchical storage management system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data will be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Migration of data into and out of the repository will be under the control of the user who will interact with system using one or both of the &lt;br /&gt;
following utilities:&lt;br /&gt;
* HSI is a client with an ftp-like interface will be used to archive and retrieve large files. It is also useful for browsing the contents of the repository.&lt;br /&gt;
* HTAR is a utility that creates tar format archives resident in the archive. It also creates a separate index file that can be accessed quickly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
User access will be controlled by the job scheduling system of the GPC. An interactive session can be requested that will allow a user to list, rearrange or remove files with the HSI client. Transfer of data into or out of the archive is expected to be scripted and submitted as a batch job.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Guidelines ==&lt;br /&gt;
* HPSS storage space is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into archive files with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the archive is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application return code and check the  log file for errors after all data transfers.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Access Through the Queue System  ==&lt;br /&gt;
All access to the archive system is through the queue system. &lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the archive should be scripted into jobs and submitted to the archive queue.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_file_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
if [ ! $? == 0 ];then&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Staging Data for Analysis ===&lt;br /&gt;
&lt;br /&gt;
Job dependencies can be used to make analysis jobs wait for data staging before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-restore.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-restored-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with the archive system. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents of the archive. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the HPSS file does not exist &lt;br /&gt;
|- &lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local file space on the host system only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of the archive.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More complex operations can be performed using a Here Document.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;-EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI vs. FTP ===&lt;br /&gt;
HSI syntax and usage is very similar to that of FTP. Please note the following information adapted from the HSI man page:&lt;br /&gt;
&lt;br /&gt;
HSI supports several of the commonly used FTP commands, including &amp;quot;dir&amp;quot;,&amp;quot;get&amp;quot;,&amp;quot;ls&amp;quot;,&amp;quot;mdelete&amp;quot;,&amp;quot;mget&amp;quot;,&amp;quot;put&amp;quot;,&amp;quot;mput&amp;quot; and &amp;quot;prompt&amp;quot;, with the following differences:&lt;br /&gt;
&lt;br /&gt;
* The &amp;quot;dir&amp;quot; command is an alias for &amp;quot;ls&amp;quot; in HSI. The &amp;quot;ls&amp;quot; command supports an extensive set of options for displaying files, including wildcard pattern-matching, and the ability to recursively list a directory tree&lt;br /&gt;
* The &amp;quot;put&amp;quot; and &amp;quot;get&amp;quot; family of commands support recursion&lt;br /&gt;
* There are &amp;quot;conditional put&amp;quot; and &amp;quot;conditional&amp;quot; get commands (cput, cget)&lt;br /&gt;
&lt;br /&gt;
* The syntax for renaming local files when storing files to HPSS or retrieving files from HPSS is different than FTP. With HSI, the syntax is always &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     &amp;quot;local_file : HPSS_file&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and multiple such pairs may be specified on a single command line.&lt;br /&gt;
With FTP, the local filename is specified first on a &amp;quot;put&amp;quot; command, and second on a &amp;quot;get&amp;quot; command. &lt;br /&gt;
&lt;br /&gt;
For example, when using HSI to store the local file &amp;quot;file1&amp;quot; as HPSS file &amp;quot;hpss_file1&amp;quot;, then retrieve it back to the local filesystem as &amp;quot;file1.bak&amp;quot;, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 : hpss_file1&lt;br /&gt;
    get file1.bak : hpss_file1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* With FTP, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 hpss_file1 &lt;br /&gt;
    get hpss_file1 file1.bak&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The &amp;quot;m&amp;quot; prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The &amp;quot;m&amp;quot; series of commands are intended to provide a measure of compatibility for FTP users.&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Other Examples === &lt;br /&gt;
&lt;br /&gt;
* Creating tar archive on the fly by piping stdout:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HPSS pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Restore the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' online or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help'''] from the hsi prompt.&lt;br /&gt;
&lt;br /&gt;
== '''HTAR''' ==&lt;br /&gt;
''' Please aggregate small file (&amp;lt;~100MB) into tarballs or htar files. '''&lt;br /&gt;
&lt;br /&gt;
HTAR is a utility that is used for aggregating a set of files from the local file system directly into HPSS, creating a file that conforms to the POSIX TAR specification.&lt;br /&gt;
It does this without having to first create an intermediate file on the local filesystem; instead, it uses a sophisticated multithreaded buffering scheme to write files directly into HPSS, thereby achieving a high rate of performance. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*  To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' on a remote FTP server called &amp;quot;blue.pacific.llnl.gov&amp;quot;, creating the tar file in the user’s remote FTP home directory, enter (bonus HTAR functionality to sites outside SciNet):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar -F blue.pacific.llnl.gov file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the Archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;br /&gt;
&lt;br /&gt;
== '''More detailed examples''' ==&lt;br /&gt;
[[GPC_Quickstart|Submitting_A_Batch_Job]]&lt;br /&gt;
* gpc-archive01 is part of the gpc queuing system under torque/moab&lt;br /&gt;
* Currently it is setup to share the node with up to 12 jobs at one time&lt;br /&gt;
* default parameters ( -l nodes=1:ppn=1,walltime=48:00:00)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&lt;br /&gt;
qsub -I -q archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away-and-forget&lt;br /&gt;
cd put-away-and-forget&lt;br /&gt;
put /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
put /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# create a tarball on-the-fly of the finished-job3 directory&lt;br /&gt;
/usr/local/bin/htar -cf finished-job3.tar /scratch/$USER/workarea/finished-job3/&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
   - Very painful without interactive browsing&lt;br /&gt;
       -Tentative solution: dump all user files to log file and use that as file index&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_dump&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
echo ===========&lt;br /&gt;
echo&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
ls -lUR&lt;br /&gt;
EOF&lt;br /&gt;
echo&lt;br /&gt;
echo ===========&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data restore'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-restore.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N restore&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/restored-from-MARS&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Jan-2010-jobs.tar.gz : forgotten-from-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Feb-2010-jobs.tar.gz : forgotten-from-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
cd /scratch/$USER/restored-from-MARS&lt;br /&gt;
/usr/local/bin/htar -xf finished-job3.tar&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3415</id>
		<title>MARS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3415"/>
		<updated>2011-06-20T17:29:04Z</updated>

		<summary type="html">&lt;p&gt;Knecht: /* Examples */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== '''Massive Archive and Retrieval System''' ==&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The SciNet Massive Archive and Retrieval System (MARS) is a tape backed hierarchical storage management system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data will be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Migration of data into and out of the repository will be under the control of the user who will interact with system using one or both of the &lt;br /&gt;
following utilities:&lt;br /&gt;
* HSI is a client with an ftp-like interface will be used to archive and retrieve large files. It is also useful for browsing the contents of the repository.&lt;br /&gt;
* HTAR is a utility that creates tar format archives resident in the archive. It also creates a separate index file that can be accessed quickly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
User access will be controlled by the job scheduling system of the GPC. An interactive session can be requested that will allow a user to list, rearrange or remove files with the HSI client. Transfer of data into or out of the archive is expected to be scripted and submitted as a batch job.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Guidelines ==&lt;br /&gt;
* HPSS storage space is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into archive files with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the archive is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application return code and check the  log file for errors after all data transfers.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Access Through the Queue System  ==&lt;br /&gt;
All access to the archive system is through the queue system. &lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the archive should be scripted into jobs and submitted to the archive queue.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_file_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
if [ ! $? == 0 ];then&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Staging Data for Analysis ===&lt;br /&gt;
&lt;br /&gt;
Job dependencies can be used to make analysis jobs wait for data staging before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-restore.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-restored-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with the archive system. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents of the archive. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the HPSS file does not exist &lt;br /&gt;
|- &lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local file space on the host system only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of the archive.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More complex operations can be performed using a Here Document.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;-EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI vs. FTP ===&lt;br /&gt;
HSI syntax and usage is very similar to that of FTP. Please note the following information adapted from the HSI man page:&lt;br /&gt;
&lt;br /&gt;
HSI supports several of the commonly used FTP commands, including &amp;quot;dir&amp;quot;,&amp;quot;get&amp;quot;,&amp;quot;ls&amp;quot;,&amp;quot;mdelete&amp;quot;,&amp;quot;mget&amp;quot;,&amp;quot;put&amp;quot;,&amp;quot;mput&amp;quot; and &amp;quot;prompt&amp;quot;, with the following differences:&lt;br /&gt;
&lt;br /&gt;
* The &amp;quot;dir&amp;quot; command is an alias for &amp;quot;ls&amp;quot; in HSI. The &amp;quot;ls&amp;quot; command supports an extensive set of options for displaying files, including wildcard pattern-matching, and the ability to recursively list a directory tree&lt;br /&gt;
* The &amp;quot;put&amp;quot; and &amp;quot;get&amp;quot; family of commands support recursion&lt;br /&gt;
* There are &amp;quot;conditional put&amp;quot; and &amp;quot;conditional&amp;quot; get commands (cput, cget)&lt;br /&gt;
&lt;br /&gt;
* The syntax for renaming local files when storing files to HPSS or retrieving files from HPSS is different than FTP. With HSI, the syntax is always &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     &amp;quot;local_file : HPSS_file&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and multiple such pairs may be specified on a single command line.&lt;br /&gt;
With FTP, the local filename is specified first on a &amp;quot;put&amp;quot; command, and second on a &amp;quot;get&amp;quot; command. &lt;br /&gt;
&lt;br /&gt;
For example, when using HSI to store the local file &amp;quot;file1&amp;quot; as HPSS file &amp;quot;hpss_file1&amp;quot;, then retrieve it back to the local filesystem as &amp;quot;file1.bak&amp;quot;, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 : hpss_file1&lt;br /&gt;
    get file1.bak : hpss_file1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* With FTP, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 hpss_file1 &lt;br /&gt;
    get hpss_file1 file1.bak&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The &amp;quot;m&amp;quot; prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The &amp;quot;m&amp;quot; series of commands are intended to provide a measure of compatibility for FTP users.&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Other Examples === &lt;br /&gt;
&lt;br /&gt;
* Creating tar archive on the fly by piping stdout:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HPSS pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Restore the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' online or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help'''] from the hsi prompt.&lt;br /&gt;
&lt;br /&gt;
== '''Using HTAR''' ==&lt;br /&gt;
&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*  To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' on a remote FTP server called &amp;quot;blue.pacific.llnl.gov&amp;quot;, creating the tar file in the user’s remote FTP home directory, enter (bonus HTAR functionality to sites outside SciNet):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar -F blue.pacific.llnl.gov file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the Archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;br /&gt;
&lt;br /&gt;
== '''More detailed examples''' ==&lt;br /&gt;
[[GPC_Quickstart|Submitting_A_Batch_Job]]&lt;br /&gt;
* gpc-archive01 is part of the gpc queuing system under torque/moab&lt;br /&gt;
* Currently it is setup to share the node with up to 12 jobs at one time&lt;br /&gt;
* default parameters ( -l nodes=1:ppn=1,walltime=48:00:00)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&lt;br /&gt;
qsub -I -q archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away-and-forget&lt;br /&gt;
cd put-away-and-forget&lt;br /&gt;
put /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
put /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# create a tarball on-the-fly of the finished-job3 directory&lt;br /&gt;
/usr/local/bin/htar -cf finished-job3.tar /scratch/$USER/workarea/finished-job3/&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
   - Very painful without interactive browsing&lt;br /&gt;
       -Tentative solution: dump all user files to log file and use that as file index&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_dump&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
echo ===========&lt;br /&gt;
echo&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
ls -lUR&lt;br /&gt;
EOF&lt;br /&gt;
echo&lt;br /&gt;
echo ===========&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data restore'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-restore.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N restore&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/restored-from-MARS&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Jan-2010-jobs.tar.gz : forgotten-from-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Feb-2010-jobs.tar.gz : forgotten-from-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
cd /scratch/$USER/restored-from-MARS&lt;br /&gt;
/usr/local/bin/htar -xf finished-job3.tar&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3414</id>
		<title>MARS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3414"/>
		<updated>2011-06-20T17:27:38Z</updated>

		<summary type="html">&lt;p&gt;Knecht: cut interactive batch access section&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== '''Massive Archive and Retrieval System''' ==&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The SciNet Massive Archive and Retrieval System (MARS) is a tape backed hierarchical storage management system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data will be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Migration of data into and out of the repository will be under the control of the user who will interact with system using one or both of the &lt;br /&gt;
following utilities:&lt;br /&gt;
* HSI is a client with an ftp-like interface will be used to archive and retrieve large files. It is also useful for browsing the contents of the repository.&lt;br /&gt;
* HTAR is a utility that creates tar format archives resident in the archive. It also creates a separate index file that can be accessed quickly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
User access will be controlled by the job scheduling system of the GPC. An interactive session can be requested that will allow a user to list, rearrange or remove files with the HSI client. Transfer of data into or out of the archive is expected to be scripted and submitted as a batch job.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Guidelines ==&lt;br /&gt;
* HPSS storage space is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into archive files with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the archive is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application return code and check the  log file for errors after all data transfers.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Access Through the Queue System  ==&lt;br /&gt;
All access to the archive system is through the queue system. &lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the archive should be scripted into jobs and submitted to the archive queue.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_file_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
if [ ! $? == 0 ];then&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Staging Data for Analysis ===&lt;br /&gt;
&lt;br /&gt;
Job dependencies can be used to make analysis jobs wait for data staging before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-restore.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-restored-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with the archive system. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents of the archive. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the HPSS file does not exist &lt;br /&gt;
|- &lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local file space on the host system only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of the archive.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More complex operations can be performed using a Here Document.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;-EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI vs. FTP ===&lt;br /&gt;
HSI syntax and usage is very similar to that of FTP. Please note the following information adapted from the HSI man page:&lt;br /&gt;
&lt;br /&gt;
HSI supports several of the commonly used FTP commands, including &amp;quot;dir&amp;quot;,&amp;quot;get&amp;quot;,&amp;quot;ls&amp;quot;,&amp;quot;mdelete&amp;quot;,&amp;quot;mget&amp;quot;,&amp;quot;put&amp;quot;,&amp;quot;mput&amp;quot; and &amp;quot;prompt&amp;quot;, with the following differences:&lt;br /&gt;
&lt;br /&gt;
* The &amp;quot;dir&amp;quot; command is an alias for &amp;quot;ls&amp;quot; in HSI. The &amp;quot;ls&amp;quot; command supports an extensive set of options for displaying files, including wildcard pattern-matching, and the ability to recursively list a directory tree&lt;br /&gt;
* The &amp;quot;put&amp;quot; and &amp;quot;get&amp;quot; family of commands support recursion&lt;br /&gt;
* There are &amp;quot;conditional put&amp;quot; and &amp;quot;conditional&amp;quot; get commands (cput, cget)&lt;br /&gt;
&lt;br /&gt;
* The syntax for renaming local files when storing files to HPSS or retrieving files from HPSS is different than FTP. With HSI, the syntax is always &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     &amp;quot;local_file : HPSS_file&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and multiple such pairs may be specified on a single command line.&lt;br /&gt;
With FTP, the local filename is specified first on a &amp;quot;put&amp;quot; command, and second on a &amp;quot;get&amp;quot; command. &lt;br /&gt;
&lt;br /&gt;
For example, when using HSI to store the local file &amp;quot;file1&amp;quot; as HPSS file &amp;quot;hpss_file1&amp;quot;, then retrieve it back to the local filesystem as &amp;quot;file1.bak&amp;quot;, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 : hpss_file1&lt;br /&gt;
    get file1.bak : hpss_file1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* With FTP, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 hpss_file1 &lt;br /&gt;
    get hpss_file1 file1.bak&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The &amp;quot;m&amp;quot; prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The &amp;quot;m&amp;quot; series of commands are intended to provide a measure of compatibility for FTP users.&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Examples === &lt;br /&gt;
&lt;br /&gt;
* Save a &amp;quot;tar file&amp;quot; of C source programs and header files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HPSS pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Restore the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' online or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help'''] from the hsi prompt.&lt;br /&gt;
&lt;br /&gt;
== '''Using HTAR''' ==&lt;br /&gt;
&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*  To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' on a remote FTP server called &amp;quot;blue.pacific.llnl.gov&amp;quot;, creating the tar file in the user’s remote FTP home directory, enter (bonus HTAR functionality to sites outside SciNet):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar -F blue.pacific.llnl.gov file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the Archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;br /&gt;
&lt;br /&gt;
== '''More detailed examples''' ==&lt;br /&gt;
[[GPC_Quickstart|Submitting_A_Batch_Job]]&lt;br /&gt;
* gpc-archive01 is part of the gpc queuing system under torque/moab&lt;br /&gt;
* Currently it is setup to share the node with up to 12 jobs at one time&lt;br /&gt;
* default parameters ( -l nodes=1:ppn=1,walltime=48:00:00)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&lt;br /&gt;
qsub -I -q archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away-and-forget&lt;br /&gt;
cd put-away-and-forget&lt;br /&gt;
put /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
put /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# create a tarball on-the-fly of the finished-job3 directory&lt;br /&gt;
/usr/local/bin/htar -cf finished-job3.tar /scratch/$USER/workarea/finished-job3/&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
   - Very painful without interactive browsing&lt;br /&gt;
       -Tentative solution: dump all user files to log file and use that as file index&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_dump&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
echo ===========&lt;br /&gt;
echo&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
ls -lUR&lt;br /&gt;
EOF&lt;br /&gt;
echo&lt;br /&gt;
echo ===========&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data restore'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-restore.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N restore&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/restored-from-MARS&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Jan-2010-jobs.tar.gz : forgotten-from-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Feb-2010-jobs.tar.gz : forgotten-from-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
cd /scratch/$USER/restored-from-MARS&lt;br /&gt;
/usr/local/bin/htar -xf finished-job3.tar&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3413</id>
		<title>MARS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3413"/>
		<updated>2011-06-20T17:27:08Z</updated>

		<summary type="html">&lt;p&gt;Knecht: purge interactive examples&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== '''Massive Archive and Retrieval System''' ==&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The SciNet Massive Archive and Retrieval System (MARS) is a tape backed hierarchical storage management system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data will be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Migration of data into and out of the repository will be under the control of the user who will interact with system using one or both of the &lt;br /&gt;
following utilities:&lt;br /&gt;
* HSI is a client with an ftp-like interface will be used to archive and retrieve large files. It is also useful for browsing the contents of the repository.&lt;br /&gt;
* HTAR is a utility that creates tar format archives resident in the archive. It also creates a separate index file that can be accessed quickly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
User access will be controlled by the job scheduling system of the GPC. An interactive session can be requested that will allow a user to list, rearrange or remove files with the HSI client. Transfer of data into or out of the archive is expected to be scripted and submitted as a batch job.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Guidelines ==&lt;br /&gt;
* HPSS storage space is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into archive files with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the archive is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application return code and check the  log file for errors after all data transfers.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Access Through the Queue System  ==&lt;br /&gt;
All access to the archive system is through the queue system. &lt;br /&gt;
=== Interactive Access ===&lt;br /&gt;
 To get an interactive session and use the HSI client, use the -I option of qsub.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub -q archive -I&lt;br /&gt;
hpss-archive01:~ $ hsi ls&lt;br /&gt;
******************************************************************&lt;br /&gt;
*   Welcome to the Massive Archive and Restore System @ SciNet   *&lt;br /&gt;
*                                                                *&lt;br /&gt;
*        Contact Information: support@scinet.utoronto.ca         *&lt;br /&gt;
*  NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead  *&lt;br /&gt;
******************************************************************&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the archive should be scripted into jobs and submitted to the archive queue.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_file_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
if [ ! $? == 0 ];then&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Staging Data for Analysis ===&lt;br /&gt;
&lt;br /&gt;
Job dependencies can be used to make analysis jobs wait for data staging before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-restore.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-restored-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with the archive system. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents of the archive. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the HPSS file does not exist &lt;br /&gt;
|- &lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local file space on the host system only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of the archive.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More complex operations can be performed using a Here Document.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;-EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI vs. FTP ===&lt;br /&gt;
HSI syntax and usage is very similar to that of FTP. Please note the following information adapted from the HSI man page:&lt;br /&gt;
&lt;br /&gt;
HSI supports several of the commonly used FTP commands, including &amp;quot;dir&amp;quot;,&amp;quot;get&amp;quot;,&amp;quot;ls&amp;quot;,&amp;quot;mdelete&amp;quot;,&amp;quot;mget&amp;quot;,&amp;quot;put&amp;quot;,&amp;quot;mput&amp;quot; and &amp;quot;prompt&amp;quot;, with the following differences:&lt;br /&gt;
&lt;br /&gt;
* The &amp;quot;dir&amp;quot; command is an alias for &amp;quot;ls&amp;quot; in HSI. The &amp;quot;ls&amp;quot; command supports an extensive set of options for displaying files, including wildcard pattern-matching, and the ability to recursively list a directory tree&lt;br /&gt;
* The &amp;quot;put&amp;quot; and &amp;quot;get&amp;quot; family of commands support recursion&lt;br /&gt;
* There are &amp;quot;conditional put&amp;quot; and &amp;quot;conditional&amp;quot; get commands (cput, cget)&lt;br /&gt;
&lt;br /&gt;
* The syntax for renaming local files when storing files to HPSS or retrieving files from HPSS is different than FTP. With HSI, the syntax is always &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     &amp;quot;local_file : HPSS_file&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and multiple such pairs may be specified on a single command line.&lt;br /&gt;
With FTP, the local filename is specified first on a &amp;quot;put&amp;quot; command, and second on a &amp;quot;get&amp;quot; command. &lt;br /&gt;
&lt;br /&gt;
For example, when using HSI to store the local file &amp;quot;file1&amp;quot; as HPSS file &amp;quot;hpss_file1&amp;quot;, then retrieve it back to the local filesystem as &amp;quot;file1.bak&amp;quot;, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 : hpss_file1&lt;br /&gt;
    get file1.bak : hpss_file1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* With FTP, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 hpss_file1 &lt;br /&gt;
    get hpss_file1 file1.bak&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The &amp;quot;m&amp;quot; prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The &amp;quot;m&amp;quot; series of commands are intended to provide a measure of compatibility for FTP users.&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Examples === &lt;br /&gt;
&lt;br /&gt;
* Save a &amp;quot;tar file&amp;quot; of C source programs and header files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HPSS pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Restore the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' online or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help'''] from the hsi prompt.&lt;br /&gt;
&lt;br /&gt;
== '''Using HTAR''' ==&lt;br /&gt;
&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*  To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' on a remote FTP server called &amp;quot;blue.pacific.llnl.gov&amp;quot;, creating the tar file in the user’s remote FTP home directory, enter (bonus HTAR functionality to sites outside SciNet):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar -F blue.pacific.llnl.gov file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the Archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;br /&gt;
&lt;br /&gt;
== '''More detailed examples''' ==&lt;br /&gt;
[[GPC_Quickstart|Submitting_A_Batch_Job]]&lt;br /&gt;
* gpc-archive01 is part of the gpc queuing system under torque/moab&lt;br /&gt;
* Currently it is setup to share the node with up to 12 jobs at one time&lt;br /&gt;
* default parameters ( -l nodes=1:ppn=1,walltime=48:00:00)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&lt;br /&gt;
qsub -I -q archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away-and-forget&lt;br /&gt;
cd put-away-and-forget&lt;br /&gt;
put /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
put /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# create a tarball on-the-fly of the finished-job3 directory&lt;br /&gt;
/usr/local/bin/htar -cf finished-job3.tar /scratch/$USER/workarea/finished-job3/&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
   - Very painful without interactive browsing&lt;br /&gt;
       -Tentative solution: dump all user files to log file and use that as file index&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_dump&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
echo ===========&lt;br /&gt;
echo&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
ls -lUR&lt;br /&gt;
EOF&lt;br /&gt;
echo&lt;br /&gt;
echo ===========&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data restore'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-restore.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N restore&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/restored-from-MARS&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Jan-2010-jobs.tar.gz : forgotten-from-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Feb-2010-jobs.tar.gz : forgotten-from-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
cd /scratch/$USER/restored-from-MARS&lt;br /&gt;
/usr/local/bin/htar -xf finished-job3.tar&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3412</id>
		<title>MARS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3412"/>
		<updated>2011-06-20T17:25:48Z</updated>

		<summary type="html">&lt;p&gt;Knecht: /* HSI vs. FTP */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== '''Massive Archive and Retrieval System''' ==&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The SciNet Massive Archive and Retrieval System (MARS) is a tape backed hierarchical storage management system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data will be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Migration of data into and out of the repository will be under the control of the user who will interact with system using one or both of the &lt;br /&gt;
following utilities:&lt;br /&gt;
* HSI is a client with an ftp-like interface will be used to archive and retrieve large files. It is also useful for browsing the contents of the repository.&lt;br /&gt;
* HTAR is a utility that creates tar format archives resident in the archive. It also creates a separate index file that can be accessed quickly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
User access will be controlled by the job scheduling system of the GPC. An interactive session can be requested that will allow a user to list, rearrange or remove files with the HSI client. Transfer of data into or out of the archive is expected to be scripted and submitted as a batch job.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Guidelines ==&lt;br /&gt;
* HPSS storage space is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into archive files with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the archive is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application return code and check the  log file for errors after all data transfers.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Access Through the Queue System  ==&lt;br /&gt;
All access to the archive system is through the queue system. &lt;br /&gt;
=== Interactive Access ===&lt;br /&gt;
 To get an interactive session and use the HSI client, use the -I option of qsub.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub -q archive -I&lt;br /&gt;
hpss-archive01:~ $ hsi ls&lt;br /&gt;
******************************************************************&lt;br /&gt;
*   Welcome to the Massive Archive and Restore System @ SciNet   *&lt;br /&gt;
*                                                                *&lt;br /&gt;
*        Contact Information: support@scinet.utoronto.ca         *&lt;br /&gt;
*  NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead  *&lt;br /&gt;
******************************************************************&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the archive should be scripted into jobs and submitted to the archive queue.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_file_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
if [ ! $? == 0 ];then&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Staging Data for Analysis ===&lt;br /&gt;
&lt;br /&gt;
Job dependencies can be used to make analysis jobs wait for data staging before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-restore.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-restored-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with the archive system. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents of the archive. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the HPSS file does not exist &lt;br /&gt;
|- &lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local file space on the host system only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of the archive.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More complex operations can be performed using a Here Document.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;-EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI vs. FTP ===&lt;br /&gt;
HSI syntax and usage is very similar to that of FTP. Please note the following information adapted from the HSI man page:&lt;br /&gt;
&lt;br /&gt;
HSI supports several of the commonly used FTP commands, including &amp;quot;dir&amp;quot;,&amp;quot;get&amp;quot;,&amp;quot;ls&amp;quot;,&amp;quot;mdelete&amp;quot;,&amp;quot;mget&amp;quot;,&amp;quot;put&amp;quot;,&amp;quot;mput&amp;quot; and &amp;quot;prompt&amp;quot;, with the following differences:&lt;br /&gt;
&lt;br /&gt;
* The &amp;quot;dir&amp;quot; command is an alias for &amp;quot;ls&amp;quot; in HSI. The &amp;quot;ls&amp;quot; command supports an extensive set of options for displaying files, including wildcard pattern-matching, and the ability to recursively list a directory tree&lt;br /&gt;
* The &amp;quot;put&amp;quot; and &amp;quot;get&amp;quot; family of commands support recursion&lt;br /&gt;
* There are &amp;quot;conditional put&amp;quot; and &amp;quot;conditional&amp;quot; get commands (cput, cget)&lt;br /&gt;
&lt;br /&gt;
* The syntax for renaming local files when storing files to HPSS or retrieving files from HPSS is different than FTP. With HSI, the syntax is always &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     &amp;quot;local_file : HPSS_file&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and multiple such pairs may be specified on a single command line.&lt;br /&gt;
With FTP, the local filename is specified first on a &amp;quot;put&amp;quot; command, and second on a &amp;quot;get&amp;quot; command. &lt;br /&gt;
&lt;br /&gt;
For example, when using HSI to store the local file &amp;quot;file1&amp;quot; as HPSS file &amp;quot;hpss_file1&amp;quot;, then retrieve it back to the local filesystem as &amp;quot;file1.bak&amp;quot;, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 : hpss_file1&lt;br /&gt;
    get file1.bak : hpss_file1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* With FTP, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 hpss_file1 &lt;br /&gt;
    get hpss_file1 file1.bak&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The &amp;quot;m&amp;quot; prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The &amp;quot;m&amp;quot; series of commands are intended to provide a measure of compatibility for FTP users.&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Examples === &lt;br /&gt;
&lt;br /&gt;
* Put a subdirectory ''LargeFiles'' and all its contents recursively. You may use '-u' option to resume a previously disrupted session (as rsync would do).&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;quot;prompt; mput -R -u LargeFiles&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Interactively descend into the ''Source'' directory and move all files which end in &amp;quot;.h&amp;quot; into a sibling directory (ie, a directory at the same level in the tree as &amp;quot;Source&amp;quot;) named &amp;quot;Include&amp;quot;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] cd Source&lt;br /&gt;
    [HSI] mv *.h ../Include&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Delete all files beginning with &amp;quot;m&amp;quot; and ending with 9101 (note that this is an interactive request, not a one-liner request, so the wildcard path does not need quotes to preserve it):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] delete m*9101&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Interactively delete all files beginning with H and ending with a digit, and ask for verification before deleting each such file.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] mdel H*[0-9]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* From a shell, save your local files that begin with the letter &amp;quot;c&amp;quot; (let the UN*X shell resolve the wild-card path pattern in terms of your local files by not enclosing it in quotes):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi mput c*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* From a shell, get all files in the subdirectory ''subdirA'' which begin with the letters &amp;quot;b&amp;quot; or &amp;quot;c&amp;quot; (surrounding the wildcard path in single quotes prevents shells on UNIX systems from processing the wild card pattern):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get ’subdirA/[bc]*’&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Save a &amp;quot;tar file&amp;quot; of C source programs and header files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HPSS pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Restore the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' online or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help'''] from the hsi prompt.&lt;br /&gt;
&lt;br /&gt;
== '''Using HTAR''' ==&lt;br /&gt;
&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*  To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' on a remote FTP server called &amp;quot;blue.pacific.llnl.gov&amp;quot;, creating the tar file in the user’s remote FTP home directory, enter (bonus HTAR functionality to sites outside SciNet):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar -F blue.pacific.llnl.gov file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the Archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;br /&gt;
&lt;br /&gt;
== '''More detailed examples''' ==&lt;br /&gt;
[[GPC_Quickstart|Submitting_A_Batch_Job]]&lt;br /&gt;
* gpc-archive01 is part of the gpc queuing system under torque/moab&lt;br /&gt;
* Currently it is setup to share the node with up to 12 jobs at one time&lt;br /&gt;
* default parameters ( -l nodes=1:ppn=1,walltime=48:00:00)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&lt;br /&gt;
qsub -I -q archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away-and-forget&lt;br /&gt;
cd put-away-and-forget&lt;br /&gt;
put /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
put /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# create a tarball on-the-fly of the finished-job3 directory&lt;br /&gt;
/usr/local/bin/htar -cf finished-job3.tar /scratch/$USER/workarea/finished-job3/&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
   - Very painful without interactive browsing&lt;br /&gt;
       -Tentative solution: dump all user files to log file and use that as file index&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_dump&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
echo ===========&lt;br /&gt;
echo&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
ls -lUR&lt;br /&gt;
EOF&lt;br /&gt;
echo&lt;br /&gt;
echo ===========&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data restore'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-restore.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N restore&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/restored-from-MARS&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Jan-2010-jobs.tar.gz : forgotten-from-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Feb-2010-jobs.tar.gz : forgotten-from-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
cd /scratch/$USER/restored-from-MARS&lt;br /&gt;
/usr/local/bin/htar -xf finished-job3.tar&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3411</id>
		<title>MARS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3411"/>
		<updated>2011-06-20T17:24:49Z</updated>

		<summary type="html">&lt;p&gt;Knecht: copy in ftp vs. hsi notes&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== '''Massive Archive and Retrieval System''' ==&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The SciNet Massive Archive and Retrieval System (MARS) is a tape backed hierarchical storage management system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data will be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Migration of data into and out of the repository will be under the control of the user who will interact with system using one or both of the &lt;br /&gt;
following utilities:&lt;br /&gt;
* HSI is a client with an ftp-like interface will be used to archive and retrieve large files. It is also useful for browsing the contents of the repository.&lt;br /&gt;
* HTAR is a utility that creates tar format archives resident in the archive. It also creates a separate index file that can be accessed quickly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
User access will be controlled by the job scheduling system of the GPC. An interactive session can be requested that will allow a user to list, rearrange or remove files with the HSI client. Transfer of data into or out of the archive is expected to be scripted and submitted as a batch job.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Guidelines ==&lt;br /&gt;
* HPSS storage space is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into archive files with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the archive is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application return code and check the  log file for errors after all data transfers.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Access Through the Queue System  ==&lt;br /&gt;
All access to the archive system is through the queue system. &lt;br /&gt;
=== Interactive Access ===&lt;br /&gt;
 To get an interactive session and use the HSI client, use the -I option of qsub.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub -q archive -I&lt;br /&gt;
hpss-archive01:~ $ hsi ls&lt;br /&gt;
******************************************************************&lt;br /&gt;
*   Welcome to the Massive Archive and Restore System @ SciNet   *&lt;br /&gt;
*                                                                *&lt;br /&gt;
*        Contact Information: support@scinet.utoronto.ca         *&lt;br /&gt;
*  NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead  *&lt;br /&gt;
******************************************************************&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the archive should be scripted into jobs and submitted to the archive queue.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_file_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
if [ ! $? == 0 ];then&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Staging Data for Analysis ===&lt;br /&gt;
&lt;br /&gt;
Job dependencies can be used to make analysis jobs wait for data staging before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-restore.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-restored-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with the archive system. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents of the archive. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the HPSS file does not exist &lt;br /&gt;
|- &lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local file space on the host system only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of the archive.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More complex operations can be performed using a Here Document.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;-EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== HSI vs. FTP ===&lt;br /&gt;
Please note the following information adapted from the HSI man page:&lt;br /&gt;
&lt;br /&gt;
HSI supports several of the commonly used FTP commands, including &amp;quot;dir&amp;quot;,&amp;quot;get&amp;quot;,&amp;quot;ls&amp;quot;,&amp;quot;mdelete&amp;quot;,&amp;quot;mget&amp;quot;,&amp;quot;put&amp;quot;,&amp;quot;mput&amp;quot; and &amp;quot;prompt&amp;quot;, with the following differences:&lt;br /&gt;
&lt;br /&gt;
* The &amp;quot;dir&amp;quot; command is an alias for &amp;quot;ls&amp;quot; in HSI. The &amp;quot;ls&amp;quot; command supports an extensive set of options for displaying files, including wildcard pattern-matching, and the ability to recursively list a directory tree&lt;br /&gt;
* The &amp;quot;put&amp;quot; and &amp;quot;get&amp;quot; family of commands support recursion&lt;br /&gt;
* There are &amp;quot;conditional put&amp;quot; and &amp;quot;conditional&amp;quot; get commands (cput, cget)&lt;br /&gt;
&lt;br /&gt;
* The syntax for renaming local files when storing files to HPSS or retrieving files from HPSS is different than FTP. With HSI, the syntax is always &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
     &amp;quot;local_file : HPSS_file&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and multiple such pairs may be specified on a single command line.&lt;br /&gt;
With FTP, the local filename is specified first on a &amp;quot;put&amp;quot; command, and second on a &amp;quot;get&amp;quot; command. &lt;br /&gt;
&lt;br /&gt;
For example, when using HSI to store the local file &amp;quot;file1&amp;quot; as HPSS file &amp;quot;hpss_file1&amp;quot;, then retrieve it back to the local filesystem as &amp;quot;file1.bak&amp;quot;, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 : hpss_file1&lt;br /&gt;
    get file1.bak : hpss_file1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* With FTP, the following commands could be used:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    put file1 hpss_file1 &lt;br /&gt;
    get hpss_file1 file1.bak&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The &amp;quot;m&amp;quot; prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The &amp;quot;m&amp;quot; series of commands are intended to provide a measure of compatibility for FTP users.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Examples === &lt;br /&gt;
&lt;br /&gt;
* Put a subdirectory ''LargeFiles'' and all its contents recursively. You may use '-u' option to resume a previously disrupted session (as rsync would do).&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;quot;prompt; mput -R -u LargeFiles&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Interactively descend into the ''Source'' directory and move all files which end in &amp;quot;.h&amp;quot; into a sibling directory (ie, a directory at the same level in the tree as &amp;quot;Source&amp;quot;) named &amp;quot;Include&amp;quot;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] cd Source&lt;br /&gt;
    [HSI] mv *.h ../Include&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Delete all files beginning with &amp;quot;m&amp;quot; and ending with 9101 (note that this is an interactive request, not a one-liner request, so the wildcard path does not need quotes to preserve it):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] delete m*9101&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Interactively delete all files beginning with H and ending with a digit, and ask for verification before deleting each such file.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] mdel H*[0-9]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* From a shell, save your local files that begin with the letter &amp;quot;c&amp;quot; (let the UN*X shell resolve the wild-card path pattern in terms of your local files by not enclosing it in quotes):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi mput c*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* From a shell, get all files in the subdirectory ''subdirA'' which begin with the letters &amp;quot;b&amp;quot; or &amp;quot;c&amp;quot; (surrounding the wildcard path in single quotes prevents shells on UNIX systems from processing the wild card pattern):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get ’subdirA/[bc]*’&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Save a &amp;quot;tar file&amp;quot; of C source programs and header files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HPSS pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Restore the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' online or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help'''] from the hsi prompt.&lt;br /&gt;
&lt;br /&gt;
== '''Using HTAR''' ==&lt;br /&gt;
&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*  To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' on a remote FTP server called &amp;quot;blue.pacific.llnl.gov&amp;quot;, creating the tar file in the user’s remote FTP home directory, enter (bonus HTAR functionality to sites outside SciNet):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar -F blue.pacific.llnl.gov file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the Archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;br /&gt;
&lt;br /&gt;
== '''More detailed examples''' ==&lt;br /&gt;
[[GPC_Quickstart|Submitting_A_Batch_Job]]&lt;br /&gt;
* gpc-archive01 is part of the gpc queuing system under torque/moab&lt;br /&gt;
* Currently it is setup to share the node with up to 12 jobs at one time&lt;br /&gt;
* default parameters ( -l nodes=1:ppn=1,walltime=48:00:00)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&lt;br /&gt;
qsub -I -q archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away-and-forget&lt;br /&gt;
cd put-away-and-forget&lt;br /&gt;
put /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
put /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# create a tarball on-the-fly of the finished-job3 directory&lt;br /&gt;
/usr/local/bin/htar -cf finished-job3.tar /scratch/$USER/workarea/finished-job3/&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
   - Very painful without interactive browsing&lt;br /&gt;
       -Tentative solution: dump all user files to log file and use that as file index&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_dump&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
echo ===========&lt;br /&gt;
echo&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
ls -lUR&lt;br /&gt;
EOF&lt;br /&gt;
echo&lt;br /&gt;
echo ===========&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data restore'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-restore.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N restore&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/restored-from-MARS&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Jan-2010-jobs.tar.gz : forgotten-from-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Feb-2010-jobs.tar.gz : forgotten-from-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
cd /scratch/$USER/restored-from-MARS&lt;br /&gt;
/usr/local/bin/htar -xf finished-job3.tar&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3410</id>
		<title>MARS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3410"/>
		<updated>2011-06-20T17:17:04Z</updated>

		<summary type="html">&lt;p&gt;Knecht: staging updated&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== '''Massive Archive and Retrieval System''' ==&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The SciNet Massive Archive and Retrieval System (MARS) is a tape backed hierarchical storage management system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data will be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Migration of data into and out of the repository will be under the control of the user who will interact with system using one or both of the &lt;br /&gt;
following utilities:&lt;br /&gt;
* HSI is a client with an ftp-like interface will be used to archive and retrieve large files. It is also useful for browsing the contents of the repository.&lt;br /&gt;
* HTAR is a utility that creates tar format archives resident in the archive. It also creates a separate index file that can be accessed quickly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
User access will be controlled by the job scheduling system of the GPC. An interactive session can be requested that will allow a user to list, rearrange or remove files with the HSI client. Transfer of data into or out of the archive is expected to be scripted and submitted as a batch job.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Guidelines ==&lt;br /&gt;
* HPSS storage space is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into archive files with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the archive is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application return code and check the  log file for errors after all data transfers.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Access Through the Queue System  ==&lt;br /&gt;
All access to the archive system is through the queue system. &lt;br /&gt;
=== Interactive Access ===&lt;br /&gt;
 To get an interactive session and use the HSI client, use the -I option of qsub.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub -q archive -I&lt;br /&gt;
hpss-archive01:~ $ hsi ls&lt;br /&gt;
******************************************************************&lt;br /&gt;
*   Welcome to the Massive Archive and Restore System @ SciNet   *&lt;br /&gt;
*                                                                *&lt;br /&gt;
*        Contact Information: support@scinet.utoronto.ca         *&lt;br /&gt;
*  NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead  *&lt;br /&gt;
******************************************************************&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the archive should be scripted into jobs and submitted to the archive queue.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_file_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
if [ ! $? == 0 ];then&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Staging Data for Analysis ===&lt;br /&gt;
&lt;br /&gt;
Job dependencies can be used to make analysis jobs wait for data staging before starting. The qsub flag is&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-W depend=afterok:&amp;lt;JOBID&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.&lt;br /&gt;
&lt;br /&gt;
Here is a short cut for generating the dependency:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-restore.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-restored-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with the archive system. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents of the archive. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the HPSS file does not exist &lt;br /&gt;
|- &lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local file space on the host system only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of the archive.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More complex operations can be performed using a Here Document.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;-EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Examples === &lt;br /&gt;
&lt;br /&gt;
* Put a subdirectory ''LargeFiles'' and all its contents recursively. You may use '-u' option to resume a previously disrupted session (as rsync would do).&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;quot;prompt; mput -R -u LargeFiles&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Interactively descend into the ''Source'' directory and move all files which end in &amp;quot;.h&amp;quot; into a sibling directory (ie, a directory at the same level in the tree as &amp;quot;Source&amp;quot;) named &amp;quot;Include&amp;quot;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] cd Source&lt;br /&gt;
    [HSI] mv *.h ../Include&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Delete all files beginning with &amp;quot;m&amp;quot; and ending with 9101 (note that this is an interactive request, not a one-liner request, so the wildcard path does not need quotes to preserve it):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] delete m*9101&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Interactively delete all files beginning with H and ending with a digit, and ask for verification before deleting each such file.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] mdel H*[0-9]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* From a shell, save your local files that begin with the letter &amp;quot;c&amp;quot; (let the UN*X shell resolve the wild-card path pattern in terms of your local files by not enclosing it in quotes):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi mput c*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* From a shell, get all files in the subdirectory ''subdirA'' which begin with the letters &amp;quot;b&amp;quot; or &amp;quot;c&amp;quot; (surrounding the wildcard path in single quotes prevents shells on UNIX systems from processing the wild card pattern):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get ’subdirA/[bc]*’&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Save a &amp;quot;tar file&amp;quot; of C source programs and header files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HPSS pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Restore the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' online or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help'''] from the hsi prompt.&lt;br /&gt;
&lt;br /&gt;
== '''Using HTAR''' ==&lt;br /&gt;
&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*  To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' on a remote FTP server called &amp;quot;blue.pacific.llnl.gov&amp;quot;, creating the tar file in the user’s remote FTP home directory, enter (bonus HTAR functionality to sites outside SciNet):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar -F blue.pacific.llnl.gov file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the Archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;br /&gt;
&lt;br /&gt;
== '''More detailed examples''' ==&lt;br /&gt;
[[GPC_Quickstart|Submitting_A_Batch_Job]]&lt;br /&gt;
* gpc-archive01 is part of the gpc queuing system under torque/moab&lt;br /&gt;
* Currently it is setup to share the node with up to 12 jobs at one time&lt;br /&gt;
* default parameters ( -l nodes=1:ppn=1,walltime=48:00:00)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&lt;br /&gt;
qsub -I -q archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away-and-forget&lt;br /&gt;
cd put-away-and-forget&lt;br /&gt;
put /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
put /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# create a tarball on-the-fly of the finished-job3 directory&lt;br /&gt;
/usr/local/bin/htar -cf finished-job3.tar /scratch/$USER/workarea/finished-job3/&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
   - Very painful without interactive browsing&lt;br /&gt;
       -Tentative solution: dump all user files to log file and use that as file index&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_dump&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
echo ===========&lt;br /&gt;
echo&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
ls -lUR&lt;br /&gt;
EOF&lt;br /&gt;
echo&lt;br /&gt;
echo ===========&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data restore'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-restore.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N restore&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/restored-from-MARS&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Jan-2010-jobs.tar.gz : forgotten-from-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Feb-2010-jobs.tar.gz : forgotten-from-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
cd /scratch/$USER/restored-from-MARS&lt;br /&gt;
/usr/local/bin/htar -xf finished-job3.tar&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3409</id>
		<title>MARS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3409"/>
		<updated>2011-06-20T17:08:54Z</updated>

		<summary type="html">&lt;p&gt;Knecht: /* Using HSI */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== '''Massive Archive and Retrieval System''' ==&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The SciNet Massive Archive and Retrieval System (MARS) is a tape backed hierarchical storage management system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data will be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Migration of data into and out of the repository will be under the control of the user who will interact with system using one or both of the &lt;br /&gt;
following utilities:&lt;br /&gt;
* HSI is a client with an ftp-like interface will be used to archive and retrieve large files. It is also useful for browsing the contents of the repository.&lt;br /&gt;
* HTAR is a utility that creates tar format archives resident in the archive. It also creates a separate index file that can be accessed quickly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
User access will be controlled by the job scheduling system of the GPC. An interactive session can be requested that will allow a user to list, rearrange or remove files with the HSI client. Transfer of data into or out of the archive is expected to be scripted and submitted as a batch job.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Guidelines ==&lt;br /&gt;
* HPSS storage space is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into archive files with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the archive is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application return code and check the  log file for errors after all data transfers.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Access Through the Queue System  ==&lt;br /&gt;
All access to the archive system is through the queue system. &lt;br /&gt;
=== Interactive Access ===&lt;br /&gt;
 To get an interactive session and use the HSI client, use the -I option of qsub.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub -q archive -I&lt;br /&gt;
hpss-archive01:~ $ hsi ls&lt;br /&gt;
******************************************************************&lt;br /&gt;
*   Welcome to the Massive Archive and Restore System @ SciNet   *&lt;br /&gt;
*                                                                *&lt;br /&gt;
*        Contact Information: support@scinet.utoronto.ca         *&lt;br /&gt;
*  NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead  *&lt;br /&gt;
******************************************************************&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the archive should be scripted into jobs and submitted to the archive queue.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_file_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
if [ ! $? == 0 ];then&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with the archive system. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents of the archive. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the HPSS file does not exist &lt;br /&gt;
|- &lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local file space on the host system only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of the archive.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More complex operations can be performed using a Here Document.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;-EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/$USER/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Examples === &lt;br /&gt;
&lt;br /&gt;
* Put a subdirectory ''LargeFiles'' and all its contents recursively. You may use '-u' option to resume a previously disrupted session (as rsync would do).&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;quot;prompt; mput -R -u LargeFiles&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Interactively descend into the ''Source'' directory and move all files which end in &amp;quot;.h&amp;quot; into a sibling directory (ie, a directory at the same level in the tree as &amp;quot;Source&amp;quot;) named &amp;quot;Include&amp;quot;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] cd Source&lt;br /&gt;
    [HSI] mv *.h ../Include&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Delete all files beginning with &amp;quot;m&amp;quot; and ending with 9101 (note that this is an interactive request, not a one-liner request, so the wildcard path does not need quotes to preserve it):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] delete m*9101&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Interactively delete all files beginning with H and ending with a digit, and ask for verification before deleting each such file.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] mdel H*[0-9]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* From a shell, save your local files that begin with the letter &amp;quot;c&amp;quot; (let the UN*X shell resolve the wild-card path pattern in terms of your local files by not enclosing it in quotes):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi mput c*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* From a shell, get all files in the subdirectory ''subdirA'' which begin with the letters &amp;quot;b&amp;quot; or &amp;quot;c&amp;quot; (surrounding the wildcard path in single quotes prevents shells on UNIX systems from processing the wild card pattern):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get ’subdirA/[bc]*’&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Save a &amp;quot;tar file&amp;quot; of C source programs and header files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HPSS pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Restore the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' online or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help'''] from the hsi prompt.&lt;br /&gt;
&lt;br /&gt;
== '''Using HTAR''' ==&lt;br /&gt;
&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*  To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' on a remote FTP server called &amp;quot;blue.pacific.llnl.gov&amp;quot;, creating the tar file in the user’s remote FTP home directory, enter (bonus HTAR functionality to sites outside SciNet):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar -F blue.pacific.llnl.gov file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the Archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;br /&gt;
&lt;br /&gt;
== '''More detailed examples''' ==&lt;br /&gt;
[[GPC_Quickstart|Submitting_A_Batch_Job]]&lt;br /&gt;
* gpc-archive01 is part of the gpc queuing system under torque/moab&lt;br /&gt;
* Currently it is setup to share the node with up to 12 jobs at one time&lt;br /&gt;
* default parameters ( -l nodes=1:ppn=1,walltime=48:00:00)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&lt;br /&gt;
qsub -I -q archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away-and-forget&lt;br /&gt;
cd put-away-and-forget&lt;br /&gt;
put /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
put /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# create a tarball on-the-fly of the finished-job3 directory&lt;br /&gt;
/usr/local/bin/htar -cf finished-job3.tar /scratch/$USER/workarea/finished-job3/&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
   - Very painful without interactive browsing&lt;br /&gt;
       -Tentative solution: dump all user files to log file and use that as file index&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_dump&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
echo ===========&lt;br /&gt;
echo&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
ls -lUR&lt;br /&gt;
EOF&lt;br /&gt;
echo&lt;br /&gt;
echo ===========&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data restore'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-restore.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N restore&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/restored-from-MARS&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Jan-2010-jobs.tar.gz : forgotten-from-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Feb-2010-jobs.tar.gz : forgotten-from-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
cd /scratch/$USER/restored-from-MARS&lt;br /&gt;
/usr/local/bin/htar -xf finished-job3.tar&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''analysis''' (depends on previous data-restore.sh execution)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-restore.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-restored-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3408</id>
		<title>MARS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3408"/>
		<updated>2011-06-20T17:06:39Z</updated>

		<summary type="html">&lt;p&gt;Knecht: /* Using HSI */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== '''Massive Archive and Retrieval System''' ==&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The SciNet Massive Archive and Retrieval System (MARS) is a tape backed hierarchical storage management system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data will be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Migration of data into and out of the repository will be under the control of the user who will interact with system using one or both of the &lt;br /&gt;
following utilities:&lt;br /&gt;
* HSI is a client with an ftp-like interface will be used to archive and retrieve large files. It is also useful for browsing the contents of the repository.&lt;br /&gt;
* HTAR is a utility that creates tar format archives resident in the archive. It also creates a separate index file that can be accessed quickly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
User access will be controlled by the job scheduling system of the GPC. An interactive session can be requested that will allow a user to list, rearrange or remove files with the HSI client. Transfer of data into or out of the archive is expected to be scripted and submitted as a batch job.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Guidelines ==&lt;br /&gt;
* HPSS storage space is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into archive files with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the archive is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application return code and check the  log file for errors after all data transfers.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Access Through the Queue System  ==&lt;br /&gt;
All access to the archive system is through the queue system. &lt;br /&gt;
=== Interactive Access ===&lt;br /&gt;
 To get an interactive session and use the HSI client, use the -I option of qsub.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub -q archive -I&lt;br /&gt;
hpss-archive01:~ $ hsi ls&lt;br /&gt;
******************************************************************&lt;br /&gt;
*   Welcome to the Massive Archive and Restore System @ SciNet   *&lt;br /&gt;
*                                                                *&lt;br /&gt;
*        Contact Information: support@scinet.utoronto.ca         *&lt;br /&gt;
*  NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead  *&lt;br /&gt;
******************************************************************&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the archive should be scripted into jobs and submitted to the archive queue.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_file_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
if [ ! $? == 0 ];then&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with the archive system. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents of the archive. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the HPSS file does not exist &lt;br /&gt;
|- &lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local file space on the host system only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd,mkdir,ls,rm,mv&lt;br /&gt;
  | Operate as one would expect on the contents of the archive.&lt;br /&gt;
|-&lt;br /&gt;
  | lcd,lls&lt;br /&gt;
  | ''Local'' commands.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More complex operations can be performed using a Here Document.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;-EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/knecht/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Examples === &lt;br /&gt;
&lt;br /&gt;
* Put a subdirectory ''LargeFiles'' and all its contents recursively. You may use '-u' option to resume a previously disrupted session (as rsync would do).&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;quot;prompt; mput -R -u LargeFiles&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Interactively descend into the ''Source'' directory and move all files which end in &amp;quot;.h&amp;quot; into a sibling directory (ie, a directory at the same level in the tree as &amp;quot;Source&amp;quot;) named &amp;quot;Include&amp;quot;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] cd Source&lt;br /&gt;
    [HSI] mv *.h ../Include&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Delete all files beginning with &amp;quot;m&amp;quot; and ending with 9101 (note that this is an interactive request, not a one-liner request, so the wildcard path does not need quotes to preserve it):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] delete m*9101&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Interactively delete all files beginning with H and ending with a digit, and ask for verification before deleting each such file.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] mdel H*[0-9]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* From a shell, save your local files that begin with the letter &amp;quot;c&amp;quot; (let the UN*X shell resolve the wild-card path pattern in terms of your local files by not enclosing it in quotes):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi mput c*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* From a shell, get all files in the subdirectory ''subdirA'' which begin with the letters &amp;quot;b&amp;quot; or &amp;quot;c&amp;quot; (surrounding the wildcard path in single quotes prevents shells on UNIX systems from processing the wild card pattern):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get ’subdirA/[bc]*’&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Save a &amp;quot;tar file&amp;quot; of C source programs and header files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HPSS pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Restore the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' online or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help'''] from the hsi prompt.&lt;br /&gt;
&lt;br /&gt;
== '''Using HTAR''' ==&lt;br /&gt;
&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*  To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' on a remote FTP server called &amp;quot;blue.pacific.llnl.gov&amp;quot;, creating the tar file in the user’s remote FTP home directory, enter (bonus HTAR functionality to sites outside SciNet):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar -F blue.pacific.llnl.gov file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the Archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;br /&gt;
&lt;br /&gt;
== '''More detailed examples''' ==&lt;br /&gt;
[[GPC_Quickstart|Submitting_A_Batch_Job]]&lt;br /&gt;
* gpc-archive01 is part of the gpc queuing system under torque/moab&lt;br /&gt;
* Currently it is setup to share the node with up to 12 jobs at one time&lt;br /&gt;
* default parameters ( -l nodes=1:ppn=1,walltime=48:00:00)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&lt;br /&gt;
qsub -I -q archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away-and-forget&lt;br /&gt;
cd put-away-and-forget&lt;br /&gt;
put /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
put /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# create a tarball on-the-fly of the finished-job3 directory&lt;br /&gt;
/usr/local/bin/htar -cf finished-job3.tar /scratch/$USER/workarea/finished-job3/&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
   - Very painful without interactive browsing&lt;br /&gt;
       -Tentative solution: dump all user files to log file and use that as file index&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_dump&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
echo ===========&lt;br /&gt;
echo&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
ls -lUR&lt;br /&gt;
EOF&lt;br /&gt;
echo&lt;br /&gt;
echo ===========&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data restore'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-restore.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N restore&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/restored-from-MARS&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Jan-2010-jobs.tar.gz : forgotten-from-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Feb-2010-jobs.tar.gz : forgotten-from-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
cd /scratch/$USER/restored-from-MARS&lt;br /&gt;
/usr/local/bin/htar -xf finished-job3.tar&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''analysis''' (depends on previous data-restore.sh execution)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-restore.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-restored-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3383</id>
		<title>MARS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3383"/>
		<updated>2011-06-10T21:37:13Z</updated>

		<summary type="html">&lt;p&gt;Knecht: /* Using HSI */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== '''Massive Archive and Retrieval System''' ==&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The SciNet Massive Archive and Retrieval System (MARS) is a tape backed hierarchical storage management system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data will be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Migration of data into and out of the repository will be under the control of the user who will interact with system using one or both of the &lt;br /&gt;
following utilities:&lt;br /&gt;
* HSI is a client with an ftp-like interface will be used to archive and retrieve large files. It is also useful for browsing the contents of the repository.&lt;br /&gt;
* HTAR is a utility that creates tar format archives resident in the archive. It also creates a separate index file that can be accessed quickly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
User access will be controlled by the job scheduling system of the GPC. An interactive session can be requested that will allow a user to list, rearrange or remove files with the HSI client. Transfer of data into or out of the archive is expected to be scripted and submitted as a batch job.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Guidelines ==&lt;br /&gt;
* HPSS storage space is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into archive files with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the archive is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application return code and check the  log file for errors after all data transfers.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Access Through the Queue System  ==&lt;br /&gt;
All access to the archive system is through the queue system. &lt;br /&gt;
=== Interactive Access ===&lt;br /&gt;
 To get an interactive session and use the HSI client, use the -I option of qsub.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub -q archive -I&lt;br /&gt;
hpss-archive01:~ $ hsi ls&lt;br /&gt;
******************************************************************&lt;br /&gt;
*   Welcome to the Massive Archive and Restore System @ SciNet   *&lt;br /&gt;
*                                                                *&lt;br /&gt;
*        Contact Information: support@scinet.utoronto.ca         *&lt;br /&gt;
*  NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead  *&lt;br /&gt;
******************************************************************&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the archive should be scripted into jobs and submitted to the archive queue.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_file_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
if [ ! $? == 0 ];then&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with the archive system. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents of the archive. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the HPSS file does not exist &lt;br /&gt;
|- &lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local file space on the host system only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd&lt;br /&gt;
  |&lt;br /&gt;
|-&lt;br /&gt;
  | mkdir&lt;br /&gt;
  |&lt;br /&gt;
|- &lt;br /&gt;
  | ls&lt;br /&gt;
  | list&lt;br /&gt;
|-&lt;br /&gt;
  | rm&lt;br /&gt;
  | remove&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Simple commands can be executed on a single line.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples; cput example_data.tgz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More complex operations can be performed using a Here Document.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
hsi &amp;lt;&amp;lt;-EOF&lt;br /&gt;
  mkdir -p examples/201106&lt;br /&gt;
  cd examples&lt;br /&gt;
  mv example_data.tgz 201106/&lt;br /&gt;
  lcd /scratch/knecht/examples/&lt;br /&gt;
  cput -R -u * &lt;br /&gt;
EOF&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Examples === &lt;br /&gt;
&lt;br /&gt;
* Put a subdirectory ''LargeFiles'' and all its contents recursively. You may use '-u' option to resume a previously disrupted session (as rsync would do).&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;quot;prompt; mput -R -u LargeFiles&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Interactively descend into the ''Source'' directory and move all files which end in &amp;quot;.h&amp;quot; into a sibling directory (ie, a directory at the same level in the tree as &amp;quot;Source&amp;quot;) named &amp;quot;Include&amp;quot;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] cd Source&lt;br /&gt;
    [HSI] mv *.h ../Include&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Delete all files beginning with &amp;quot;m&amp;quot; and ending with 9101 (note that this is an interactive request, not a one-liner request, so the wildcard path does not need quotes to preserve it):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] delete m*9101&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Interactively delete all files beginning with H and ending with a digit, and ask for verification before deleting each such file.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] mdel H*[0-9]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* From a shell, save your local files that begin with the letter &amp;quot;c&amp;quot; (let the UN*X shell resolve the wild-card path pattern in terms of your local files by not enclosing it in quotes):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi mput c*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* From a shell, get all files in the subdirectory ''subdirA'' which begin with the letters &amp;quot;b&amp;quot; or &amp;quot;c&amp;quot; (surrounding the wildcard path in single quotes prevents shells on UNIX systems from processing the wild card pattern):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get ’subdirA/[bc]*’&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Save a &amp;quot;tar file&amp;quot; of C source programs and header files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HPSS pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Restore the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' online or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help'''] from the hsi prompt.&lt;br /&gt;
&lt;br /&gt;
== '''Using HTAR''' ==&lt;br /&gt;
&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*  To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' on a remote FTP server called &amp;quot;blue.pacific.llnl.gov&amp;quot;, creating the tar file in the user’s remote FTP home directory, enter (bonus HTAR functionality to sites outside SciNet):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar -F blue.pacific.llnl.gov file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the Archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;br /&gt;
&lt;br /&gt;
== '''More detailed examples''' ==&lt;br /&gt;
[[GPC_Quickstart|Submitting_A_Batch_Job]]&lt;br /&gt;
* gpc-archive01 is part of the gpc queuing system under torque/moab&lt;br /&gt;
* Currently it is setup to share the node with up to 12 jobs at one time&lt;br /&gt;
* default parameters ( -l nodes=1:ppn=1,walltime=48:00:00)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&lt;br /&gt;
qsub -I -q archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away-and-forget&lt;br /&gt;
cd put-away-and-forget&lt;br /&gt;
put /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
put /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# create a tarball on-the-fly of the finished-job3 directory&lt;br /&gt;
/usr/local/bin/htar -cf finished-job3.tar /scratch/$USER/workarea/finished-job3/&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
   - Very painful without interactive browsing&lt;br /&gt;
       -Tentative solution: dump all user files to log file and use that as file index&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_dump&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
echo ===========&lt;br /&gt;
echo&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
ls -lUR&lt;br /&gt;
EOF&lt;br /&gt;
echo&lt;br /&gt;
echo ===========&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data restore'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-restore.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N restore&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/restored-from-MARS&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Jan-2010-jobs.tar.gz : forgotten-from-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Feb-2010-jobs.tar.gz : forgotten-from-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
cd /scratch/$USER/restored-from-MARS&lt;br /&gt;
/usr/local/bin/htar -xf finished-job3.tar&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''analysis''' (depends on previous data-restore.sh execution)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-restore.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-restored-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3382</id>
		<title>MARS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3382"/>
		<updated>2011-06-10T19:38:32Z</updated>

		<summary type="html">&lt;p&gt;Knecht: /* Using HSI */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== '''Massive Archive and Retrieval System''' ==&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The SciNet Massive Archive and Retrieval System (MARS) is a tape backed hierarchical storage management system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data will be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Migration of data into and out of the repository will be under the control of the user who will interact with system using one or both of the &lt;br /&gt;
following utilities:&lt;br /&gt;
* HSI is a client with an ftp-like interface will be used to archive and retrieve large files. It is also useful for browsing the contents of the repository.&lt;br /&gt;
* HTAR is a utility that creates tar format archives resident in the archive. It also creates a separate index file that can be accessed quickly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
User access will be controlled by the job scheduling system of the GPC. An interactive session can be requested that will allow a user to list, rearrange or remove files with the HSI client. Transfer of data into or out of the archive is expected to be scripted and submitted as a batch job.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Guidelines ==&lt;br /&gt;
* HPSS storage space is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into archive files with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the archive is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application return code and check the  log file for errors after all data transfers.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Access Through the Queue System  ==&lt;br /&gt;
All access to the archive system is through the queue system. &lt;br /&gt;
=== Interactive Access ===&lt;br /&gt;
 To get an interactive session and use the HSI client, use the -I option of qsub.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub -q archive -I&lt;br /&gt;
hpss-archive01:~ $ hsi ls&lt;br /&gt;
******************************************************************&lt;br /&gt;
*   Welcome to the Massive Archive and Restore System @ SciNet   *&lt;br /&gt;
*                                                                *&lt;br /&gt;
*        Contact Information: support@scinet.utoronto.ca         *&lt;br /&gt;
*  NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead  *&lt;br /&gt;
******************************************************************&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the archive should be scripted into jobs and submitted to the archive queue.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_file_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
if [ ! $? == 0 ];then&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with the archive system. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents of the archive. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the HPSS file does not exist &lt;br /&gt;
|- &lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local file space on the host system only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | cd&lt;br /&gt;
  |&lt;br /&gt;
|-&lt;br /&gt;
  | mkdir&lt;br /&gt;
  |&lt;br /&gt;
|- &lt;br /&gt;
  | ls&lt;br /&gt;
  | list&lt;br /&gt;
|-&lt;br /&gt;
  | rm&lt;br /&gt;
  | remove&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Simple commands can be executed on a single line:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   hsi &amp;quot;mkdir examples; cd examples;&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Examples === &lt;br /&gt;
&lt;br /&gt;
* Put a subdirectory ''LargeFiles'' and all its contents recursively. You may use '-u' option to resume a previously disrupted session (as rsync would do).&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;quot;prompt; mput -R -u LargeFiles&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Interactively descend into the ''Source'' directory and move all files which end in &amp;quot;.h&amp;quot; into a sibling directory (ie, a directory at the same level in the tree as &amp;quot;Source&amp;quot;) named &amp;quot;Include&amp;quot;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] cd Source&lt;br /&gt;
    [HSI] mv *.h ../Include&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Delete all files beginning with &amp;quot;m&amp;quot; and ending with 9101 (note that this is an interactive request, not a one-liner request, so the wildcard path does not need quotes to preserve it):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] delete m*9101&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Interactively delete all files beginning with H and ending with a digit, and ask for verification before deleting each such file.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] mdel H*[0-9]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* From a shell, save your local files that begin with the letter &amp;quot;c&amp;quot; (let the UN*X shell resolve the wild-card path pattern in terms of your local files by not enclosing it in quotes):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi mput c*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* From a shell, get all files in the subdirectory ''subdirA'' which begin with the letters &amp;quot;b&amp;quot; or &amp;quot;c&amp;quot; (surrounding the wildcard path in single quotes prevents shells on UNIX systems from processing the wild card pattern):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get ’subdirA/[bc]*’&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Save a &amp;quot;tar file&amp;quot; of C source programs and header files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HPSS pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Restore the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' online or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help'''] from the hsi prompt.&lt;br /&gt;
&lt;br /&gt;
== '''Using HTAR''' ==&lt;br /&gt;
&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*  To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' on a remote FTP server called &amp;quot;blue.pacific.llnl.gov&amp;quot;, creating the tar file in the user’s remote FTP home directory, enter (bonus HTAR functionality to sites outside SciNet):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar -F blue.pacific.llnl.gov file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the Archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;br /&gt;
&lt;br /&gt;
== '''More detailed examples''' ==&lt;br /&gt;
[[GPC_Quickstart|Submitting_A_Batch_Job]]&lt;br /&gt;
* gpc-archive01 is part of the gpc queuing system under torque/moab&lt;br /&gt;
* Currently it is setup to share the node with up to 12 jobs at one time&lt;br /&gt;
* default parameters ( -l nodes=1:ppn=1,walltime=48:00:00)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&lt;br /&gt;
qsub -I -q archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away-and-forget&lt;br /&gt;
cd put-away-and-forget&lt;br /&gt;
put /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
put /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# create a tarball on-the-fly of the finished-job3 directory&lt;br /&gt;
/usr/local/bin/htar -cf finished-job3.tar /scratch/$USER/workarea/finished-job3/&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
   - Very painful without interactive browsing&lt;br /&gt;
       -Tentative solution: dump all user files to log file and use that as file index&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_dump&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
echo ===========&lt;br /&gt;
echo&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
ls -lUR&lt;br /&gt;
EOF&lt;br /&gt;
echo&lt;br /&gt;
echo ===========&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data restore'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-restore.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N restore&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/restored-from-MARS&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Jan-2010-jobs.tar.gz : forgotten-from-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Feb-2010-jobs.tar.gz : forgotten-from-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
cd /scratch/$USER/restored-from-MARS&lt;br /&gt;
/usr/local/bin/htar -xf finished-job3.tar&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''analysis''' (depends on previous data-restore.sh execution)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-restore.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-restored-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3377</id>
		<title>MARS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3377"/>
		<updated>2011-06-08T21:25:59Z</updated>

		<summary type="html">&lt;p&gt;Knecht: Change heading levels.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== '''Massive Archive and Retrieval System''' ==&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The SciNet Massive Archive and Retrieval System (MARS) is a tape backed hierarchical storage management system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data will be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Migration of data into and out of the repository will be under the control of the user who will interact with system using one or both of the &lt;br /&gt;
following utilities:&lt;br /&gt;
* HSI is a client with an ftp-like interface will be used to archive and retrieve large files. It is also useful for browsing the contents of the repository.&lt;br /&gt;
* HTAR is a utility that creates tar format archives resident in the archive. It also creates a separate index file that can be accessed quickly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
User access will be controlled by the job scheduling system of the GPC. An interactive session can be requested that will allow a user to list, rearrange or remove files with the HSI client. Transfer of data into or out of the archive is expected to be scripted and submitted as a batch job.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Guidelines ==&lt;br /&gt;
* HPSS storage space is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into archive files with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the archive is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application return code and check the  log file for errors after all data transfers.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Access Through the Queue System  ==&lt;br /&gt;
All access to the archive system is through the queue system. &lt;br /&gt;
=== Interactive Access ===&lt;br /&gt;
 To get an interactive session and use the HSI client, use the -I option of qsub.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub -q archive -I&lt;br /&gt;
hpss-archive01:~ $ hsi ls&lt;br /&gt;
******************************************************************&lt;br /&gt;
*   Welcome to the Massive Archive and Restore System @ SciNet   *&lt;br /&gt;
*                                                                *&lt;br /&gt;
*        Contact Information: support@scinet.utoronto.ca         *&lt;br /&gt;
*  NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead  *&lt;br /&gt;
******************************************************************&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Scripted File Transfers ===&lt;br /&gt;
File transfers in and out of the archive should be scripted into jobs and submitted to the archive queue.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_file_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
if [ ! $? == 0 ];then&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== '''Using HSI''' ==&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with the archive system. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents of the archive. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the HPSS file does not exist &lt;br /&gt;
|- &lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local file space on the host system only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | ls&lt;br /&gt;
  | list&lt;br /&gt;
|-&lt;br /&gt;
  | rm&lt;br /&gt;
  | remove&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== HSI Documentation === &lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
=== Examples === &lt;br /&gt;
&lt;br /&gt;
* Interactively put a subdirectory ''LargeFiles'' and all its contents recursively. You may use '-u' option to resume a previously disrupted session (as rsync would do).&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] prompt&lt;br /&gt;
    [HSI] mput -R -u LargeFiles&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Same as above, but from a shell&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;quot;prompt; mput -R -u LargeFiles&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Interactively descend into the ''Source'' directory and move all files which end in &amp;quot;.h&amp;quot; into a sibling directory (ie, a directory at the same level in the tree as &amp;quot;Source&amp;quot;) named &amp;quot;Include&amp;quot;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] cd Source&lt;br /&gt;
    [HSI] mv *.h ../Include&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Delete all files beginning with &amp;quot;m&amp;quot; and ending with 9101 (note that this is an interactive request, not a one-liner request, so the wildcard path does not need quotes to preserve it):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] delete m*9101&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Interactively delete all files beginning with H and ending with a digit, and ask for verification before deleting each such file.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] mdel H*[0-9]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* From a shell, save your local files that begin with the letter &amp;quot;c&amp;quot; (let the UN*X shell resolve the wild-card path pattern in terms of your local files by not enclosing it in quotes):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi mput c*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* From a shell, get all files in the subdirectory ''subdirA'' which begin with the letters &amp;quot;b&amp;quot; or &amp;quot;c&amp;quot; (surrounding the wildcard path in single quotes prevents shells on UNIX systems from processing the wild card pattern):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get ’subdirA/[bc]*’&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Save a &amp;quot;tar file&amp;quot; of C source programs and header files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HPSS pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Restore the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' online or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help'''] from the hsi prompt.&lt;br /&gt;
&lt;br /&gt;
== '''Using HTAR''' ==&lt;br /&gt;
&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*  To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' on a remote FTP server called &amp;quot;blue.pacific.llnl.gov&amp;quot;, creating the tar file in the user’s remote FTP home directory, enter (bonus HTAR functionality to sites outside SciNet):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar -F blue.pacific.llnl.gov file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the Archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;br /&gt;
&lt;br /&gt;
== '''More detailed examples''' ==&lt;br /&gt;
[[GPC_Quickstart|Submitting_A_Batch_Job]]&lt;br /&gt;
* gpc-archive01 is part of the gpc queuing system under torque/moab&lt;br /&gt;
* Currently it is setup to share the node with up to 12 jobs at one time&lt;br /&gt;
* default parameters ( -l nodes=1:ppn=1,walltime=48:00:00)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&lt;br /&gt;
qsub -I -q archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away-and-forget&lt;br /&gt;
cd put-away-and-forget&lt;br /&gt;
put /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
put /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# create a tarball on-the-fly of the finished-job3 directory&lt;br /&gt;
/usr/local/bin/htar -cf finished-job3.tar /scratch/$USER/workarea/finished-job3/&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
   - Very painful without interactive browsing&lt;br /&gt;
       -Tentative solution: dump all user files to log file and use that as file index&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_dump&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
echo ===========&lt;br /&gt;
echo&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
ls -lUR&lt;br /&gt;
EOF&lt;br /&gt;
echo&lt;br /&gt;
echo ===========&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data restore'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-restore.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N restore&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/restored-from-MARS&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Jan-2010-jobs.tar.gz : forgotten-from-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Feb-2010-jobs.tar.gz : forgotten-from-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
cd /scratch/$USER/restored-from-MARS&lt;br /&gt;
/usr/local/bin/htar -xf finished-job3.tar&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''analysis''' (depends on previous data-restore.sh execution)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-restore.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-restored-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3376</id>
		<title>MARS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3376"/>
		<updated>2011-06-08T21:15:38Z</updated>

		<summary type="html">&lt;p&gt;Knecht: /* Using HSI */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== '''Massive Archive and Retrieval System''' ===&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The SciNet Massive Archive and Retrieval System (MARS) is a tape backed hierarchical storage management system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data will be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Migration of data into and out of the repository will be under the control of the user who will interact with system using one or both of the &lt;br /&gt;
following utilities:&lt;br /&gt;
* HSI is a client with an ftp-like interface will be used to archive and retrieve large files. It is also useful for browsing the contents of the repository.&lt;br /&gt;
* HTAR is a utility that creates tar format archives resident in the archive. It also creates a separate index file that can be accessed quickly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
User access will be controlled by the job scheduling system of the GPC. An interactive session can be requested that will allow a user to list, rearrange or remove files with the HSI client. Transfer of data into or out of the archive is expected to be scripted and submitted as a batch job.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Guidelines ===&lt;br /&gt;
* HPSS storage space is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into archive files with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the archive is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application return code and check the  log file for errors after all data transfers.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Access Through the Queue System  ===&lt;br /&gt;
All access to the archive system is through the queue system. &lt;br /&gt;
== Interactive Access ==&lt;br /&gt;
 To get an interactive session and use the HSI client, use the -I option of qsub.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub -q archive -I&lt;br /&gt;
hpss-archive01:~ $ hsi ls&lt;br /&gt;
******************************************************************&lt;br /&gt;
*   Welcome to the Massive Archive and Restore System @ SciNet   *&lt;br /&gt;
*                                                                *&lt;br /&gt;
*        Contact Information: support@scinet.utoronto.ca         *&lt;br /&gt;
*  NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead  *&lt;br /&gt;
******************************************************************&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Scripted File Transfers ==&lt;br /&gt;
File transfers in and out of the archive should be scripted into jobs and submitted to the archive queue.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_file_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
if [ ! $? == 0 ];then&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== '''Using HSI''' ===&lt;br /&gt;
&lt;br /&gt;
HSI is the primary client with which a user will interact with the archive system. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents of the archive. The most commonly used commands will be:&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
  | cput &lt;br /&gt;
  | Conditionally stores a file only if the HPSS file does not exist &lt;br /&gt;
|- &lt;br /&gt;
  | cget &lt;br /&gt;
  | Conditionally retrieves a copy of a file from HPSS to your local file space on the host system only if a local copy does not already exist. &lt;br /&gt;
|-&lt;br /&gt;
  | ls&lt;br /&gt;
  | list&lt;br /&gt;
|-&lt;br /&gt;
  | rm&lt;br /&gt;
  | remove&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== HSI Documentation == &lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]&lt;br /&gt;
* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]&lt;br /&gt;
&lt;br /&gt;
= Examples = &lt;br /&gt;
&lt;br /&gt;
* Interactively put a subdirectory ''LargeFiles'' and all its contents recursively. You may use '-u' option to resume a previously disrupted session (as rsync would do).&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] prompt&lt;br /&gt;
    [HSI] mput -R -u LargeFiles&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Same as above, but from a shell&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;quot;prompt; mput -R -u LargeFiles&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Interactively descend into the ''Source'' directory and move all files which end in &amp;quot;.h&amp;quot; into a sibling directory (ie, a directory at the same level in the tree as &amp;quot;Source&amp;quot;) named &amp;quot;Include&amp;quot;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] cd Source&lt;br /&gt;
    [HSI] mv *.h ../Include&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Delete all files beginning with &amp;quot;m&amp;quot; and ending with 9101 (note that this is an interactive request, not a one-liner request, so the wildcard path does not need quotes to preserve it):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] delete m*9101&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Interactively delete all files beginning with H and ending with a digit, and ask for verification before deleting each such file.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] mdel H*[0-9]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* From a shell, save your local files that begin with the letter &amp;quot;c&amp;quot; (let the UN*X shell resolve the wild-card path pattern in terms of your local files by not enclosing it in quotes):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi mput c*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* From a shell, get all files in the subdirectory ''subdirA'' which begin with the letters &amp;quot;b&amp;quot; or &amp;quot;c&amp;quot; (surrounding the wildcard path in single quotes prevents shells on UNIX systems from processing the wild card pattern):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get ’subdirA/[bc]*’&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Save a &amp;quot;tar file&amp;quot; of C source programs and header files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HPSS pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Restore the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' online or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help'''] from the hsi prompt.&lt;br /&gt;
&lt;br /&gt;
=== '''Using HTAR''' ===&lt;br /&gt;
&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*  To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' on a remote FTP server called &amp;quot;blue.pacific.llnl.gov&amp;quot;, creating the tar file in the user’s remote FTP home directory, enter (bonus HTAR functionality to sites outside SciNet):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar -F blue.pacific.llnl.gov file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the Archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;br /&gt;
&lt;br /&gt;
=== '''More detailed examples''' ===&lt;br /&gt;
[[GPC_Quickstart|Submitting_A_Batch_Job]]&lt;br /&gt;
* gpc-archive01 is part of the gpc queuing system under torque/moab&lt;br /&gt;
* Currently it is setup to share the node with up to 12 jobs at one time&lt;br /&gt;
* default parameters ( -l nodes=1:ppn=1,walltime=48:00:00)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&lt;br /&gt;
qsub -I -q archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away-and-forget&lt;br /&gt;
cd put-away-and-forget&lt;br /&gt;
put /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
put /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# create a tarball on-the-fly of the finished-job3 directory&lt;br /&gt;
/usr/local/bin/htar -cf finished-job3.tar /scratch/$USER/workarea/finished-job3/&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
   - Very painful without interactive browsing&lt;br /&gt;
       -Tentative solution: dump all user files to log file and use that as file index&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_dump&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
echo ===========&lt;br /&gt;
echo&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
ls -lUR&lt;br /&gt;
EOF&lt;br /&gt;
echo&lt;br /&gt;
echo ===========&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data restore'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-restore.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N restore&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/restored-from-MARS&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Jan-2010-jobs.tar.gz : forgotten-from-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Feb-2010-jobs.tar.gz : forgotten-from-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
cd /scratch/$USER/restored-from-MARS&lt;br /&gt;
/usr/local/bin/htar -xf finished-job3.tar&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''analysis''' (depends on previous data-restore.sh execution)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-restore.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-restored-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3375</id>
		<title>MARS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3375"/>
		<updated>2011-06-08T18:59:05Z</updated>

		<summary type="html">&lt;p&gt;Knecht: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== '''Massive Archive and Retrieval System''' ===&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The SciNet Massive Archive and Retrieval System (MARS) is a tape backed hierarchical storage management system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data will be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Migration of data into and out of the repository will be under the control of the user who will interact with system using one or both of the &lt;br /&gt;
following utilities:&lt;br /&gt;
* HSI is a client with an ftp-like interface will be used to archive and retrieve large files. It is also useful for browsing the contents of the repository.&lt;br /&gt;
* HTAR is a utility that creates tar format archives resident in the archive. It also creates a separate index file that can be accessed quickly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
User access will be controlled by the job scheduling system of the GPC. An interactive session can be requested that will allow a user to list, rearrange or remove files with the HSI client. Transfer of data into or out of the archive is expected to be scripted and submitted as a batch job.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Guidelines ===&lt;br /&gt;
* HPSS storage space is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into archive files with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the archive is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application return code and check the  log file for errors after all data transfers.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Access Through the Queue System  ===&lt;br /&gt;
All access to the archive system is through the queue system. &lt;br /&gt;
== Interactive Access ==&lt;br /&gt;
 To get an interactive session and use the HSI client, use the -I option of qsub.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub -q archive -I&lt;br /&gt;
hpss-archive01:~ $ hsi ls&lt;br /&gt;
******************************************************************&lt;br /&gt;
*   Welcome to the Massive Archive and Restore System @ SciNet   *&lt;br /&gt;
*                                                                *&lt;br /&gt;
*        Contact Information: support@scinet.utoronto.ca         *&lt;br /&gt;
*  NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead  *&lt;br /&gt;
******************************************************************&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Scripted File Transfers ==&lt;br /&gt;
File transfers in and out of the archive should be scripted into jobs and submitted to the archive queue.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_file_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
if [ ! $? == 0 ];then&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== '''Using HSI''' ===&lt;br /&gt;
&lt;br /&gt;
* Link to docs at Gleicher Ent.&lt;br /&gt;
&lt;br /&gt;
* Interactively put a subdirectory ''LargeFiles'' and all its contents recursively. You may use '-u' option to resume a previously disrupted session (as rsync would do).&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] prompt&lt;br /&gt;
    [HSI] mput -R -u LargeFiles&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Same as above, but from a shell&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;quot;prompt; mput -R -u LargeFiles&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Interactively descend into the ''Source'' directory and move all files which end in &amp;quot;.h&amp;quot; into a sibling directory (ie, a directory at the same level in the tree as &amp;quot;Source&amp;quot;) named &amp;quot;Include&amp;quot;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] cd Source&lt;br /&gt;
    [HSI] mv *.h ../Include&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Delete all files beginning with &amp;quot;m&amp;quot; and ending with 9101 (note that this is an interactive request, not a one-liner request, so the wildcard path does not need quotes to preserve it):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] delete m*9101&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Interactively delete all files beginning with H and ending with a digit, and ask for verification before deleting each such file.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] mdel H*[0-9]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* From a shell, save your local files that begin with the letter &amp;quot;c&amp;quot; (let the UN*X shell resolve the wild-card path pattern in terms of your local files by not enclosing it in quotes):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi mput c*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* From a shell, get all files in the subdirectory ''subdirA'' which begin with the letters &amp;quot;b&amp;quot; or &amp;quot;c&amp;quot; (surrounding the wildcard path in single quotes prevents shells on UNIX systems from processing the wild card pattern):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get ’subdirA/[bc]*’&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Save a &amp;quot;tar file&amp;quot; of C source programs and header files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HPSS pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Restore the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' online or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help'''] from the hsi prompt.&lt;br /&gt;
&lt;br /&gt;
=== '''Using HTAR''' ===&lt;br /&gt;
&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*  To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' on a remote FTP server called &amp;quot;blue.pacific.llnl.gov&amp;quot;, creating the tar file in the user’s remote FTP home directory, enter (bonus HTAR functionality to sites outside SciNet):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar -F blue.pacific.llnl.gov file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the Archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;br /&gt;
&lt;br /&gt;
=== '''More detailed examples''' ===&lt;br /&gt;
[[GPC_Quickstart|Submitting_A_Batch_Job]]&lt;br /&gt;
* gpc-archive01 is part of the gpc queuing system under torque/moab&lt;br /&gt;
* Currently it is setup to share the node with up to 12 jobs at one time&lt;br /&gt;
* default parameters ( -l nodes=1:ppn=1,walltime=48:00:00)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&lt;br /&gt;
qsub -I -q archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away-and-forget&lt;br /&gt;
cd put-away-and-forget&lt;br /&gt;
put /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
put /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# create a tarball on-the-fly of the finished-job3 directory&lt;br /&gt;
/usr/local/bin/htar -cf finished-job3.tar /scratch/$USER/workarea/finished-job3/&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
   - Very painful without interactive browsing&lt;br /&gt;
       -Tentative solution: dump all user files to log file and use that as file index&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_dump&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
echo ===========&lt;br /&gt;
echo&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
ls -lUR&lt;br /&gt;
EOF&lt;br /&gt;
echo&lt;br /&gt;
echo ===========&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data restore'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-restore.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N restore&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/restored-from-MARS&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Jan-2010-jobs.tar.gz : forgotten-from-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Feb-2010-jobs.tar.gz : forgotten-from-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
cd /scratch/$USER/restored-from-MARS&lt;br /&gt;
/usr/local/bin/htar -xf finished-job3.tar&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''analysis''' (depends on previous data-restore.sh execution)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-restore.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-restored-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3374</id>
		<title>MARS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3374"/>
		<updated>2011-06-08T18:54:44Z</updated>

		<summary type="html">&lt;p&gt;Knecht: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== '''Massive Archive and Retrieval System''' ===&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The SciNet Massive Archive and Retrieval System (MARS) is a tape backed hierarchicle storage management system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data will be returned to the active GPFS filesystem when it is needed. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Migration of data into and out of the repository will be under the control of the user who will interact with system using one or both of the &lt;br /&gt;
following utilities:&lt;br /&gt;
* HSI is a client with an ftp-like interface will be used to archive and retrieve large files. It is also useful for browsing the contents of the repository.&lt;br /&gt;
* HTAR is a utility that creates tar format archives resident in the archive. It also creates a separate index file that can be accessed quickly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
User access will be controlled by the job scheduling system of the GPC. An interactive session can be requested that will allow a user to list, rearrange or remove files with the HSI client. Transfer of data into or out of the archive is expected to be scripted and submitted as a batch job.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Guidelines ===&lt;br /&gt;
* HPSS storage space is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into archive files with tar or htar.&lt;br /&gt;
* The maximum size of a file that can be transferred into the archive is 1TB. However, optimal performance is obtained with file sizes &amp;lt;= 100 GB.&lt;br /&gt;
* Make sure to check the application return code and check the  log file for errors after all data transfers.&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Access Through the Queue System  ===&lt;br /&gt;
All access to the archive system is through the queue system. To get an interactive session and use the HSI client, use the -I option of qsub.&lt;br /&gt;
== Interactive Access ==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub -q archive -I&lt;br /&gt;
hpss-archive01:~ $ hsi ls&lt;br /&gt;
******************************************************************&lt;br /&gt;
*   Welcome to the Massive Archive and Restore System @ SciNet   *&lt;br /&gt;
*                                                                *&lt;br /&gt;
*        Contact Information: support@scinet.utoronto.ca         *&lt;br /&gt;
*  NOTE: do not transfer SMALL FILES with HSI. Use HTAR instead  *&lt;br /&gt;
******************************************************************&lt;br /&gt;
[HSI]/archive/group/user-&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Scripted File Transfers ==&lt;br /&gt;
File transfers in and out of the archive should be scripted into jobs and submitted to the archive queue.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/env bash&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hsi_file_transfer&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
if [ ! $? == 0 ];then&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== '''Using HSI''' ===&lt;br /&gt;
&lt;br /&gt;
* Link to docs at Gleicher Ent.&lt;br /&gt;
&lt;br /&gt;
* Interactively put a subdirectory ''LargeFiles'' and all its contents recursively. You may use '-u' option to resume a previously disrupted session (as rsync would do).&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] prompt&lt;br /&gt;
    [HSI] mput -R -u LargeFiles&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Same as above, but from a shell&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;quot;prompt; mput -R -u LargeFiles&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Interactively descend into the ''Source'' directory and move all files which end in &amp;quot;.h&amp;quot; into a sibling directory (ie, a directory at the same level in the tree as &amp;quot;Source&amp;quot;) named &amp;quot;Include&amp;quot;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] cd Source&lt;br /&gt;
    [HSI] mv *.h ../Include&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Delete all files beginning with &amp;quot;m&amp;quot; and ending with 9101 (note that this is an interactive request, not a one-liner request, so the wildcard path does not need quotes to preserve it):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] delete m*9101&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Interactively delete all files beginning with H and ending with a digit, and ask for verification before deleting each such file.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] mdel H*[0-9]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* From a shell, save your local files that begin with the letter &amp;quot;c&amp;quot; (let the UN*X shell resolve the wild-card path pattern in terms of your local files by not enclosing it in quotes):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi mput c*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* From a shell, get all files in the subdirectory ''subdirA'' which begin with the letters &amp;quot;b&amp;quot; or &amp;quot;c&amp;quot; (surrounding the wildcard path in single quotes prevents shells on UNIX systems from processing the wild card pattern):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get ’subdirA/[bc]*’&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Save a &amp;quot;tar file&amp;quot; of C source programs and header files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HPSS pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Restore the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' online or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help'''] from the hsi prompt.&lt;br /&gt;
&lt;br /&gt;
=== '''Using HTAR''' ===&lt;br /&gt;
&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*  To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' on a remote FTP server called &amp;quot;blue.pacific.llnl.gov&amp;quot;, creating the tar file in the user’s remote FTP home directory, enter (bonus HTAR functionality to sites outside SciNet):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar -F blue.pacific.llnl.gov file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the Archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;br /&gt;
&lt;br /&gt;
=== '''More detailed examples''' ===&lt;br /&gt;
[[GPC_Quickstart|Submitting_A_Batch_Job]]&lt;br /&gt;
* gpc-archive01 is part of the gpc queuing system under torque/moab&lt;br /&gt;
* Currently it is setup to share the node with up to 12 jobs at one time&lt;br /&gt;
* default parameters ( -l nodes=1:ppn=1,walltime=48:00:00)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&lt;br /&gt;
qsub -I -q archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away-and-forget&lt;br /&gt;
cd put-away-and-forget&lt;br /&gt;
put /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
put /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# create a tarball on-the-fly of the finished-job3 directory&lt;br /&gt;
/usr/local/bin/htar -cf finished-job3.tar /scratch/$USER/workarea/finished-job3/&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
   - Very painful without interactive browsing&lt;br /&gt;
       -Tentative solution: dump all user files to log file and use that as file index&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_dump&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
echo ===========&lt;br /&gt;
echo&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
ls -lUR&lt;br /&gt;
EOF&lt;br /&gt;
echo&lt;br /&gt;
echo ===========&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data restore'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-restore.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N restore&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/restored-from-MARS&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Jan-2010-jobs.tar.gz : forgotten-from-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Feb-2010-jobs.tar.gz : forgotten-from-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
cd /scratch/$USER/restored-from-MARS&lt;br /&gt;
/usr/local/bin/htar -xf finished-job3.tar&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''analysis''' (depends on previous data-restore.sh execution)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-restore.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-restored-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3366</id>
		<title>MARS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3366"/>
		<updated>2011-06-07T19:49:50Z</updated>

		<summary type="html">&lt;p&gt;Knecht: /* Using the batch queue */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== '''Massive Archive and Retrieval System''' ===&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;'''NOTE: PLEASE TAKE THE TIME TO AT LEAST READ THIS PAGE IN FULL.'''&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The '''MARS''' deployment at SciNet is an effort to offer a more efficient way to offoad/archive data from the most active file systems (scratch and project) than our current TSM-HSM solution, still without having to deal directly with the tape library or &amp;quot;tape commands&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The system is a combination of the underlaying hardware infrastructure, 3 software components, ''HPSS'', ''HSI'' and ''HTAR'', plus some environment customization. &lt;br /&gt;
&lt;br /&gt;
* '''HPSS''': the main software component, best described as a very scalable engine running on a &amp;quot;blackbox&amp;quot; made of disks and tapes, to support the Archive and Retrieve operations. [http://www.hpss-collaboration.org/index.shtml High Performance Storage System - HPSS] is the result of over a decade of collaboration among five Department of Energy laboratories and IBM, with significant contributions by universities and other laboratories worldwide. For now the best way for SciNet users to [https://support.scinet.utoronto.ca/wiki/index.php/HPSS_compared_to_HSM-TSM understand HPSS] may be to compare it with our existing HSM-TSM implementation.&lt;br /&gt;
&lt;br /&gt;
* '''HSI''': it may be best understood as a supercharged ftp like interface, specially designed by [http://www.mgleicher.us/GEL/hsi/ Gleicher Enterprises] to act as a front-end for HPSS, gathering some of the best features you would encounter on a shell, rsync and GridFTP (and a few more). It enables users to transfer whole directory trees from /project and /scratch, therefore freeing up space. HSI is most suitable when those directory trees do not contain too many small files to start with, or when you already have a series of tarballs. &lt;br /&gt;
&lt;br /&gt;
* '''HTAR''': similarly, htar is sort of a &amp;quot;super-tar&amp;quot; application, also specially designed by [http://www.mgleicher.us/GEL/htar/ Gleicher Enterprises] to interact with HPSS, allowing users to build and automatically transfer tarballs to HPSS on the fly. HTAR is most suitable to aggregate whole directory trees. When HTAR creates the TAR file, it also builds an index file, with a &amp;quot;.idx&amp;quot; suffix added, which is stored in the same directory as the TAR file.&lt;br /&gt;
&lt;br /&gt;
=== '''General guide lines''' ===&lt;br /&gt;
* IN/OUT transfers to HPSS using HSI is bound to maximum of about '''4 files/second'''. Therefore do not attempt to transfer directories with too many small files inside. Not only it will take &amp;quot;forever&amp;quot;, it will induce a lot of ware and tear on the library's robot mechanism as well as the tapes themselves in case of recalls. Instead use HTAR, so the files are aggregated while being sent to HPSS&lt;br /&gt;
* The maximum size that an individual file can have inside an HTAR is '''68GB'''. Please be sure to identify and &amp;quot;fish out&amp;quot; those files that are larger than 68GB from the directories and transfer them with  HSI&lt;br /&gt;
* The maximum size of a tar file that HPSS will take is '''1TB'''. Please do not generate tarballs that large.&lt;br /&gt;
* The maximum number of files in a htar is '''1 million'''. Please, break-up your htar segments as required.&lt;br /&gt;
* These guide lines may be strict, but for as long as they are followed the system will perform reasonably well&lt;br /&gt;
&lt;br /&gt;
=== '''Performance considerations''' ===&lt;br /&gt;
* Files are kept on disk-cache for as long as possible, so as to avoid tape operations during recalls.&lt;br /&gt;
* Average transfer rates with '''HSI'''&lt;br /&gt;
  No small files, average &amp;gt; 10MB/file: &lt;br /&gt;
  * write: 100-130MB/s&lt;br /&gt;
  * read:  450-600MB/s (IF no recall from tapes required)&lt;br /&gt;
  &lt;br /&gt;
* &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;'''NOTE: do not use HSI with small files (&amp;lt; 1MB/file). It would take over 1 week to transfer 1 TB. If we find that you are abusing the system we'll suspend your privileges'''&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Average transfer rates with '''HTAR''' &lt;br /&gt;
  Average files size &amp;gt; 1MB&lt;br /&gt;
  * write: 120MB/s&lt;br /&gt;
  * read:  480MB/s (IF no recall from tapes required) &lt;br /&gt;
&lt;br /&gt;
  Not too many small files, average &amp;gt; 100KB/file: &lt;br /&gt;
  * write: 64MB/s&lt;br /&gt;
  * read:  170MB/s (IF no recall from tapes required)&lt;br /&gt;
&lt;br /&gt;
* Average transfer rates from '''tapes''', if stage is required (add to the above estimates)&lt;br /&gt;
  * read: 80-100MB/s per tape drive. &lt;br /&gt;
  * maximum of 4 drives may be used per hsi/htar session&lt;br /&gt;
&lt;br /&gt;
=== '''Quick Reference''' ===&lt;br /&gt;
&lt;br /&gt;
* Users must request authorization to access MARS@SciNet. To run HSI or HTAR with HPSS please login to the '''gpc-archive01''' node.&lt;br /&gt;
* To browse the contents of your HPSS archive just type '''hsi''' on a shell to get the ''hsi prompt''. Then use simple commands such as '''ls''', '''pwd''', '''cd''' to navigate your way around. You may also use [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''help'''] from the hsi prompt.&lt;br /&gt;
* Files are organized inside HPSS in the same fashion as in /project. Users in the same group have read permissions to each other's archives.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[HSI]/archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== '''Using HSI''' ===&lt;br /&gt;
&lt;br /&gt;
* Interactively put a subdirectory ''LargeFiles'' and all its contents recursively. You may use '-u' option to resume a previously disrupted session (as rsync would do).&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] prompt&lt;br /&gt;
    [HSI] mput -R -u LargeFiles&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Same as above, but from a shell&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;quot;prompt; mput -R -u LargeFiles&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Interactively descend into the ''Source'' directory and move all files which end in &amp;quot;.h&amp;quot; into a sibling directory (ie, a directory at the same level in the tree as &amp;quot;Source&amp;quot;) named &amp;quot;Include&amp;quot;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] cd Source&lt;br /&gt;
    [HSI] mv *.h ../Include&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Delete all files beginning with &amp;quot;m&amp;quot; and ending with 9101 (note that this is an interactive request, not a one-liner request, so the wildcard path does not need quotes to preserve it):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] delete m*9101&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Interactively delete all files beginning with H and ending with a digit, and ask for verification before deleting each such file.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] mdel H*[0-9]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* From a shell, save your local files that begin with the letter &amp;quot;c&amp;quot; (let the UN*X shell resolve the wild-card path pattern in terms of your local files by not enclosing it in quotes):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi mput c*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* From a shell, get all files in the subdirectory ''subdirA'' which begin with the letters &amp;quot;b&amp;quot; or &amp;quot;c&amp;quot; (surrounding the wildcard path in single quotes prevents shells on UNIX systems from processing the wild card pattern):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get ’subdirA/[bc]*’&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Save a &amp;quot;tar file&amp;quot; of C source programs and header files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HPSS pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Restore the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' online or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help'''] from the hsi prompt.&lt;br /&gt;
&lt;br /&gt;
=== '''Using HTAR''' ===&lt;br /&gt;
&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*  To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' on a remote FTP server called &amp;quot;blue.pacific.llnl.gov&amp;quot;, creating the tar file in the user’s remote FTP home directory, enter (bonus HTAR functionality to sites outside SciNet):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar -F blue.pacific.llnl.gov file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the Archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;br /&gt;
&lt;br /&gt;
=== '''Using the batch queue''' ===&lt;br /&gt;
[[GPC_Quickstart|Submitting_A_Batch_Job]]&lt;br /&gt;
* gpc-archive01 is part of the gpc queuing system under torque/moab&lt;br /&gt;
* Currently it is setup to share the node with up to 12 jobs at one time&lt;br /&gt;
* default parameters ( -l nodes=1:ppn=1,walltime=48:00:00)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&lt;br /&gt;
qsub -I -q archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away-and-forget&lt;br /&gt;
cd put-away-and-forget&lt;br /&gt;
put /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
put /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# create a tarball on-the-fly of the finished-job3 directory&lt;br /&gt;
/usr/local/bin/htar -cf finished-job3.tar /scratch/$USER/workarea/finished-job3/&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
   - Very painful without interactive browsing&lt;br /&gt;
       -Tentative solution: dump all user files to log file and use that as file index&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_dump&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
echo ===========&lt;br /&gt;
echo&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
ls -lUR&lt;br /&gt;
EOF&lt;br /&gt;
echo&lt;br /&gt;
echo ===========&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data restore'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-restore.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N restore&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/restored-from-MARS&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Jan-2010-jobs.tar.gz : forgotten-from-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Feb-2010-jobs.tar.gz : forgotten-from-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
cd /scratch/$USER/restored-from-MARS&lt;br /&gt;
/usr/local/bin/htar -xf finished-job3.tar&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''analysis''' (depends on previous data-restore.sh execution)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-restore.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-restored-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3365</id>
		<title>MARS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3365"/>
		<updated>2011-06-07T19:13:02Z</updated>

		<summary type="html">&lt;p&gt;Knecht: /* Massive Archive and Retrieval System */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== '''Massive Archive and Retrieval System''' ===&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;'''NOTE: PLEASE TAKE THE TIME TO AT LEAST READ THIS PAGE IN FULL.'''&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The '''MARS''' deployment at SciNet is an effort to offer a more efficient way to offoad/archive data from the most active file systems (scratch and project) than our current TSM-HSM solution, still without having to deal directly with the tape library or &amp;quot;tape commands&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The system is a combination of the underlaying hardware infrastructure, 3 software components, ''HPSS'', ''HSI'' and ''HTAR'', plus some environment customization. &lt;br /&gt;
&lt;br /&gt;
* '''HPSS''': the main software component, best described as a very scalable engine running on a &amp;quot;blackbox&amp;quot; made of disks and tapes, to support the Archive and Retrieve operations. [http://www.hpss-collaboration.org/index.shtml High Performance Storage System - HPSS] is the result of over a decade of collaboration among five Department of Energy laboratories and IBM, with significant contributions by universities and other laboratories worldwide. For now the best way for SciNet users to [https://support.scinet.utoronto.ca/wiki/index.php/HPSS_compared_to_HSM-TSM understand HPSS] may be to compare it with our existing HSM-TSM implementation.&lt;br /&gt;
&lt;br /&gt;
* '''HSI''': it may be best understood as a supercharged ftp like interface, specially designed by [http://www.mgleicher.us/GEL/hsi/ Gleicher Enterprises] to act as a front-end for HPSS, gathering some of the best features you would encounter on a shell, rsync and GridFTP (and a few more). It enables users to transfer whole directory trees from /project and /scratch, therefore freeing up space. HSI is most suitable when those directory trees do not contain too many small files to start with, or when you already have a series of tarballs. &lt;br /&gt;
&lt;br /&gt;
* '''HTAR''': similarly, htar is sort of a &amp;quot;super-tar&amp;quot; application, also specially designed by [http://www.mgleicher.us/GEL/htar/ Gleicher Enterprises] to interact with HPSS, allowing users to build and automatically transfer tarballs to HPSS on the fly. HTAR is most suitable to aggregate whole directory trees. When HTAR creates the TAR file, it also builds an index file, with a &amp;quot;.idx&amp;quot; suffix added, which is stored in the same directory as the TAR file.&lt;br /&gt;
&lt;br /&gt;
=== '''General guide lines''' ===&lt;br /&gt;
* IN/OUT transfers to HPSS using HSI is bound to maximum of about '''4 files/second'''. Therefore do not attempt to transfer directories with too many small files inside. Not only it will take &amp;quot;forever&amp;quot;, it will induce a lot of ware and tear on the library's robot mechanism as well as the tapes themselves in case of recalls. Instead use HTAR, so the files are aggregated while being sent to HPSS&lt;br /&gt;
* The maximum size that an individual file can have inside an HTAR is '''68GB'''. Please be sure to identify and &amp;quot;fish out&amp;quot; those files that are larger than 68GB from the directories and transfer them with  HSI&lt;br /&gt;
* The maximum size of a tar file that HPSS will take is '''1TB'''. Please do not generate tarballs that large.&lt;br /&gt;
* The maximum number of files in a htar is '''1 million'''. Please, break-up your htar segments as required.&lt;br /&gt;
* These guide lines may be strict, but for as long as they are followed the system will perform reasonably well&lt;br /&gt;
&lt;br /&gt;
=== '''Performance considerations''' ===&lt;br /&gt;
* Files are kept on disk-cache for as long as possible, so as to avoid tape operations during recalls.&lt;br /&gt;
* Average transfer rates with '''HSI'''&lt;br /&gt;
  No small files, average &amp;gt; 10MB/file: &lt;br /&gt;
  * write: 100-130MB/s&lt;br /&gt;
  * read:  450-600MB/s (IF no recall from tapes required)&lt;br /&gt;
  &lt;br /&gt;
* &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;'''NOTE: do not use HSI with small files (&amp;lt; 1MB/file). It would take over 1 week to transfer 1 TB. If we find that you are abusing the system we'll suspend your privileges'''&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Average transfer rates with '''HTAR''' &lt;br /&gt;
  Average files size &amp;gt; 1MB&lt;br /&gt;
  * write: 120MB/s&lt;br /&gt;
  * read:  480MB/s (IF no recall from tapes required) &lt;br /&gt;
&lt;br /&gt;
  Not too many small files, average &amp;gt; 100KB/file: &lt;br /&gt;
  * write: 64MB/s&lt;br /&gt;
  * read:  170MB/s (IF no recall from tapes required)&lt;br /&gt;
&lt;br /&gt;
* Average transfer rates from '''tapes''', if stage is required (add to the above estimates)&lt;br /&gt;
  * read: 80-100MB/s per tape drive. &lt;br /&gt;
  * maximum of 4 drives may be used per hsi/htar session&lt;br /&gt;
&lt;br /&gt;
=== '''Quick Reference''' ===&lt;br /&gt;
&lt;br /&gt;
* Users must request authorization to access MARS@SciNet. To run HSI or HTAR with HPSS please login to the '''gpc-archive01''' node.&lt;br /&gt;
* To browse the contents of your HPSS archive just type '''hsi''' on a shell to get the ''hsi prompt''. Then use simple commands such as '''ls''', '''pwd''', '''cd''' to navigate your way around. You may also use [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''help'''] from the hsi prompt.&lt;br /&gt;
* Files are organized inside HPSS in the same fashion as in /project. Users in the same group have read permissions to each other's archives.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[HSI]/archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== '''Using HSI''' ===&lt;br /&gt;
&lt;br /&gt;
* Interactively put a subdirectory ''LargeFiles'' and all its contents recursively. You may use '-u' option to resume a previously disrupted session (as rsync would do).&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] prompt&lt;br /&gt;
    [HSI] mput -R -u LargeFiles&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Same as above, but from a shell&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;quot;prompt; mput -R -u LargeFiles&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Interactively descend into the ''Source'' directory and move all files which end in &amp;quot;.h&amp;quot; into a sibling directory (ie, a directory at the same level in the tree as &amp;quot;Source&amp;quot;) named &amp;quot;Include&amp;quot;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] cd Source&lt;br /&gt;
    [HSI] mv *.h ../Include&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Delete all files beginning with &amp;quot;m&amp;quot; and ending with 9101 (note that this is an interactive request, not a one-liner request, so the wildcard path does not need quotes to preserve it):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] delete m*9101&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Interactively delete all files beginning with H and ending with a digit, and ask for verification before deleting each such file.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] mdel H*[0-9]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* From a shell, save your local files that begin with the letter &amp;quot;c&amp;quot; (let the UN*X shell resolve the wild-card path pattern in terms of your local files by not enclosing it in quotes):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi mput c*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* From a shell, get all files in the subdirectory ''subdirA'' which begin with the letters &amp;quot;b&amp;quot; or &amp;quot;c&amp;quot; (surrounding the wildcard path in single quotes prevents shells on UNIX systems from processing the wild card pattern):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get ’subdirA/[bc]*’&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Save a &amp;quot;tar file&amp;quot; of C source programs and header files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HPSS pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Restore the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' online or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help'''] from the hsi prompt.&lt;br /&gt;
&lt;br /&gt;
=== '''Using HTAR''' ===&lt;br /&gt;
&lt;br /&gt;
* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf subdirA.tar subdirA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*  To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' on a remote FTP server called &amp;quot;blue.pacific.llnl.gov&amp;quot;, creating the tar file in the user’s remote FTP home directory, enter (bonus HTAR functionality to sites outside SciNet):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar -F blue.pacific.llnl.gov file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the ''project1/src'' directory in the Archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online&lt;br /&gt;
&lt;br /&gt;
=== '''Using the batch queue''' ===&lt;br /&gt;
* gpc-archive01 is part of the gpc queuing system under torque/moab&lt;br /&gt;
* Currently it is setup to share the node with up to 12 jobs at one time&lt;br /&gt;
* default parameters ( -l nodes=1:ppn=1,walltime=48:00:00)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&lt;br /&gt;
qsub -I -q archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-offload.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
# individual tarballs already exist&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away-and-forget&lt;br /&gt;
cd put-away-and-forget&lt;br /&gt;
put /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
put /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# create a tarball on-the-fly of the finished-job3 directory&lt;br /&gt;
/usr/local/bin/htar -cf finished-job3.tar /scratch/$USER/workarea/finished-job3/&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
   - Very painful without interactive browsing&lt;br /&gt;
       -Tentative solution: dump all user files to log file and use that as file index&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-list.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_dump&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
echo ===========&lt;br /&gt;
echo&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
ls -lUR&lt;br /&gt;
EOF&lt;br /&gt;
echo&lt;br /&gt;
echo ===========&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data restore'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-restore.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N restore&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/$USER/restored-from-MARS&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Jan-2010-jobs.tar.gz : forgotten-from-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
get /scratch/$USER/restored-from-MARS/Feb-2010-jobs.tar.gz : forgotten-from-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
cd /scratch/$USER/restored-from-MARS&lt;br /&gt;
/usr/local/bin/htar -xf finished-job3.tar&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''analysis''' (depends on previous data-restore.sh execution)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-restore.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-restored-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3222</id>
		<title>MARS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3222"/>
		<updated>2011-05-26T20:25:40Z</updated>

		<summary type="html">&lt;p&gt;Knecht: /* Using the batch queue */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== '''Massive Archive and Restore System''' ===&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in May/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The '''MARS''' deployment at SciNet is an effort to offer a more efficient way to offoad/archive data from the most active file systems (scratch and project) than our current TSM-HSM solution, still without having to deal directly with the tape library or &amp;quot;tape commands&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The system is a combination of the underlaying hardware infrastructure, 3 software components, ''HPSS'', ''HSI'' and ''HTAR'', plus some environment customization. &lt;br /&gt;
&lt;br /&gt;
* '''HPSS''': the main component, best described as a very scalable &amp;quot;blackbox&amp;quot; running in the background to support the Archive and Restore operations. [http://www.hpss-collaboration.org/index.shtml High Performance Storage System - HPSS] is the result of over a decade of collaboration among five Department of Energy laboratories and IBM, with significant contributions by universities and other laboratories worldwide. For now the best way for SciNet users to [https://support.scinet.utoronto.ca/wiki/index.php/HPSS_compared_to_HSM-TSM understand HPSS] may be to compare it with our existing HSM-TSM implementation.&lt;br /&gt;
&lt;br /&gt;
* '''HSI''': it may be best understood as a supercharged ftp interface, specially designed by [http://www.mgleicher.us/GEL/hsi/ Gleicher Enterprises] to act as a front-end for HPSS, gathering some of the best features you would encounter on a shell, rsync and GridFTP (and a few more). It enables users to transfer whole directory trees from /project and /scratch, therefore freeing up space. HSI is most suitable when those directory trees do not contain too many small files to start with, or when you already have a series of tarballs. &lt;br /&gt;
&lt;br /&gt;
* '''HTAR''': similarly, htar is sort of a &amp;quot;super-tar&amp;quot; application, also specially designed by [http://www.mgleicher.us/GEL/htar/ Gleicher Enterprises] to interact with HPSS, allowing users to build and automatically transfer tarballs to HPSS on the fly. HTAR is most suitable to aggregate whole directory trees. When HTAR creates the TAR file, it also builds an index file, with a &amp;quot;.idx&amp;quot; suffix added, which is stored in the same directory as the TAR file.&lt;br /&gt;
&lt;br /&gt;
=== '''General guide lines''' ===&lt;br /&gt;
* IN/OUT transfers to HPSS using HSI is bound to maximum of about '''4 files/second'''. Therefore do not attempt to transfer directories with too many small files inside. Not only it will take &amp;quot;forever&amp;quot;, it will induce a lot of ware and tear on the library's robot mechanism as well as the tapes themselves in case of recalls. Instead use HTAR, so the files are aggregated while being sent to HPSS&lt;br /&gt;
* The maximum size that an individual file can have inside an HTAR is '''68GB'''. Please be sure to identify and &amp;quot;fish out&amp;quot; those files that are larger than 68GB from the directories and transfer them with  HSI&lt;br /&gt;
* The maximum size of a tar file that HPSS will take is '''1TB'''. Please do not generate tarballs that large.&lt;br /&gt;
* The maximum number of files in a htar is '''1 million'''. Please, break-up your htar segments as required.&lt;br /&gt;
* These guide lines may be strict, but for as long as they are followed the system will perform reasonably well&lt;br /&gt;
&lt;br /&gt;
=== '''Performance considerations''' ===&lt;br /&gt;
* Files are kept on disk-cache for as long as possible, so as to avoid tape operations during recalls.&lt;br /&gt;
* Average transfer rates with '''HSI'''&lt;br /&gt;
  No small files, average &amp;gt; 1MB/file: &lt;br /&gt;
  * write: 100-130MB/s&lt;br /&gt;
  * read:  450-600MB/s (IF no recall from tapes required)&lt;br /&gt;
  &lt;br /&gt;
* &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;'''NOTE: do not use HSI with small files (&amp;lt; 1MB/file). It would take over 1 week to transfer 1 TB. If we find that you are abusing the system we'll suspend your privileges'''&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Average transfer rates with '''HTAR''' &lt;br /&gt;
  Average files size &amp;gt; 1MB&lt;br /&gt;
  * write: 120MB/s&lt;br /&gt;
  * read:  480MB/s (IF no recall from tapes required) &lt;br /&gt;
&lt;br /&gt;
  Not too many small files, average &amp;gt; 100KB/file: &lt;br /&gt;
  * write: 64MB/s&lt;br /&gt;
  * read:  170MB/s (IF no recall from tapes required)&lt;br /&gt;
&lt;br /&gt;
* Average transfer rates from '''tapes''', if stage is required (add to the above estimates)&lt;br /&gt;
  * read: 80MB/s per tape drive. &lt;br /&gt;
  * maximum of 4 drives may be used per hsi/htar session&lt;br /&gt;
&lt;br /&gt;
=== '''Quick Reference''' ===&lt;br /&gt;
&lt;br /&gt;
* To use HSI or HTAR and access HPSS please login to the '''gpc-archive01''' node.&lt;br /&gt;
* AUTHENTICATION: done automatically&lt;br /&gt;
* To browse the contents of your HPSS archive just type '''hsi''' on a shell to get the hsi prompt. Then use simple commands such as '''ls''', '''pwd''', '''cd''' to navigate your way around. You may also use [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''help'''] from the hsi prompt.&lt;br /&gt;
* Files are organized inside HPSS in the same fashion as in /project. Users in the same group have read permissions to each other's archives.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[HSI]/archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* There is also provision to have files migrated to a &amp;quot;mirrored set of tapes&amp;quot; (the 2nd set may be kept off-site). You'll have request and justify the need. The default is a &amp;quot;single tape set&amp;quot;. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[HSI]/archive-dual-copy/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== '''Using HSI''' ===&lt;br /&gt;
&lt;br /&gt;
* Interactively put a subdirectory subdirb and all its contents recursively. You may use '-u' option to resume a previously disrupted session (as rsync would do).&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] prompt&lt;br /&gt;
    [HSI] mput -R -u subdirb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Interactively descend into the &amp;quot;Source&amp;quot; directory and move all files which end in &amp;quot;.h&amp;quot; into a sibling directory (ie, a directory at the same level in the tree as &amp;quot;Source&amp;quot;) named &amp;quot;Include&amp;quot;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] cd Source&lt;br /&gt;
    [HSI] mv *.h ../Include&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Delete all files beginning with &amp;quot;m&amp;quot; and ending with 9101 (note that this is an interactive request, not a one-liner request, so the wildcard path does not need quotes to preserve it):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] delete m*9101&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Interactively delete all files beginning with H and ending with a digit, and ask for verification before deleting each such file.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] mdel H*[0-9]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Save your local files that begin with the letter &amp;quot;c&amp;quot; (let the UN*X shell resolve the wild-card path pattern in terms of your local files by not enclosing it in quotes:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put c*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Get all files in the subdirectory subdira which begin with the letters &amp;quot;b&amp;quot; or &amp;quot;c&amp;quot; (surrounding the wildcard path in single quotes prevents shells on UNIX systems from processing the wild card pattern):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get ’subdira/[bc]*’&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Save a &amp;quot;tar file&amp;quot; of C source programs and header files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HPSS pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Restore the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Using the &amp;quot;mirrored set of tapes&amp;quot; provision (you'll need authorization and a directory placement in /archive-dual-copy)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar : /archive-dual-copy/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
OR&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] cd /archive-dual-copy/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;&lt;br /&gt;
    [HSI] put source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* For more details please check the [http://www.mgleicher.us/GEL/hsi/ HSI Introduction] or the [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page] online&lt;br /&gt;
&lt;br /&gt;
=== '''Using HTAR''' ===&lt;br /&gt;
&lt;br /&gt;
* To write the file1 and file2 files to a new archive called &amp;quot;files.tar&amp;quot; in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*  To write the file1 and file2 files to a new archive called &amp;quot;files.tar&amp;quot; on a remote FTP server called &amp;quot;blue.pacific.llnl.gov&amp;quot;, creating the tar file in the user’s remote FTP home directory, enter (bonus HTAR functionality to sites outside SciNet):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar -F blue.pacific.llnl.gov file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the project1/src directory in the Archive file called proj1.tar, and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the out.tar archive file within the HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Using the &amp;quot;mirrored set of tapes&amp;quot; provision (you'll need authorization and a directory placement in /archive-dual-copy) &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf /archive-dual-copy/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the [http://www.mgleicher.us/GEL/htar/ HTAR - Introduction] or the [http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page] online&lt;br /&gt;
&lt;br /&gt;
=== '''Using the batch queue''' ===&lt;br /&gt;
* gpc-archive01 is part of the gpc queuing system under torque/moab&lt;br /&gt;
* Currently it is setup to share the node with up to 12 jobs at one time&lt;br /&gt;
* default parameters ( -l nodes=1:ppn=1,walltime=48:00:00)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -I -q archive&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data offload'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N offload&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir put-away-and-forget&lt;br /&gt;
cd put-away-and-forget&lt;br /&gt;
put /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz&lt;br /&gt;
put /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample '''data list'''&lt;br /&gt;
   - Very painful without interactive browsing&lt;br /&gt;
       -Tentative solution: dump all user files to log file and use that as file index&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_dump&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
echo ===========&lt;br /&gt;
echo&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
ls -lUR&lt;br /&gt;
EOF&lt;br /&gt;
echo&lt;br /&gt;
echo ===========&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample '''data restore'''&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# This script is named: data-restore.sh&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N restore&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/knecht/restored-from-MARS&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
get /scratch/knecht/restored-from-MARS/Jan-2010-jobs.tar.gz : forgotten-from-2010/Jan-2010-jobs.tar.gz&lt;br /&gt;
get /scratch/knecht/restored-from-MARS/Feb-2010-jobs.tar.gz : forgotten-from-2010/Feb-2010-jobs.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* sample analysis&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04 $ qsub $(qsub data-restore.sh | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') job-to-work-on-restored-data.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3214</id>
		<title>MARS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3214"/>
		<updated>2011-05-26T18:53:43Z</updated>

		<summary type="html">&lt;p&gt;Knecht: /* Using the batch queue */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== '''Massive Archive and Restore System''' ===&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in May/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The '''MARS''' deployment at SciNet is an effort to offer a more efficient way to offoad/archive data from the most active file systems (scratch and project) than our current TSM-HSM solution, still without having to deal directly with the tape library or &amp;quot;tape commands&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The system is a combination of the underlaying hardware infrastructure, 3 software components, ''HPSS'', ''HSI'' and ''HTAR'', plus some environment customization. &lt;br /&gt;
&lt;br /&gt;
* '''HPSS''': the main component, best described as a very scalable &amp;quot;blackbox&amp;quot; running in the background to support the Archive and Restore operations. [http://www.hpss-collaboration.org/index.shtml High Performance Storage System - HPSS] is the result of over a decade of collaboration among five Department of Energy laboratories and IBM, with significant contributions by universities and other laboratories worldwide. For now the best way for SciNet users to [https://support.scinet.utoronto.ca/wiki/index.php/HPSS_compared_to_HSM-TSM understand HPSS] may be to compare it with our existing HSM-TSM implementation.&lt;br /&gt;
&lt;br /&gt;
* '''HSI''': it may be best understood as a supercharged ftp interface, specially designed by [http://www.mgleicher.us/GEL/hsi/ Gleicher Enterprises] to act as a front-end for HPSS, gathering some of the best features you would encounter on a shell, rsync and GridFTP (and a few more). It enables users to transfer whole directory trees from /project and /scratch, therefore freeing up space. HSI is most suitable when those directory trees do not contain too many small files to start with, or when you already have a series of tarballs. &lt;br /&gt;
&lt;br /&gt;
* '''HTAR''': similarly, htar is sort of a &amp;quot;super-tar&amp;quot; application, also specially designed by [http://www.mgleicher.us/GEL/htar/ Gleicher Enterprises] to interact with HPSS, allowing users to build and automatically transfer tarballs to HPSS on the fly. HTAR is most suitable to aggregate whole directory trees. When HTAR creates the TAR file, it also builds an index file, with a &amp;quot;.idx&amp;quot; suffix added, which is stored in the same directory as the TAR file.&lt;br /&gt;
&lt;br /&gt;
=== '''General guide lines''' ===&lt;br /&gt;
* IN/OUT transfers to HPSS using HSI is bound to maximum of about '''4 files/second'''. Therefore do not attempt to transfer directories with too many small files inside. Not only it will take &amp;quot;forever&amp;quot;, it will induce a lot of ware and tear on the library's robot mechanism as well as the tapes themselves in case of recalls. Instead use HTAR, so the files are aggregated while being sent to HPSS&lt;br /&gt;
* The maximum size that an individual file can have inside an HTAR is '''68GB'''. Please be sure to identify and &amp;quot;fish out&amp;quot; those files that are larger than 68GB from the directories and transfer them with  HSI&lt;br /&gt;
* The maximum size of a tar file that HPSS will take is '''1TB'''. Please do not generate tarballs that large.&lt;br /&gt;
* The maximum number of files in a htar is '''1 million'''. Please, break-up your htar segments as required.&lt;br /&gt;
* These guide lines may be strict, but for as long as they are followed the system will perform reasonably well&lt;br /&gt;
&lt;br /&gt;
=== '''Performance considerations''' ===&lt;br /&gt;
* Files are kept on disk-cache for as long as possible, so as to avoid tape operations during recalls.&lt;br /&gt;
* Average transfer rates with '''HSI'''&lt;br /&gt;
  No small files, average &amp;gt; 1MB/file: &lt;br /&gt;
  * write: 100-130MB/s&lt;br /&gt;
  * read:  450-600MB/s (IF no recall from tapes required)&lt;br /&gt;
  &lt;br /&gt;
* &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;'''NOTE: do not use HSI with small files (&amp;lt; 1MB/file). It would take over 1 week to transfer 1 TB. If we find that you are abusing the system we'll suspend your privileges'''&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Average transfer rates with '''HTAR''' &lt;br /&gt;
  Average files size &amp;gt; 1MB&lt;br /&gt;
  * write: 120MB/s&lt;br /&gt;
  * read:  480MB/s (IF no recall from tapes required) &lt;br /&gt;
&lt;br /&gt;
  Not too many small files, average &amp;gt; 100KB/file: &lt;br /&gt;
  * write: 64MB/s&lt;br /&gt;
  * read:  170MB/s (IF no recall from tapes required)&lt;br /&gt;
&lt;br /&gt;
* Average transfer rates from '''tapes''', if stage is required (add to the above estimates)&lt;br /&gt;
  * read: 80MB/s per tape drive. &lt;br /&gt;
  * maximum of 4 drives may be used per hsi/htar session&lt;br /&gt;
&lt;br /&gt;
=== '''Quick Reference''' ===&lt;br /&gt;
&lt;br /&gt;
* To use HSI or HTAR and access HPSS please login to the '''gpc-archive01''' node.&lt;br /&gt;
* AUTHENTICATION: done automatically&lt;br /&gt;
* To browse the contents of your HPSS archive just type '''hsi''' on a shell to get the hsi prompt. Then use simple commands such as '''ls''', '''pwd''', '''cd''' to navigate your way around. You may also use [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''help'''] from the hsi prompt.&lt;br /&gt;
* Files are organized inside HPSS in the same fashion as in /project. Users in the same group have read permissions to each other's archives.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[HSI]/archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* There is also provision to have files migrated to a &amp;quot;mirrored set of tapes&amp;quot; (the 2nd set may be kept off-site). You'll have request and justify the need. The default is a &amp;quot;single tape set&amp;quot;. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[HSI]/archive-dual-copy/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== '''Using HSI''' ===&lt;br /&gt;
&lt;br /&gt;
* Interactively put a subdirectory subdirb and all its contents recursively. You may use '-u' option to resume a previously disrupted session (as rsync would do).&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] prompt&lt;br /&gt;
    [HSI] mput -R -u subdirb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Interactively descend into the &amp;quot;Source&amp;quot; directory and move all files which end in &amp;quot;.h&amp;quot; into a sibling directory (ie, a directory at the same level in the tree as &amp;quot;Source&amp;quot;) named &amp;quot;Include&amp;quot;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] cd Source&lt;br /&gt;
    [HSI] mv *.h ../Include&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Delete all files beginning with &amp;quot;m&amp;quot; and ending with 9101 (note that this is an interactive request, not a one-liner request, so the wildcard path does not need quotes to preserve it):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] delete m*9101&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Interactively delete all files beginning with H and ending with a digit, and ask for verification before deleting each such file.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] mdel H*[0-9]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Save your local files that begin with the letter &amp;quot;c&amp;quot; (let the UN*X shell resolve the wild-card path pattern in terms of your local files by not enclosing it in quotes:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put c*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Get all files in the subdirectory subdira which begin with the letters &amp;quot;b&amp;quot; or &amp;quot;c&amp;quot; (surrounding the wildcard path in single quotes prevents shells on UNIX systems from processing the wild card pattern):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get ’subdira/[bc]*’&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Save a &amp;quot;tar file&amp;quot; of C source programs and header files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HPSS pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Restore the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Using the &amp;quot;mirrored set of tapes&amp;quot; provision (you'll need authorization and a directory placement in /archive-dual-copy)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar : /archive-dual-copy/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
OR&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] cd /archive-dual-copy/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;&lt;br /&gt;
    [HSI] put source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* For more details please check the [http://www.mgleicher.us/GEL/hsi/ HSI Introduction] or the [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page] online&lt;br /&gt;
&lt;br /&gt;
=== '''Using HTAR''' ===&lt;br /&gt;
&lt;br /&gt;
* To write the file1 and file2 files to a new archive called &amp;quot;files.tar&amp;quot; in the default HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
OR&lt;br /&gt;
    htar -cf /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*  To write the file1 and file2 files to a new archive called &amp;quot;files.tar&amp;quot; on a remote FTP server called &amp;quot;blue.pacific.llnl.gov&amp;quot;, creating the tar file in the user’s remote FTP home directory, enter (bonus HTAR functionality to sites outside SciNet):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar -F blue.pacific.llnl.gov file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the project1/src directory in the Archive file called proj1.tar, and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the out.tar archive file within the HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Using the &amp;quot;mirrored set of tapes&amp;quot; provision (you'll need authorization and a directory placement in /archive-dual-copy) &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf /archive-dual-copy/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For more details please check the [http://www.mgleicher.us/GEL/htar/ HTAR - Introduction] or the [http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page] online&lt;br /&gt;
&lt;br /&gt;
=== '''Using the batch queue''' ===&lt;br /&gt;
* gpc-archive01 is part of the gpc queuing system under torque/moab&lt;br /&gt;
* Currently it is setup to share the node with up to 12 jobs at one time&lt;br /&gt;
* default parameters ( -l nodes=1:ppn=1,walltime=48:00:00)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -I -q archive&lt;br /&gt;
showq -w class=archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample data import&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N ingest&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir HItest&lt;br /&gt;
cd HItest&lt;br /&gt;
put  /scratch/knecht/workarea/SwapTests/A/DBRelease-13.2.1.tar.gz         : DBRelease-13.2.1.tar.gz&lt;br /&gt;
put  /scratch/knecht/workarea/SwapTests/A/RDO.215791._003065.pool.root.1  : RDO.215791._003065.pool.root.1&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample data list&lt;br /&gt;
   - Very painful without interactive browsing&lt;br /&gt;
       -Tentative solution: dump all user files to log file and use that as file index&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_dump&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
echo ===========&lt;br /&gt;
echo&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
ls -lUR&lt;br /&gt;
EOF&lt;br /&gt;
echo&lt;br /&gt;
echo ===========&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample stage-in&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N stage&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/knecht/workarea/SwapTests/stagetest&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
get /scratch/knecht/workarea/SwapTests/stagetest/RDO.215791._003065.pool.root.1 : HItest/RDO.215791._003065.pool.root.1  &lt;br /&gt;
get /scratch/knecht/workarea/SwapTests/stagetest/DBRelease-13.2.1.tar.gz        : HItest/DBRelease-13.2.1.tar.gz         : &lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample analysis&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04#&amp;gt; qsub $(qsub stage.job | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') hiReco.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3206</id>
		<title>MARS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3206"/>
		<updated>2011-05-26T16:36:51Z</updated>

		<summary type="html">&lt;p&gt;Knecht: /* Using the batch queue */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== '''Massive Archive and Restore System''' ===&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in May/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The '''MARS''' deployment at SciNet is an effort to offer a more efficient way to offoad/archive data from the most active file systems (scratch and project) than our current TSM-HSM solution, still without having to deal directly with the tape library or &amp;quot;tape commands&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The system is a combination of the underlaying hardware infrastructure, 3 software components, ''HPSS'', ''HSI'' and ''HTAR'', plus some environment customization. &lt;br /&gt;
&lt;br /&gt;
* '''HPSS''': the main component, best described as a very scalable &amp;quot;blackbox&amp;quot; running in the background to support the Archive and Restore operations. [http://www.hpss-collaboration.org/index.shtml High Performance Storage System - HPSS] is the result of over a decade of collaboration among five Department of Energy laboratories and IBM, with significant contributions by universities and other laboratories worldwide. For now the best way for SciNet users to [https://support.scinet.utoronto.ca/wiki/index.php/HPSS_compared_to_HSM-TSM understand HPSS] may be to compare it with our existing HSM-TSM implementation.&lt;br /&gt;
&lt;br /&gt;
* '''HSI''': it may be best understood as a supercharged ftp interface, specially designed by [http://www.mgleicher.us/GEL/hsi/ Gleicher Enterprises] to act as a front-end for HPSS, gathering some of the best features you would encounter on a shell, rsync and GridFTP (and a few more). It enables users to transfer whole directory trees from /project and /scratch, therefore freeing up space. HSI is most suitable when those directory trees do not contain too many small files to start with, or when you already have a series of tarballs. &lt;br /&gt;
&lt;br /&gt;
* '''HTAR''': similarly, htar is sort of a &amp;quot;super-tar&amp;quot; application, also specially designed by [http://www.mgleicher.us/GEL/htar/ Gleicher Enterprises] to interact with HPSS, allowing users to build and automatically transfer tarballs to HPSS on the fly. HTAR is most suitable to aggregate whole directory trees. When HTAR creates the TAR file, it also builds an index file, with a &amp;quot;.idx&amp;quot; suffix added, which is stored in the same directory as the TAR file.&lt;br /&gt;
&lt;br /&gt;
=== '''General guide lines''' ===&lt;br /&gt;
* IN/OUT transfers to HPSS using HSI is bound to maximum of about '''4 files/second'''. Therefore do not attempt to transfer directories with too many small files inside. Not only it will take &amp;quot;forever&amp;quot;, it will induce a lot of ware and tear on the library's robot mechanism as well as the tapes themselves in case of recalls. Instead use HTAR, so the files are aggregated while being sent to HPSS&lt;br /&gt;
* The maximum size that an individual file can have inside an HTAR is '''68GB'''. Please be sure to identify and &amp;quot;fish out&amp;quot; those files that are larger than 68GB from the directories and transfer them with  HSI&lt;br /&gt;
* The maximum size of a tar file that HPSS will take is '''1TB'''. Please do not generate tarballs that large.&lt;br /&gt;
* The maximum number of files in a htar is '''1 million'''. Please, break-up your htar segments as required.&lt;br /&gt;
* These guide lines may be strict, but for as long as they are followed the system will perform reasonably well&lt;br /&gt;
&lt;br /&gt;
=== '''Performance considerations''' ===&lt;br /&gt;
* Files are kept on disk-cache for as long as possible, so as to avoid tape operations during recalls.&lt;br /&gt;
* Average transfer rates with '''HSI'''&lt;br /&gt;
  No small files, average &amp;gt; 1MB/file: &lt;br /&gt;
  * write: 100-130MB/s&lt;br /&gt;
  * read:  450-600MB/s (IF no recall from tapes required)&lt;br /&gt;
  &lt;br /&gt;
* &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;'''NOTE: do not use HSI with small files (&amp;lt; 1MB/file). It would take over 1 week to transfer 1 TB. If we find that you are abusing the system we'll suspend your privileges'''&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Average transfer rates with '''HTAR''' &lt;br /&gt;
  Average files size &amp;gt; 1MB&lt;br /&gt;
  * write: 120MB/s&lt;br /&gt;
  * read:  480MB/s (IF no recall from tapes required) &lt;br /&gt;
&lt;br /&gt;
  Not too many small files, average &amp;gt; 100KB/file: &lt;br /&gt;
  * write: 64MB/s&lt;br /&gt;
  * read:  170MB/s (IF no recall from tapes required)&lt;br /&gt;
&lt;br /&gt;
* Average transfer rates from '''tapes''', if stage is required (add to the above estimates)&lt;br /&gt;
  * read: 80MB/s per tape drive. &lt;br /&gt;
  * maximum of 4 drives may be used per hsi/htar session&lt;br /&gt;
&lt;br /&gt;
=== '''Quick Reference''' ===&lt;br /&gt;
&lt;br /&gt;
* To use HSI or HTAR and access HPSS please login to the '''gpc-archive01''' node.&lt;br /&gt;
* AUTHENTICATION: done automatically&lt;br /&gt;
* To browse the contents of your HPSS archive just type '''hsi''' on a shell to get the hsi prompt. Then use simple commands such as '''ls''', '''pwd''', '''cd''' to navigate your way around. You may also use [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''help'''] from the hsi prompt.&lt;br /&gt;
* Files are organized inside HPSS in the same fashion as in /project. Users in the same group have read permissions to each other's archives.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[HSI]/archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* There is also provision to have files migrated to a &amp;quot;mirrored set of tapes&amp;quot; Class of Service (COS). You'll have request and justify the need. The default is a &amp;quot;single tape&amp;quot; COS. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[HSI]/archive-dual-copy/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== '''Using HSI''' ===&lt;br /&gt;
&lt;br /&gt;
* Interactively put a subdirectory subdirb and all its contents recursively. You may use '-u' option to resume a previously disrupted session (as rsync would do).&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] prompt&lt;br /&gt;
    [HSI] mput -R -u subdirb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Interactively descend into the &amp;quot;Source&amp;quot; directory and move all files which end in &amp;quot;.h&amp;quot; into a sibling directory (ie, a directory at the same level in the tree as &amp;quot;Source&amp;quot;) named &amp;quot;Include&amp;quot;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] cd Source&lt;br /&gt;
    [HSI] mv *.h ../Include&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Delete all files beginning with &amp;quot;m&amp;quot; and ending with 9101 (note that this is an interactive request, not a one-liner request, so the wildcard path does not need quotes to preserve it):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] delete m*9101&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Interactively delete all files beginning with H and ending with a digit, and ask for verification before deleting each such file.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] mdel H*[0-9]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Save your local files that begin with the letter &amp;quot;c&amp;quot; (let the UN*X shell resolve the wild-card path pattern in terms of your local files by not enclosing it in quotes:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put c*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Get all files in the subdirectory subdira which begin with the letters &amp;quot;b&amp;quot; or &amp;quot;c&amp;quot; (surrounding the wildcard path in single quotes prevents shells on UNIX systems from processing the wild card pattern):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get ’subdira/[bc]*’&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Save a &amp;quot;tar file&amp;quot; of C source programs and header files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HPSS pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Restore the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent (the default HSI directory placement is /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Using the copy to mirror tapes provision (you'll need authorization and a directory placement in /archive-dual-copy)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar : /archive-dual-copy/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
OR&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] cd /archive-dual-copy/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;&lt;br /&gt;
    [HSI] put source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* For more details please check the [http://www.mgleicher.us/GEL/hsi/ HSI Introduction] or the [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page] online&lt;br /&gt;
&lt;br /&gt;
=== '''Using HTAR''' ===&lt;br /&gt;
&lt;br /&gt;
* To write the file1 and file2 files to a new archive called &amp;quot;files.tar&amp;quot; in the current HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*  To write the file1 and file2 files to a new archive called &amp;quot;files.tar&amp;quot; on a remote FTP server called &amp;quot;blue.pacific.llnl.gov&amp;quot;, creating the tar file in the user’s remote FTP home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar -F blue.pacific.llnl.gov file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the project1/src directory in the Archive file called proj1.tar, and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the out.tar archive file within the HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
For more details please check the [http://www.mgleicher.us/GEL/htar/ HTAR - Introduction] or the [http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page] online&lt;br /&gt;
&lt;br /&gt;
=== '''Using the batch queue''' ===&lt;br /&gt;
* gpc-archive01 is part of the gpc queuing system under torque/moab&lt;br /&gt;
* Currently it is setup to share the node with up to 12 jobs at one time&lt;br /&gt;
* default parameters ( -l nodes=1:ppn=1,walltime=48:00:00)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -I -q archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample data import&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N ingest&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o logs/&lt;br /&gt;
&lt;br /&gt;
# module load dcap&lt;br /&gt;
# module load &lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir HItest&lt;br /&gt;
cd HItest&lt;br /&gt;
put  /scratch/knecht/workarea/SwapTests/A/DBRelease-13.2.1.tar.gz         : DBRelease-13.2.1.tar.gz&lt;br /&gt;
put  /scratch/knecht/workarea/SwapTests/A/RDO.215791._003065.pool.root.1  : RDO.215791._003065.pool.root.1&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample data list&lt;br /&gt;
   - Very painful without interactive browsing&lt;br /&gt;
       -Tentative solution: dump all user files to log file and use that as file index&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_dump&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
echo ===========&lt;br /&gt;
echo&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
ls -lUR&lt;br /&gt;
EOF&lt;br /&gt;
echo&lt;br /&gt;
echo ===========&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample stage-in&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N stage&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
# module load dcap&lt;br /&gt;
# module load &lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/knecht/workarea/SwapTests/stagetest&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
get /scratch/knecht/workarea/SwapTests/stagetest/RDO.215791._003065.pool.root.1 : HItest/RDO.215791._003065.pool.root.1  &lt;br /&gt;
get /scratch/knecht/workarea/SwapTests/stagetest/DBRelease-13.2.1.tar.gz        : HItest/DBRelease-13.2.1.tar.gz         : &lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample analyis&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04#&amp;gt; qsub $(qsub stage.job | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') hiReco.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3204</id>
		<title>MARS</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=MARS&amp;diff=3204"/>
		<updated>2011-05-26T16:30:41Z</updated>

		<summary type="html">&lt;p&gt;Knecht: /* Using the batch queue */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== '''Massive Archive and Restore System''' ===&lt;br /&gt;
&lt;br /&gt;
(Pilot usage phase to start in May/2011 with a select group of users. Deployment and configuration are still a work in progress)&lt;br /&gt;
&lt;br /&gt;
The '''MARS''' deployment at SciNet is an effort to offer a more efficient way to offoad/archive data from the most active file systems (scratch and project) than our current TSM-HSM solution, still without having to deal directly with the tape library or &amp;quot;tape commands&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The system is a combination of the underlaying hardware infrastructure, 3 software components, ''HPSS'', ''HSI'' and ''HTAR'', plus some environment customization. &lt;br /&gt;
&lt;br /&gt;
* '''HPSS''': the main component, best described as a very scalable &amp;quot;blackbox&amp;quot; running in the background to support the Archive and Restore operations. [http://www.hpss-collaboration.org/index.shtml High Performance Storage System - HPSS] is the result of over a decade of collaboration among five Department of Energy laboratories and IBM, with significant contributions by universities and other laboratories worldwide. For now the best way for SciNet users to [https://support.scinet.utoronto.ca/wiki/index.php/HPSS_compared_to_HSM-TSM understand HPSS] may be to compare it with our existing HSM-TSM implementation.&lt;br /&gt;
&lt;br /&gt;
* '''HSI''': it may be best understood as a supercharged ftp interface, specially designed by [http://www.mgleicher.us/GEL/hsi/ Gleicher Enterprises] to act as a front-end for HPSS, gathering some of the best features you would encounter on a shell, rsync and GridFTP (and a few more). It enables users to transfer whole directory trees from /project and /scratch, therefore freeing up space. HSI is most suitable when those directory trees do not contain too many small files to start with, or when you already have a series of tarballs. &lt;br /&gt;
&lt;br /&gt;
* '''HTAR''': similarly, htar is sort of a &amp;quot;super-tar&amp;quot; application, also specially designed by [http://www.mgleicher.us/GEL/htar/ Gleicher Enterprises] to interact with HPSS, allowing users to build and automatically transfer tarballs to HPSS on the fly. HTAR is most suitable to aggregate whole directory trees. When HTAR creates the TAR file, it also builds an index file, with a &amp;quot;.idx&amp;quot; suffix added, which is stored in the same directory as the TAR file.&lt;br /&gt;
&lt;br /&gt;
=== '''General guide lines''' ===&lt;br /&gt;
* IN/OUT transfers to HPSS using HSI is bound to maximum of about '''4 files/second'''. Therefore do not attempt to transfer directories with too many small files inside. Not only it will take &amp;quot;forever&amp;quot;, it will induce a lot of ware and tear on the library's robot mechanism as well as the tapes themselves in case of recalls. Instead use HTAR, so the files are aggregated while being sent to HPSS&lt;br /&gt;
* The maximum size that an individual file can have inside an HTAR is '''68GB'''. Please be sure to identify and &amp;quot;fish out&amp;quot; those files that are larger than 68GB from the directories and transfer them with  HSI&lt;br /&gt;
* The maximum size of a tar file that HPSS will take is '''1TB'''. Please do not generate tarballs that large.&lt;br /&gt;
* The maximum number of files in a htar is '''1 million'''. Please, break-up your htar segments as required.&lt;br /&gt;
* These guide lines may be strict, but for as long as they are followed the system will perform reasonably well&lt;br /&gt;
&lt;br /&gt;
=== '''Performance considerations''' ===&lt;br /&gt;
* Files are kept on disk-cache for as long as possible, so as to avoid tape operations during recalls.&lt;br /&gt;
* Average transfer rates with '''HSI'''&lt;br /&gt;
  No small files, average &amp;gt; 1MB/file: &lt;br /&gt;
  * write: 100-130MB/s&lt;br /&gt;
  * read:  450-600MB/s (IF no recall from tapes required)&lt;br /&gt;
  &lt;br /&gt;
* &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;'''NOTE: do not use HSI with small files (&amp;lt; 1MB/file). It would take over 1 week to transfer 1 TB. If we find that you are abusing the system we'll suspend your privileges'''&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Average transfer rates with '''HTAR''' &lt;br /&gt;
  Average files size &amp;gt; 1MB&lt;br /&gt;
  * write: 120MB/s&lt;br /&gt;
  * read:  480MB/s (IF no recall from tapes required) &lt;br /&gt;
&lt;br /&gt;
  Not too many small files, average &amp;gt; 100KB/file: &lt;br /&gt;
  * write: 64MB/s&lt;br /&gt;
  * read:  170MB/s (IF no recall from tapes required)&lt;br /&gt;
&lt;br /&gt;
* Average transfer rates from '''tapes''', if stage is required (add to the above estimates)&lt;br /&gt;
  * read: 80MB/s per tape drive. &lt;br /&gt;
  * maximum of 4 drives may be used per hsi/htar session&lt;br /&gt;
&lt;br /&gt;
=== '''Quick Reference''' ===&lt;br /&gt;
&lt;br /&gt;
* To use HSI or HTAR and access HPSS please login to the '''gpc-archive01''' node.&lt;br /&gt;
* AUTHENTICATION: done automatically&lt;br /&gt;
* To browse the contents of your HPSS archive just type '''hsi''' on a shell to get the hsi prompt. Then use simple commands such as '''ls''', '''pwd''', '''cd''' to navigate your way around. You may also use [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''help'''] from the hsi prompt.&lt;br /&gt;
* Files are organized inside HPSS in the same fashion as in /project. Users in the same group have read permissions to each other's archives.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[HSI]/archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* There is also provision to have files migrated to a &amp;quot;mirrored set of tapes&amp;quot; Class of Service (COS). You'll have request and justify the need. The default is a &amp;quot;single tape&amp;quot; COS. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[HSI]/archive-dual-copy/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* '''Pilot users:''' &amp;lt;span style=&amp;quot;color:#CC0000&amp;quot;&amp;gt;DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== '''Using HSI''' ===&lt;br /&gt;
&lt;br /&gt;
* Interactively put a subdirectory subdirb and all its contents recursively. You may use '-u' option to resume a previously disrupted session (as rsync would do).&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] prompt&lt;br /&gt;
    [HSI] mput -R -u subdirb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Interactively descend into the &amp;quot;Source&amp;quot; directory and move all files which end in &amp;quot;.h&amp;quot; into a sibling directory (ie, a directory at the same level in the tree as &amp;quot;Source&amp;quot;) named &amp;quot;Include&amp;quot;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] cd Source&lt;br /&gt;
    [HSI] mv *.h ../Include&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Delete all files beginning with &amp;quot;m&amp;quot; and ending with 9101 (note that this is an interactive request, not a one-liner request, so the wildcard path does not need quotes to preserve it):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] delete m*9101&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Interactively delete all files beginning with H and ending with a digit, and ask for verification before deleting each such file.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi &amp;lt;RETURN&amp;gt;&lt;br /&gt;
    [HSI] mdel H*[0-9]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Save your local files that begin with the letter &amp;quot;c&amp;quot; (let the UN*X shell resolve the wild-card path pattern in terms of your local files by not enclosing it in quotes:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put c*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Get all files in the subdirectory subdira which begin with the letters &amp;quot;b&amp;quot; or &amp;quot;c&amp;quot; (surrounding the wildcard path in single quotes prevents shells on UNIX systems from processing the wild card pattern):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get ’subdira/[bc]*’&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Save a &amp;quot;tar file&amp;quot; of C source programs and header files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   tar cf - *.[ch] | hsi put - : source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note: the &amp;quot;:&amp;quot; operator which separates the local and HPSS pathnames must be surrounded by whitespace (one or more space characters)&lt;br /&gt;
&lt;br /&gt;
* Restore the tar file source kept above and extract all files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi get - : source.tar | tar xf -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The commands below are equivalent:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar&lt;br /&gt;
    hsi put source.tar : /archive/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Using the copy to mirror tapes provision (you'll need authorization and a directory placement)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    hsi put source.tar : /archive-dual-copy/&amp;lt;group&amp;gt;/&amp;lt;user&amp;gt;/source.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* For more details please check the [http://www.mgleicher.us/GEL/hsi/ HSI Introduction] or the [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page] online&lt;br /&gt;
&lt;br /&gt;
=== '''Using HTAR''' ===&lt;br /&gt;
&lt;br /&gt;
* To write the file1 and file2 files to a new archive called &amp;quot;files.tar&amp;quot; in the current HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*  To write the file1 and file2 files to a new archive called &amp;quot;files.tar&amp;quot; on a remote FTP server called &amp;quot;blue.pacific.llnl.gov&amp;quot;, creating the tar file in the user’s remote FTP home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -cf files.tar -F blue.pacific.llnl.gov file1 file2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To extract all files from the project1/src directory in the Archive file called proj1.tar, and use the time of extraction as the modification time, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -xm -f proj1.tar project1/src&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* To display the names of the files in the out.tar archive file within the HPSS home directory, enter:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    htar -vtf out.tar&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
For more details please check the [http://www.mgleicher.us/GEL/htar/ HTAR - Introduction] or the [http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page] online&lt;br /&gt;
&lt;br /&gt;
=== '''Using the batch queue''' ===&lt;br /&gt;
* gpc-archive01 is part of the gpc queuing system under torque/moab&lt;br /&gt;
* Currently it is setup to share the node with up to 12 jobs at one time&lt;br /&gt;
* default parameters ( -l nodes=1:ppn=1,walltime=48:00:00)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub -I -q archive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample data import&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N ingest&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o logs/&lt;br /&gt;
&lt;br /&gt;
# module load dcap&lt;br /&gt;
# module load &lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
mkdir HItest&lt;br /&gt;
cd HItest&lt;br /&gt;
put  /scratch/knecht/workarea/SwapTests/A/DBRelease-13.2.1.tar.gz         : DBRelease-13.2.1.tar.gz&lt;br /&gt;
put  /scratch/knecht/workarea/SwapTests/A/RDO.215791._003065.pool.root.1  : RDO.215791._003065.pool.root.1&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample data list&lt;br /&gt;
   - Very painful without interactive browsing&lt;br /&gt;
       -Tentative solution: dump all user files to log file and use that as file index&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N hpss_dump&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
echo ===========&lt;br /&gt;
echo&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt;EOF&lt;br /&gt;
ls -lUR&lt;br /&gt;
EOF&lt;br /&gt;
echo&lt;br /&gt;
echo ===========&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample stage-in&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
#PBS -q archive&lt;br /&gt;
#PBS -N stage&lt;br /&gt;
#PBS -j oe&lt;br /&gt;
#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID&lt;br /&gt;
&lt;br /&gt;
# module load dcap&lt;br /&gt;
# module load &lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
mkdir -p /scratch/knecht/workarea/SwapTests/stagetest&lt;br /&gt;
&lt;br /&gt;
/usr/local/bin/hsi  -v &amp;lt;&amp;lt; EOF&lt;br /&gt;
get HItest/RDO.215791._003065.pool.root.1  :   /scratch/knecht/workarea/SwapTests/stagetest/RDO.215791._003065.pool.root.1 &lt;br /&gt;
get HItest/DBRelease-13.2.1.tar.gz         : /scratch/knecht/workarea/SwapTests/stagetest/DBRelease-13.2.1.tar.gz&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
date&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* sample analyis&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gpc04#&amp;gt; qsub $(qsub stage.job | awk -F '.' '{print &amp;quot;-W depend=afterok:&amp;quot;$1}') hiReco.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Software_and_Libraries&amp;diff=1208</id>
		<title>Software and Libraries</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Software_and_Libraries&amp;diff=1208"/>
		<updated>2010-06-10T21:40:58Z</updated>

		<summary type="html">&lt;p&gt;Knecht: /* GPC Software */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;All the software listed on this page is accessed using a [[GPC_Quickstart#Environment_Variables | modules]] system.  This means that much of the software is not &lt;br /&gt;
accessible by default but has to be loaded using the module command. The&lt;br /&gt;
reason is that&lt;br /&gt;
* it allows us to easily keep multiple versions of software for different users on the system;&lt;br /&gt;
* it allows users to easily switch between versions.&lt;br /&gt;
The module system works similarly on the GPC and the TCS.&lt;br /&gt;
&lt;br /&gt;
To use particular software, just load the module (the last column in the following table) as follows.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load [module-name]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
It is recommended to load frequently used modules in the file .bashrc in your home directory.&lt;br /&gt;
&lt;br /&gt;
Modules that load libraries, define environment variables pointing to the location of library files and&lt;br /&gt;
include files for use Makefiles. These environment variables follow the naming convention&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
SCINET_[module-name]_{LIB,INC,BASE}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
So to compile and link the library, you will have to add &amp;lt;tt&amp;gt;-I$SCINET_[module-name]_INC&amp;lt;/tt&amp;gt; and &amp;lt;tt&amp;gt;-L$SCINET_[module-name]_LIB&amp;lt;/tt&amp;gt;, respectively, in addition to the usual &amp;lt;tt&amp;gt;-l[libname]&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
For a full list of available software packages that may be accessed with the modules system, use the command&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module avail&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To see the list of available versions of a specific software package, use the command&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module avail [module-name]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For a list of the currently loaded modules in your shell, use the command&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module list&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For a description of a particular module, use the command&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module help [module-name]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Other software and libraries ==&lt;br /&gt;
&lt;br /&gt;
If you want to use a piece of software or a library that is not on the list, you can in principle install it yourself in you &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; directory.&lt;br /&gt;
Note however that building libraries and software from source often uses a lot of files. To avoid running out of disk space, building software is therefore best done from the &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt;, from which&lt;br /&gt;
you can copy/install only the libraries, header files and binaries to your &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; directory.&lt;br /&gt;
&lt;br /&gt;
If you suspect that a particular piece of software or a library would be of use to other users of SciNet as well, contact us, and we will consider adding it to the system.&lt;br /&gt;
&lt;br /&gt;
== GPC Software ==&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
! Software  &lt;br /&gt;
! Version&lt;br /&gt;
! Comments&lt;br /&gt;
! Command/Library&lt;br /&gt;
! Module Name&lt;br /&gt;
|-  style='background: lightgray'&lt;br /&gt;
|Intel Compiler&lt;br /&gt;
| 11.1 &lt;br /&gt;
|&lt;br /&gt;
| &amp;lt;tt&amp;gt;icpc,icc,ifort, includes MKL library&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;intel&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-  style='background: lightgray'&lt;br /&gt;
| GCC Compiler&lt;br /&gt;
| 4.4.0&lt;br /&gt;
|&lt;br /&gt;
| &amp;lt;tt&amp;gt;gcc,g++,gfortran&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gcc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-  style='background: lightgray'&lt;br /&gt;
| IntelMPI&lt;br /&gt;
| 3.2.2&lt;br /&gt;
|&lt;br /&gt;
| &amp;lt;tt&amp;gt;mpicc,mpiCC,mpif77,mpif90&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;intelmpi&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- style='background: lightgray'&lt;br /&gt;
| OpenMPI&lt;br /&gt;
| 1.4.1&lt;br /&gt;
|&lt;br /&gt;
| &amp;lt;tt&amp;gt;mpicc,mpiCC,mpif77,mpif90&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;openmpi&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- style='background: white'&lt;br /&gt;
| Emacs&lt;br /&gt;
| 23.1&lt;br /&gt;
| New version of popular text editor&lt;br /&gt;
| &amp;lt;tt&amp;gt;emacs&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;emacs&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- style='background: white'&lt;br /&gt;
| Git&lt;br /&gt;
| 1.6.3&lt;br /&gt;
| Revision control system&lt;br /&gt;
| &amp;lt;tt&amp;gt;git,gitk&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;git&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- style='background: white'&lt;br /&gt;
| Valgrind&lt;br /&gt;
| 3.4.1&lt;br /&gt;
| Memory checking utility&lt;br /&gt;
| &amp;lt;tt&amp;gt;valgrind,cachegrind&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;valgrind&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- style='background: lightgray'&lt;br /&gt;
| grace&lt;br /&gt;
| 5.22.1&lt;br /&gt;
| Plotting utility&lt;br /&gt;
| &amp;lt;tt&amp;gt;xmgrace&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;graphics&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- style='background: lightgray'&lt;br /&gt;
| Gnuplot&lt;br /&gt;
| 4.2.6&lt;br /&gt;
| Plotting utility&lt;br /&gt;
| &amp;lt;tt&amp;gt;gnuplot&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;graphics&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- style='background: lightgray'&lt;br /&gt;
| VMD&lt;br /&gt;
| 1.8.6&lt;br /&gt;
| Visualization and analysis utility&lt;br /&gt;
| &amp;lt;tt&amp;gt;vmd&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;vmd&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- style='background: lightgray'&lt;br /&gt;
| ferret&lt;br /&gt;
| 6.4&lt;br /&gt;
| Plotting utility&lt;br /&gt;
| &amp;lt;tt&amp;gt;ferret&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;ferret&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- style='background: lightgray'&lt;br /&gt;
| ncl/ncarg&lt;br /&gt;
| 5.1.1&lt;br /&gt;
| NCARG graphics and ncl utilities&lt;br /&gt;
| &amp;lt;tt&amp;gt;ncl&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;ncl&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- style='background: lightgray'&lt;br /&gt;
| ROOT&lt;br /&gt;
| 5.26.00&lt;br /&gt;
| ROOT Analysis Framework from CERN&lt;br /&gt;
| &amp;lt;tt&amp;gt;root&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;ROOT&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- style='background: lightgray'&lt;br /&gt;
| ParaView&lt;br /&gt;
| 3.6.1&lt;br /&gt;
| Scientific visualization, server only&lt;br /&gt;
| &amp;lt;tt&amp;gt;pvserver,pvbatch,pvpython&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;visualization&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- style='background: white'&lt;br /&gt;
| NetCDF&lt;br /&gt;
| 4.0.1&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;ncdump,ncgen,libnetcdf&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;netcdf&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- style='background: white'&lt;br /&gt;
| ncview&lt;br /&gt;
| 1.93g&lt;br /&gt;
| Visualization for NetCDF files&lt;br /&gt;
| &amp;lt;tt&amp;gt;ncview&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;graphics/ncview&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- style='background: white'&lt;br /&gt;
| NCO&lt;br /&gt;
| 3.9.9&lt;br /&gt;
| NCO utilities to manipulate netCDF files&lt;br /&gt;
| &amp;lt;tt&amp;gt;ncap, ncap2, ncatted, etc.&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;nco&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- style='background: white'&lt;br /&gt;
| udunits&lt;br /&gt;
| 2.1.11&lt;br /&gt;
| unit conversion utilities&lt;br /&gt;
| &amp;lt;tt&amp;gt;libudunits2&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;udunits&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- style='background: white'&lt;br /&gt;
| HDF4&lt;br /&gt;
| 4.2r4&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;h4fc,hdiff,...,libdf,libsz&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hdf4&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- style='background: white'&lt;br /&gt;
| HDF5&lt;br /&gt;
| 1.8.4-v18&lt;br /&gt;
| Scientific data storage and retrieval, parallel I/O&lt;br /&gt;
| &amp;lt;tt&amp;gt;h5ls, h5diff, ..., libhdf5&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hdf5&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- style='background: lightgray'&lt;br /&gt;
| [[gamess|GAMESS (US)]]&lt;br /&gt;
| January 12, 2009 R3&lt;br /&gt;
| General Atomic and Molecular Electronic Structure System&lt;br /&gt;
| &amp;lt;tt&amp;gt;rungms&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gamess&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- style='background: lightgray'&lt;br /&gt;
| [http://blast.ncbi.nlm.nih.gov BLAST]&lt;br /&gt;
| 2.2.23+&lt;br /&gt;
| Basic Local Alignment Search Tool&lt;br /&gt;
| &amp;lt;tt&amp;gt;blastn,blastp,blastx,psiblast,tblastn...&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;blast&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- style='background: lightgray'&lt;br /&gt;
| [[amber|AMBER 10]]&lt;br /&gt;
| Amber 10 + Amber Tools 1.3&lt;br /&gt;
| Amber Molecular Dynamics Package&lt;br /&gt;
| &amp;lt;tt&amp;gt;sander, sander.MPI&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;amber10&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- style='background: lightgray'&lt;br /&gt;
| [http://www.gdal.org/ GDAL]&lt;br /&gt;
| 1.7.1&lt;br /&gt;
| Geospatial Data Abstraction Library&lt;br /&gt;
| &amp;lt;tt&amp;gt;gdal_contour,gdal_rasterize,gdal_grid, libgdal&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gdal&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- style='background: lightgray'&lt;br /&gt;
| [http://www.mcs.anl.gov/petsc/petsc-as/  PETSc ]&lt;br /&gt;
| &lt;br /&gt;
| Portable, Extensible Tolkit for Scientific Computation (PETSc)&lt;br /&gt;
| &amp;lt;tt&amp;gt;libpetsc, etc.. &amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;petsc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- style='background: lightgray'&lt;br /&gt;
| Matlab/IDL/Commercial software&lt;br /&gt;
|&lt;br /&gt;
| Little to none.   See [[FAQ#How_can_I_run_Matlab_.2F_IDL_.2F_my_favourite_commercial_software_on_SciNet.3F | the FAQ]]&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|- style='background: white'&lt;br /&gt;
| gsl&lt;br /&gt;
| 1.13&lt;br /&gt;
| GNU Scientific Library&lt;br /&gt;
| &amp;lt;tt&amp;gt;libgsl, libgslcblas&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gsl&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- style='background: white'&lt;br /&gt;
| fftw&lt;br /&gt;
| 3.2.2&lt;br /&gt;
| fftw fast Fourier transform library&lt;br /&gt;
| &amp;lt;tt&amp;gt;libfftw3&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;fftw&amp;lt;/tt&amp;gt;&lt;br /&gt;
|- style='background: white'&lt;br /&gt;
| extras&lt;br /&gt;
|  &lt;br /&gt;
| Full set of X11 libraries and others not installed on compute nodes&lt;br /&gt;
| &amp;lt;tt&amp;gt;bc, dmidecode, gv, iostat, lsof, tkdiff, zip, libXaw,...,libjpeg&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;extras&amp;lt;/tt&amp;gt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== TCS Software ==&lt;br /&gt;
The software listed below uses the same module system as described [[ #GPC_Software | above for the GPC]].&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
! Software  &lt;br /&gt;
! Version&lt;br /&gt;
! Comments&lt;br /&gt;
! Command/Library&lt;br /&gt;
! Module Name&lt;br /&gt;
|-&lt;br /&gt;
| antlr&lt;br /&gt;
| 2.7.7&lt;br /&gt;
| ANother Tool for Language Recognition&lt;br /&gt;
| &amp;lt;tt&amp;gt;antlr, antlr-config&amp;lt;br&amp;gt;libantlr, antlr.jar, antlr.py&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;antlr&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| gsl&lt;br /&gt;
| 1.13&lt;br /&gt;
| GNU Scientific Library&lt;br /&gt;
| &amp;lt;tt&amp;gt;libgsl, libgslcblas&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;gsl&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| HDF4&lt;br /&gt;
| 4.2.5&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;h4fc, hdiff, ..., libdf, libsz&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;hdf4&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| mpe 2&lt;br /&gt;
| 1.0.6&lt;br /&gt;
| Performance Visualization for Parallel Programs   &lt;br /&gt;
| &amp;lt;tt&amp;gt;libmpe&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;mpe&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| NetCDF + ncview&lt;br /&gt;
| 4.0.1&lt;br /&gt;
| Scientific data storage and retrieval&lt;br /&gt;
| &amp;lt;tt&amp;gt;ncdump, ncgen, libnetcdf, ncview&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;netcdf&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| NCL&lt;br /&gt;
| 5.1.1&lt;br /&gt;
| NCAR Command Language&lt;br /&gt;
| &amp;lt;tt&amp;gt;ncl, libncl, ...&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;ncl&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| NCO&lt;br /&gt;
| 3.9.6&lt;br /&gt;
| NCO utilities to manipulate netCDF files&lt;br /&gt;
| &amp;lt;tt&amp;gt;ncap, ncap2, ncatted, &amp;lt;/tt&amp;gt; etc.&lt;br /&gt;
| &amp;lt;tt&amp;gt;nco&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| parallel netCDF&lt;br /&gt;
| 1.1.1&lt;br /&gt;
| Scientific data storage and retrieval using MPI-IO&lt;br /&gt;
| &amp;lt;tt&amp;gt;libpnetcdf.a&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;parallel-netcdf&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| scalasca&lt;br /&gt;
| 1.2&lt;br /&gt;
| SCalable performance Analysis of LArge SCale Applications&lt;br /&gt;
| &amp;lt;tt&amp;gt;scalasca, ...&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;scalasca&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| upc&lt;br /&gt;
| 1.2&lt;br /&gt;
| Unified Parallel C&lt;br /&gt;
| &amp;lt;tt&amp;gt;xlupc&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;upc&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| extras&lt;br /&gt;
|&lt;br /&gt;
| Adds paths to a fuller set of libraries to your user environment&amp;lt;br&amp;gt; compile with &amp;lt;tt&amp;gt;-I$SCINET_EXTRAS_INC&amp;lt;/tt&amp;gt;&amp;lt;br&amp;gt; link with &amp;lt;tt&amp;gt;-L$SCINET_EXTRAS_LIB&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;libfftw, libfftw_mpi, libfftw3, libhdf5, liblapack, ...&amp;lt;/tt&amp;gt;&lt;br /&gt;
| &amp;lt;tt&amp;gt;extras&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-|}&lt;/div&gt;</summary>
		<author><name>Knecht</name></author>
	</entry>
</feed>