Difference between revisions of "MARS"

Latest revision as of 19:37, 31 August 2018

WARNING: SciNet is in the process of replacing this wiki with a new documentation site. For current information, please go to https://docs.scinet.utoronto.ca

MARS is not more. Follow this link

Difference between revisions of "MARS"

Latest revision as of 19:37, 31 August 2018

Navigation menu

Search

@@ Line 1: / Line 1: @@
-== '''Massive Archive and Retrieval System''' ==
+{| style="border-spacing: 8px; width:100%"
+| valign="top" style="cellpadding:1em; padding:1em; border:2px solid; background-color:#f6f674; border-radius:5px"|
-(Pilot usage phase to start in Jun/2011 with a select group of users. Deployment and configuration are still a work in progress)
+'''WARNING: SciNet is in the process of replacing this wiki with a new documentation site. For current information, please go to [https://docs.scinet.utoronto.ca https://docs.scinet.utoronto.ca]'''
-The SciNet Massive Archive and Retrieval System (MARS) is a tape backed hierarchical storage management system that will provide a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data will be returned to the active GPFS filesystem when it is needed.
-Migration of data into and out of the repository will be under the control of the user who will interact with system using one or both of the
-following utilities:
-* HSI is a client with an ftp-like interface will be used to archive and retrieve large files. It is also useful for browsing the contents of the repository.
-* HTAR is a utility that creates tar format archives resident in the archive. It also creates a separate index file that can be accessed quickly.
-User access will be controlled by the job scheduling system of the GPC. An interactive session can be requested that will allow a user to list, rearrange or remove files with the HSI client. Transfer of data into or out of the archive is expected to be scripted and submitted as a batch job.
-== Guidelines ==
-* HPSS storage space is provided on tape -- a media that is not suited for storing small files. Files smaller than ~100MB should be grouped into archive files with tar or htar.
-* The maximum size of a file that can be transferred into the archive is 1TB. However, optimal performance is obtained with file sizes <= 100 GB.
-* Make sure to check the application return code and check the  log file for errors after all data transfers.
-* '''Pilot users:''' <span style="color:#CC0000">DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project</span>
-== Access Through the Queue System  ==
-All access to the archive system is through the queue system.
-=== Scripted File Transfers ===
-File transfers in and out of the archive should be scripted into jobs and submitted to the archive queue.
-<pre>
-#!/bin/env bash
-#PBS -q archive
-#PBS -N hsi_file_transfer
-#PBS -j oe
-#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID
-/usr/local/bin/hsi  -v <<EOF
-cput -p /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz
-EOF
-if [ ! $? == 0 ];then
-</pre>
-=== Staging Data for Analysis ===
-Job dependencies can be used to make analysis jobs wait for data staging before starting. The qsub flag is
-<pre>
--W depend=afterok:<JOBID>
-</pre>
-where JOBID is the job number of the staging job that must finish successfully before the analysis job can start.
-Here is a short cut for generating the dependency:
-<pre>
-gpc04 $ qsub $(qsub data-restore.sh | awk -F '.' '{print "-W depend=afterok:"$1}') job-to-work-on-restored-data.sh
-</pre>
-== '''Using HSI''' ==
-HSI is the primary client with which a user will interact with the archive system. It provides an ftp-like interface for archiving and retrieving files. In addition it provides a number of shell-like commands that are useful for examining and manipulating the contents of the archive. The most commonly used commands will be:
-{|border="1" cellpadding="10" cellspacing="0"
-|-
-  | cput
-  | Conditionally stores a file only if the HPSS file does not exist
-|-
-  | cget
-  | Conditionally retrieves a copy of a file from HPSS to your local file space on the host system only if a local copy does not already exist.
-|-
-  | cd,mkdir,ls,rm,mv
-  | Operate as one would expect on the contents of the archive.
-|-
-  | lcd,lls
-  | ''Local'' commands.
 |}
-Simple commands can be executed on a single line.
+MARS is not more. [https://support.scinet.utoronto.ca/wiki/index.php/HPSS Follow this link]
-<pre>
-   hsi "mkdir examples; cd examples; cput example_data.tgz
-</pre>
-More complex operations can be performed using a Here Document.
-<pre>
-hsi <<-EOF
-  mkdir -p examples/201106
-  cd examples
-  mv example_data.tgz 201106/
-  lcd /scratch/$USER/examples/
-  cput -R -u *
-EOF
-</pre>
-=== HSI vs. FTP ===
-HSI syntax and usage is very similar to that of FTP. Please note the following information adapted from the HSI man page:
-HSI supports several of the commonly used FTP commands, including "dir","get","ls","mdelete","mget","put","mput" and "prompt", with the following differences:
-* The "dir" command is an alias for "ls" in HSI. The "ls" command supports an extensive set of options for displaying files, including wildcard pattern-matching, and the ability to recursively list a directory tree
-* The "put" and "get" family of commands support recursion
-* There are "conditional put" and "conditional" get commands (cput, cget)
-* The syntax for renaming local files when storing files to HPSS or retrieving files from HPSS is different than FTP. With HSI, the syntax is always
-<pre>
-     "local_file : HPSS_file"
-</pre>
-and multiple such pairs may be specified on a single command line.
-With FTP, the local filename is specified first on a "put" command, and second on a "get" command.
-For example, when using HSI to store the local file "file1" as HPSS file "hpss_file1", then retrieve it back to the local filesystem as "file1.bak", the following commands could be used:
-<pre>
-    put file1 : hpss_file1
-    get file1.bak : hpss_file1
-</pre>
-* With FTP, the following commands could be used:
-<pre>
-    put file1 hpss_file1
-    get hpss_file1 file1.bak
-</pre>
-* The "m" prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The "m" series of commands are intended to provide a measure of compatibility for FTP users.
-=== HSI Documentation ===
-* [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html man hsi]
-* [http://www.mgleicher.us/GEL/hsi/hsi-exit-codes.html exit codes]
-=== Other Examples ===
-* Creating tar archive on the fly by piping stdout:
-<pre>
-   tar cf - *.[ch] | hsi put - : source.tar
-</pre>
-Note: the ":" operator which separates the local and HPSS pathnames must be surrounded by whitespace (one or more space characters)
-* Restore the tar file source kept above and extract all files:
-<pre>
-    hsi get - : source.tar | tar xf -
-</pre>
-* The commands below are equivalent (the default HSI directory placement is /archive/<group>/<user>/):
-<pre>
-    hsi put source.tar
-    hsi put source.tar : /archive/<group>/<user>/source.tar
-</pre>
-* For more details please check the '''[http://www.mgleicher.us/GEL/hsi/ HSI Introduction]''', the '''[http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page]''' online or the or the [https://support.scinet.utoronto.ca/wiki/index.php/HSI_help '''hsi help'''] from the hsi prompt.
-== '''Using HTAR''' ==
-* To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' in the default HPSS home directory, enter:
-<pre>
-    htar -cf files.tar file1 file2
-OR
-    htar -cf /archive/<group>/<user>/files.tar file1 file2
-</pre>
-* To write a ''subdirA'' to a new archive called ''subdirA.tar'' in the default HPSS home directory, enter:
-<pre>
-    htar -cf subdirA.tar subdirA
-</pre>
-*  To write the ''file1'' and ''file2'' files to a new archive called ''files.tar'' on a remote FTP server called "blue.pacific.llnl.gov", creating the tar file in the user’s remote FTP home directory, enter (bonus HTAR functionality to sites outside SciNet):
-<pre>
-    htar -cf files.tar -F blue.pacific.llnl.gov file1 file2
-</pre>
-* To extract all files from the ''project1/src'' directory in the Archive file called ''proj1.tar'', and use the time of extraction as the modification time, enter:
-<pre>
-    htar -xm -f proj1.tar project1/src
-</pre>
-* To display the names of the files in the ''out.tar'' archive file within the HPSS home directory, enter (the out.tar.idx file will be queried):
-<pre>
-    htar -vtf out.tar
-</pre>
-For more details please check the '''[http://www.mgleicher.us/GEL/htar/ HTAR - Introduction]''' or the '''[http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page]''' online
-== '''More detailed examples''' ==
-[[GPC_Quickstart|Submitting_A_Batch_Job]]
-* gpc-archive01 is part of the gpc queuing system under torque/moab
-* Currently it is setup to share the node with up to 12 jobs at one time
-* default parameters ( -l nodes=1:ppn=1,walltime=48:00:00)
-<pre>
-showq -w class=archive
-qsub -I -q archive
-</pre>
-* sample '''data offload'''
-<pre>
-#!/bin/bash
-# This script is named: data-offload.sh
-#PBS -q archive
-#PBS -N offload
-#PBS -j oe
-#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID
-date
-# individual tarballs already exist
-/usr/local/bin/hsi  -v <<EOF
-mkdir put-away-and-forget
-cd put-away-and-forget
-put /scratch/$USER/workarea/finished-job1.tar.gz : finished-job1.tar.gz
-put /scratch/$USER/workarea/finished-job2.tar.gz : finished-job2.tar.gz
-EOF
-# create a tarball on-the-fly of the finished-job3 directory
-/usr/local/bin/htar -cf finished-job3.tar /scratch/$USER/workarea/finished-job3/
-date
-</pre>
-* sample '''data list'''
-   - Very painful without interactive browsing
-       -Tentative solution: dump all user files to log file and use that as file index
-<pre>
-#!/bin/bash
-# This script is named: data-list.sh
-#PBS -q archive
-#PBS -N hpss_dump
-#PBS -j oe
-#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID
-date
-echo ===========
-echo
-/usr/local/bin/hsi  -v <<EOF
-ls -lUR
-EOF
-echo
-echo ===========
-date
-</pre>
-* sample '''data restore'''
-<pre>
-#!/bin/bash
-# This script is named: data-restore.sh
-#PBS -q archive
-#PBS -N restore
-#PBS -j oe
-#PBS -o hpsslogs/$PBS_JOBNAME.$PBS_JOBID
-date
-mkdir -p /scratch/$USER/restored-from-MARS
-/usr/local/bin/hsi  -v << EOF
-get /scratch/$USER/restored-from-MARS/Jan-2010-jobs.tar.gz : forgotten-from-2010/Jan-2010-jobs.tar.gz
-get /scratch/$USER/restored-from-MARS/Feb-2010-jobs.tar.gz : forgotten-from-2010/Feb-2010-jobs.tar.gz
-EOF
-cd /scratch/$USER/restored-from-MARS
-/usr/local/bin/htar -xf finished-job3.tar
-date
-</pre>