Difference between revisions of "MARS"

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search
m
 
(157 intermediate revisions by 2 users not shown)
Line 1: Line 1:
=== '''Massive Archive and Restore System''' ===
+
{| style="border-spacing: 8px; width:100%"
 +
| valign="top" style="cellpadding:1em; padding:1em; border:2px solid; background-color:#f6f674; border-radius:5px"|
 +
'''WARNING: SciNet is in the process of replacing this wiki with a new documentation site. For current information, please go to [https://docs.scinet.utoronto.ca https://docs.scinet.utoronto.ca]'''
 +
|}
  
(Pilot usage phase to start in May/2011 with a select group of users. Deployment and configuration are still a work in progress)
+
MARS is not more. [https://support.scinet.utoronto.ca/wiki/index.php/HPSS Follow this link]
 
 
The '''MARS''' deployment at SciNet is an effort to offer a more efficient way to offoad/archive data from the most active file systems (scratch and project) than our current TSM-HSM solution, still without having to deal directly with the tape library or "tape commands"
 
 
 
The system is a combination of the underlaying hardware infrastructure, 3 software components, HPSS, HSI and HTAR, plus some environment customization.
 
 
 
* '''HPSS''': the main component, best described as a very scalable "blackbox" engine running in the background to support the Archive and Restore operations. [http://www.hpss-collaboration.org/index.shtml High Performance Storage System - HPSS] is the result of over a decade of collaboration among five Department of Energy laboratories and IBM, with significant contributions by universities and other laboratories worldwide. For now the best way for SciNet users to [https://support.scinet.utoronto.ca/wiki/index.php/HPSS_compared_to_HSM-TSM understand HPSS] may be to compare it with our existing HSM-TSM implementation.
 
 
 
* '''HSI''': it may be best understood as a supercharged ftp interface, specially designed by [http://www.mgleicher.us/GEL/hsi/ Gleicher Enterprises] to act as a front-end for HPSS, gathering some of the best features you would encounter on a shell, rsync and GridFTP (and a few more). It enables users to transfer whole directory trees from /project and /scratch, therefore freeing up space. HSI is most suitable when those directory trees do not contain too many small files to start with, or when you already have a series of tarballs.
 
 
 
* '''HTAR''': similarly, htar is sort of a "super-tar" application, also specially designed by [http://www.mgleicher.us/GEL/htar/ Gleicher Enterprises] to interact with HPSS, allowing users to build and automatically transfer tarballs to HPSS on the fly. HTAR is most suitable to aggregate whole directory trees. When HTAR creates the TAR file, it also builds an index file, with a ".idx" suffix added, which is stored in the same directory as the TAR file.
 
 
 
=== '''Quick Reference''' ===
 
 
 
* To use HSI or HTAR and access HPSS please login to the '''archive01''' node.
 
* AUTHENTICATION: done automatically by the '''keytab''' file inside the '''.private''' directory in you home account. Please to not edit/delete it.
 
* Files are organized inside HPSS in the same fashion as in /project. Users in the same group have read permissions to each other's archives.
 
<pre>
 
/archive/<group>/<user>
 
</pre>
 
* There is also provision to have files migrated to a "mirrored set of tapes" Class of Service (COS). You'll have request and justify the need. The default is a "single tape" COS.
 
<pre>
 
/archive-dual-copy/<group>/<user>
 
</pre>
 
* To view the contents of your HPSS archive just type '''hsi''' on a shell to get and hsi prompt. Then use simple commands such as '''ls''', '''pwd''', '''cd''' to navigate your way around. You may also use '''help'''.
 
* '''Pilot users:''' <span style="color:#CC0000">DURING THE TESTING PHASE DO NOT DELETE THE ORIGINAL FILES FROM /scratch OR /project</span>
 
 
 
=== '''Using HSI''' ===
 
<pre>
 
# Save a "tar file" of C source programs and header files:
 
    tar cf - *.[ch] | hsi put - : source.tar
 
 
 
Note: the ":" operator which separates the local and HPSS pathnames must be surrounded by whitespace (one or more space characters)
 
 
 
# Restore the tar file source kept above and extract all files:
 
    hsi get - : source.tar | tar xf -
 
 
 
# Get all files in the subdirectory subdira which begin with the letters "b" or "c" (surrounding the wildcard path in single quotes prevents shells on UNIX systems from processing the wild card pattern):
 
    hsi get ’subdira/[bc]*’
 
 
 
# Save your local files that begin with the letter "c" (let the UN*X shell resolve the wild-card path pattern in terms of your local files by not enclosing it in quotes:
 
    hsi put c*
 
 
 
# Delete all files beginning with "m" and ending with 9101 (note that this is an interactive request, not a one-liner request, so the wildcard path does not need quotes to preserve it):
 
    hsi <RETURN>
 
    ? delete m*9101
 
 
 
# Interactively delete all files beginning with H and ending with a digit, and ask for verification before deleting each such file.
 
    hsi <RETURN>
 
    ? mdel H*[0-9]
 
 
 
# Interactively descend into the "Source" directory and move all files which end in ".h" into a sibling directory (ie, a directory at the same level in the tree as "Source") named "Include":
 
    hsi <RETURN>
 
    ? cd Source
 
    ? mv *.h ../Include
 
 
 
# Interactively put a subdirectory subdirb and all its contents recursively. You may use '-u' option to resume a previously disrupted session (as rsync would do).
 
    hsi <RETURN>
 
    ? prompt
 
    ? mput -R -u subdirb
 
</pre>
 
* For more details please check the [http://www.mgleicher.us/GEL/hsi/ HSI Introduction] or the [http://www.mgleicher.us/GEL/hsi/hsi_man_page.html HSI Man Page] online
 
 
 
=== '''Using HTAR''' ===
 
<pre>
 
# To write the file1 and file2 files to a new archive called "files.tar" in the current HPSS home directory, enter:
 
 
 
    htar -cf files.tar file1 file2
 
 
 
#. To write the file1 and file2 files to a new archive called "files.tar" on a remote FTP server called "blue.pacific.llnl.gov", creating the tar file in the user’s remote FTP home directory, enter:
 
 
 
    htar -cf files.tar -F blue.pacific.llnl.gov file1 file2
 
 
 
# To extract all files from the project1/src directory in the Archive file called proj1.tar, and use the time of extraction as the modification time, enter:
 
 
 
    htar -xm -f proj1.tar project1/src
 
 
 
# To display the names of the files in the out.tar archive file within the HPSS home directory, enter:
 
 
 
    htar -vtf out.tar
 
</pre>
 
For more details please check the [http://www.mgleicher.us/GEL/htar/ HTAR - Introduction] or the [http://www.mgleicher.us/GEL/htar/htar_man_page.html HTAR Man Page] online
 
 
 
=== '''Performance/Limits considerations''' ===
 
* IN/OUT transfers to HPSS using HSI is bound to maximum of about '''4 files/second'''. Therefore do not attempt to transfer directories with too many (small) files inside. Instead use HTAR, so they are aggregated while being sent to HPSS
 
* The maximum size that an individual file can have inside an HTAR is '''68GB'''. Please be sure to fish out those files that are larger from the directories and transfer them with  HSI
 
* The maximum size of a tar file that HPSS will take is '''1TB'''. Please do not generate tarballs that large.
 
* The maximum number of files in a htar it '''1 million'''. Please, break-up your htar segments as required.
 
* Average transfer rates with '''HSI''' (no small files, average > 1MB/file):
 
  * write: 100-130MB/s
 
  * read:  450-600MB/s (IF no staging from tapes required)
 
* Average transfer rates with '''HTAR''' (not too many small files, average > 100KB/file, aggregation included):
 
  * write: 30-40MB/s
 
  * read:  100-110MB/s (IF no staging from tapes required)
 
* Average transfer rates from '''tapes''', if stage is required (add to the above estimates)
 
  * read: 80MB/s per tape drive.
 
  * maximum of 4 drives may be used per hsi/htar session
 

Latest revision as of 19:37, 31 August 2018

WARNING: SciNet is in the process of replacing this wiki with a new documentation site. For current information, please go to https://docs.scinet.utoronto.ca

MARS is not more. Follow this link