HSM

Hierarchical Storage Management (HSM)

Implementing a hierarchical storage management scheme (HSM) was a project started in July/2010 with a select group of users. HSM is being phased out as an archiving system at SciNet in the 2nd semester of 2011. Users with material in HSM will be given the opportunity and the time to migrate things over to HPSS, if they so wish.

You may read more on HSM basic concepts, definitions, etc, on the Appendix section.

Deployment at SciNet

What we are offering users is a way to offoad/archive data from the most active file systems (scratch and project) without necessarily having to deal directly with the tape library or "tape commands". We devised a 2-step migration process and deployed a 15 TB disk based cache mounted as /repository, accessible from datamover2. HSM is performed by a dedicated IBM software made up of a number of HSM daemons. These daemons constantly monitor the usage of /repository and, depending on a predefined set of policies, data may be automatically or manually migrated to the Tivoli Storage Management server (TSM), and kept on our library of LTO-4 tapes.

On step 1 users may relocate data as required from /scratch or /project to /repository in a number of ways, such as copy, move, tar or rsync. On step 2 /repository is constantly being purged in the background by the HSM daemons. What is left behind is the directory tree with the HSM stub files and the metadata associated with them (about 1-2%). In this scenario, on step 2 users also have the option to manually migrate or recall files between /repository and the "tape system" with simple commands such as 'dsmmigrate' or 'dsmrecall'.

Inside /repository, data is segregated on a per group basis, just as in /project. Within groups, users and group supervisors can structure materials anyway they prefer. But please follow the recommendations.

Please be sure to contact us to schedule your transfers IN or OUT of repository, first to avoid conflict with other users, and also to allow us time to reserve and allocate tapes for your group.

How to migrate/recall data

Automatic

We currently setup the disk cache component of /repository with High and Low thresholds of 10% and 2% respectively. That means, at regular intervals the file system is monitored to determine if the 10% usage mark has been reached or surpassed. In that case, data is automatically migrated to tapes, oldest (or largest) first, and purged from /repository until the file system is down to 2%, if possible (metadata is not migrated). For now at SciNet we migrate every file in /repository to tapes.

To recall a file automatically all you have to do is access it. There are countless combinations of 'cat', 'more', 'vi/vim', 'find', 'grep', 'head', 'tail', etc that could trigger 1,000 of files to be recalled from tape. You may also copy the file (or directory) from /repository to another location. Please be patient: the file will have to be pulled back from tape, and this will take some time, longer if it happens to be at the end of a tape.

Selective (aka manual)

Used to overwrite the internal priority of HSM (oldest/largest) or to migrate files/directories "immediately". If you already know that you relocated material to repository with the intention of having it migrated to tapes, you may as well just use dsmmigrate as soon as the rsync to repository has finished.

Note: files won't be migrated until they have "aged" for at least 5 minutes, that is, after their last access/modification time.

dsmmigrate [path to FILE]
or
dsmmigrate -R -D /repository/[group]/[user]/[directory]
or
dsmmigrate /repository/scinet/pinto/blahblahblah.tar.Z
or
{
 cd /repository/scinet/pinto/
 dsmmigrate blahblahblah.tar.Z
}

where:
    R: recursive
    D: details

To selectively recall data, just type:

dsmrecall [path to FILE]
or
dsmrecall -R -D /repository/[group]/[user]/[directory]
or
dsmrecall /repository/scinet/pinto/blahblahblah.tar.Z
or
{
 cd /repository/scinet/pinto/
 dsmrecall blahblahblah.tar.Z
}

Recommendations

Those involved should spend some time designing the structure inside their area in repository ahead of time, since you may merge data from project and/or scratch (or even home). It's possible to reorganize the FS structure after migration, change the name and ownership of directories and stubs, and still recall files under the new path and ownership. HSM does seem to keep a very symbiotic relation between the metadata and the inode attributes at the file system level, without necessarily having to replicate these changes with tape recall & migration operations. But please don't abuse this flexibility. If possible, keep your initial layout structure somewhat fixed over time.

Data may be copied/moved/rsync'ed/tar'ed in faster than /repository can be purged, so you may observe 80-90% disk usages sporadically. Do not initiate a relocation of more than a 10TB chunk at once, even if /repository is at 1% to start with, so that the system has time to process your data and still allow other user(s) to migrate/recall some material before reaching 100% full.

Users should stage and bundle files in tar-balls of at least 10GB before relocation, and keep a listing of those files somewhere; in fact you may use the 'tar' command to create the tar-ball directly in /repository on-the-fly. See examples below:

tar -czvf /repository/[group]/[user]/myproject1.tar.gz /project/[group]/[user]/project1/ > /project/[group]/[user]/myproject1-repository-listing.txt

or

tar -czvf /repository/[group]/[user]/myscratch1.tar.gz /scratch/[user]/scratchdata1/ > /home/[user]/myscratch1-repository-listing.txt

Keep the listing of the files that are in each tar on a partition other than the HSM repository so that you can quickly decide which tar you need to recall (see the redirection in the above examples). While the tar stub will always exist on the HSM disk, you will not be able to run tar --list on the stub without recalling the full tar file back from tape to the disk cache.

The important is to avoid the relocation of many thousands (or millions) of small files. It's very demanding on the system to constantly scan/reconcile all these files on the file system, tapes, metadata and database. A good reference is average file size > 100MB in /repository.

Deep directory nesting in general also increases the time required to traverse a file system and thus should be avoided where possible.

We've been finding that the search for new candidates for automatic migration takes much longer once repository is already full of files/stubs, and this cycle could take some 6 to 12 hours to kick in at SciNet. Hence do not wait and proceed with the selective migration of your own files/directories asap.

Avoid working with or manipulating files inside /repository if you are not using tar-balls. Just copy the material out of there and back into /scratch or /project. That will trigger the automatic recall, and once that operation is finished the recalled files will be released and re-scheduled for purging again.

Disaster Recovery: as with any disk based storage, repository is just cache, and not immune to failures. We do not do regular backups of its contents, but it's possible to do a full recovery of the directory tree and stubs in case of catastrophic loss of repository. For that it's important that all files have been completely migrated to tapes before hand. That puts the onus on users to ensure this migration is indeed finished (with selective migration) for the relocated material before the originals are deleted from /project or /scratch.

Performance

Unlike /project or /scratch, /repository is only a 2 tier disk raid, so don't expect transfer rates much higher than 60MB/sec on a rsync session for example. In another words, a 10TB offload operation will typically take 2 days to complete, if made up of large files. On the other hand, we have conducted experiments where we migrated only 1TB of small files, 1 million of them, and that took nearly a day! This is a situation that should be avoided. Performance is as much a function of the amount of data as the number of files. Please pack them in tar-balls.

As for the "ideal tar-ball size", experiments have shown that an isolated 10GB tar-ball typically takes 10-15 minutes to be pulled back, considering all tape operations involved. That seems like a reasonable amount of time to wait for a group of files kept off-line for an extended period of time. Also consider that pulling back an individual tiny file could still take as long as 5-8 minutes. So, it's pretty clear that you get the best pay for the buck by tar'ing your material, and you won't tie up the tape system for too long. As for the upper limit, you can probably bundle files in 100-500GB tar-balls, provided that you're OK with waiting a couple of hours for them to be recalled at a later date; at least from SciNet's perceptive, it would be a very efficient migration.

Appendix (HSM)

Basic Concepts

Hierarchical Storage Management (HSM) is a data storage technique which automatically moves data between high-cost and low-cost storage media. HSM systems exist because high-speed storage devices, such as hard disk drive arrays, are more expensive (per byte stored) than slower devices, such as optical discs and magnetic tape drives. While it would be ideal to have all data available on high-speed devices all the time, this is prohibitively expensive for many organizations. Instead, HSM systems store the bulk of the enterprise's data on slower devices, and then copy data to faster disk drives when needed. In effect, HSM turns the fast disk drives into caches for the slower mass storage devices. The HSM system monitors the way data is used and makes best guesses as to which data can safely be moved to slower devices and which data should stay on the fast devices.

In a typical HSM scenario, data files which are frequently used are stored on disk drives, but are eventually migrated to tape if they are not used for a certain period of time, typically a few months. If a user does reuse a file which is on tape, it is automatically moved back to disk storage. The advantage is that the total amount of stored data can be much larger than the capacity of the disk storage available, but since only rarely-used files are on tape, most users will not notice any slowdown.

The HSM client provides both automatic and selective migration. Once file migration begins, the HSM client sends a copy of your file to storage volumes on disk devices or devices that support removable media, such as tape and replaces the original file with a stub file on HSM managed file system (aka repository at SciNet)

Definitions

Repository commonly refers to a location for long-term storage, often for safety or preservation. It has a component of disk cache and another of tapes.

Migration, in the context of HSM, refers to set of actions that move files from the front-end disk based cache to a back-end tape library system (often invisible or inaccessible to users)

Relocation, in the context of SciNet, refers to the use of unix commands such as copy, move, tar or rsync to get data into the repository.

The stub file is a small replacement file that makes it appear as though the original file is on the repository. It contains required metadata information to locate and recall a migrated file and to respond to specific UNIX commands without recalling the file.

Automatic migration periodically monitors space usage and automatically migrates eligible files according to the options and settings that have been selected. The HSM client provides two types of automatic migration: threshold migration and demand migration.

Threshold migration maintains a specific level of free space on the repository file system. When disk usage reaches the high threshold percentage, eligible files are migrated to tapes automatically. When space usage drops to the low threshold set for the file system, file migration stops.

Demand migration: responds to an out-of-space condition on the repository file system. Demand migration starts automatically if the file system runs out of space (usually triggered at 90%). For HSM, as files are migrated (oldest/largest first), space becomes available on the file system, and the process or event that caused the out-of-space condition can be resumed.

Selective migration often an user given HSM command, that migrates specific files from the repository at will, in anticipation of the automatic migration, or independently of the system wide eligibility criteria. For example, if you know that you will not be using a particular group of files for an extended time, you can migrate them, so as to free additional space on the repository.

Reclamation is the process of reclaiming unused space on a tape (applies to Virtual Tapes as well). Over time, as files/directories get deleted or updated on the repository, a process will expire old data, creating gaps of unused storage on the tapes. Since tapes are sequential media, typical tape handling software can only write data to the end of the tape, so these gaps of “Empty Space” cannot be used. The process entails periodically and in a rolling fashion copying active data from the "Swiss Cheese" like tapes to unused tapes on a compacted form, and recycling the former.

Optimal environment HSM should be used in an environment where the old and large files which need to be preserved are not used regularly. Files that are needed frequently should not be migrated at all, otherwise HSM would act as a very active cache, by migrating files and shortly after recalling them. This is not advisable, in particular for the stress it imposes on the tape system. The disk cache component of repository needs to be large enough to hold all regularly used files.

Common HSM commands

Some traditional unix/linux commands, such as 'ls' or 'rm' for instance, will work with the stub file as the real files. But others, such as 'du' or 'df', you better use a HSM equivalent, which will give you more meaningful information in the context of HSM. They only work inside /repository. Some of them will be executable only by root, such as 'dsmrm', in which case you'll be notified.

dsmls to check status of files; used in the directory where you expect to have migrated files

r: resident (the file is on repository only)

m: migrated (only the stub of the file is on repository)

p: premigrated (the file is on repository and on tape)

Usage: dsmls [-Noheader] [-Recursive] [-Help] [file specs|-FIlelist=file]

Example:

gpc-logindm02-$ dsmls -R a3
IBM Tivoli Storage Manager
Command Line Space Management Client Interface
  Client Version 6, Release 1, Level 0.0  
  Client date/time: 07/27/2010 12:06:36
(c) Copyright by IBM Corporation and other(s) 1990, 2009. All Rights Reserved.

      Actual     Resident     Resident  File   File
        Size         Size     Blk (KB)  State  Name
       <dir>         8192            8   -      a3/

/repository/scinet/pinto/a3:
 34008432640            0            0   m      32G-1
 34008432640  34008432640            0   r      32G-2
 34008432640  34008432640            0   p      32G-3
           0            0            0   r      dsmerror.log

dsmdu disk usage on the original files/directory

Usage: dsmdu [-Allfiles] [-Summary] [-Help] [directory names]

dsmdf disk free on the HSM file system.

Usage: dsmdf [-Help] [-Detail] [file systems]

dsmmigrate

Usage: dsmmigrate [-Recursive] [-Premigrate] [-Detail] [-Help] filespecs|-FIlelist=file

dsmrecall

Usage: dsmrecall [-Recursive] [-Detail] [-Help] file specs|-FIlelist=file
   or  dsmrecall [-Detail] -offset=XXXX[kmgKMG] -size=XXXX[kmgKMG] file specs

To have an idea of what HSM is doing on datamover2 at a given time:


[pinto@gpc-logindm02 ~]$ ps -def | grep dsm | grep -v mmfs

root      2455 15190  0 16:26 ?        00:00:00 dsmmonitord
root      2456  2455  2 16:26 ?        00:05:38 dsmautomig -2 system::/repository
pinto    10997 10637 30 16:40 pts/3    01:14:20 dsmmigrate -R -D pinto
root     12857     1  0 16:15 ?        00:00:00 dsmrecalld
root     13013 12857  0 16:15 ?        00:00:01 dsmrecalld
root     13015 12857  0 16:15 ?        00:00:00 dsmrecalld
root     15190     1  0 16:15 ?        00:00:00 dsmmonitord
root     16936     1  3 16:15 ?        00:10:44 dsmscoutd
root     17217     1 13 16:16 ?        00:36:49 dsmrootd
root     18732  2456  4 17:51 ?        00:07:19 dsmautomig -2 system::/repository
root     18737  2456  0 17:51 ?        00:00:26 dsmautomig -2 system::/repository
pinto    24533 10363  0 20:48 pts/2    00:00:00 grep dsm
root     25090     1  0 06:42 ?        00:00:08 dsmwatchd nodetach
root     30840 13013  0 17:15 ?        00:00:02 dsmrecalld

In the above example, dsmmonitord, dsmrecalld, dsmscoutd, dsmrootd and dsmwatchd are the 5 typical HSM daemons, and they always running. In addition, there are 3 streams of dsmautomig (triggered by threshold migration) and 1 stream of dsmmigrate (selective migration initiated by user pinto).

BACK TO Data Management

HSM

Contents