HPSS compared to HSM-TSM

From oldwiki.scinet.utoronto.ca
Revision as of 18:39, 31 August 2018 by Rzon (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

WARNING: SciNet is in the process of replacing this wiki with a new documentation site. For current information, please go to https://docs.scinet.utoronto.ca

Understanding HPSS by comparing it with HSM-TSM at SciNet

The Basic Concepts and Definitions described for HSM-TSM are equally applicable to HPSS.

For most part both systems are very similar in topology and functionality. They both offer a way to offoad/archive data from the most active file systems (scratch and project) without necessarily having to deal directly with the tape library or "tape commands". The main differences are in the terminology and the servers scalability of HPSS.

machines used in the setup

  • HSM-TSM at SciNet requires 2 nodes: 1 tapenode (which also runs TSM and HSM servers), 1 datamover (running the HSM client). This setup is not easily scalable, and performance is limited.
  • HPSS requires a minimum of 4 nodes: 1 core, 1 mover (we currently have 2), 1 HSI/HTAR server and 1 gateway (archive01), which functions more like a datamover under HSM, ie, the node users can login in order to transfer files. The setup is designed to be scalable, just by adding more movers, gateways, cache LUNs to the SAN, larger tape library, more tape drives, larger/more DB volumes. Currently, the HSI/HTAR server and the core are configured on the same node.

Both have a component of disk cache and another of tapes (inaccessible to end-users):

  • the cache under HSM-TSM is 15TB, formated as gpfs, and mounted as /repository on datamover2
  • the cache under HPSS is 233TB, SAN attached to the 2 movers, proprietary format, but it can be accessed as /archive through the HSI prompt on gpc-archive01, provided that you submit your queries via the queue system

migration between cache and tapes:

  • under HSM-TSM, tapenode is the system directly connected to a TS3310 tape library, it has the function of moving data back and forth between cache (/repository) and tapes. This migration can be done automatically (by the HSM daemons running on datamover2) or manually from a shell on datamover2 by the users, with dsmmigrate or dsmrecall commands.
  • under HPSS the tapenode equivalent is known as mover, and we have 2 of them. They are the ones connected to the TS3500 tape library. For most part there is no need for user direct interference on this migration, which is done automatically between the cache and the tapes, and coordinated by the core node. However the user has the option to manually migrate and recall files and directories. The term migrate is used predominantly in the context of copying files down the hierarchy to tapes. After migration, files may be deleted from the cache via purge operations. Files are copied up the hierarchy via staging operations, usually when accessed after being purged from the top level of the hierarchy (HSI 'migrate', 'stage' and 'purge'). At SciNet we have all cache<=>tapes transfers set to be done automatically

relocation of data from /scratch and /project to the cache. Access to the cache by users.

  • HSM-TSM has a system called datamover2 specially for this relocation purpose. Users can login to it using ssh, and rely on unix commands such as cp, mv, rsync or tar to transfer data from /scratch and /project to the cache (/repository)
  • under HPSS all access to the system is done through the GPC queue system. Job scripts interact with the cache through a node known as the gateway machine, named gpc-archive01. All tools available in HSM-TSM for transferring files to/from repository on datamover2 (cp, mv, tar, rsync, rcp, sftp, gridFTP, etc) have an equivalent on gpc-archive01 through the HSI interface. In addition there is also a tar equivalent application called HTAR.

exporting and accessing the cache

  • under HSM-TSM the cache is a gpfs, and in theory it could be mounted across all 4000 nodes in the cluster. In practice we only mount it on 2 nodes, datamover2 and tapenode as /repository. In this scenario, although possible, there is no real need for scp, sftp or rsync remotely. Users can access the cache directly with a ssh session to datamover2.
  • under HPSS the cache is not a parallel file system. It's a proprietary file system (hpssfs) and can only be accessed in its native raw format on the 2 movers, the core and the hsi server systems. On the gateway machine we could mount it locally in the traditional fashion as a virtual file system (VFS), but this capability is currently disabled at SciNet. However the HSI prompt on gpc-archive01 allows the access of the naming space on the cache as /archive.

Metadata and database:

  • In TSM-HSM there exists a DB2 to keep track of all material on tapes and it runs on tapenode. The metadata for file/stubs on cache is kept within the cache gpfs itself on a .SpaceMan? directory, and it's maintained by datamover2.
  • HPSS has a set of drives (2x 144TB LUNs mirrored) directly attached to the core node, just to store all information related to naming space an ownership inside the cache. The core node also runs a DB2 holding information on all material on tapes as well as info on the metadata.

Configuration and logistics definitions

  • TSM-HSM uses a set of policies to determine which files/stub should be migrated, and when. Files can be grouped in tape pools primarily based on their paths on the file systems, but it's nearly impossible to segregate contents in the tape-pools based on the individual ownership/groups of the files.
  • The HPSS equivalent to policies is a combination of Classes of Services (COS) and Storage Class, and the performance of migrations between the original location of the files, through the cache and into tapes is very dependent on how the cache is physically stripped, the configuration parameters from the Storage Classes as well as the number of small files involved. The equivalent to tape-pools in HPSS is known as Families. There is also provision to have files migrated to a "mirrored tape" COS, which is the default at SciNet.

Handling small file in the cache

  • Under TSM-HSM, it's impossible to prevent or enforce the non-placement of small files in cache (/repository is gpfs). All we can do is rely on user's education and outreach, so that they won't transfer all the small files they have on the active file systems (scratch and project) to /repository. As a more efficient approach we make recommendations and encourage users to generate tar-balls larger than 10GB, and prevent the proliferation of small files inside the cache that way.
  • Under the proprietary HPSS file system we face the same issue of not being able to prevent or enforce the non-placement of small files in cache. One effective workaround utility is HTAR, used for aggregating a set of files from the local file system directly into the HPSS cache, creating a file that conforms to the POSIX TAR specification. When HTAR creates the TAR file, it also builds an index file ".idx", which is stored in the cache along with the associated TAR file. HPSS also supports its own small file tape aggregation. By aggregating many, many small files in the cache into a single object, tape drives are kept moving for longer periods of time (streaming) -- greatly improving tape drive performance!

files and stubs in cache

  • Under TSM-HSM, when files are migrated to tapes and the original files are purged from the cache, what is left behind is the directory tree with the HSM stub files and the metadata associated with them. It's possible to distinguish between resident and purged files by using the 'dsmls' command.
  • Under HPSS when a file or directory is migrated and subsequently purged the cache stays empty, unlike HSM, where a zero-byte stub remains. In fact, this would be the main difference between TSM-HSM and HPSS. The HPSS Core Server maintains the HPSS name space in system metadata. However, from the HSI prompt we can still query the metadata/DB2 server indirectly with standard unix command such as ls, pwd, du, etc, and get a response from the system as if the files were really there.


BACK TO Data Management