HPSS compared to HSM-TSM

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search

Understanding HPSS by comparing it with HSM-TSM at SciNet

The Basic Concepts and Definitions described for HSM-TSM are equally applicable to HPSS.

For most part both systems are very similar in topology and functionality. What we want to offer is a way to offoad/archive data from the most active file systems (scratch and project) without necessarily having to deal directly with the tape library or "tape commands". The main differences are in the terminology.

machines used in the setup

  • HSM-TSM at SciNet requires 3 systems: 1 tapenode, 1 datamover and 1 HSM failover node
  • HPSS requires 5 systems: 1 core, 2 movers, 1 HSI server and 1 gateway node (archive01), which functions more like a datamover under HSM, ie, the node users can login in order to transfer files.

Both have a component of disk cache and another of tapes (inaccessible to end-users):

  • the cache under HSM-TSM is 15TB, formated as gpfs, and mounted as /repository on datamover2
  • the cache under HPSS is 174TB, SAN attached to the 2 movers, proprietary format, but it can be accessed as /archive through the HSI prompt on archive01.

migration between cache and tapes:

  • under HSM-TSM tapenode is system directly connected to the TS310 tape library, it has the function of moving data back and forth between cache (/repository) and tapes. This migration can be done automatically (by the HSM daemons running on datamover2) or manually from a shell on datamover2 by the users, with dsmmigrate or dsmrecall commands. tapenode does not have a failover machine. The failover for datamover2 is login10.
  • under HPSS, the tapenode equivalent is known as mover. On phase I we have 2 movers, and they are the ones connected to the TS8500 tape library. I'm not sure yet if they are setup in a HA fashion. There is no user direct interference on this migration. All migration is done automatically between the 131TB SAN cache and the tapes, and coordinated by the core node. The term migrate is used predominantly in the context of copying files down the hierarchy to tapes. After migration, files may be deleted from a storage class via purge operations. Files are copied up the hierarchy via staging operations, usually when accessed after being purged from the top level of the hierarchy

relocation of data from /scratch and /project to the cache. Access to the cache by users.

  • HSM-TSM has a system called datamover2 specially for this relocation purpose. Users can login to it using ssh, and rely on unix commands such as cp, mv, rsync or tar to transfer data from /scratch and /project to the cache (/repository)
  • under HPSS the system that users can login to and interact with the cache directly is known as the gateway machine (we call it archive01 at the moment but it may be called datamover3 in the future). All tools available in HSM-TSM for transfer files to/from repository on datamover2 (cp, mv, tar, rsync, rsc, sftp, gridFTP, etc) could also be made available on the gateway, in addition to a couple more, such as HSI and HTAR. We could or not install gpfs client on the gateway, so the /scratch and /project can be mounted locally as well, and allow for direct transfers to/from the cache in the same way it's done under HSM-TSM on datamover2

exporting and accessing the cache

  • under HSM-TSM the cache is a gpfs, and in theory it could be mounted across all 4000 nodes in the cluster. In practice we only mount it on 2 nodes, datamover2 and login10 as /repository. In this scenario, although possible, there is no real need for scp, sftp or rsync remotely. Users can access the cache directly with a ssh session to datamover2.
  • under HPSS the cache is not a parallel file system. It's a proprietary file system (hpssfs) and can only be accessed in its native raw format on the 2 movers, the core and the vfs systems. In the vfs system we can mount it locally in the traditional fashion as a virtual file system (VFS). If we wish to mount it locally on any of the other nodes of the cluster we would need to re-export from the vfs node using a 3rd party service or application such as NFS, samba or some posix like protocol. During the initial tests we are mounting the cache as /hpss on vfs and exporting HFS exporting it to the client gateway (archive) where the cache is mount as /hpss for the moment.

Metadata and database:

  • In TSM-HSM there exists a DB2 to keep track of all material on tapes and it runs on tapenode. The metadata for file/stubs on cache is kept within the cache gpfs itself on a .SpaceMan? directory, and it's maintained by datamover2.
  • HPSS has a set of drives (2x 144TB mirrored) directly attached to the core node just to store metadata related to all files/stubs inside the cache. The core node also runs a DB2 holding information on all material on tapes as well as info on the metadata.

Configuration and logistics definitions

  • TSM-HSM uses a set of policies to determine which files/stub should be migrated, and when. Files can be grouped in tape pools primarily based on their paths on the file systems, but it's nearly impossible to segregate contents in the tape-pools based on the individual ownership/groups of the files.
  • The HPSS equivalent to policies is a combination of Classes of Services (COS) and Storage Class, and the performance of migrations between the original location of the files, through the cache and into tapes seems to be very dependent on how the cache is stripped based on information from the storage classes. The equivalent to tape-pools in HPSS is known as Families.

Handling small file in the cache

  • Under TSM-HSM, it's impossible to prevent or enforce the non-placement of small files in cache (/repository is gpfs). All we can do is rely on user's education and outreach, so that they won't transfer all the small files they have on the active file systems (scratch and project) to /repository. As a more efficient approach we make recommendations and encourage users to generate tar-balls larger than 10GB, and prevent the proliferation of small files inside the cache that way.
  • Under the proprietary HPSS file system we face the same issue of not being able to prevent or enforce the non-placement of small files in cache. One effective workaround utility is HTAR, used for aggregating a set of files from the local file system directly into the HPSS cache, creating a file that conforms to the POSIX TAR specification. When HTAR creates the TAR file, it also builds an index file ".idx", which is stored in the same directory as the TAR file, as shown by the diagram below. However, it's still not clear how to prevent users from creating "small htar files" inside the cache. In any case, HPSS also supports its own small file tape aggregation. By aggregating many, many small files in the cache into a single object, tape drives are kept moving for longer periods of time (streaming) -- greatly improving tape drive performance!


files and stubs in cache

  • Under TSM-HSM, when files are migrated to tapes and the original files are purged from the cache, what is left behind is the directory tree with the HSM stub files and the metadata associated with them. It's possible to distinguish between resident and purged files by using the 'dsmls' command.
  • Under HPSS when a file or directory is migrated and subsequently purged the cache stays empty, unlike HSM where a zero-byte stub remains. In fact, this would be the main difference between TSM-HSM and HPSS. The HPSS Core Server maintains the HPSS name space in system metadata. However we can still query the metadata/DB2 server indirectly with standard unix command such as ls, pwd, du, etc, and get a response from the system as if the files were really there. For example, we can do a 'ls' command on VFS, and use the result to copy the 'listed' files to another location.