Data Management

From oldwiki.scinet.utoronto.ca
Revision as of 15:18, 17 June 2010 by Cneale (talk | contribs)
Jump to navigation Jump to search

SciNet's storage system is based on IBM's GPFS (General Parallel File System). There are two main systems for user data: /home, a small, backed-up space where user home directories are located, and /scratch, a large system for input or output data for jobs; data on /scratch is not only not backed up, data placed there will be deleted after a two weeks. SciNet does not provide long-term storage for large data sets.

Home Disk Space

Every SciNet user gets a 10GB directory on /home which is regularly backed-up. Home is visible from login.scinet nodes, and from the development nodes on GPC and the TCS. However, on the compute nodes of the GPC clusters -- as when jobs are running -- /home is mounted read-only; thus GPC jobs can read files in /home but cannot write to files there. /home is a good place to put code, input files for runs, and anything else that needs to be kept to reproduce runs. On the other hand, /home is not a good place to put many small files, since the block size for the file system is 4MB, so you would quickly run out of disk quota and you will make the backup system very slow.

Scratch Disk Space

Every SciNet user also gets a directory in /scratch. Scratch is visible from the login.scinet nodes, the development nodes on GPC and the TCS, and on the compute nodes of the clusters, mounted as read-write. Thus jobs would normally write their output somewhere in /scratch. There are NO backups of anything on /scratch.

There is a large amount of space available on /scratch but it is purged routinely so that all users running jobs and generating large outputs will have room to store their data temporarily. Computational results which you want to keep longer than this must be copied (using scp) off of SciNet entirely and to your local system. SciNet does not routinely provide long-term storage for large data sets.

Scratch Disk Purging Policy

In order to ensure that there is always significant space available for running jobs we automatically delete files in /scratch that have not been accessed in 3 months. This policy is subject to revision depending on its effectiveness. More details about the purging process and how users can check if their files will be deleted follows. If you have files scheduled for deletion you should move them to a more permanent locations such as your departmental server or your /project space (for PIs who have either been allocated disk space by the LRAC or have bought diskspace).

On the first of each month, a list of files scheduled for deletion will be produced. Those files will be automatically deleted on the 15th of the same month unless they have been accessed in the interim. If you have files scheduled for deletion then they will be listed in a file in /scratch/todelete/current, which has your userid and groupid in the filename. For example, if user xxyz wants to check if they have files scheduled for deletion they can issue the following command on a system which mounts /scratch (e.g. a scinet login node): ls -l /scratch/todelete/current |grep xxyz. In the example below, the name of this file indicates that user xxyz is part of group abc, has 9,560 files scheduled for deletion and they take up 1.0TB of space:

[xxyz@scinet04 ~]$ ls -l /scratch/todelete/current |grep xxyz
-rw-r----- 1 xxyz     root       1733059 Jan 12 11:46 10001___xxyz_______abc__________1.0TB_____9560files

The file itself contains a list of all files scheduled for deletion (in the last column) and can be viewed with standard commands like more or less - e.g. more /scratch/todelete/current/10001___xxyz_______abc__________1.0TB_____9560files


Similarly, you can also verify all other users in your group by using the ls command with grep on your group. For example: ls -l /scratch/todelete/current |grep abc. That will list all other users in the same group that xxyz is part of, and have files to be purged on the 15th. Members of the same group have access to each other's contents.

Project Disk Space

Investigators who have been granted allocations through the LRAC/NRAC Application Process may have been allocated disk space in addition to compute time. For the period of time that the allocation is granted, they will have disk space on the /project disk system. Space on the project systems are not purged, but neither are they backed up. All members of the investigators groups will have access to these systems, which will be mounted read/write everywhere.

How much Disk Space Do I have left?

The /scinet/gpc/bin/diskUsage command, available on the login nodes and the GPC devel nodes, reports how much disk space is being used by yourself and your group on the home, scratch, and project file systems, and how much remains available. This information is updated hourly.

mmlsquota will show your quotas on the various filesystems. You must use mmlsquota -g <groupname> to check your group quota on /project.

How Much Disk Space is a Particular Directory Using

Please do not use the du command to check on the disk usage in a particular directory. We think it's probably a good idea to use "ls -lR" to sum up the file size instead of doing "du" which is much slower and can put some load on the file system.

One possible usage with directory entries filtered:

 ls -alR directory|awk '!/^d/ && NF==9 {s+=$5} END{print s/1024,"KB"}'

Transferring Data From SciNet

All incoming connections to SciNet go through relatively low-speed connections to the login.scinet gateways. To transfer large amounts of data out from SciNet, by far the fastest way is to log in from a scinet machine to the `data mover' node, datamover1 (also called gpc-logindm01) and to use rsync (best) or scp to copy data out. This machine has a 10Gb link to the outside world and is the fastest way to move data from the SciNet machines. Other tools for moving data may be made available over time, but they all will involve gpc-logindm01.

Performance

GPFS is a high-performance filesystem which provides rapid reads and writes to large datasets in parallel from many nodes. As a consequence of this design, however, it performs quite poorly at accessing data sets which consist of many, small files. For instance, you will find that reading data in from one 4MB file is enormously faster than from 100 40KB files. Such small files are also quite wasteful of space, as the blocksize for the filesystem is 4MB. This is something you should keep in mind when planning your input/output strategy for runs on SciNet.

Note that if you run multi-process jobs, having each process write to a file of its own is not an scalable I/O solution. A directory gets locked by the first process accessing it, so all other processes have to wait for it. Not only has the code just become considerably less parallel, chances are the file system will have a time-out while waiting for your other processes, leading your program to crash mysteriously. Consider using MPI-IO (part of the MPI-2 standard), which allows files to be opened simultaneously by different processes, or using a dedicated process for I/O to which all other processes send their data, and which subsequently writes this data to a single file.

Local Disk

The compute nodes on the GPC do not contain hard drives so there is no local disk available to use during your computation. You can however use part of a compute nodes ram like a local disk but this will reduce how much memory is available for your program. This can be accessed using /dev/shm/ and is currently set to 8GB. Anything written to this location that you want to keep must be copied back to the /scratch filesystem as /dev/shm is wiped after each job and since it is in memory will not survive through a reboot of the node.

Moving Data To/From SciNet

For more information on copying large amounts of data to or from SciNet, see our page on Data Transfer.