<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en-GB">
	<id>https://oldwiki.scinet.utoronto.ca/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Kzuberi</id>
	<title>oldwiki.scinet.utoronto.ca - User contributions [en-gb]</title>
	<link rel="self" type="application/atom+xml" href="https://oldwiki.scinet.utoronto.ca/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Kzuberi"/>
	<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php/Special:Contributions/Kzuberi"/>
	<updated>2026-05-24T19:45:26Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.35.12</generator>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Data_Management&amp;diff=5923</id>
		<title>Data Management</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Data_Management&amp;diff=5923"/>
		<updated>2013-04-04T19:19:18Z</updated>

		<summary type="html">&lt;p&gt;Kzuberi: put shell command example in &amp;lt;pre&amp;gt; block to fix wiki formatting breakage&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=='''Storage Space'''==&lt;br /&gt;
SciNet's storage system is based on IBM's [http://en.wikipedia.org/wiki/IBM_General_Parallel_File_System GPFS] (General Parallel File System).   There are two main systems for user data: &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt;, a small, backed-up space where user home directories are located, and &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt;, a large system for input or output data for jobs; data on &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt; is not only not backed up (a third storage system, /project, exist only for groups with LRAC/NRAC allocations). Data placed on scratch will be deleted if it has not been accessed in 3 months.  SciNet does not provide long-term storage for large data sets.  &lt;br /&gt;
&lt;br /&gt;
===Overview of the different file systems===&lt;br /&gt;
&lt;br /&gt;
{|border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;10&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! {{Hl2}} | file system &lt;br /&gt;
! {{Hl2}} | purpose &lt;br /&gt;
! {{Hl2}} | quota &lt;br /&gt;
! {{Hl2}} | block size &lt;br /&gt;
! {{Hl2}} | backed up&lt;br /&gt;
! {{Hl2}} | purged&lt;br /&gt;
! {{Hl2}} | access &lt;br /&gt;
|- &lt;br /&gt;
| /home&lt;br /&gt;
| development&lt;br /&gt;
| 10 GB&lt;br /&gt;
| 256 KB&lt;br /&gt;
| yes&lt;br /&gt;
| never&lt;br /&gt;
| read-only on compute nodes (r/w on login, devel and datamover1) &lt;br /&gt;
|- &lt;br /&gt;
| /scratch&lt;br /&gt;
| computation&lt;br /&gt;
| 20 TB&lt;br /&gt;
| 4 MB&lt;br /&gt;
| no&lt;br /&gt;
| files &amp;gt; 3 month&lt;br /&gt;
| read/write on all nodes&lt;br /&gt;
|- &lt;br /&gt;
| /project&lt;br /&gt;
| computation&lt;br /&gt;
| by allocation&lt;br /&gt;
| 256 KB&lt;br /&gt;
| yes&lt;br /&gt;
| never&lt;br /&gt;
| read/write on all nodes &lt;br /&gt;
|}&lt;br /&gt;
project is included in scratch&lt;br /&gt;
&lt;br /&gt;
===Home Disk Space===&lt;br /&gt;
&lt;br /&gt;
Every SciNet user gets a 10GB directory on &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; in a directory &amp;lt;tt&amp;gt;/home/G/GROUP/USER&amp;lt;/tt&amp;gt;, which is regularly backed-up.   Home is visible from &amp;lt;tt&amp;gt;login.scinet&amp;lt;/tt&amp;gt; nodes, and from the development nodes on [[GPC_Quickstart | GPC]] and the [[TCS_Quickstart | TCS]].  However, on the compute nodes of the GPC clusters -- as when jobs are running -- &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; is mounted '''''read-only'''''; thus GPC jobs can read files in /home but cannot write to files there.   &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; is a good place to put code, input files for runs, and anything else that needs to be kept to reproduce runs.  On the other hand, &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; is not a good place to put many small files, since&lt;br /&gt;
the block size for the file system is 256KB, so you would quickly run out of disk quota and you will make the backup system very slow.&lt;br /&gt;
&lt;br /&gt;
If your application absolutely insists on writing material to your home account and you can't find a way to instruct it to write somewhere else, an alternative is to create a link pointing from your account under /home to a location under /scratch.&lt;br /&gt;
&lt;br /&gt;
===Scratch Disk Space===&lt;br /&gt;
&lt;br /&gt;
Every SciNet user also gets a directory in &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt; called &amp;lt;tt&amp;gt;/scratch/G/GROUP/USER&amp;lt;/tt&amp;gt;.   Scratch is visible from the &amp;lt;tt&amp;gt;login.scinet&amp;lt;/tt&amp;gt; nodes,  the development nodes on [[GPC_Quickstart | GPC]] and the [[TCS_Quickstart | TCS]], and on the compute nodes of the clusters, mounted as read-write.   Thus jobs would normally write their output somewhere in &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt;.  There are '''NO''' backups of anything on &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
There is a large amount of space available on &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt; but it is purged routinely so that all users running jobs and generating large outputs will have room to store their data temporarily.  Computational results which you want to keep longer than this must be copied (using &amp;lt;tt&amp;gt;scp&amp;lt;/tt&amp;gt;) off of SciNet entirely and to your local system.   SciNet does not routinely provide long-term storage for large data sets.&lt;br /&gt;
&lt;br /&gt;
===Scratch Disk Purging Policy===&lt;br /&gt;
&lt;br /&gt;
In order to ensure that there is always significant space available for running jobs '''we automatically delete files in /scratch that have not been accessed or modified for more than 3 months by the actual deletion day on the 15th of each month'''. Note that we recently changed the cut out reference to the ''MostRecentOf(atime,mtime)''. This policy is subject to revision depending on its effectiveness. More details about the purging process and how users can check if their files will be deleted follows. If you have files scheduled for deletion you should move them to a more permanent locations such as your departmental server or your /project space (for PIs who have either been allocated disk space by the LRAC or have bought diskspace).&lt;br /&gt;
&lt;br /&gt;
On the '''first''' of each month, a list of files scheduled for purging is produced, and an email notification is sent to each user on that list. Furthermore, at/or about the '''12th''' of each month a 2nd scan produces a more current assessment and another email notification is sent. This way users can double check that they have indeed taken care of all the files they needed to relocate before the purging deadline. Those files will be automatically deleted on the '''15th''' of the same month unless they have been accessed or relocated in the interim. If you have files scheduled for deletion then they will be listed in a file in /scratch/t/todelete/current, which has your userid and groupid in the filename. For example, if user xxyz wants to check if they have files scheduled for deletion they can issue the following command on a system which mounts /scratch (e.g. a scinet login node): '''ls -1 /scratch/t/todelete/current |grep xxyz'''. In the example below, the name of this file indicates that user xxyz is part of group abc, has 9,560 files scheduled for deletion and they take up 1.0TB of space:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
 [xxyz@scinet04 ~]$ ls -1 /scratch/t/todelete/current |grep xxyz&lt;br /&gt;
 -rw-r----- 1 xxyz     root       1733059 Jan 12 11:46 10001___xxyz_______abc_________1.00T_____9560files&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The file itself contains a list of all files scheduled for deletion (in the last column) and can be viewed with standard commands like more/less/cat - e.g. '''more /scratch/t/todelete/current/10001___xxyz_______abc_________1.00T_____9560files'''&lt;br /&gt;
&lt;br /&gt;
Similarly, you can also verify all other users on your group by using the ls command with grep on your group. For example: '''ls -1 /scratch/t/todelete/current |grep abc'''. That will list all other users in the same group that xxyz is part of, and have files to be purged on the 15th. Members of the same group have access to each other's contents.&lt;br /&gt;
&lt;br /&gt;
'''NOTE:''' Preparing these assessments takes several hours. If you change the access/modification time of a file in the interim, that will not be detected until the next cycle. A way for you to get immediate feedback is to use the ''''ls -lu'''' command on the file to verify the atime and ''''ls -la'''' for the mtime. If the file atime/mtime has been updated in the meantime, coming the purging date on the 15th it will no longer be deleted.&lt;br /&gt;
&lt;br /&gt;
===Project Disk Space===&lt;br /&gt;
&lt;br /&gt;
Investigators who have been granted allocations through the [http://www.scinet.utoronto.ca/resources/Account_Allocations.htm LRAC/NRAC Application Process] may have been allocated disk space in addition to compute time.   For the period of time that the allocation is granted, they will have disk space on the &amp;lt;tt&amp;gt;/project&amp;lt;/tt&amp;gt; disk system.  Space on project is a subset of scratch, but is not purged and is backed up. All members of the investigators groups will have access to this disk system, which will be mounted read/write everywhere.&lt;br /&gt;
&lt;br /&gt;
===How much Disk Space Do I have left?===&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;'''/scinet/gpc/bin6/diskUsage'''&amp;lt;/tt&amp;gt; command, available on the login nodes, datamovers and the GPC devel nodes, provides information in a number of ways on the home, scratch, and project file systems. For instance, how much disk space is being used by yourself and your group (with the -a option), or how much your usage has changed over a certain period (&amp;quot;delta information&amp;quot;) or you may generate plots of your usage over time. Please see the usage help below for more details.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: diskUsage [-h|-?| [-a] [-u &amp;lt;user&amp;gt;] [-de|-plot]&lt;br /&gt;
       -h|-?: help&lt;br /&gt;
       -a: list usages of all members on the group&lt;br /&gt;
       -u &amp;lt;user&amp;gt;: as another user on your group&lt;br /&gt;
       -de: include delta information&lt;br /&gt;
       -plot: create plots of disk usages&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Notes:&lt;br /&gt;
* information on usage and quota is only updated hourly!&lt;br /&gt;
* contents of project count against space and #files limits on scratch&lt;br /&gt;
&lt;br /&gt;
===Performance===&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/IBM_General_Parallel_File_System GPFS] is a high-performance filesystem which provides rapid reads and writes to large datasets in parallel from many nodes.  As a consequence of this design, however, '''the file system performs quite ''poorly'' at accessing data sets which consist of many, small files.'''  For instance, you will find that reading data in from one 4MB file is enormously faster than from 100 40KB files.   Such small files are also quite wasteful of space, as the blocksize for the filesystem is 4MB.   This is something you should keep in mind when planning your input/output strategy for runs on SciNet.&lt;br /&gt;
&lt;br /&gt;
For instance, if you run multi-process jobs, having each process write to a file of its own is not an scalable I/O solution. A directory gets locked by the first process accessing it, so all other processes have to wait for it. Not only has the code just become considerably less parallel, chances are the file system will have a time-out while waiting for your other processes, leading your program to crash mysteriously.&lt;br /&gt;
Consider using MPI-IO (part of the MPI-2 standard), which allows files to be opened simultaneously&lt;br /&gt;
by different processes, or using a dedicated process for I/O to which all other processes send their&lt;br /&gt;
data, and which subsequently writes this data to a single file.&lt;br /&gt;
&lt;br /&gt;
===Local Disk===&lt;br /&gt;
&lt;br /&gt;
The compute nodes on the GPC '''do not contain hard drives''' so there is no local disk available to use during your computation.  You can however use part of a compute nodes RAM like a local disk ('ramdisk') but this will reduce how much memory is available for your&lt;br /&gt;
program.  This can be accessed using &amp;lt;tt&amp;gt;/dev/shm/&amp;lt;/tt&amp;gt; and is currently set to 8GB.  Anything written&lt;br /&gt;
to this location that you want to keep must be copied back to the &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt; filesystem as &amp;lt;tt&amp;gt;/dev/shm&amp;lt;/tt&amp;gt; is wiped after each job and since it is in memory will not survive through a reboot of the node. More on ramdisk usage can be found [[User_Ramdisk | here]].&lt;br /&gt;
&lt;br /&gt;
Note that the absense of hard drives also means that the nodes cannot swap memory, so be sure that your computation fits within memory.&lt;br /&gt;
&lt;br /&gt;
=='''Data Transfer'''==&lt;br /&gt;
{{:Data_Transfer}}&lt;br /&gt;
&lt;br /&gt;
=='''File/Ownership Management (ACL)'''==&lt;br /&gt;
* By default, at SciNet, users within the same group have read permission to each other's files (not write)&lt;br /&gt;
* You may use access control list ('''ACL''') to allow your supervisor (or another user within your group) to manage files for you (i.e., create, move, rename, delete), while still retaining your access and permission as the original owner of the files/directories.&lt;br /&gt;
* '''NOTE''': We highly recommend that you never give write permission to other users on the top level of your home directory (/home/G/GROUP/[owner]), since that would seriously compromise your privacy, in addition to disable ssh key authentication, among other things. If necessary, make specific sub-directories under your home directory so that other users can manipulate/access files from those.&lt;br /&gt;
* If you need to set up permissions across groups [mailto:support@scinet.utoronto.ca contact us] (and the other group's supervisor!).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
&lt;br /&gt;
===Using  setfacl/getfacl===&lt;br /&gt;
* To allow [supervisor] to manage files in /project/g/group/[owner] using '''setfacl''' and '''getfacl''' commands, follow the 3-steps below as the [owner] account from a shell:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1) $ /scinet/gpc/bin/setfacl -d -m user:[supervisor]:rwx /project/g/group/[owner]&lt;br /&gt;
   (every *new* file/directory inside [owner] will inherit [supervisor] ownership by default from now on)&lt;br /&gt;
&lt;br /&gt;
2) $ /scinet/gpc/bin/setfacl -d -m user:[owner]:rwx /project/g/group/[owner]&lt;br /&gt;
   (but will also inherit [owner] ownership, ie, ownership of both by default, for files/directories created by [supervisor])&lt;br /&gt;
&lt;br /&gt;
3) $ /scinet/gpc/bin/setfacl -Rm user:[supervisor]:rwx /project/g/group/[owner]&lt;br /&gt;
   (recursively modify all *existing* files/directories inside [owner] to also be rwx by [supervisor])&lt;br /&gt;
&lt;br /&gt;
   $ /scinet/gpc/bin/getfacl /project/g/group/[owner]&lt;br /&gt;
   (to determine the current ACL attributes)&lt;br /&gt;
&lt;br /&gt;
   $ /scinet/gpc/bin/setfacl -b /project/g/group/[owner]&lt;br /&gt;
   (to remove any previously set ACL)&lt;br /&gt;
&lt;br /&gt;
PS: on the datamovers getfacl, setfacl and chacl will be on your path&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
For more information on using [http://linux.die.net/man/1/setfacl &amp;lt;tt&amp;gt;setfacl&amp;lt;/tt&amp;gt;] or [http://linux.die.net/man/1/getfacl &amp;lt;tt&amp;gt;getfacl&amp;lt;/tt&amp;gt;] see their man pages.&lt;br /&gt;
&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Using mmputacl/mmgetacl===&lt;br /&gt;
* You may use gpfs' native '''mmputacl''' and '''mmgetacl''' commands. The advantages are that you can set &amp;quot;control&amp;quot; permission and that [http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.gpfs.doc%2Fgpfs31%2Fbl1adm1160.html POSIX or NFS v4 style ACL] are supported. You will need first to create a /tmp/supervisor.acl file with the following contents:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user::rwxc&lt;br /&gt;
group::----&lt;br /&gt;
other::----&lt;br /&gt;
mask::rwxc&lt;br /&gt;
user:[owner]:rwxc&lt;br /&gt;
user:[supervisor]:rwxc&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then issue the following 2 commands:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1) $ mmputacl -i /tmp/supervisor.acl /project/g/group/[owner]&lt;br /&gt;
2) $ mmputacl -d -i /tmp/supervisor.acl /project/g/group/[owner]&lt;br /&gt;
   (every *new* file/directory inside [owner] will inherit [supervisor] ownership by default as well as &lt;br /&gt;
   [owner] ownership, ie, ownership of both by default, for files/directories created by [supervisor])&lt;br /&gt;
&lt;br /&gt;
   $ mmgetacl /project/g/group/[owner]&lt;br /&gt;
   (to determine the current ACL attributes)&lt;br /&gt;
&lt;br /&gt;
   $ mmdelacl -d /project/g/group/[owner]&lt;br /&gt;
   (to remove any previously set ACL)&lt;br /&gt;
&lt;br /&gt;
   $ mmeditacl /project/g/group/[owner]&lt;br /&gt;
   (to create or change a GPFS access control list)&lt;br /&gt;
   (for this command to work set the EDITOR environment variable: export EDITOR=/usr/bin/vi)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
NOTES:&lt;br /&gt;
* There is no option to recursively add or remove ACL attributes using a gpfs built-in command to existing files. You'll need to use the -i option as above for each file or directory individually. [[Data_Management#bash_script_that_you_may_adapt_to_recursively_add_or_remove_ACL_attributes_using_gpfs_built-in_commands | Here is a sample bash script you may use for that purpose]]&lt;br /&gt;
&lt;br /&gt;
* mmputacl/setfacl will not overwrite the original linux group permissions for a directory when copied to another directory already with ACLs, hence the &amp;quot;#effective:r-x&amp;quot; note you may see from time to time with mmgetacf/getfacl. If you want to give rwx permissions to everyone in your group you should simply rely on the plain unix 'chmod g+rwx' command. You may do that before or after copying the original material to another folder with the ACLs.&lt;br /&gt;
&lt;br /&gt;
For more information on using [http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.gpfs.doc%2Fgpfs31%2Fbl1adm11120.html &amp;lt;tt&amp;gt;mmputacl&amp;lt;/tt&amp;gt;] or [http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.gpfs.doc%2Fgpfs31%2Fbl1adm11120.html &amp;lt;tt&amp;gt;mmgetaclacl&amp;lt;/tt&amp;gt;] see their man pages.&lt;br /&gt;
&lt;br /&gt;
===Appendix (ACL)===&lt;br /&gt;
====bash script that you may adapt to recursively add or remove ACL attributes using gpfs built-in commands====&lt;br /&gt;
Courtesy of Agata Disks (http://csngwinfo.in2p3.fr/mediawiki/index.php/GPFS_ACL)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# USAGE&lt;br /&gt;
#     - on one directory:     ./set_acl.sh dir_name&lt;br /&gt;
#     - on more directories:  ./set_acl.sh 'dir_nam*'&lt;br /&gt;
#&lt;br /&gt;
&lt;br /&gt;
# Path of the file that contains the ACL&lt;br /&gt;
ACL_FILE_PATH=/agatadisks/data/acl_file.acl&lt;br /&gt;
&lt;br /&gt;
# Directories onto the ACLs have to be set&lt;br /&gt;
dirs=$1&lt;br /&gt;
&lt;br /&gt;
# Recursive function that sets ACL to files and directories&lt;br /&gt;
set_acl () {&lt;br /&gt;
  curr_dir=$1&lt;br /&gt;
  for args in $curr_dir/*&lt;br /&gt;
  do&lt;br /&gt;
    if [ -f $args ]; then&lt;br /&gt;
      echo &amp;quot;ACL set on file $args&amp;quot;&lt;br /&gt;
      mmputacl -i $ACL_FILE_PATH $args&lt;br /&gt;
      if [ $? -ne 0 ]; then&lt;br /&gt;
        echo &amp;quot;ERROR: ACL not set on $args&amp;quot;&lt;br /&gt;
        exit -1&lt;br /&gt;
      fi&lt;br /&gt;
    fi&lt;br /&gt;
    if [ -d $args ]; then&lt;br /&gt;
      # Set Default ACL in directory&lt;br /&gt;
      mmputacl -i $ACL_FILE_PATH $args -d&lt;br /&gt;
      if [ $? -ne 0 ]; then&lt;br /&gt;
        echo &amp;quot;ERROR: Default ACL not set on $args&amp;quot;&lt;br /&gt;
        exit -1&lt;br /&gt;
      fi&lt;br /&gt;
      echo &amp;quot;Default ACL set on directory $args&amp;quot;&lt;br /&gt;
      # Set ACL in directory&lt;br /&gt;
      mmputacl -i $ACL_FILE_PATH $args&lt;br /&gt;
      if [ $? -ne 0 ]; then&lt;br /&gt;
        echo &amp;quot;ERROR: ACL not set on $args&amp;quot;&lt;br /&gt;
        exit -1&lt;br /&gt;
      fi&lt;br /&gt;
      echo &amp;quot;ACL set on directory $args&amp;quot;&lt;br /&gt;
      set_acl $args&lt;br /&gt;
    fi&lt;br /&gt;
  done&lt;br /&gt;
}&lt;br /&gt;
for dir in $dirs&lt;br /&gt;
do&lt;br /&gt;
  if [ ! -d $dir ]; then&lt;br /&gt;
    echo &amp;quot;ERROR: $dir is not a directory&amp;quot;&lt;br /&gt;
    exit -1&lt;br /&gt;
  fi&lt;br /&gt;
  set_acl $dir&lt;br /&gt;
done&lt;br /&gt;
exit 0&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==[[HPSS|'''High Performance Storage System (HPSS)''']]==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==More questions on data management?==&lt;br /&gt;
&lt;br /&gt;
Check out the [[FAQ#Data_on_SciNet_disks|FAQ]].&lt;/div&gt;</summary>
		<author><name>Kzuberi</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=Ssh_keys&amp;diff=4231</id>
		<title>Ssh keys</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=Ssh_keys&amp;diff=4231"/>
		<updated>2011-12-05T18:57:21Z</updated>

		<summary type="html">&lt;p&gt;Kzuberi: trivial typo fix&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Ssh | SSH]] has an alternative to passwords to authenticate your login; you can generate a key file on a trusted machine and tell a remote machine to trust logins from a machine that presents that key.   This can be both convenient and secure, and may be necessary for some tasks (such as connecting directly to compute nodes to use [[Using_Paraview | some visualization packages]]).  Here we describe how to setup keys for logging into SciNet.&lt;br /&gt;
&lt;br /&gt;
==SSH Keys and SciNet==&lt;br /&gt;
&lt;br /&gt;
[[Ssh | SSH]] is a secure protocol for logging into or copying data to/from remote machines.  In addition to using passwords to [http://en.wikipedia.org/wiki/Authentication authenticate] users, one can use cryptographically secure keys to guarantee that a login request is coming from a trusted account on a remote machine, and automatically allow such requests.   Done properly, this is as secure as requiring a password, but can be more convenient, and is necessary for some operations.&lt;br /&gt;
&lt;br /&gt;
On this page, we will assume you are using Linux, Mac OS X, or a similar environment such as [http://www.cygwin.com/ Cygwin] under Windows.  If not, the steps will be the same, but how they are done (for instance, generating keys) may differ; look up the documentation for your ssh package for details.&lt;br /&gt;
&lt;br /&gt;
==Using SSH keys==&lt;br /&gt;
===How SSH keys work===&lt;br /&gt;
&lt;br /&gt;
SSH relies on [http://en.wikipedia.org/wiki/Public-key_cryptography public key cryptography] for its encryption.  These cryptosystems have a private key, which must be kept secret, and a public key, which may be disseminated freely.   In these systems, anyone may use the public key to encode a message; but only the owner of the private key can decode the message.  This can also be used to verify identities; if someone is claiming to be Alice, the owner of some private key, Bob can send Alice a message encoded with Alice's well-known public key.  If the person claiming to be Alice can then tell Bob what the message really was, then that person at the very least has access to Alice's private key.&lt;br /&gt;
&lt;br /&gt;
To use keys for authentication, we:&lt;br /&gt;
* Generate a key pair (Private and Public)&lt;br /&gt;
* Copy the public keys to remote sites we wish to be able to login to, and mark it as an authorized key for that system&lt;br /&gt;
* Ensure permissions are set properly&lt;br /&gt;
* Test.&lt;br /&gt;
&lt;br /&gt;
===Generating an SSH key pair===&lt;br /&gt;
&lt;br /&gt;
{|border=&amp;quot;1&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
Note: This describes creating ssh key pairs on '''your''' machine, not on SciNet.  On SciNet, you already have key pairs generated, sitting in &amp;lt;tt&amp;gt;${HOME}/.ssh/&amp;lt;/tt&amp;gt;, and modifying them is likely to cause problems.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The first stage is to create an SSH key pair.   On most systems, this is done using the command&lt;br /&gt;
&lt;br /&gt;
 ssh-keygen&lt;br /&gt;
&lt;br /&gt;
This will prompt you for two pieces of information: where to save the key, and a passphrase for the key.  The passphrase is like a password, but rather than letting you in to some particular account, it allows you to use the key you've generated to log into other systems.  &lt;br /&gt;
&lt;br /&gt;
There are a series of possible options to &amp;lt;tt&amp;gt;ssh-keygen&amp;lt;/tt&amp;gt; which allow increasingly cryptographically secure keys (by increasing the number of bits used in the key), or by choosing different encryption systems.  The defaults are fine, and we won't discuss other options here.&lt;br /&gt;
&lt;br /&gt;
The default location to save the private key is in &amp;lt;tt&amp;gt;${HOME}/.ssh/id_rsa&amp;lt;/tt&amp;gt; (for an RSA key); unless you have some specific reason for placing it elsewhere, use this option.  The public key will be &amp;lt;tt&amp;gt;id_rsa.pub&amp;lt;/tt&amp;gt; in the same directory.&lt;br /&gt;
&lt;br /&gt;
Your passphrase can be any string, and of any length.   It is best not to make it the same as any of your passwords.&lt;br /&gt;
&lt;br /&gt;
A sample session of generating a key would go like this:&lt;br /&gt;
&lt;br /&gt;
 $ ssh-keygen&lt;br /&gt;
 Generating public/private rsa key pair.&lt;br /&gt;
 Enter file in which to save the key (${HOME}/.ssh/id_rsa): &lt;br /&gt;
 Enter passphrase (empty for no passphrase): &lt;br /&gt;
 Enter same passphrase again: &lt;br /&gt;
 Your identification has been saved in ${HOME}/.ssh/id_rsa.&lt;br /&gt;
 Your public key has been saved in ${HOME}/.ssh/id_rsa.&lt;br /&gt;
 The key fingerprint is:&lt;br /&gt;
 79:8e:36:6a:78:7d:cf:80:94:90:92:0e:74:0b:f1:b7 USERNAME@YOURMACHINE&lt;br /&gt;
&lt;br /&gt;
====Don't Use Passphraseless Keys!====&lt;br /&gt;
&lt;br /&gt;
If you do not specify a passphrase, you will have a completely &amp;quot;exposed&amp;quot; private key.  '''This is a terrible idea.'''   If you then use this key for anything it means that anyone who sits down at your desk, or anyone who borrows or steals your laptop, can login to anywhere you use that key (good guesses could come from just looking at your history) without needing any password, and could do anything they wanted with your account or data.  Don't use passphraseless keys.&lt;br /&gt;
&lt;br /&gt;
We should note that we do, in fact, have one necessary and reasonable exception here -- the keys used within SciNet itself.  The SciNet key used for within-scinet operations (you already have one in your account in &amp;lt;tt&amp;gt;~/.ssh/id_rsa&amp;lt;/tt&amp;gt;) is passphraseless, for two good reasons.  One is that, once you are on one SciNet machine (like the login node), you already have read/write access to all your data; all the nodes mount the same file systems.  So there is little to be gained in protecting the SciNet nodes from each other.   The second is practical; ssh is used to login to compute nodes to start your compute jobs.  You obviously can't be asked to type in a passphrase every time one of your jobs starts; you may not be at your computer at that moment.  So passphraseless keys are ok ''within'' a controlled environment; but don't use them for remote access.&lt;br /&gt;
&lt;br /&gt;
===Copying the Public Key to SciNet (and elsewhere)===&lt;br /&gt;
&lt;br /&gt;
Now that you have this SSH &amp;quot;identity&amp;quot;, you use the public (''not'' the private) key for access to remote machines.  The public key must be put as one line in the file &amp;lt;tt&amp;gt;/home/USERNAME/.ssh/authorized_keys&amp;lt;/tt&amp;gt;.  Do not delete the lines already there, or you may end up with strange problems using SciNet machines.&lt;br /&gt;
&lt;br /&gt;
You can copy your new public key to the SciNet systems&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
scp /home/LOCAL_USERNAME/.ssh/id_rsa.pub SCINET_USERNAME@login.scinet.utoronto.ca:newkey&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Then login to SciNet and copy&amp;amp;paste the contents from &amp;lt;tt&amp;gt;~/newkey&amp;lt;/tt&amp;gt; into &amp;lt;tt&amp;gt;~/.ssh/authorized_keys&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat ~/newkey &amp;gt;&amp;gt; ~/.ssh/authorized_keys&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===&amp;lt;tt&amp;gt;.ssh&amp;lt;/tt&amp;gt; Permissions===&lt;br /&gt;
&lt;br /&gt;
Note that &amp;lt;tt&amp;gt;SSH&amp;lt;/tt&amp;gt; is very fussy about file permissions; your &amp;lt;tt&amp;gt;~/.ssh&amp;lt;/tt&amp;gt; directory must only be accessible by you, and your various key files must not be writable (or in some cases, readable) by anyone else.  Sometimes users accidentally reset file permissions while editing these files, and problems happen.   If you look at the &amp;lt;tt&amp;gt;~/.ssh&amp;lt;/tt&amp;gt; directory itself, it should not be readable at all by anyone else:&lt;br /&gt;
&lt;br /&gt;
 ls -ld ~/.ssh&lt;br /&gt;
 drwx------ 2 USERNAME GROUPNAME 7 Aug  9 15:43 /home/USERNAME/.ssh&lt;br /&gt;
&lt;br /&gt;
and &amp;lt;tt&amp;gt;authorized_keys&amp;lt;/tt&amp;gt; must not be writable:&lt;br /&gt;
&lt;br /&gt;
 $ ls -l ~/.ssh/authorized_keys &lt;br /&gt;
 -rw-r--r-- 1 USERNAME GROUPNAME 1213 May 29  2009 /home/USERNAME/.ssh/authorized_keys&lt;br /&gt;
&lt;br /&gt;
===Testing Your Key===&lt;br /&gt;
&lt;br /&gt;
Now you should be able to login to the remote system (say, SciNet):&lt;br /&gt;
&lt;br /&gt;
 $ ssh USERNAME@login.scinet.utoronto.ca&lt;br /&gt;
 Enter passphrase for key '/home/USERNAME/.ssh/id_rsa': &lt;br /&gt;
 Last login: Tue Aug 17 11:24:48 2010 from HOMEMACHINE&lt;br /&gt;
 &lt;br /&gt;
 ===================================================&lt;br /&gt;
 &lt;br /&gt;
 This SciNet login node is to be used only as a&lt;br /&gt;
 gateway to the GPC and TCS.&lt;br /&gt;
 &lt;br /&gt;
 [...]&lt;br /&gt;
 scinet04-$&lt;br /&gt;
&lt;br /&gt;
If this doesn't work, you should be able to login using your password, and investigate the problem. For example, if during a login session you get an message similar to the one below, just follow the instruction and delete the offending key on line 3 (you can use vi to jump to that line with ESC plus : plus 3). That only means that you may have logged in from your home computer to SciNet in the past, and that key is obsolete.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh USERNAME@login.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@**@@@@@@@@@@@@@@@@@@@@@@@@@@@@@&lt;br /&gt;
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @&lt;br /&gt;
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@**@@@@@@@@@@@@@@@@@@@@@@@@@@@@@&lt;br /&gt;
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!&lt;br /&gt;
Someone could be eavesdropping on you right now (man-in-the-middle&lt;br /&gt;
attack)!&lt;br /&gt;
It is also possible that the RSA host key has just been changed.&lt;br /&gt;
The fingerprint for the RSA key sent by the remote host is&lt;br /&gt;
53:f9:60:71:a8:0b:5d:74:83:52:**fe:ea:1a:9e:cc:d3.&lt;br /&gt;
Please contact your system administrator.&lt;br /&gt;
Add correct host key in /home/&amp;lt;user&amp;gt;/.ssh/known_hosts to get rid of&lt;br /&gt;
this message.&lt;br /&gt;
Offending key in /home/&amp;lt;user&amp;gt;/.ssh/known_hosts:3&lt;br /&gt;
RSA host key for login.scinet.utoronto.ca &lt;br /&gt;
&amp;lt;http://login.scinet.utoronto.ca &amp;lt;http://login.scinet.utoronto.ca&amp;gt;&amp;gt; has&lt;br /&gt;
changed and you have requested&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* If you get the message below you may need to logout of your gnome session and log back in since the ssh-agent needs to be&lt;br /&gt;
restarted with the new passphrase ssh key.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh USERNAME@login.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Agent admitted failure to sign using the key.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===(Optional) Using &amp;lt;tt&amp;gt;ssh-agent&amp;lt;/tt&amp;gt; to Remember Your Passphrase===&lt;br /&gt;
&lt;br /&gt;
But now you've just replaced having to type a password for login with having to type a passphrase for your key; what have you gained?  &lt;br /&gt;
&lt;br /&gt;
It turns out that there's an automated way to manage ssh &amp;quot;identities&amp;quot;, using the &amp;lt;tt&amp;gt;ssh-agent&amp;lt;/tt&amp;gt; command, which should automatically be running on newer Linux or Mac&amp;amp;nbsp;OS&amp;amp;nbsp;X machines.   You can add keys to this agent for the duration of your login using the &amp;lt;tt&amp;gt;ssh-add&amp;lt;/tt&amp;gt; command:&lt;br /&gt;
&lt;br /&gt;
 $ ssh-add&lt;br /&gt;
 Enter passphrase for /home/USERNAME/.ssh/id_rsa: &lt;br /&gt;
 Identity added: /home/USERNAME/.ssh/id_rsa (/home/USERNAME/.ssh/id_rsa)&lt;br /&gt;
&lt;br /&gt;
and then logins will not require the passphrase, as &amp;lt;tt&amp;gt;ssh-agent&amp;lt;/tt&amp;gt; will provide access to the key.&lt;br /&gt;
&lt;br /&gt;
When you log out of your home computer, the ssh agent will close, and next time you log in, you will have to &amp;lt;tt&amp;gt;ssh-add&amp;lt;/tt&amp;gt; your key.  You can also set a timeout of (say) an hour by using &amp;lt;tt&amp;gt;ssh-add -t 3600&amp;lt;/tt&amp;gt;.  This minimizes the number of times you have to type your passphrase, while still maintaining some degree of key security.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Multiple ssh private keys ===&lt;br /&gt;
In quite a few situations its preferred to have ssh keys dedicated to each service, specific role or domain.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ssh-keygen -t rsa -f ~/.ssh/id_rsa.SciNet   -C &amp;quot;Key for SciNet&amp;quot;&lt;br /&gt;
ssh-keygen -t rsa -f ~/.ssh/id_rsa.SHARCNET -C &amp;quot;Key for SHARCNET&amp;quot;&lt;br /&gt;
ssh-keygen -t rsa -f ~/.ssh/id_rsa.DCS      -C &amp;quot;Key for Dept. Of Computer Science&amp;quot;&lt;br /&gt;
ssh-keygen -t rsa -f ~/.ssh/id_rsa.CITA     -C &amp;quot;Key for CITA&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use different file names for each key. Lets assume that there are 2 keys, ~/.ssh/id_rsa.SciNet and ~/.ssh/id_rsa.SHARCNET. The simple way of making sure each of the keys works all the time is to now create config file for ssh:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
touch ~/.ssh/config&lt;br /&gt;
chmod 600 ~/.ssh/config&lt;br /&gt;
echo &amp;quot;IdentityFile ~/.ssh/id_rsa.SciNet&amp;quot;   &amp;gt;&amp;gt; ~/.ssh/config&lt;br /&gt;
echo &amp;quot;IdentityFile ~/.ssh/id_rsa.SHARCNET&amp;quot; &amp;gt;&amp;gt; ~/.ssh/config&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This would make sure that both keys are always used whenever ssh makes a connection. However, ssh config lets you get down to a much finer level of control on keys and other per-connection setups. The recommendation is to use a key selection based on the Hostname. For example, a ~/.ssh/config that looks like this :&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Host SciNet&lt;br /&gt;
  Hostname login.scinet.utoronto.ca&lt;br /&gt;
  IdentityFile ~/.ssh/id_dsa.SciNet&lt;br /&gt;
  User pinto&lt;br /&gt;
&lt;br /&gt;
Host SHARCNET&lt;br /&gt;
  Hostname sharcnet.ca&lt;br /&gt;
  IdentityFile ~/.ssh/id_rsa.SHARCNET&lt;br /&gt;
  User jchong&lt;br /&gt;
  Port 44787&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And just login with the shortcut:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh SciNet&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== SSH tunnel ===&lt;br /&gt;
A more obscure use of ssh is to generate a communication tunnel. As an example, assume you want to access a website running on a remotehost using your localhost, but there is a firewall between the 2 systems blocking every port, except incoming ssh.&lt;br /&gt;
&lt;br /&gt;
The basic syntax of the ssh command for such a purpose is: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ssh -f -N -L localport:localhost:remoteport user@remotehost&lt;br /&gt;
# -f puts ssh in background&lt;br /&gt;
# -N makes it not execute a remote command&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If the remote website broadcasts on the default port 80, you could do the following:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ssh -L 8080:localhost:remotehost:80 tunneluser@remotehost&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
... and point your local browser to http://localhost:8080&lt;br /&gt;
&lt;br /&gt;
If you don't want to remember the above sequence of flags all the time, you can add an entry to your ~/.ssh/config:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Host tunnel&lt;br /&gt;
    HostName remotehost&lt;br /&gt;
    IdentityFile ~/.ssh/id_rsa.tunnel&lt;br /&gt;
    LocalForward 8080 127.0.0.1:80&lt;br /&gt;
    User tunneluser&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To open the tunnel just issue the command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh -f -N tunnel&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Kzuberi</name></author>
	</entry>
</feed>