R Statistical Package

From oldwiki.scinet.utoronto.ca
Revision as of 13:09, 18 September 2012 by Rzon (talk | contribs)
Jump to navigation Jump to search

R is powerful statistical and plotting software available on the GPC in the module R. In fact, there are currently two R modules installed, 2.13.1 and 2.14.1. While the former is the default, we do recommend making the transition to the newer version, which you load by specifying the version number explicitly:

$ module load intel R/2.14.1

(The intel module is a prerequesite for the R module).

Many optional packages are available for R which add functionality for specific domains; they are available through the Comprehensive R Archive Network (CRAN).

R provides an easy way for users to install the libraries they need in their home directories rather than having them installed system-wide; there are so many potential optional packages for R people could potentially want, we recommend users who want additional packages to proceed this way. This is almost certainly the easiest way to deal with the wide range of packages, ensure they're up to date, and ensure that users package choices don't conflict.

In general, you can install those that you need yourself in your home directory; eg,

$ R 
> install.packages("package-name", dependencies = TRUE)

will download and compile the source for the packags you need in your home directory under ${HOME}/R/x86_64-unknown-linux-gnu-library/2.11/ (you can specify another directory with a lib= option.) Then take a look at help(".libPaths") to make sure that R knows where to look for the packages you've compiled.

Installing Rmpi (R with MPI)

The newer R installation on the GPC, 2.14.1, has Rmpi installed by default using OpenMPI. The default R module is, however, still 2.13.1, which does not have the Rmpi library as a standard package, which means you have to install it yourself. The same is true if you want to use IntelMPI instead of OpenMPI.

Installing the Rmpi package can be a bit challenging, since some additional parameters need to be given to the installation, which contain the path to various header files and libraries. These paths differ depending on what MPI version you are using.

The various MPI versions on the GPC are loaded with the module command. So the first thing to do is to decide what mpi version to use (openmpi or intelmpi), and to put the corresponding "module load" command in your .bashrc file in your home directory.

The newer R installation on the GPC, 2.14.1, has Rmpi installed by default using OpenMPI. The default R module is, however, still 2.13.1, which does not have the Rmpi library as a standard package, which means you have to install it yourself. The same is true if you want to use IntelMPI instead of OpenMPI.

Because the MPI modules define all the paths in environment variables, the following line seem to work for installations of all openmpi versions.

> install.packages("Rmpi",
                   configure.args =
                   c(paste("--with-Rmpi-include=",Sys.getenv("SCINET_MPI_INC"),sep=""),
                     paste("--with-Rmpi-libpath=",Sys.getenv("SCINET_MPI_LIB"),sep=""),
                     "--with-Rmpi-type=OPENMPI"))

For intelmpi, you only need to change OPENMPI to MPICH2 in the last line.

Running Rmpi

To start using R with Rmpi, make sure you have all require modules loaded (e.g. module load intel openmpi R/2.14.1), then launch it with

$ mpirun -np 1 R --no-save

which starts one master mpi process, but starts up the infrastructure to be able to spawn additional processes.

Running serial R jobs

As with all serial jobs, if your R computation do not use multiple cores, you should bundle them up so the 8 cores of a nodes are all performing work. Examples of this can be found on the User_Serial page.