R Statistical Package

From oldwiki.scinet.utoronto.ca
Revision as of 15:28, 28 August 2014 by Rzon (talk | contribs)
Jump to navigation Jump to search

Running R on the GPC

R is powerful statistical and plotting software available on the GPC in the module R. In fact, there are currently five R modules installed, 2.13.1, 2.14.1, 2.15.1, 3.0.0 and 3.0.1. While the former is the default, we do recommend making the transition to the newer version, which you load by specifying the version number explicitly:

$ module load intel R/3.0.1

(The intel module is a prerequesite for the R module). If you will be using Rmpi, you will need to load the openmpi module as well.

Many optional packages are available for R which add functionality for specific domains; they are available through the Comprehensive R Archive Network (CRAN).

R provides an easy way for users to install the libraries they need in their home directories rather than having them installed system-wide; there are so many potential optional packages for R people could potentially want, we recommend users who want additional packages to proceed this way. This is almost certainly the easiest way to deal with the wide range of packages, ensure they're up to date, and ensure that users package choices don't conflict.

In general, you can install those that you need yourself in your home directory; eg,

$ R 
> install.packages("package-name", dependencies = TRUE)

will download and compile the source for the packages you need in your home directory under ${HOME}/R/x86_64-unknown-linux-gnu-library/2.11/ (you can specify another directory with a lib= option.) Then take a look at help(".libPaths") to make sure that R knows where to look for the packages you've compiled.

Note that during the installation you may get warnings that the packages cannot be installed in e.g. /scinet/gpc/Applications/R/3.0.1/lib64/R/bin/. But after those messages, R should have succeeded in installing the package into your home directory.

Installing Rmpi (R with MPI)

The newer R installation on the GPC, 2.14.1, has Rmpi installed by default using OpenMPI. The default R module is, however, still 2.13.1, which does not have the Rmpi library as a standard package, which means you have to install it yourself. The same is true if you want to use IntelMPI instead of OpenMPI.

Installing the Rmpi package can be a bit challenging, since some additional parameters need to be given to the installation, which contain the path to various header files and libraries. These paths differ depending on what MPI version you are using.

The various MPI versions on the GPC are loaded with the module command. So the first thing to do is to decide what mpi version to use (openmpi or intelmpi), and to type the corresponding "module load" command on the command-line (as well as in your jobs scripts).

Because the MPI modules define all the paths in environment variables, the following line seem to work for installations of all openmpi versions.

> install.packages("Rmpi",
                   configure.args =
                   c(paste("--with-Rmpi-include=",Sys.getenv("SCINET_MPI_INC"),sep=""),
                     paste("--with-Rmpi-libpath=",Sys.getenv("SCINET_MPI_LIB"),sep=""),
                     "--with-Rmpi-type=OPENMPI"))

For intelmpi, you only need to change OPENMPI to MPICH2 in the last line.

Running Rmpi

To start using R with Rmpi, make sure you have all require modules loaded (e.g. module load intel openmpi R/2.14.1), then launch it with

$ mpirun -np 1 R --no-save

which starts one master mpi process, but starts up the infrastructure to be able to spawn additional processes.

Running serial R jobs

As with all serial jobs, if your R computation do not use multiple cores, you should bundle them up so the 8 cores of a nodes are all performing work. Examples of this can be found on the User_Serial page.