R Statistical Package

From oldwiki.scinet.utoronto.ca
Revision as of 16:06, 30 March 2011 by Rzon (talk | contribs)
Jump to navigation Jump to search

R is powerful statistical and plotting software available on the GPC in the module R.

Many optional packages are available for R which add functionality for specific domains; they are available through the Comprehensive R Archive Network (CRAN).

R provides an easy way for users to install the libraries they need in their home directories rather than having them installed system-wide; there are so many potential optional packages for R people could potentially want, we recommend users who want additional packages to proceed this way. This is almost certainly the easiest way to deal with the wide range of packages, ensure they're up to date, and ensure that users package choices don't conflict.

In general, you can install those that you need yourself in your home directory; eg,

$ R 
> install.packages("package-name", dependencies = TRUE)

will download and compile the source for the packags you need in your home directory under ${HOME}/R/x86_64-unknown-linux-gnu-library/2.11/ (you can specify another directory with a lib= option.) Then take a look at help(".libPaths") to make sure that R knows where to look for the packages you've compiled.

Rmpi (R with MPI)

Installing the Rmpi package can be a bit challenging, since some additional parameters need to be given to the installation, which contain the path to various header files and libraries. These paths differ depending on what MPI version you are using.

The various MPI versions on the GPC are loaded with the module command. So the first thing to do is to decide what mpi version to use (openmpi or intelmpi), and to put the corresponding "module load" command in you .bashrc file in you home directory.

Because the MPI modules define all the path in environment variables, the following line seem to work for all openmpi versions.

install.packages("Rmpi",
                 configure.args =
                 c(paste("--with-Rmpi-include=",Sys.getenv("SCINET_MPI_INC"),sep=""),
                   paste("--with-Rmpi-libpath=",Sys.getenv("SCINET_MPI_LIB"),sep=""),
                   "--with-Rmpi-type=OPENMPI"))

For intelmpi, you only need to change OPENMPI to MPICH2 in the last line.

To start using R with Rmpi, launch it with

   mpirun -np 1 R --no-save

which starts one master mpi process, but starts up the infrastructure to be able to spawn additional processes.

When running on ethernet nodes, one can get rid of the annoying messages informing you that you're not using infiniband as follows.

This works for openmpi:

mpirun --mca btl self,sm,openib -np 1 R --no-save

For intelmpi, one can use instead:

mpirun -np 1 -env I_MPI_FABRICS shm:tcp R --no-save