Difference between revisions of "R Statistical Package"

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search
Line 1: Line 1:
[http://www.r-project.org/ R] is powerful statistical and plotting software available on the [[GPC_Quickstart|GPC]] in the [[Software_and_Libraries|module]] R.
+
[http://www.r-project.org/ R] is powerful statistical and plotting software available on the [[GPC_Quickstart|GPC]] in the [[Software_and_Libraries|module]] R.  In fact, there are currently two R modules installed, 2.13.1 and 2.14.1.  While the former is the default, we do recommend making the transition to the newer version, which you load by specifying the version number explicitly:
 +
<pre>
 +
$ module load intel R/2.14.1
 +
</pre>
 +
(The intel module is a prerequesite for the R module).
  
 
Many optional packages are available for R which add functionality for specific domains; they are available through the [http://cran.r-project.org/mirrors.html Comprehensive R Archive Network (CRAN)].  
 
Many optional packages are available for R which add functionality for specific domains; they are available through the [http://cran.r-project.org/mirrors.html Comprehensive R Archive Network (CRAN)].  
Line 14: Line 18:
 
will download and compile the source for the packags you need in your home  directory under <tt>${HOME}/R/x86_64-unknown-linux-gnu-library/2.11/</tt> (you can specify another directory with a lib= option.)  Then take a look at  help(".libPaths") to make sure that R knows where to look for  the packages you've compiled.
 
will download and compile the source for the packags you need in your home  directory under <tt>${HOME}/R/x86_64-unknown-linux-gnu-library/2.11/</tt> (you can specify another directory with a lib= option.)  Then take a look at  help(".libPaths") to make sure that R knows where to look for  the packages you've compiled.
  
=== Rmpi (R with MPI) ===
+
=== Installing Rmpi (R with MPI) ===
  
 
The newer R installation on the GPC, 2.14.1, has Rmpi installed by default using OpenMPI.  The default R module is, however, still 2.13.1, which does not have the Rmpi library as a standard package, which means you have to install it yourself.  The same is true if you want to use IntelMPI instead of OpenMPI.   
 
The newer R installation on the GPC, 2.14.1, has Rmpi installed by default using OpenMPI.  The default R module is, however, still 2.13.1, which does not have the Rmpi library as a standard package, which means you have to install it yourself.  The same is true if you want to use IntelMPI instead of OpenMPI.   
Line 36: Line 40:
 
For intelmpi, you only need to change <tt>OPENMPI</tt> to <tt>MPICH2</tt> in the last line.
 
For intelmpi, you only need to change <tt>OPENMPI</tt> to <tt>MPICH2</tt> in the last line.
  
To start using R with Rmpi, launch it with
+
=== Running Rmpi ===
 +
 
 +
To start using R with Rmpi, make sure you have all require modules loaded (e.g. <tt>module load intel openmpi R/2.14.1</tt>), then launch it with
 
<pre>
 
<pre>
 
$ mpirun -np 1 R --no-save
 
$ mpirun -np 1 R --no-save
 
</pre>
 
</pre>
 
which starts one master mpi process, but starts up the infrastructure to be able to spawn additional processes.
 
which starts one master mpi process, but starts up the infrastructure to be able to spawn additional processes.
 +
 +
=== Running serial R jobs ===
 +
 +
As with all serial jobs, if your R computation do not use multiple cores, you should bundle them up so the 8 cores of a nodes are all performing work.  Examples of this can be found on the [[User_Serial]] page.

Revision as of 13:09, 18 September 2012

R is powerful statistical and plotting software available on the GPC in the module R. In fact, there are currently two R modules installed, 2.13.1 and 2.14.1. While the former is the default, we do recommend making the transition to the newer version, which you load by specifying the version number explicitly:

$ module load intel R/2.14.1

(The intel module is a prerequesite for the R module).

Many optional packages are available for R which add functionality for specific domains; they are available through the Comprehensive R Archive Network (CRAN).

R provides an easy way for users to install the libraries they need in their home directories rather than having them installed system-wide; there are so many potential optional packages for R people could potentially want, we recommend users who want additional packages to proceed this way. This is almost certainly the easiest way to deal with the wide range of packages, ensure they're up to date, and ensure that users package choices don't conflict.

In general, you can install those that you need yourself in your home directory; eg,

$ R 
> install.packages("package-name", dependencies = TRUE)

will download and compile the source for the packags you need in your home directory under ${HOME}/R/x86_64-unknown-linux-gnu-library/2.11/ (you can specify another directory with a lib= option.) Then take a look at help(".libPaths") to make sure that R knows where to look for the packages you've compiled.

Installing Rmpi (R with MPI)

The newer R installation on the GPC, 2.14.1, has Rmpi installed by default using OpenMPI. The default R module is, however, still 2.13.1, which does not have the Rmpi library as a standard package, which means you have to install it yourself. The same is true if you want to use IntelMPI instead of OpenMPI.

Installing the Rmpi package can be a bit challenging, since some additional parameters need to be given to the installation, which contain the path to various header files and libraries. These paths differ depending on what MPI version you are using.

The various MPI versions on the GPC are loaded with the module command. So the first thing to do is to decide what mpi version to use (openmpi or intelmpi), and to put the corresponding "module load" command in your .bashrc file in your home directory.

The newer R installation on the GPC, 2.14.1, has Rmpi installed by default using OpenMPI. The default R module is, however, still 2.13.1, which does not have the Rmpi library as a standard package, which means you have to install it yourself. The same is true if you want to use IntelMPI instead of OpenMPI.

Because the MPI modules define all the paths in environment variables, the following line seem to work for installations of all openmpi versions.

> install.packages("Rmpi",
                   configure.args =
                   c(paste("--with-Rmpi-include=",Sys.getenv("SCINET_MPI_INC"),sep=""),
                     paste("--with-Rmpi-libpath=",Sys.getenv("SCINET_MPI_LIB"),sep=""),
                     "--with-Rmpi-type=OPENMPI"))

For intelmpi, you only need to change OPENMPI to MPICH2 in the last line.

Running Rmpi

To start using R with Rmpi, make sure you have all require modules loaded (e.g. module load intel openmpi R/2.14.1), then launch it with

$ mpirun -np 1 R --no-save

which starts one master mpi process, but starts up the infrastructure to be able to spawn additional processes.

Running serial R jobs

As with all serial jobs, if your R computation do not use multiple cores, you should bundle them up so the 8 cores of a nodes are all performing work. Examples of this can be found on the User_Serial page.