Gromacs

From oldwiki.scinet.utoronto.ca
Revision as of 14:01, 19 December 2013 by Ejspence (talk | contribs) (→‎Gromacs 4.6.3 (Single Precision))
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

WARNING: The last edit of this page is over two years old. The information on this page may be out-of-date.

Download and general information: http://www.gromacs.org

Search the mailing list archives: http://www.gromacs.org/Support/Mailing_Lists/Search

Gromacs 4.6.3 (Single Precision)

Gromacs has been updated to its latest version to date (August 7th 2013). It does use the new intel and openmpi libraries so the modules that it depends on are not the same as for Gromacs 4.5.1. If you do use the older libraries, you will get an error stating that libirng.so cannot be found. This file is part of intel/13.1.1. In order to load all the associated libraries as well as the latest version of gromacs every time you login, have the following in the .bashrc file:

<source lang="bash">

  module load intel/13.1.1 openmpi/intel/1.6.4 fftw extras gromacs/4.6.3

</source>

== mnaqvi 7 August 2013

Note: loading modules within .bashrc files is no longer recommended by SciNet staff because it can lead to unpredictable behaviour, since the .bashrc file is referenced by every bash script which is run. It is instead recommended that module load commands be typed in at the command line, and/or be placed within job submission scripts.

== ejspence (SciNet staff) 19 December 2013

Gromacs 4.5.1 (Single Precision)

Built with Intel compilers and OpenMPI

Used cmake

In "build" directory:

  module load gcc intel openmpi extras cmake
  cmake -D GMX_MPI=ON -D CMAKE_INSTALL_PREFIX=/scinet/gpc/Applications/gromacs/4.5.1 -D FFTW3F_INCLUDE_DIR=$SCINET_FFTW_INC\
      -D FFTW3F_LIBRARIES=$SCINET_FFTW_LIB/libfftw3f.a ../gromacs-4.5.1/
  make >& make.out
  make install


Created the gromacs/4.5.1 module. Prerequisite modules: intel, openmpi, extras.

<source lang="tcl">

 #%Module -*- tcl -*-
 # gromacs
 proc ModulesHelp { } {
   puts stderr "\tThis module adds gromacs 4.5.1 (single precision) environment variables"
 }
 module-whatis "adds gromacs 4.5.1 (single precision) environment variables"
 # gromacs was compiled with Intel compilers and OpenMPI
 prereq intel 
 prereq openmpi
 prereq extras
 setenv SCINET_GROMACS_HOME /scinet/gpc/Applications/gromacs/4.5.1
 setenv SCINET_GROMACS_BIN /scinet/gpc/Applications/gromacs/4.5.1/bin
 setenv SCINET_MDRUN /scinet/gpc/Applications/gromacs/4.5.1/bin/mdrun

</source>

Here is a sample script for running Gromacs on 4 nodes, on the IB partition of the GPC:

<source lang="bash">

  1. !/bin/bash
  2. PBS -l nodes=4:ib:ppn=8,walltime=08:50:00
  3. PBS -N test

cd $PBS_O_WORKDIR mpirun -np 32 -hostfile $PBS_NODEFILE $SCINET_MDRUN -v -s test.tpr -deffnm after_test </source>


== dgruner 1 October 2010

Peculiarities of running single node GROMACS jobs on SCINET

This is VERY IMPORTANT !!! Please read the [relevant user tips section] for information that is essential for your single node (up to 8 core) MPI GROMACS jobs.

-- cneale 14 September 2009

Compiling GROMACS on SciNet

Please refer to the GROMACS compilation page

Submitting GROMACS jobs on SciNet

Please refer to the GROMACS submission page

-- cneale 18 August 2009

GROMACS benchmarks on Scinet

This is a rudimentary list of scaling information.

I have a 50K atom system running performance on GPC right now. On 56 cores connected with IB I am getting 55 ns/day. I set up 50 such simulations, each with 2 proteins in a bilayer and I'm getting a total of 5.5 us per day. I am using gromacs 4.0.5 and a 5 fs timestep by fixing the bond lengths and all angles involving hydrogen.

I can get about 12 ns/day on 8 cores of the non-IB part of GPC -- also excellent.

As for larger systems, My speedup over saw.sharcnet.ca for a 1e6 atom system is only 1.2x running on 128 cores in single precision. Although saw.sharcnet.ca is composed of xeons, they are running at 2.83 GHz (https://www.sharcnet.ca/my/systems/show/41), which is a faster clock speed than the Scinet 2.5 GHz for Intel's next-generation X86-CPU architecture. While GROMACS is generally not excellent for scaling up to or beyond 128 cores (even for large systems), our benchmarking of this system on saw.sharcnet.ca indicated that it was running at about 65% efficiency. Benchmarking was also done on Scinet for this system, but was not recorded as we were mostly tinkering with the -npme option to mdrun in an attempt to optimize it. My recollection, though, is that the scaling was similar on scinet.

-- cneale 19 August 2009

Strong scaling for GROMACS on GPC

Requested, and on our list to complete, but not yet available in a complete chart form.

-- cneale 19 August 2009

Scientific studies being carried out using GROMACS on GPC

Requested, but not yet available

-- cneale 19 August 2009

Hyperthreading with Gromacs

Using -np 16 on an 8 core box, I get an 8% to 18% performance increase when using -np 16 and optimizing -npme as compared to -np 8 and optimizing -npme (using gromacs 4.0.7). I now regularly overload the number of processes.

selected examples: System A with 250,000 atoms:

 mdrun -np 8  -npme -1    1.15 ns/day
 mdrun -np 8  -npme  2    1.02 ns/day
 mdrun -np 16 -npme  2    0.99 ns/day
 mdrun -np 16 -npme  4    1.36 ns/day <-- 118 % performance vs 1.15 ns/day
 mdrun -np 15 -npme  3    1.32 ns/day

System B with 35,000 atoms (4 fs timestep):

 mdrun -np 8  -npme -1    22.66 ns/day
 mdrun -np 8  -npme  2    23.06 ns/day
 mdrun -np 16 -npme -1    22.69 ns/day
 mdrun -np 16 -npme  4    24.90 ns/day <-- 108 % performance vs 23.06 ns/day
 mdrun -np 56 -npme 16    14.15 ns/day

Cutoffs and timesteps differ between these runs, but both use PME and explicit water.

And according to gromacs developer Berk Hess ( http://lists.gromacs.org/pipermail/gmx-users/2010-August/053033.html )

"In Gromacs 4.5 there is no difference [between -np and -nt based hyperthreading], since it does not use real thread parallelization. Gromacs 4.5 has a built-in threaded MPI library, but openmpi also has an efficient MPI implementation for shared memory machines. But even with proper thread parallelization I expect the same 15 to 20% performance improvement."