Gromacs
Download and general information: http://www.gromacs.org
Search the mailing list archives: http://www.gromacs.org/Support/Mailing_Lists/Search
Gromacs 4.5.1 (Single Precision)
Built with Intel compilers and OpenMPI
Used cmake
In "build" directory:
module load gcc intel openmpi extras cmake
cmake -D GMX_MPI=ON -D CMAKE_INSTALL_PREFIX=/scinet/gpc/Applications/gromacs/4.5.1 -D FFTW3F_INCLUDE_DIR=$SCINET_FFTW_INC\ -D FFTW3F_LIBRARIES=$SCINET_FFTW_LIB/libfftw3f.a ../gromacs-4.5.1/
make >& make.out
make install
Created the gromacs/4.5.1 module. Prerequisite modules: intel, openmpi, extras.
#%Module -*- tcl -*-
# gromacs
proc ModulesHelp { } { puts stderr "\tThis module adds gromacs 4.5.1 (single precision) environment variables" }
module-whatis "adds gromacs 4.5.1 (single precision) environment variables"
# gromacs was compiled with Intel compilers and OpenMPI prereq intel prereq openmpi prereq extras
setenv SCINET_GROMACS_HOME /scinet/gpc/Applications/gromacs/4.5.1 setenv SCINET_GROMACS_BIN /scinet/gpc/Applications/gromacs/4.5.1/bin setenv SCINET_MDRUN /scinet/gpc/Applications/gromacs/4.5.1/bin/mdrun
== dgruner 1 October 2010
Peculiarities of running single node GROMACS jobs on SCINET
This is VERY IMPORTANT !!! Please read the [relevant user tips section] for information that is essential for your single node (up to 8 core) MPI GROMACS jobs.
-- cneale 14 September 2009
Compiling GROMACS on SciNet
Please refer to the GROMACS compilation page
Submitting GROMACS jobs on SciNet
Please refer to the GROMACS submission page
-- cneale 18 August 2009
GROMACS benchmarks on Scinet
This is a rudimentary list of scaling information.
I have a 50K atom system running performance on GPC right now. On 56 cores connected with IB I am getting 55 ns/day. I set up 50 such simulations, each with 2 proteins in a bilayer and I'm getting a total of 5.5 us per day. I am using gromacs 4.0.5 and a 5 fs timestep by fixing the bond lengths and all angles involving hydrogen.
I can get about 12 ns/day on 8 cores of the non-IB part of GPC -- also excellent.
As for larger systems, My speedup over saw.sharcnet.ca for a 1e6 atom system is only 1.2x running on 128 cores in single precision. Although saw.sharcnet.ca is composed of xeons, they are running at 2.83 GHz (https://www.sharcnet.ca/my/systems/show/41), which is a faster clock speed than the Scinet 2.5 GHz for Intel's next-generation X86-CPU architecture. While GROMACS is generally not excellent for scaling up to or beyond 128 cores (even for large systems), our benchmarking of this system on saw.sharcnet.ca indicated that it was running at about 65% efficiency. Benchmarking was also done on Scinet for this system, but was not recorded as we were mostly tinkering with the -npme option to mdrun in an attempt to optimize it. My recollection, though, is that the scaling was similar on scinet.
-- cneale 19 August 2009
Strong scaling for GROMACS on GPC
Requested, and on our list to complete, but not yet available in a complete chart form.
-- cneale 19 August 2009
Scientific studies being carried out using GROMACS on GPC
Requested, but not yet available
-- cneale 19 August 2009
Hyperthreading with Gromacs
Using -np 16 on an 8 core box, I get an 8% to 18% performance increase when using -np 16 and optimizing -npme as compared to -np 8 and optimizing -npme (using gromacs 4.0.7). I now regularly overload the number of processes.
selected examples: System A with 250,000 atoms:
mdrun -np 8 -npme -1 1.15 ns/day mdrun -np 8 -npme 2 1.02 ns/day mdrun -np 16 -npme 2 0.99 ns/day mdrun -np 16 -npme 4 1.36 ns/day <-- 118 % performance vs 1.15 ns/day mdrun -np 15 -npme 3 1.32 ns/day
System B with 35,000 atoms (4 fs timestep):
mdrun -np 8 -npme -1 22.66 ns/day mdrun -np 8 -npme 2 23.06 ns/day mdrun -np 16 -npme -1 22.69 ns/day mdrun -np 16 -npme 4 24.90 ns/day <-- 108 % performance vs 23.06 ns/day mdrun -np 56 -npme 16 14.15 ns/day
Cutoffs and timesteps differ between these runs, but both use PME and explicit water.
And according to gromacs developer Berk Hess ( http://lists.gromacs.org/pipermail/gmx-users/2010-August/053033.html )
"In Gromacs 4.5 there is no difference [between -np and -nt based hyperthreading], since it does not use real thread parallelization. Gromacs 4.5 has a built-in threaded MPI library, but openmpi also has an efficient MPI implementation for shared memory machines. But even with proper thread parallelization I expect the same 15 to 20% performance improvement."