Peculiarities of running single node GROMACS jobs on SCINET

This is VERY IMPORTANT !!! Please read the [relevant user tips section] for information that is essential for your single node (up to 8 core) MPI GROMACS jobs.

-- cneale 14 September 2009

Compiling GROMACS on SciNet

Please refer to the GROMACS compilation page

Submitting GROMACS jobs on SciNet

Please refer to the GROMACS submission page

-- cneale 18 August 2009

GROMACS benchmarks on Scinet

This is a rudimentary list of scaling information.

I have a 50K atom system running performance on GPC right now. On 56 cores connected with IB I am getting 55 ns/day. I set up 50 such simulations, each with 2 proteins in a bilayer and I'm getting a total of 5.5 us per day. I am using gromacs 4.0.5 and a 5 fs timestep by fixing the bond lengths and all angles involving hydrogen.

I can get about 12 ns/day on 8 cores of the non-IB part of GPC -- also excellent.

As for larger systems, My speedup over saw.sharcnet.ca for a 1e6 atom system is only 1.2x running on 128 cores in single precision. Although saw.sharcnet.ca is composed of xeons, they are running at 2.83 GHz (https://www.sharcnet.ca/my/systems/show/41), which is a faster clock speed than the Scinet 2.5 GHz for Intel's next-generation X86-CPU architecture. While GROMACS is generally not excellent for scaling up to or beyond 128 cores (even for large systems), our benchmarking of this system on saw.sharcnet.ca indicated that it was running at about 65% efficiency. Benchmarking was also done on Scinet for this system, but was not recorded as we were mostly tinkering with the -npme option to mdrun in an attempt to optimize it. My recollection, though, is that the scaling was similar on scinet.

-- cneale 19 August 2009

Strong scaling for GROMACS on GPC

Requested, and on our list to complete, but not yet available in a complete chart form.

-- cneale 19 August 2009

Scientific studies being carried out using GROMACS on GPC

Requested, but not yet available

-- cneale 19 August 2009

Hyperthreading with Gromacs

Using -np 16 on an 8 core box, I get an 8% to 18% performance increase when using -np 16 and optimizing -npme as compared to -np 8 and optimizing -npme (using gromacs 4.0.7). I now regularly overload the number of processes.

selected examples: System A with 250,000 atoms:

 mdrun -np 8  -npme -1    1.15 ns/day
 mdrun -np 8  -npme  2    1.02 ns/day
 mdrun -np 16 -npme  2    0.99 ns/day
 mdrun -np 16 -npme  4    1.36 ns/day <-- 118 % performance vs 1.15 ns/day
 mdrun -np 15 -npme  3    1.32 ns/day

System B with 35,000 atoms (4 fs timestep):

 mdrun -np 8  -npme -1    22.66 ns/day
 mdrun -np 8  -npme  2    23.06 ns/day
 mdrun -np 16 -npme -1    22.69 ns/day
 mdrun -np 16 -npme  4    24.90 ns/day <-- 108 % performance vs 23.06 ns/day
 mdrun -np 56 -npme 16    14.15 ns/day

Cutoffs and timesteps differ between these runs, but both use PME and explicit water.

And according to gromacs developer Berk Hess ( http://lists.gromacs.org/pipermail/gmx-users/2010-August/053033.html )

"In Gromacs 4.5 there is no difference [between -np and -nt based hyperthreading], since it does not use real thread parallelization. Gromacs 4.5 has a built-in threaded MPI library, but openmpi also has an efficient MPI implementation for shared memory machines. But even with proper thread parallelization I expect the same 15 to 20% performance improvement."

Gromacs

Contents

Peculiarities of running single node GROMACS jobs on SCINET

Compiling GROMACS on SciNet

Submitting GROMACS jobs on SciNet

GROMACS benchmarks on Scinet

Strong scaling for GROMACS on GPC

Scientific studies being carried out using GROMACS on GPC

Hyperthreading with Gromacs

Navigation menu

Search