Namd on BGQ
A parameter study was undertaken to test simulation performance and efficiency of NAMD on the Blue Gene/Q cluster with attention to NAMD performance tuning documentation. The system of study is a 246,000 atom membrane protein simulation (Cytochrome c Oxidase embedded in a TIP3P solvated DPPC bilayer) using the CHARMM36 forcefield (protein and lipids). The unit cell is cubic with box dimensions 144 x 144 x 117 Angstroms.
Performance Tuning Benchmarks
Efficiency is measured with respect to the 16 ranks-per-node 512 core simulation. All simulations are started using a restart file from a pre-equilibrated snapshot. Performance in nanoseconds per day is based on the geometric mean of the three Benchmark time lines at the beginning of the simulation's standard output and any additional configuration options are listed where they are used.
Ranks | Cores | NAMD Config Options | ns/day | Efficiency |
16 | 512 | 2.79 | 1.00 | |
16 | 1024 | 5.05 | 0.91 | |
16 | 1024 | twoAwayX (default) | 5.62 | 1.01 |
16 | 2048 | twoAwayX (default) | 10.07 | 0.90 |
16 | 2048 | twoAwayXY | 10.59 | 0.95 |
16 | 4096 | twoAwayX | 14.32 | 0.64 |
16 | 4096 | twoAwayXY (default) | 17.63 | 0.79 |
16 | 4096 | twoAwayXYZ | 16.79 | 0.75 |
16 | 8192 | twoAwayX | 23.52 | 0.53 |
16 | 8192 | twoAwayXY (default) | 25.00 | 0.56 |
16 | 16384 | twoAwayX | 23.67 | 0.27 |
16 | 16384 | twoAwayXY | 28.31 | 0.32 |
16 | 16384 | twoAwayXYZ (default) | 27.98 | 0.31 |
Ranks | Cores | NAMD Config Options | ns/day | Efficiency |
16 | 4096 | twoAwayXY, PMEPencils=8, lblUnload=yes | 12.93 | 0.58 |
16 | 4096 | twoAwayXY, PMEPencils=12, lblUnload=yes | 17.27 | 0.77 |
16 | 4096 | twoAwayXY, PMEPencils=16, lblUnload=yes | 16.02 | 0.72 |
16 | 4096 | twoAwayXY, PMEPencils=20, lblUnload=yes | 15.41 | 0.69 |
16 | 4096 | twoAwayXY, PMEPencils=12 | 16.21 | 0.73 |
16 | 4096 | twoAwayXY, PMEPencils=16 | 17.92 | 0.80 |
16 | 4096 | twoAwayXY, PMEPencils=20 | 17.99 | 0.81 |
16 | 4096 | twoAwayXY, PMEPencils=24 | 17.83 | 0.80 |
16 | 4096 | twoAwayXY, PMEPencils=36 | 16.97 | 0.76 |
8 | 4096 | twoAwayXY, PMEPencils=20 | 18.24 | 0.82 |
16 | 4096 | twoAwayXY, PMEPencils=20 | 17.99 | 0.81 |
32 | 4096 | twoAwayXY, PMEPencils=20 | 13.94 | 0.63 |
4 | 512 | 2.86 | 1.03 | |
8 | 512 | 2.84 | 1.02 | |
16 | 512 | 2.79 | 1.00 | |
32 | 512 | 2.29 | 0.82 | |
16 | 512 | ldbUnloadZero=yes | 2.79 | 1.00 |
Incorrect Particle-Mesh Ewald Grid
Long-range electrostatics are computed using PME for all simulations above with PME grid spacing set to be generated automatically with the pmeGridSpacing 1.0 setting. A poor choice in PME grid spacing (i.e. not a multiple of 2,3, and 5) can result in increasingly large performance degradation due to the matrix size requirements in the FFT algorithm. Below is an example of the type of performance degradation that one may expect.
Ranks | Cores | NAMD Config Options | ns/day | Efficiency |
16 | 512 | Poor PME Multiple (144x144x111) | 2.70 | 0.97 |
16 | 1024 | Poor PME Multiple (144x144x111) | 5.13 | 0.92 |
16 | 2048 | Poor PME Multiple (144x144x111) | 8.61 | 0.77 |
16 | 4096 | Poor PME Multiple (144x144x111) | 13.93 | 0.62 |
16 | 8192 | Poor PME Multiple (144x144x111) | 17.08 | 0.38 |
16 | 16384 | Poor PME Multiple (144x144x111) | 17.64 | 0.20 |