Namd on BGQ
Taking into consideration the NAMD performance tuning documentation, a parameter study was undertaken to test simulation performance and efficiency of NAMD on the BG/Q cluster. The system of study is a 246,000 atom membrane protein simulation (Cytochrome c Oxidase embedded in a TIP3P solvated DPPC bilayer) using the CHARMM36 forcefield (protein and lipids). The unit cell is cubic with box dimensions 144 x 144 x 117 Angstroms.
Performance Tuning Benchmarks
Efficiency was measured with respect to the 16 ranks-per-node 512 core simulation. All simulations were conducted from a restart trajectory from an equilibrated system.
Ranks | Cores | NAMD Config Options | ns/day | Efficiency |
16 | 512 | 2.79 | 1.00 | |
16 | 1024 | 5.05 | 0.91 | |
16 | 1024 | twoAwayX (default) | 5.62 | 1.01 |
16 | 2048 | twoAwayX (default) | 10.07 | 0.90 |
16 | 2048 | twoAwayXY | 10.59 | 0.95 |
16 | 4096 | twoAwayX | 14.32 | 0.64 |
16 | 4096 | twoAwayXY (default) | 17.63 | 0.79 |
16 | 4096 | twoAwayXYZ | 16.79 | 0.75 |
16 | 8192 | twoAwayX | 23.52 | 0.53 |
16 | 8192 | twoAwayXY (default) | 25.00 | 0.56 |
16 | 16384 | twoAwayX | 23.67 | 0.27 |
16 | 16384 | twoAwayXY | 28.31 | 0.32 |
16 | 16384 | twoAwayXYZ (default) | 27.98 | 0.31 |
16 | 4096 | twoAwayXY, PMEPencils=8, lblUnload=yes | 12.93 | 0.58 |
16 | 4096 | twoAwayXY, PMEPencils=12, lblUnload=yes | 17.27 | 0.77 |
16 | 4096 | twoAwayXY, PMEPencils=16, lblUnload=yes | 16.02 | 0.72 |
16 | 4096 | twoAwayXY, PMEPencils=20, lblUnload=yes | 15.41 | 0.69 |
16 | 4096 | twoAwayXY, PMEPencils=12 | 16.21 | 0.73 |
16 | 4096 | twoAwayXY, PMEPencils=16 | 17.92 | 0.80 |
16 | 4096 | twoAwayXY, PMEPencils=20 | 17.99 | 0.81 |
16 | 4096 | twoAwayXY, PMEPencils=24 | 17.83 | 0.80 |
16 | 4096 | twoAwayXY, PMEPencils=36 | 16.97 | 0.76 |
8 | 4096 | twoAwayXY, PMEPencils=20 | 18.24 | 0.82 |
16 | 4096 | twoAwayXY, PMEPencils=20 | 17.99 | 0.81 |
32 | 4096 | twoAwayXY, PMEPencils=20 | 13.94 | 0.63 |
4 | 512 | 2.86 | 1.03 | |
8 | 512 | 2.84 | 1.02 | |
16 | 512 | 2.79 | 1.00 | |
32 | 512 | 2.29 | 0.82 | |
16 | 512 | ldbUnloadZero=yes | 2.79 | 1.00 |
Incorrect Particle-Mesh Ewald Grid
Long-range electrostatics are computed using PME for all simulations with PME grid spacing set to be generated automatically with the pmeGridSpacing 1.0 setting. A poor choice in PME grid spacing (i.e. not a multiple of 2,3, and 5) can result in increasingly large performance degradation.
Ranks | Cores | NAMD Config Options | ns/day | Efficiency |
16 | 512 | Poor PME Multiple (144x144x111) | 2.70 | 0.97 |
16 | 1024 | Poor PME Multiple (144x144x111) | 5.13 | 0.92 |
16 | 2048 | Poor PME Multiple (144x144x111) | 8.61 | 0.77 |
16 | 4096 | Poor PME Multiple (144x144x111) | 13.93 | 0.62 |
16 | 8192 | Poor PME Multiple (144x144x111) | 17.08 | 0.38 |
16 | 16384 | Poor PME Multiple (144x144x111) | 17.64 | 0.20 |