Difference between revisions of "Namd on BGQ"

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search
Line 196: Line 196:
 
|
 
|
 
|8.01
 
|8.01
|1.44
+
|1.43
 
|----
 
|----
 
|32
 
|32
Line 202: Line 202:
 
|
 
|
 
|13.74
 
|13.74
|1.23
+
|1.3
 
|----
 
|----
 
|16
 
|16
Line 208: Line 208:
 
|
 
|
 
|19.81
 
|19.81
|0.89
+
|1.12
 
|----
 
|----
 
|}
 
|}

Revision as of 15:50, 5 November 2012

A parameter study was undertaken to test simulation performance and efficiency of NAMD on the Blue Gene/Q cluster, BGQ, with attention to NAMD performance tuning documentation. Determining optimal parameters for a NAMD simulation on this system is more difficult as there are only certain simulation sizes that have optimal topologies (512, 1024, etc). The system of study is a 246,000 atom membrane protein simulation (Cytochrome c Oxidase embedded in a TIP3P solvated DPPC bilayer) using the CHARMM36 forcefield (protein and lipids). The unit cell is cubic with box dimensions 144 x 144 x 117 Angstroms and the simulation time-step was 2fs.

Performance Tuning Benchmarks

Efficiency is measured with respect to the 16 ranks-per-node 512 core simulation. All simulations are started using a restart file from a pre-equilibrated snapshot. Performance in nanoseconds per day is based on the geometric mean of the three "Benchmark time" lines at the beginning of the simulation's standard output. In this section, the PME patch grid was manually doubled in either the X, Y, or Z directions. Default PME patch doubling in NAMD 2.9 is generally recommended (twoAway parameters need not be specified in the configuration file).

Ranks Cores NAMD Config Options ns/day Efficiency
16 512 2.79 1.00
16 1024 5.05 0.91
16 1024 twoAwayX (default) 5.62 1.01
16 2048 twoAwayX (default) 10.07 0.90
16 2048 twoAwayXY 10.59 0.95
16 4096 twoAwayX 14.32 0.64
16 4096 twoAwayXY (default) 17.63 0.79
16 4096 twoAwayXYZ 16.79 0.75
16 8192 twoAwayX 23.52 0.53
16 8192 twoAwayXY (default) 25.00 0.56
16 16384 twoAwayX 23.67 0.27
16 16384 twoAwayXY 28.31 0.32
16 16384 twoAwayXYZ (default) 27.98 0.31

PME Pencils

A "pencil-based" PME decomposition may be more efficient than the default "slab-based decomposition". In this study PME pencil grids are created for both dedicated PME nodes (lblUnload=yes) and non-dedicated PME nodes. Fine-tuning of PMEPencils resulted in insignificant performance gains for this study.

Ranks Cores NAMD Config Options ns/day Efficiency
16 4096 twoAwayXY, PMEPencils=8, lblUnload=yes 12.93 0.58
16 4096 twoAwayXY, PMEPencils=12, lblUnload=yes 17.27 0.77
16 4096 twoAwayXY, PMEPencils=16, lblUnload=yes 16.02 0.72
16 4096 twoAwayXY, PMEPencils=20, lblUnload=yes 15.41 0.69
16 4096 twoAwayXY, PMEPencils=12 16.21 0.73
16 4096 twoAwayXY, PMEPencils=16 17.92 0.80
16 4096 twoAwayXY, PMEPencils=20 17.99 0.81
16 4096 twoAwayXY, PMEPencils=24 17.83 0.80
16 4096 twoAwayXY, PMEPencils=36 16.97 0.76
8 4096 twoAwayXY, PMEPencils=20 18.24 0.82
16 4096 twoAwayXY, PMEPencils=20 17.99 0.81
32 4096 twoAwayXY, PMEPencils=20 13.94 0.63

Ranks-Per-Node Study

The "ranks-per-node" or simply the number of processes per compute node is a Blue Gene/Q runjob command parameter. In this study, memory requirements were too large to use 64 due to memory errors, and also resulted in out of memory errors for 16384 core simulations of 32 ranks per node. The following efficiency estimates are measured with respect to the 16 ranks per node results for the same number of nodes.

Ranks Cores NAMD Config Options ns/day Efficiency
32 1024 4.46 1.6
32 2048 8.01 1.43
32 4096 13.74 1.3
16 8192 19.81 1.12

Incorrect Particle-Mesh Ewald Grid

Long-range electrostatics are computed using PME for all simulations above with PME grid spacing set to be generated automatically with the "pmeGridSpacing 1.0" setting. A poor choice in PME grid spacing (i.e. not a multiple of 2,3, and 5) can result in increasingly large performance degradation due to the matrix size requirements in the FFT algorithm. Below is an example of the type of performance degradation that one may expect with none of the grid dimensions are divisible by 5. One can draw a comparison to a more correct PME choice in the Performance Tuning Benchmarks above.

Ranks Cores NAMD Config Options ns/day Efficiency
16 512 Poor PME Multiple (144x144x111) 2.70 0.97
16 1024 Poor PME Multiple (144x144x111) 5.13 0.92
16 2048 Poor PME Multiple (144x144x111) 8.61 0.77
16 4096 Poor PME Multiple (144x144x111) 13.93 0.62
16 8192 Poor PME Multiple (144x144x111) 17.08 0.38
16 16384 Poor PME Multiple (144x144x111) 17.64 0.20

Documentation

  1. NAMD 2.9 User Guide
  2. NAMD Performance Tuning Wiki