Difference between revisions of "BGQ"

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search
Line 38: Line 38:
 
==== Run a Job =====
 
==== Run a Job =====
  
 +
When not using loadleveler there is a direct launch program called runjob on BGQ that acts a lot like mpirun/mpiexec.  The "block"
 +
argument is the predifined group of nodes that are already booted. See the next section on how to create these blocks manually.
 +
 +
<pre>
 +
runjob --block R00-M0-N03-32 --ranks-per-node=16 --np 512 --cwd=/gpfs/DDNgpfs3/xsnorthrup/osu_bgq --exe=/gpfs/DDNgpfs3/xsnorthrup/osu_bgq/osu_mbw_mr
 +
</pre>
 +
 +
To see running jobs and the status of available blocks use:
 
<pre>
 
<pre>
runjob --block R00-M0-N03-32 --ranks-per-node=16 --cwd=/gpfs/DDNgpfs3/xsnorthrup/osu_bgq --exe=/gpfs/DDNgpfs3/xsnorthrup/osu_bgq/osu_mbw_mr
+
list_jobs
 +
list_blocks
 
</pre>
 
</pre>
  
  
 
==== Setup blocks ====
 
==== Setup blocks ====
 +
 +
To reconfigure the BGQ nodes use the bg_console
  
 
<pre>
 
<pre>
 
bg_console
 
bg_console
 +
</pre>
 +
 +
There are various options to create block types (section 3.2 in the BGQ admin manual), but the smallest is created using the
 +
following command:
  
 +
<pre>
 +
gen_small_block <blockid> <midplane> <cnodes> <nodeboard>
 
gen_small_block  R00-M0-N03-32 R00-M0 32 N03
 
gen_small_block  R00-M0-N03-32 R00-M0 32 N03
 +
</pre>
 +
 +
The block then needs to be booted using:
  
 +
<pre>
 
allocate R00-M0-N03-32
 
allocate R00-M0-N03-32
 +
</pre>
  
 +
If those resources are already booted into another block, that block must be freed before the new block can be
 +
allocated.
 +
<pre>
 
select_block R00-M0-N03
 
select_block R00-M0-N03
 
free_block
 
free_block
 
</pre>
 
</pre>
  
 +
There are many other functions in bg_console:
  
 +
<pre>
 +
help all
 +
</pre>
  
 
=== I/O ===
 
=== I/O ===

Revision as of 15:35, 21 August 2012

Blue Gene/Q (BGQ)
Blue Gene Cabinet.jpeg
Installed August 2012
Operating System RH6.3, CNK (Linux)
Number of Nodes 2048(32,768 cores), 512 (8,192 cores)
Interconnect 5D Torus (jobs), QDR Infiniband (I/O)
Ram/Node 16 Gb
Cores/Node 16 (64 threads)
Login/Devel Node bgq01,bgq02
Vendor Compilers bgxlc, bgxlf
Queue Submission Loadleveler

Specifications

BGQ is an extremely powerful and energy efficient 3rd generation IBM Supercomputer built around a system on a chip compute node that has a 16core 1.6GHz Power based CPU and 16Gb of Ram and runs a very lightweight Linux OS called CNK. The nodes are bundled in groups of 32 and then 16 of these groups make up a midplane with 2 midplanes per rack. The compute nodes are all connected togther using a custom 5D interconnect. Each midplane has 8 Power7 I/O nodes that run a full Redhat Linux OS that manages the compute nodes and mounts the GPFS filesystem.

Jobs

BGQ job size is typically determined by midplanes (512 nodes or 8192 cores), however sub-blocks can be used to further subdivide midplanes with a minimum of one IO node per block. In SciNet's configuration (with 8 I/O nodes per midplane) this allows 64 nodes (1024 cores) to be the smallest job size.


Compile

/bgsys/drivers/V1R1M1/ppc64/comm/xl/bin/mpich2version
/bgsys/drivers/V1R1M1/ppc64/comm/xl/bin/mpixlc
/bgsys/drivers/V1R1M1/ppc64/comm/xl/bin/mpixf90

Run a Job =

When not using loadleveler there is a direct launch program called runjob on BGQ that acts a lot like mpirun/mpiexec. The "block" argument is the predifined group of nodes that are already booted. See the next section on how to create these blocks manually.

runjob --block R00-M0-N03-32 --ranks-per-node=16 --np 512 --cwd=/gpfs/DDNgpfs3/xsnorthrup/osu_bgq --exe=/gpfs/DDNgpfs3/xsnorthrup/osu_bgq/osu_mbw_mr

To see running jobs and the status of available blocks use:

list_jobs
list_blocks


Setup blocks

To reconfigure the BGQ nodes use the bg_console

bg_console

There are various options to create block types (section 3.2 in the BGQ admin manual), but the smallest is created using the following command:

gen_small_block <blockid> <midplane> <cnodes> <nodeboard> 
gen_small_block  R00-M0-N03-32 R00-M0 32 N03

The block then needs to be booted using:

allocate R00-M0-N03-32

If those resources are already booted into another block, that block must be freed before the new block can be allocated.

select_block R00-M0-N03
free_block

There are many other functions in bg_console:

help all

I/O

GPFS



Documentation

BGQ System Administration Guide

BGQ Application Development