Parallel Debugging with DDT

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search

Allinea DDT

For parallel debugging, SciNet has DDT ("Distributed Debugging Tool") installed on all our clusters. DDT is a powerful, GUI-based commercial debugger by Allinea. It supports the programming languages C, C++, and Fortran, and the parallel programming paradigms MPI, OpenMPI, and CUDA. DDT can also be very useful for serial programs. DDT provides a nice, intuitive graphical user interface. It does need graphics support, so make sure to use the '-X' or '-Y' arguments to your ssh commands, so that X11 graphics can find its way back to your screen ("X forwarding").

The most recent ddt module is DDT 6.0.

To use ddt, ssh in with X forwarding enabled, load your usual compiler and mpi modules, compile your code with '-g' and load the module

   module load extras ddt/6.0

(note: the extras modules is not needed on devel nodes)

You can then start ddt with one of the following commands:

   ddt
   ddt <executable compiled with -g flag>
   ddt <executable compiled with -g flag> <arguments>

The first time you run it, DDT will set up configuration files. It puts the in the hidden directory $HOME/.allinea. This is important to know, since you may need to (re)move this directory if you want to use DDT on another machine (e.g. you switch from the GPC to the TCS).

Note that most users will debug on the devel nodes of the SciNet clusters (gpc01, gpc02, gpc03, gpc04, gpc05, gpc06, gpc07, gpc08, tcs01, tcs02, arc01, p701), but that this is only appropriate if the number of mpi processes and threads is small, and the memory usage is not too large. If your debugging requires more resources, you should run it through the queue. On the GPC, an interactive debug session will suit most debugging purposed.

Parallel Debugging in an Interactive Session on the GPC

By requesting a job from the 'debug' queue on the GPC, you can have access to at most 8 nodes, i.e., a total of 64 physical cores (or 128 virtual cores, using Hyperthreading), for your exclusive, interactive use. Starting from a gpc devel node, you would request 1, 2, 4, or 8 node debug sessions as follows:

   qsub -lnodes=1:ppn=8,walltime=2:00:00 -qdebug -X -I
   qsub -lnodes=2:ppn=8,walltime=1:00:00 -qdebug -X -I
   qsub -lnodes=4:ppn=8,walltime=1:00:00 -qdebug -X -I
   qsub -lnodes=8:ppn=8,walltime=0:30:00 -qdebug -X -I

or use the (equivalent) command

   debugjob <numberofnodes>

which is part of the 'extras' module.

The commands will get you a prompt on a compute node (the 'head' node if you've asked for more than one node). Reload any modules that your application needs (e.g. module load intel openmpi), as well as the 'extras' and 'ddt/5.0' modules.

Note that on compute nodes, $HOME is read-only, so unless your code is on $SCRATCH, you cannot recompile it (with '-g') in the debug session; this should have been done on a devel node.

DDT also has an issue with $HOME being read-only on the compute nodes, because it tries to save your settings to $HOME/.allinea. You can fix this by once issuing the following commands on a gpc devel node:

   mv $HOME/.allinea $SCRATCH/.allinea
   ln -s $SCRATCH/.allinea $HOME/.allinea

Usage

Please see this pdf for an overview on how to use DDT. You may also check out the Distributed Debugging Tool User Guide.

Slides of debugging presentations