Infiniband Upgrade

From oldwiki.scinet.utoronto.ca
Revision as of 14:08, 18 April 2012 by Northrup (talk | contribs)
Jump to navigation Jump to search

SciNet has upgraded the GPC to be fully Infiniband connected. This new Infiniband network replaces the current Ethernet connected section of the GPC with a 5:1 QDR Infiniband while the existing GPC 1:1 DDR Infiniband remains. The GPC now consist of 840 nodes of DDR (6720 cores) and 3024 nodes of QDR (24192 cores). The Infiniband sections are connected, but in general operation, jobs will remain in one section or the other. The GPC Infiniband (both QDR and DDR) are fully compatible in terms of hardware and software so no recompilation or different MPI flags is required.

Job Submission

In terms of job submission, for most users your job submission scripts will still work as expected, however all jobs will now run on infiniband, so the :ib: parameter used to ask for infiniband nodes will not be necessary anymore, however will still be accepted. By default a user's job will go to whichever network section best accommodates it, typically smaller jobs to the QDR and larger jobs to the DDR. However a user can override this by simply adding the flags "ddr" or "qdr" to the job resource request.

Two nodes anywhere on the GPC (QDR or DDR)

#PBS -l nodes=2:ppn=8,walltime=1:00:00

Two nodes using DDR

#PBS -l nodes=2:ddr:ppn=8,walltime=1:00:00

Two nodes using QDR

#PBS -l nodes=2:qdr:ppn=8,walltime=1:00:00

MPI Parameters

No special MPI parameters are required to run a job using Infiniband. It will be used by default, so most users should just use the basic mpirun commands as outlined in GPC_MPI_Versions. For more detailed information please see the recent Techtalk on IB.

NRAC Allocations and Fairshare

The NRAC allocations for the current year that were based on ethernet and infiniband will carry over, however the allocation will be on the full GPC, not just the subsection. So if you were allocated 500 hours on Infiniband your fairshare allocation will still be 500 hours, just 500 out or 30000, instead of 500 out of 7000. If you received two allocations, one on gigE and one on IB, they will simply be combined. This should benefit all users as the desegregation of the GPC provides a greater pool of nodes increasing the probability of your job to run.