High Performance Scientific Computing

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search

This wiki page is about part III of SciNet's Scientific Computing course, given in 2012
Information on part I, given in 2011, can be found on the page Scientific Software Development Course
Information about part II, given in 2012, can be found on the page Numerical Tools for Physical Scientists (course)
The 2013 Scientific Computing Course page is here.

Syllabus

The third part of SciNet's for-credit Scientific Computing course (Phys 2109 modular course credit / Ast 3100 mini-course credit) will start on February 10 2012.

Whereas the first part of the course focused on the basics of best practice for writing maintainable, modular scientific programs, and the second part focused on common techniques and algorithms, such as floating point computations, validation+verification, visualization, ODEs, monte Carlo, linear algebra and fast fourier transforms, the third part focusses on high performance computing and parallel programming.

At the end of minicourse III, "High Performance Scientific Computing", students will leave with a basic understanding of distributed and shared memory parallel programming paradigms, and will be able apply these in their own code.

The course will require 4-6 hours each week spent on reading and homework.

Required Software

Each lecture will have a hands-on component; students are strongly encouraged to bring laptops. (Contact us if this will be a problem). Windows, Mac, or Linux laptops are all fine, but some software will have to be installed *before* the first lecture:

On windows laptops only, Cygwin (http://www.cygwin.com/) will have to be installed ; ensure that development tools (gcc/g++/gfortran, gdb), git, the X environment (Xorg), and ssh are installed.

Hands-on parts will be done on SciNet's GPC cluster. For those who don't have a SciNet account yet, the instructions can be found at https://wiki.scinet.utoronto.ca/wiki/index.php/Essentials#Accounts .

Course outline

The classes will cover the material as follows; homework will be due by email at Thursday noon on the day before the following class.

Lecture 9: Introduction to Parallel Programming and OpenMP

Lecture 10: MPI- part 1

Lecture 11: OpenMP - part 2

Lecture 12: MPI - part 2

Evaluation will be based entirely on the four home works, with equal weighting given to each.

Location and Dates

The location is the same as for part II, i.e.

Fridays 10:00am - 12:00am

Feb 10, 17, Mar 2, 9.

Bahen Centre for Information Technology

40 St. George Street

Room 1230

NOTE: The March 9 lecture has been moved to March 16, and will be held at the SciNet HQ, 256 McCaul Street, second floor. Same time: 10am-noon

Office Hours

The instructors will have office hours on Monday and Wednesday afternoons, 3pm-4pm, starting the week of the first class.

Location: SciNet offices at 256 McCaul, 2nd Floor.

  • Mon, Feb 13, 3pm-4pm
  • Wed, Feb 15, 3pm-4pm
  • Mon, Feb 27, 3pm-4pm
  • Wed, Feb 29, 3pm-4pm
  • Mon, Mar 5, 3pm-4pm
  • Wed, Mar 7, 3pm-4pm
  • Mon, Mar 12, 3pm-4pm
  • Wed, Mar 14, 3pm-4pm
  • Mon, Mar 19, 3pm-4pm
  • Wed, Mar 21, 3pm-4pm

Materials from Lectures

Homeworks

Homework following lecture 1

  1. Make sure you've got a SciNet account!
  2. Read the SciNet User Tutorial (mostly as it pertains to the GPC, but do not skip the data management section).
  3. Read the GPC Quickstart.
  4. Get the first set of code as follows
    • git clone /scinet/course/sc3/hw1
    • cd hw1
    • source setup
    • make
    • make testrun
  5. This set contains the serial daxpy.
  6. Make sure it compiles and runs on the GPC.
  7. Create the openmp version as just discussed.
  8. Run this version for all values of OMP_NUM_THREADS from 1 to 32 on a single node, using a batch script. (Make sure to time the duration of these runs.)
  9. Submit git log, makefile, code, job script(s), and plots of the speedup and efficiency as a function of P.

Due: Thursday Feb 16 at noon. Email the files to rzon@scinet.utoronto.ca and ljdursi@scinet.utoronto.ca.

Homework following lecture 2

  1. Get the second set of code as follows
    • git clone /scinet/course/sc3/hw2
    • cd hw2
    • source setup
    • make
  2. This set contains the serial version of a 1-d thermal diffusion code
  3. Introduce MPI into this code:
    • add standard MPI calls: init, finalize, comm_size, comm_rank
    • Figure out how many points PE is responsible for (locpoints~totpoints/size)
    • All totpoints -> locpoints
    • adjust xleft,xright
    • Figure out neighbors
    • Start at 1, but end at locpoints
    • At end of step, exchange guardcells; use sendrecv (once to send data rightwards and receive from the left, and once to send data leftwards and receive from the right)
    • Get total error (allreduce)

Due: Thursday March 1 at noon. Email the files to rzon@scinet.utoronto.ca and ljdursi@scinet.utoronto.ca.

Answers

Homework following lecture 3

Get the code for the third homework assigment as follows

  • git clone /scinet/course/sc3/hw3
  • cd hw3
  • source setup
  • make absorb rngpar dlisttest

This set contains the serial version of a 2-d absorption code:

We consider a 2d model of a substance (for instance, water) being absorbed into a porous, inhomogenous medium (say the ground). The substance is modeled as a large number of particles, which start at the top surface of the medium (y=0), and move down (y>0) with an x and y dependent mobility, or velocity, β. At the same time, they can be absorbed by the medium with a position-dependent absorption rate α(x,y). This absorption is governed by a random process, such that in a small time interval δt, the probability of being absorbed is equal to α(x,y)δt.


The particular forms used in this exercise are arbitrarily chosen to be:

α(x,y) = α0[1 + sin2(2πx/λ)]

β(x,y) = β0[2 + sin{2πsin(x/λ)}]

(Indeed, there is actually no y dependence.) The range of x is from 0 to 1, while the y direction ranges from zero to, in principle, infinity.

A serial code for this model can be found in absorb.c. It uses a linked list for the particles, and removes those particles that have been absorbed. It also uses a random number generator from the GNU Scientific Library for the absorption rule.

The assignment is to

  • OpenMP-parallelize this code using tasks. What is the most likely part to give speedup?
  • This will require the parallelization of the random number generation. Since there is less time for this assignment than usual, an example of this parallelization is included in the file rndpar.c, and you only have to encorporate this into the absorb.c code. You are still expected to parallelize the rest of the code, though.
  • You should also write an alternative OpenMP-parallelization using a for loop: Collect non-absorbed particles into an array and loop over that.
  • Perform a scaling analysis (speedup) of both ways of parallelizing. Do this writing jobs scripts and using the job scheduler. It may be interesting to play with the 'chunk' parameter in the code.

Please submit the code, git log, job scripts, and plots by Thursday March 8, 2012, at 8pm (note the different time), to rzon@scinet.utoronto.ca and ljdursi@scinet.utoronto.ca.

Homework following lecture 4

  • git clone /scinet/course/sc3/hw3
  • cd hw3
  • source setup

Make three versions of diffusion2d:

  • Pure MPI, blocking guardcells (done for you)
  • Pure MPI, nonblocking
  • Hybrid: Timings
  • Set points to something much larger (10k? 50k?) and reduce number of iterations to few dozen
  • Try to get best performance you can on 4 nodes (=32 processors). Is one MPI task per node best, or 2, or?

Please submit the code, git log, job scripts, and plots by Thursday March 22, 2012, at 8pm (note the different time), to rzon@scinet.utoronto.ca and ljdursi@scinet.utoronto.ca.

Links

Git

Python

Top500

OpenMP

  • OpenMP (open multi-processing) application programming interface for shared memory programming: http://openmp.org

SciNet

Anything on this wiki, really, but specifically: