High Performance Scientific Computing
This wiki page is about part II of SciNet's Scientific Computing course.
Information on part I can be found on the page Scientific Software Development Course
Information about part II can be found on the page Numerical Tools for Physical Scientists (course)
Syllabus
The third part of SciNet's for-credit Scientific Computing course (Phys 2109 modular course credit / Ast 3100 mini-course credit) will start on February 10 2012.
Whereas the first part of the course focused on the basics of best practice for writing maintainable, modular scientific programs, and the second part focused on common techniques and algorithms, such as floating point computations, validation+verification, visualization, ODEs, monte Carlo, linear algebra and fast fourier transforms, the third part focusses on high performance computing and parallel programming.
At the end of minicourse III, "High Performance Scientific Computing", students will leave with a basic understanding of distributed and shared memory parallel programming paradigms, and will be able apply these in their own code.
The course will require 4-6 hours each week spent on reading and homework.
Required Software
Each lecture will have a hands-on component; students are strongly encouraged to bring laptops. (Contact us if this will be a problem). Windows, Mac, or Linux laptops are all fine, but some software will have to be installed *before* the first lecture:
On windows laptops only, Cygwin (http://www.cygwin.com/) will have to be installed ; ensure that development tools (gcc/g++/gfortran, gdb), git, the X environment (Xorg), and ssh are installed.
Hands-on parts will be done on SciNet's GPC cluster. For those who don't have a SciNet account yet, the instructions can be found at https://wiki.scinet.utoronto.ca/wiki/index.php/Essentials#Accounts .
Course outline
The classes will cover the material as follows; homework will be due by email at Thursday noon on the day before the following class.
Lecture 9: Introduction to Parallel Programming and OpenMP
Lecture 10: MPI- part 1
Lecture 11: OpenMP - part 2
Lecture 12: MPI - part 2
Evaluation will be based entirely on the four home works, with equal weighting given to each.
Location and Dates
The location is the same as for part II, i.e.
Fridays 10:00am - 12:00am
Feb 10, 17, Mar 2, 9.
Bahen Centre for Information Technology
40 St. George Street
Room 1230
NOTE: The March 9 lecture has been moved to March 16, and will be held at the SciNet HQ, 256 McCaul Street, second floor. Same time: 10am-noon
Office Hours
The instructors will have office hours on Monday and Wednesday afternoons, 3pm-4pm, starting the week of the first class.
Location: SciNet offices at 256 McCaul, 2nd Floor.
- Mon, Feb 13, 3pm-4pm
- Wed, Feb 15, 3pm-4pm
- Mon, Feb 27, 3pm-4pm
- Wed, Feb 29, 3pm-4pm
- Mon, Mar 5, 3pm-4pm
- Wed, Mar 7, 3pm-4pm
- Mon, Mar 12, 3pm-4pm
- Wed, Mar 14, 3pm-4pm
Materials from Lectures
Homeworks
Homework following lecture 1
- Make sure you've got a SciNet account!
- Read the SciNet User Tutorial (mostly as it pertains to the GPC, but do not skip the data management section).
- Read the GPC Quickstart.
- Get the first set of code as follows
- git clone /scinet/course/sc3/hw1
- cd hw1
- source setup
- make
- make testrun
- This set contains the serial daxpy.
- Make sure it compiles and runs on the GPC.
- Create the openmp version as just discussed.
- Run this version for all values of OMP_NUM_THREADS from 1 to 32 on a single node, using a batch script. (Make sure to time the duration of these runs.)
- Submit git log, makefile, code, job script(s), and plots of the speedup and efficiency as a function of P.
Due: Thursday Feb 16 at noon. Email the files to rzon@scinet.utoronto.ca and ljdursi@scinet.utoronto.ca.
Homework following lecture 2
- Get the second set of code as follows
- git clone /scinet/course/sc3/hw2
- cd hw2
- source setup
- make
- This set contains the serial version of a 1-d thermal diffusion code
- Introduce MPI into this code:
- add standard MPI calls: init, finalize, comm_size, comm_rank
- Figure out how many points PE is responsible for (locpoints~totpoints/size)
- All totpoints -> locpoints
- adjust xleft,xright
- Figure out neighbors
- Start at 1, but end at locpoints
- At end of step, exchange guardcells; use sendrecv (once to send data rightwards and receive from the left, and once to send data leftwards and receive from the right)
- Get total error (allreduce)
Due: Thursday March 1 at noon. Email the files to rzon@scinet.utoronto.ca and ljdursi@scinet.utoronto.ca.
Homework following lecture 3
Get the code for the third homework assigment as follows
- git clone /scinet/course/sc3/hw3
- cd hw3
- source setup
- make absorb rngpar dlisttest
This set contains the serial version of a 2-d absorption code:
We consider a 2d model of a substance (for instance, water) being absorbed into a porous, inhomogenous medium (say the ground). The substance is modeled as a large number of particles, which start at the top surface of the medium (y=0), and move down (y>0) with an x and y dependent mobility, or velocity, β. At the same time, they can be absorbed by the medium with a position-dependent absorption rate α(x,y). This absorption is governed by a random process, such that in a small time interval δt, the probability of being absorbed is equal to α(x,y)δt.
The particular forms used in this exercise are arbitrarily chosen to be:
(Indeed, there is actually no y dependence.) The range of x is from 0 to 1, while the y direction ranges from zero to, in principle, infinity.
A serial code for this model can be found in absorb.c. It uses a linked list for the particles, and removes those particles that have been absorbed. It also uses a random number generator from the GNU Scientific Library for the absorption rule.
The assignment is to
- OpenMP-parallelize this code using tasks. What is the most likely part to give speedup?
- This will require the parallelization of the random number generation. Since there is less time for this assignment than usual, an example of this parallelization is included in the file rndpar.c, and you only have to encorporate this into the absorb.c code. You are still expected to parallelize the rest of the code, though.
- You should also write an alternative OpenMP-parallelization using a for loop: Collect non-absorbed particles into an array and loop over that.
- Perform a scaling analysis (speedup) of both ways of parallelizing. Do this writing jobs scripts and using the job scheduler. It may be interesting to play with the 'chunk' parameter in the code.
Please submit the code, git log, job scripts, and plots by Thursday March 8, 2012, at 8pm (note the different time), to rzon@scinet.utoronto.ca and ljdursi@scinet.utoronto.ca.
Links
Git
- Git cheat sheet from Git Tower: http://www.git-tower.com/files/cheatsheet/Git_Cheat_Sheet_grey.pdf
Python
- Enthought python distribution: http://www.enthought.com/products/edudownload.php
- Intro to python from software carpentry: http://software-carpentry.org/4_0/python
- Tutorial on matplotlib: http://conference.scipy.org/scipy2011/tutorials.php#jonathan
Top500
- TOP500 Supercomputing Sites: http://top500.org
OpenMP
- OpenMP (open multi-processing) application programming interface for shared memory programming: http://openmp.org
SciNet
Anything on this wiki, really, but specifically: