Latest revision as of 15:20, 24 November 2015

This wiki page concerns the 2013 installment of SciNet's Scientific Computing course. Material from the previous installment can be found on Scientific Software Development Course, Numerical Tools for Physical Scientists (course), and High Performance Scientific Computing

Syllabus

About the course

Whole-term graduate course
Prerequisite: basic C, C++ or Fortran experience.
Will use `C++ light' and Python
Topics include: Scientific computing and programming skills, Parallel programming, and Hybrid programming.

There are three parts to this course:

Scientific Software Development: Jan/Feb 2013
python, C++, git, make, modular programming, debugging
Numerical Tools for Physical Scientists: Feb/Mar 2013
modelling, floating point, Monte Carlo, ODE, linear algebra,fft
High Performance Scientific Computing: Mar/Apr 2013
openmp, mpi and hybrid programming

Each part consists of eight one-hour lectures, two per week.

These can be taken separately by astrophysics graduate students at the University of Toronto as mini-courses, and by physics graduate students at the University of Toronto as modular courses.

The first two parts count towards the SciNet Certificate in Scientific Computing, while the third part can count towards the SciNet HPC Certificate. For more info about the SciNet Certificates, see http://www.scinethpc.ca/2012/12/scinet-hpc-certificate-program.

Location and Times

SciNet HeadQuarters
256 McCaul Street, Toronto, ON
Room 229 (Conference Room)
Tuesdays 11:00 am - 12:00 noon
Thursdays 11:00 am - 12:00 noon

Instructors and office hours

Ramses van Zon - 256 McCaul Street, Rm 228 - Mondays 3-4pm
L. Jonathan Dursi - 256 McCaul Street, Rm 216 - Wednesdays 3-4pm

Grading scheme

Attendence to lectures.

Four home work sets (i.e., one per week), to be returned by email by 9:00 am the next Thursday.

Sign up

Sign up for this graduate course goes through SciNet's course website.
The direct link is https://support.scinet.utoronto.ca/courses/?q=node/99.
If you do not have a SciNet account but wish to register for this course, please email support@scinet.utoronto.ca .
Sign up is closed.

Part 1: Scientific Software Development

Prerequisites

Some programming experience. Some unix prompt experience.

Software that you'll need:

A unix-like environment with the GNU compiler suite (e.g. Cygwin), and Python 2, IPython, Numpy, SciPy and Matplotlib (which you all get if you use the Enthought distribution) installed on your laptop. Links are given at the bottom of this page.

Dates

January 15, 17, 22, 24, 29, and 31, 2013
February 5 and 7, 2013

Topics (with lecture slides and recordings)

Lecture 1: C++ introduction

Slides / Video recording

Lecture 2: More C++, build and version control

Guest lecturer: Michael Nolta (CITA) for the git portion of the lecture.

C++ and Make slides / C++ and Make video recording / Git slides / Homework assigment 1

Recordings of this lecture are missing, but you could view the videos of SciNet's One-Day Scientific C++ Class, in particular the parts on classes, polymorphism, and inheritance.

Lecture 6: ODE, interpolation

Slides / Video recording / Homework assigment 3

Lecture 7: Development tools: debugging and profiling

Slides / Video recording

Lecture 8: Objects in Python, linking C++ and Python

Slides / Video recording

Homework assignments

HW1

Multi-file C++ program to create a data file

We’ve learned programming in basic C++, use of make and Makefiles to build projects, and local use of git for version control. In this first assignment, you’ll use these to make a multi-file C++ program, built with make, which computes and outputs a data file.

Start a git repository, and begin writing a C++ program to

Get an array size and a standard deviation from user input,
Allocate a 2d array (use the code given in lecture 2),
Store a 2d Gaussian with a maximum at the centre of the array and given standard deviation (in units of grid points),
Output that array to a text file,
Free the array, and exit.

The output text file should contain just the data in text format, with a row of the file corresponding to a row of the array and with whitespace between the numbers.
The 2d array creation/freeing routines should be in one file (with an associated header file), the gaussian calculation be in another (ditto), and the output routine be in a third, with the main program calling each of these.
Use a makefile to build your code (add it to the repository).
You can start with everything in one file, with hardcoded values for sizes and standard deviation and a static array, then refactor things into multiple files, adding the other features.
As a test, use the ipython executable that came with your Enthought python distribution to read your data and plot it.
If your data file is named ‘data.txt’, running the following:

$ ipython --pylab
In [1]: data = numpy.genfromtxt('data.txt') 
In [2]: contour(data)

should give a nice contour plot of a 2-dimensional gaussian.

Email in your source code, makefile and the "git log" output of all your commits by email by at 9:00 am Thursday Jan 24th, 2013. Please zip or tar these files together as one attachment, with a file name that includes your name and "HW1".

HW2

Refactor legacy code to a modular project with unit tests

In class, today, we talked about modular programming and testing, and the project we’ll be working on for the next three weeks. This homework will start advancing on that project by working on the “legacy” code given to us by our supervisor (diffuse.cc), with a corresponding python plotting script (plotdata.py), and whipping it into shape before we start adding new physics.

Start a git repository for this project, and add the two files.
Create a Makefile and add it to the repository.
Since we have no tests, run the program with console output redirected to a file:

$ diffuse > original-output.txt

It turns out the code has a bug that can make the output different when the same code is run again, which obviously would not be good for a baseline test. Replace 'float error;' by 'float error=0.0;' to fix this.

Also save the two .npy output files, e.g. to original-data.npy and original-theory.npy. The triplet of files (original-output.txt, original-data.npy and original-theory.npy) serve as a baseline integrated test (add these to repository).
Then write a 'test' target in your makefile that:
- Runs 'diffuse' with output to a new file.
- Compares the file with the baseline test file, and compare the .npy files.

(hint: the unix command diff or cmp can compare files).

First refactoring: Move the global variables into the main routine.
Chorus: Test your modified code, and commit.
Second refactoring: Extract a diffusion operator routine, that gets called from main.
Chorus
Create a .cc/.h module for the diffusion operator.
Chorus
Add two tests for the diffusion operator: for a constant and for a linear input field (rho[i][j]=a*i+b*j). Add these to the test target in the makefile.
Chorus
More refactoring: Extract three more .cc/.h modules:
- for output (should not contain hardcoded filenames)
- computation of the theory
- and for the array allocation stuff.
Chorus
Describe, but don't implement in the .h and .cc, what would be appropriate unit tests for these three modules.

Email in your source code and the git log file of all your commits as a .zip or .tar file by email to rzon@scinethpc.ca and ljdursi@scinethpc.ca by 9:00 am on Thursday January 31, 2013.

HW3

This week, we learned about object oriented programming, which fits nicely within the modular programming idea. In this homework, we are going to use some of it to restructure our code and get it ready to add the tracer particle, the goal of the course project.

The goal will be to have an instance of a Diffusion class, as well as an instance of Tracer, which for now will be a free particle moving as (x(t),y(t)) = (x(0) + vx t, y(0) + vy t), without any coupling yet (we will handle this next week).

To be more specific:

Clean up your code, using the feedback from your HW2 grading, such that the modules are as independent as possible.
If you have not done so yet, add comments to the header files of your modules to explain exactly what each function does (without going into implementation details), what its arguments mean and what it returns (unless it's a void function, of course).
Objectify the main routine, by creating a class Diffusion.
Put this class in its own module (declaration in .h, implementation in .cc). For instance, the declaration could be

// diffusion.h
#ifndef DIFFUSIONH
#define DIFFUSIONH
#include <fstream>
class Diffusion {
  public:
    Diffusion(float x1, float x2, float D, int numPoints);
    void init(float a0, float sigma0); // set initial field
    void timeStep(float dt);           // solve diff. equation over dt
    void toFile(std::ofstream& f);     // write to file (binary,no npyheader)
    void toScreen();                   // report a line to screen
    float getRho(int i, int j);        // get a value of the field
    ~Diffusion();
  private:
    float*** rho;
    ...
};
#endif

(this is not supposed to be prescriptive.)

In the implementation file you'd have things like

// diffusion.cc
#include "diffusion.h"
...
void Diffusion::timeStep(float dt) 
{
   // code for the timeStep ...
}
...

(note the inclusion of the module's header file on the top of the implementation, so the class is declared).

Let int main() have the same functionality as before, but now by defining the parameters of the run, creating an object of this class, setting up file streams, and taking time steps and writing out by using calls to member functions of this object.
Additionally, write a class Tracer which for now implements a free particle in 2d. Something like:

class Tracer {
  public:
    Tracer(float x1, float x2);
    void init(float x0, float y0, float vx, float vy);
    void timeStep(float dt);           // solve diff. equation over dt
    void toFile(std::ofstream& f);     // write to file (binary,no npyheader)
    void toScreen();                   // report a line to screen
    ~Tracer();
  private:
    ...
};

The timeStep implementation can in this case use the infamous forward Euler integration scheme, because it happens to be exact here.

When it comes to output to a npy file, let's view the the data of the tracer particle at one point in time as a 2x2 matrix [[x,y],[vx,vy]], so we can use much of the npy output code that we used for the diffusion field, which was a (numPoints+2)x(numPoints+2) matrix.

This class too should be its own module (Often, "one class, one module" is a good paradigm, though occasionally you'll have closely related classes).
Add some code to int main to have the Tracer particle evolve at the same time as the diffusion field (although the two are completely uncoupled).
Keep using git and make, run the tests that you have regularly to make sure your program still works.

Note that because we've now set up our program in a modular fashion, you can do different parts of this assignment in any order you want. For instance, to wrap your head around object oriented programming, you may like implementing the tracer particle first, so that your diffusion code stays intact. Or you might want to wait with commenting until the end if you think you'll have to change a module for this assignment.

Email in your source code and the git log file of all your commits as a .zip or .tar file by email to rzon@scinethpc.ca and ljdursi@scinethpc.ca by 3:00 pm on Friday February 8, 2013.

HW4

In this homework, we are going to implement the class project of a tracer particle coupled to a diffusion equation. The full specification of the physical problem is here.

Augment the tracer particle to include a force in the x and in the y direction, and a friction coefficient alpha, which at first can be constant.
Implement the so-called leapfrog integration algorithm for the tracer particle

v ← v + f(v) Δt / m

r ← r + v Δt

where v,r, and f are 2d vectors and f(v) is the total, velocity dependence force speficied in the class project, i.e., the sum of the external force F=qE and the friction force -αv.
(Note: the v dependence of f make this strictly not a leapfrog integration, but we'll ignore that here.)

Further augment the tracer class with a member function 'couple' which takes a diffusion field as input, and adjusts the friction constant.
Your implementation of the 'couple' member function will need to interpolate the diffusion field to the current position of the particle. Use this interpolation module.
Rewrite your main routine so that before tracer's time step, one calls the coupling. You may need to modify the Diffusion class a bit to get rho[active] out.
For simplicity, use the same time step for both the diffusion and the tracer particle.
Keep using git and make.

You will hand in your source code, makefiles and the git log file of all your commits by email by 9:00 am on Thursday February 21, 2013. Email the files, preferably zipped or tarred, to rzon@scinethpc.ca and ljdursi@scinethpc.ca.

Part 2: Numerical Tools for Physical Scientists

Prerequisites

Part 1 or solid c++ programming skills, including make and unix/linux prompt experience.

Software that you'll need

A unix-like environment with the GNU compiler suite (e.g. Cygwin), and Python (Enthought) installed on your laptop.

Dates

February 12, 14, 26, and 28, 2013
March 5, 7, 12, and 14, 2013

Topics

Lecture 1: Numerics

Slides / Video recording

Lecture 2: Random numbers

Slides / Video recording / Homework assignment 1

Lecture 3: Numerical integration and ODEs

Slides / Video recording

Lecture 4: Molecular Dynamics

Slides / Video recording / Homework assignment 2

Lecture 5: Linear Algebra part I

Slides (combined with lecture 6)

Lecture 6: Linear Algebra part II and PDEs

Slides (combined with lecture 5) / Video recording / Homework assignment 3

Lecture 7: Fast Fourier Transform

Slides / Video recording / example code

Lecture 8: FFT for real and multidimensional data

Slides / Video recording / Homework assignment 4

Homework Assignments

HW1

This week's homework consists of two assignments.

Assignment 1

Consider the sequence of numbers: 1 followed by 10⁸ values of 10^-8
Should sum to 2
Write code which sums up those values in order. What answer does it get?
Add to program routine which sums up values in reverse order. Does it get correct answer?
How would you get correct answer?
Submit code, Makefile, text file with answers.

Assignment 2

Implement an linear congruential generator with a = 106, c = 1283, m = 6075 that generates random numbers from 0..1
Using that and MT: generate 10,000 pairs (dx, dy) with dx, dy each in -0.1 .. +0.1. Generate histograms of dx and dy (say 200 bins). Does it look okay? What would you expect variation to be?
For 10,000 points: take random walks from (x,y)=(0,0) until exceed radius of 2, then stop. Plot histogram of final angles for the two psuedo random number generators. What do you see?
Submit makefile, code, plots, git log.

Both assignments due on Thursday Feb 28th, 2013, at 9:00 am. Email the files to rzon@scinethpc.ca and ljdursi@scinethpc.ca.

HW2

Assignment 1

Compute numerically (using the GSL):

∫₀³ f(x) dx

(that is the integral of f(x) from x=0 to x=3)

with

f(x) = ln(x) sin(x) e^-x

using three different methods:

Extended Simpsons' rule
Gauss-Legendre quadrature
Monte Carlo sampling

Hint: what is f(0)?

Compare the convergence of these methods by increasing number of function evaluations.

Submit makefile, code, plots, version control log.

Assignment 2

Using an adaptive 4th order Runge-Kutta approach, with a relative accuracy of 1e-4, compute the solution for t = [0,100] of the following set of coupled ODEs (Lorenz oscillator)

dx/dt = σ(y - x)

dy/dt = (ρ-z)x-y

dz/dt = xy - βz

with σ=10; β=8/3; ρ = 28, and with initial conditions

x(0) = 10

y(0) = 20

z(0) = 30

Hint: study the GSL documentation.

Submit makefile, code, plots, version control log.

Both assignments due on Thursday Mar 7th, 2013, at 9:00 am. Email the files to rzon@scinethpc.ca and ljdursi@scinethpc.ca.

HW3

Part 1:

The time-explicit formulation of the 1d diffusion equation looks like this:

$LaTeX: \displaystyle \begin{eqnarray*} q^{n+1} & = & q^n + \frac{D \Delta t}{\Delta x^2} \left ( \begin{matrix} -2 & 1 \\ 1 & -2 & 1 \\ & 1 & -2 & 1 \\ & & & \cdots & \\ & & & 1 & -2 & 1 \\ & & & & 1 & -2 \\ \end{matrix} \right ) q^n \\ & = & \left ( 1 + \frac{D \Delta t}{\Delta x^2} A \right ) q^n \end{eqnarray*}$

what are the eigenvalues of the matrix A? What modes would we expect to be amplified/damped by this operator?

Consider 100 points in the discretization (eg, A is 100x100)
Calculate the eigenvalues and eigenvectors (using D__EV ; which sort of matrix are we using here?)
Plot the modes with the largest and smallest absolute-value of eigenvalues, and explain their physical significance
The numerical method will become unstable with one eigenmode $v$ begins to grow uncontrollably whenever it is present, e.g.

$LaTeX: \displaystyle \frac{D \Delta t}{\Delta x^2} A v = \frac{D \Delta t}{\Delta x^2} \lambda v > v$

.

In a timestepping solution, the only way to avoid this for a given physical set of parameters and grid size is to reduce the timestep, $LaTeX: \Delta t$ . Use the largest absolute value eigenvalue to place a constraint on $LaTeX: \Delta t$ for stability.

Part 2:

Using the above constraint on $\Delta t$, for a 1d grid of size 100 (eg, a 100x100 matrix A), using lapack, evolve this PDE. Plot and explain results.

Have an initial condition of $LaTeX: q(x=0,t=0) = 1$ , and $LaTeX: q(t=0)$ everywhere else being zero (eg, hot plate just turned on at the left)
Take ~100 timesteps and plot the the evolution of $q(x,t)$ at 5 times over that period.
You’ll want to use a matrix multiply to compute the matrix-vector multiply ( http://www.gnu.org/software/gsl/manual/html_node/Level-2-GSL-BLAS-Interface.html). Do multiply in double precision (D__MV). Which should you use?
The GSL has a cblas interface, http://www.gnu.org/software/gsl/manual/html_node/Level-2-GSL-BLAS-Interface.html ; an example of its use can be found here http://www.gnu.org/software/gsl/manual/html_node/GSL-CBLAS-Examples.html

Important things to know about lapack:

If you are using an nxn array, the “leading dimension” of the array is n. (This argument is so that you could work on sub-matrices if you wanted)
Have to make sure the 2d array is contiguous block of memory
You'll (presumably) want to use the C bindings for LAPACK - lapacke. Note that the usual C arrays are row-major.

Here's a simple example of calling a LAPACKE routine; note that how the matrix is described (here with a pointer to the data, a leading dimension, and the number of rows and columns) will vary with different types of matrix:

include <iostream>
include <mkl_lapacke.h>

double **matrix(int n,int m); void free_matrix(double **a);

int main (int argc, const char * argv[]) {

  const int n=5;             // number of rows, columns of the matrix
  const int m = n;           // nrows
  const int leading_dim_A=n; // leading dimension (# of cols for row major);
                             // lets us operate on sub-matrices in principle
  const int leading_dim_b=n; // similarly for b
  double **A;
  double *b;

  b = new double[leading_dim_b];
  A = matrix(n,leading_dim_A);

  for (int i=0; i<n; i++)
      for (int j=0; j<leading_dim_A; j++)
           A[i][j] = 0.;

  // let's do a trivial solve
  // It should be pretty clear that the solution to this system
  // is x = {0,1,2...n-1}

  for (int i=0; i<leading_dim_A; i++) {
       A[i][i] = 2.;
  }

  for (int i=0; i<leading_dim_b; i++) {
       b[i]    = 2*i;
  }

  const char transpose='N';     //solve Ax=b, not A^T x = b
  const int  nrhs = 1;          //  we're only solving 1 right hand side
  int info;

  // Call DGELS; b will be overwritten with the value of x.
  info = LAPACKE_dgels(LAPACK_COL_MAJOR,transpose,m,n,nrhs,
                         &(A[0][0]),leading_dim_A, &(b[0]),leading_dim_b);

  // print results
  for(int i=0;i<n;i++)
  {
     if (i != n/2)
       std::cout << "    " << b[i] << std::endl;
     else
       std::cout << "x = " << b[i] << std::endl;
  }
  return(info);

}

double **matrix(int n,int m) {

  double **a = new double * [n];
  a[0] = new double [n*m];

  for (int i=1; i<n; i++)
        a[i] = &a[0][i*m];

  return a;

}

void free_matrix(double **a) {

  delete[] a[0];
  delete[] a;

} </source>

HW4

Assignment 1

Trigonometric interpolation uses a n point Fourier series to find values at intermediate points. It is one way of downscaling data, and was a motivation for Gauss, to be applied to planetary motion.

The way it works is:

You fourier-transform your data
You add frequecies above the Nyquist frequency (in absolute values), but set all the amplitudes of the new frequencies to zero.
Note that the frequencies are stored such that eg. f_n-1 is a low frequency -1.
The resulting 2n array can be back transformed, and now gives an interpolated signal.

For this assignment, write an application that will read in an image from a binary file into a 2d double precision array (this will require converting from bytes to doubles), and creates an image twice the size in all directions using trigonometric interpolation. Use a real-to-half-complex version of the fftw (note: in 2d, this version of the fftw mixes fourier components with the same physical magnitude of their wave number k, so this will work). You can process the red, green and blue values separately.

Assignment 2

Write an application which reads an image and performs a low pass filter on the image, i.e., any fourier components with magnitudes k larger than n/8 are to be set to zero, after which the fourier inverse is taken and the image is written out to disk again. Use the same fft technique as in the first assignment.

Input image

Use this image of Gauss.

Image format:

Use the following simple PPM format:

First line (ascii): 'P6\n'
Second line, in ascii, 'width height\n'
Third line, in ascii, 'maxcolorvalue\n' (this is typically just 255)
Following that, in binary, are byte-triplets with the red, green and blue values of each pixel.
Note: in C, the 'unsigned char' data type matches the concept of a byte best (for most machines anyway).

In fact, between the first and second line, one can have comment lines that start with '#'.

Part 3: High Performance Scientific Computing

Prerequisites

Part 1 or good c++ programming skills, including make and unix/linux prompt experience.

Software that you'll need

You will need to bring a laptop with a ssh facility. Hands-on parts will be done on SciNet's GPC cluster.

For those who don't have a SciNet account yet, the instructions can be found at http://wiki.scinethpc.ca/wiki/index.php/Essentials#Accounts

Dates

March 19, 21, 26, and 28, 2013
April 2, 4, 9, and 11, 2013

Topics

Lecture 1: Introduction to Parallel Programming

Slides / Video recording

Lecture 2: Parallel Computing Paradigms

Slides / Video recording / homework 1

Lectures 3,4: Shared Memory Programming with OpenMP, part 1,2

Slides / Video recording / homework 2

Lecture 5: Distributed Parallel Programming with MPI, part 1

Slides / Video recording

Lecture 6: Distributed Parallel Programming with MPI, part 2

Slides / Video recording / homework 3

Lecture 7: Distributed Parallel Programming with MPI, part 3

Slides / Video recording

Lecture 8: Hybrid OpenMPI+MPI Programming

Slides / Video recording / homework 4

Homework assignments

HW1

Read the SciNet tutorial (as it pertains to the GPC)
Read the GPC Quick Start.
Get the first set of code:

  $ cd $SCRATCH
  $ git clone /scinet/course/sc3/homework1
  $ cd homework1
  $ source setup
  $ make

</source>

This contains threaded program 'blurppm' and 266 ppm images to be blurred. Usage:

 blurppm INPUTPPM OUTPUTPPM BLURRADIUS NUMBEROFTHREADS

</source>

Simple test:

 $ qsub -l nodes=1:ppn=8,walltime=2:00:00 -I -X -qdebug
 $ cd $SCRATCH/homework1
 $ time blurppm 001.ppm new001.ppm 30 1
 real  0m52.900s
 user  0m52.881s
 sys   0m0.008s
 $ display 001.ppm &
 $ display new001.ppm &

</source> Assignment 1

Time blurppm with a BLURRADIUS ranging from 1 to 41 in steps of 4, and for NUMBEROFTHREADS ranging from 1 to 16. Record the (real) duration of each run.
Plot the duration as a function of NUMBEROFTHREADS, as well as the speed-up and efficiency.
Submit script and plots of the duration, speedup and effiency as a function of NUMBEROFTHREADS.

Assignment 2

Use GNU parallel to run blurppm on all 266 images with a radius of 41.
Investigate different scenarios:

Have GNU parallel run 16 at a time with just 1 thread.
Have GNU parallel run 8 at a time with 2 threads.
Have GNU parallel run 4 at a time with 4 threads.
Have GNU parallel run 2 at a time with 8 threads.
Have GNU parallel run 1 at a time with 16 threads.

Record the total time it takes in each of these scenarios.

Repeat this with a BLURRADIUS of 3.
Submit scripts, timing data and plots.

HW2

In the course materials ( /scinet/course/ppp/nbodyc or nbodyf ) there is the source code for a serial N-body integrator. This, like the molecular dynamics you've seen earlier, calculates long-range forces by particles on all of the other particles.

OpenMP the force calculation, and present timing results for 1,4, and 8 threads compared to the serial version. Note that you can turn off graphic output by removing the "USEPGPLOT = -DPGPLOT" line in Makefile.inc in the top level directory.

Begin by doubling the work by _not_ calculating two forces at once (eg, not making use of f_ji = -f_ij), and simply parallelizing the outer force loop. Then find a way to implement the forces efficiently but also in parallel. Is there any other part of the problem which could usefully be parallelized?

HW3

In the same area ( /scinet/course/ppp/diffusion/diffusion.c ) there is a 1d diffusion problem of a form you may recognize from earlier modules. Your task is to parallelize this with MPI. I'd encourage you to use the graphics output while you're developing this (use -DPGPLOT on the compile line), and then omit the graphics and do timings with a larger totpoints (say 16000) and run timings on 1, 2, 4, and 8 processors. Note that for this simple problem, we don't necessarily expect a huge speedup; it's already pretty fast.

I'd suggest doing this in steps:

include mpi.h, add mpi_init/finalize/comm_size/comm_rank, and compile with mpicc and make sure it runs;
calculate the local number of points from the total number of points and the size (and possibly rank); treat the case where the total number of points isn't divisible by the number of processors however you like, but make sure you're consistent about it.
Once you've calculated the local number of points, you won't need the variable totpoints; no arrays will be declared of that size, no plots will be made of that size, etc. Make those changes, compile, and run on (say) 2 procs.
Now you'll find that you'll need to figure out the local xleft and local xright of the domain; again, once this is done you won't need to know the global variables any more. Make those changes, compile, and run.
Finally, after the "old" boundary condition setting, do the internal boundary conditions, as in our example on class on Tuesday with sending messages around to our neighbour.

HW4

Take the diffusion code of last week’s homework and add OpenMP to the loops that you expect do most of the computational work.
Analyze your code’s scaling on a single GPC node, by timing 64 cases, varying both OMP NUM THREADS and the number of MPI processes from 1 to 8.
Plot the result and explain what you see.
Try this on a pair of GPC nodes as well. Let OMP_NUM_THREADS take values 1,2,4,8, while adjusting the number of mpi processes to 16/OMP_NUM_THREADS.
Plot the timing results and explain what you see.
Email code, makefile, git log, plots, together with the explanations in a file called explain.txt, by Tuesday April 23.

Links

Unix

Cygwin: http://www.cygwin.com
Linux Command Line: A Primer (June 2012) Slides, Files
Intro to unix shell from software carpentry: http://software-carpentry.org/4_0/shell

C/C++

One-Day Scientific C++ Class at SciNet
C++ library reference: http://www.cplusplus.com/reference
C preprocessor: http://www.cprogramming.com/tutorial/cpreprocessor.html
Boost: http://www.boost.org
Boost Python tutorial: http://www.boost.org/doc/libs/1_53_0/libs/python/doc/tutorial/doc/html/index.html

Git

Git: http://git-scm.com
Version Control: Video/ Slides
Git cheat sheet from Git Tower: http://www.git-tower.com/files/cheatsheet/Git_Cheat_Sheet_grey.pdf

Python

Python: http://www.python.org
IPython: http://ipython.org
Matplotlib: http://www.matplotlib.org
Enthought python distribution: http://www.enthought.com/products/edudownload.php

(this gives you numpy, matplotlib and ipython all installed in one fell swoop)

Intro to python from software carpentry: http://software-carpentry.org/4_0/python
Tutorial on matplotlib: http://conference.scipy.org/scipy2011/tutorials.php#jonathan
Npy file format: https://github.com/numpy/numpy/blob/master/doc/neps/npy-format.txt
Boost Python tutorial: http://www.boost.org/doc/libs/1_53_0/libs/python/doc/tutorial/doc/html/index.html

ODEs

Integrators for particle based ODEs (i.e. molecular dynamics): http://www.chem.utoronto.ca/~rzon/simcourse/partmd.pdf.
Focus on 4.1.4 - 4.1.6 for practical aspects.
Numerical algorithm to solve ODEs (General) in Numerical Recipes for C: http://apps.nrbook.com/c/index.html Chapter 16

Interpolation (2D)

Interpolation in Numerical Recipes for C: http://apps.nrbook.com/c/index.html Pages 123-128
Wikipedia pages on Bilinear Interpolation and Bicubic Interpolation are not bad either.

BLAS

LAPACK

http://www.netlib.org/lapack

GSL

GNU Scientific Library: http://www.gnu.org/s/gsl

FFT

FFTW: http://www.fftw.org

Top500

TOP500 Supercomputing Sites: http://top500.org

OpenMP

OpenMP (open multi-processing) application programming interface for shared memory programming: http://openmp.org

GNU parallel

Official citation: O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login: The USENIX Magazine, February 2011:42-47.
Slides of the SciNet TechTalk on Gnu Parallel (14 Nov 2012)
The documentation for GNU parallel can be found at http://www.gnu.org/software/parallel/
Its man page can be found here http://www.gnu.org/software/parallel/man.html
The man page is also available on the GPC when the gnu-parallel module is loaded, with the command $ man parallel. The man page contains options, such as how to make sure the output is not all scrambled, and examples.

SciNet

Anything on this wiki, really, but specifically:

Other Resources

What Every Computer Scientist Should Know About Floating-Point Arithmetic - the classic (and extremely comprehensive) overview of the basics of floating point math. The first few pages, in particular, are very useful.
Random Numbers In Scientific Computing: An Introduction by Katzgraber. A very lucid discussion of pseudo random number generators for science.

@@ Line 53: / Line 53: @@
 '''Software that you'll need:'''
-A unix-like environment with the GNU compiler suite (e.g. Cygwin), and Python (Enthought) installed on your laptop.
+A unix-like environment with the GNU compiler suite (e.g. Cygwin), and Python 2, IPython, Numpy, SciPy and Matplotlib (which you all get if you use the Enthought distribution) installed on your laptop. Links are given at the bottom of this page.
 ==Dates==
@@ Line 62: / Line 62: @@
 ==Topics (with lecture slides and recordings)==
-''Lecture 1''&nbsp;&nbsp; C++ introduction
+===''Lecture 1:'' C++ introduction===
-:::[[Media:Lecture1-2013.pdf|Slides]] &nbsp;/ &nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture1-2013/lecture1-2013.mp4 Recording]
+:::[[File:Lecture1-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture1-2013/lecture1-2013.html]]
+:::[[Media:Lecture1-2013.pdf|Slides]] &nbsp;/ &nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture1-2013/lecture1-2013.mp4 Video recording]
-''Lecture 2''&nbsp;&nbsp; More C++, build and version control<br>
+===''Lecture 2:'' More C++, build and version control<br>===
+:::[[File:Lecture2-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture2-2013/lecture2-2013.html]]
 :::Guest lecturer: Michael Nolta (CITA) for the git portion of the lecture.
-:::[[Media:Lecture2-2013.pdf|C++ and Make slides]] &nbsp;/ &nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture2-2013/lecture2-2013.mp4 C++ and Make Recording] &nbsp;/ &nbsp; [[Media:Git-Nolta.pdf|Git slides]] &nbsp;/ &nbsp; [[#HW1|Homework assigment]]
+:::[[Media:Lecture2-2013.pdf|C++ and Make slides]] &nbsp;/ &nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture2-2013/lecture2-2013.mp4 C++ and Make video recording] &nbsp;/ &nbsp; [[Media:Git-Nolta.pdf|Git slides]] &nbsp;/ &nbsp; [[#HW1|Homework assigment 1]]
+===''Lecture 3:'' Python and visualization===
+:::[[File:Lecture3-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture3-2013/lecture3-2013.html]]
+:::[[Media:Lecture3-2013.pdf|Slides]] &nbsp;/ &nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture3-2013/lecture3-2013.mp4 Video recording]
-''Lecture 3''&nbsp;&nbsp; Python and visualization <br>
+===''Lecture 4:'' Modular programming, refactoring, testing===
-''Lecture 4''&nbsp;&nbsp; Modular programming, refactoring, testing<br>
+:::[[File:Lecture4-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture4-2013/lecture4-2013.html]]
-''Lecture 5''&nbsp;&nbsp; Object oriented programming<br>
+:::[[Media:Lecture4-2013.pdf|Slides]] &nbsp;/ &nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture4-2013/lecture4-2013.mp4 Video recording] &nbsp;/ &nbsp;  [[#HW2|Homework assigment 2]]
-''Lecture 6''&nbsp;&nbsp; ODE, interpolation<br>
+:::[http://wiki.scinethpc.ca/wiki/images/f/f0/diffuse.cc diffuse.cc (course project source file)] &nbsp;/ &nbsp; [http://wiki.scinethpc.ca/wiki/images/f/f0/plotdata.py plotdata.py (corresponding python movie generator)]
-''Lecture 7''&nbsp;&nbsp; Development tools: debugging and profiling<br>
-''Lecture 8''&nbsp;&nbsp; Objects in Python, linking C++ and Python
+===''Lecture 5:'' Object oriented programming===
+:::[[Media:Lecture5-2013.pdf|Slides]]
+:::Recordings of this lecture are missing, but you could view the videos of SciNet's [[One-Day Scientific C++ Class]], in particular the parts on classes, polymorphism, and inheritance.
+===''Lecture 6:'' ODE, interpolation===
+:::[[File:Lecture6-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture6-2013/lecture6-2013.html]]
+:::[[Media:ScientificComputing2013-Lecture5-ODE.pdf|Slides]] &nbsp;/ &nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture6-2013/lecture6-2013.mp4 Video recording] &nbsp;/ &nbsp; [[#HW3|Homework assigment 3]]
+===''Lecture 7:'' Development tools: debugging and profiling===
+:::[[File:Lecture7-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture7-2013/lecture7-2013.html]]
+:::[[Media:ScientificComputing2013-Debugging.pdf|Slides]] &nbsp;/ &nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture7-2013/lecture7-2013.mp4 Video recording]
+===''Lecture 8:'' Objects in Python, linking C++ and Python===
+:::[[File:Lecture8-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture8-2013/lecture8-2013.html]]
+:::[[Media:Lecture8-2013.pdf|Slides]] &nbsp;/ &nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture8-2013/lecture8-2013.mp4 Video recording]
 ==Homework assignments==
@@ Line 102: / Line 121: @@
 </pre>
 should give a nice contour plot of a 2-dimensional gaussian.
-* Email in your source code and the git log file of all your commits by email by at 9:00 am Thursday Jan 24th, 2013.
+* Email in your source code, makefile and the "git log" output of all your commits by email by at 9:00 am Thursday Jan 24th, 2013. Please zip or tar these files together as one attachment, with a file name that includes your name and "HW1".
 ===HW2===
-'''''TBA'''''
+'''''Refactor legacy code to a modular project with unit tests'''''
+In class, today, we talked about modular programming and testing, and the project we’ll be working on for the next three weeks. This homework will start advancing on that project by working on the “legacy” code given to us by our supervisor ([http://wiki.scinethpc.ca/wiki/images/f/f0/diffuse.cc diffuse.cc]), with a corresponding python plotting script ([http://wiki.scinethpc.ca/wiki/images/f/f0/plotdata.py plotdata.py]), and whipping it into shape before we start adding new physics.
+* Start a git repository for this project, and add the two files.
+* Create a Makefile and add it to the repository.
+* Since we have no tests, run the program with console output redirected to a file:
+:<pre>$ diffuse > original-output.txt</pre>
+<br>''It turns out the code has a bug that can make the output different when the same code is run again, which obviously would not be good for a baseline test. Replace 'float error;' by 'float error=0.0;' to fix this.''
+* Also save the two .npy output files, e.g. to original-data.npy and original-theory.npy. The triplet of files (original-output.txt, original-data.npy and original-theory.npy) serve as a baseline integrated test (add these to repository).
+* Then write a 'test' target in your makefile that:
+** Runs 'diffuse' with output to a new file.
+** Compares the file with the baseline test file, and compare the .npy files.
+:: (hint: the unix command diff or cmp can compare files).
+* First refactoring: Move the global variables into the main routine.
+* ''Chorus: Test your modified code, and commit.''
+* Second refactoring: Extract a diffusion operator routine, that gets called from main.
+* ''Chorus''
+* Create a .cc/.h module for the diffusion operator.
+* ''Chorus''
+* Add two tests for the diffusion operator: for a constant and for a linear input field (<tt>rho[i][j]=a*i+b*j</tt>). Add these to the test target in the makefile.
+* ''Chorus''
+* More refactoring: Extract three more .cc/.h modules:
+** for output (should not contain hardcoded filenames)
+** computation of the theory
+** and for the array allocation stuff.
+* ''Chorus''
+* Describe, but don't implement in the .h and .cc, what would be appropriate unit tests for these three modules.
+Email in your source code and the git log file of all your commits as a .zip or .tar file by email to rzon@scinethpc.ca and ljdursi@scinethpc.ca by 9:00 am on Thursday January 31, 2013.
 ===HW3===
-'''''TBA'''''
+This week, we learned about object oriented programming, which fits nicely within the modular programming idea.  In this homework, we are going to use some of it to restructure our code and get it ready to add the tracer particle, the goal of the course project.
+The goal will be to have an instance of a <tt>Diffusion</tt> class,
+as well as an instance of <tt>Tracer</tt>, which for now will be a
+free particle moving as ('''x'''(t),'''y'''(t)) = ('''x'''(0) +
+'''vx''' t, '''y'''(0) + '''vy''' t), without any coupling yet (we
+will handle this next week).
+To be more specific:
+* Clean up your code, using the feedback from your HW2 grading, such that the modules are as independent as possible.
+* If you have not done so yet, add comments to the header files of your modules to explain exactly what each function does (without going into implementation details), what its arguments mean and what it returns (unless it's a void function, of course).
+* Objectify the <tt>main</tt> routine, by creating a class <tt>Diffusion</tt>.
+* Put this class in its own module (declaration in .h, implementation in .cc). For instance, the declaration could be
+<pre>
+// diffusion.h
+#ifndef DIFFUSIONH
+#define DIFFUSIONH
+#include <fstream>
+class Diffusion {
+  public:
+    Diffusion(float x1, float x2, float D, int numPoints);
+    void init(float a0, float sigma0); // set initial field
+    void timeStep(float dt);           // solve diff. equation over dt
+    void toFile(std::ofstream& f);     // write to file (binary,no npyheader)
+    void toScreen();                   // report a line to screen
+    float getRho(int i, int j);        // get a value of the field
+    ~Diffusion();
+  private:
+    float*** rho;
+    ...
+};
+#endif
+</pre>
+(this is not supposed to be prescriptive.)
+* In the implementation file you'd have things like
+<pre>
+// diffusion.cc
+#include "diffusion.h"
+...
+void Diffusion::timeStep(float dt)
+{
+   // code for the timeStep ...
+}
+...
+</pre>
+(note the inclusion of the module's header file on the top of the implementation, so the class is declared).
+* Let <tt>int main()</tt> have the same functionality as before, but now by defining the parameters of the run, creating an object of this class, setting up file streams, and taking time steps and writing out by using calls to member functions of this object.
+* Additionally, write a class <tt>Tracer</tt> which for now implements a free particle in 2d. Something like:
+<pre>
+class Tracer {
+  public:
+    Tracer(float x1, float x2);
+    void init(float x0, float y0, float vx, float vy);
+    void timeStep(float dt);           // solve diff. equation over dt
+    void toFile(std::ofstream& f);     // write to file (binary,no npyheader)
+    void toScreen();                   // report a line to screen
+    ~Tracer();
+  private:
+    ...
+};
+</pre>
+:The timeStep implementation can in this case use the infamous forward Euler integration scheme, because it happens to be exact here.
+:When it comes to output to a npy file, let's view the the data of the tracer particle at one point in time as a 2x2 matrix <tt>[[x,y],[vx,vy]]</tt>, so we can use much of the npy output code that we used for the diffusion field, which was a (numPoints+2)x(numPoints+2) matrix.
+* This class too should be its own module (Often, "one class, one module" is a good paradigm, though occasionally you'll have closely related classes).
+* Add some code to int main to  have the Tracer particle evolve at the same time as the diffusion field (although the two are completely uncoupled).
+* Keep using git and make, run the tests that you have regularly to make sure your program still works.
+Note that because we've now set up our program in a modular fashion, you can do
+different parts of this assignment in any order you want.  For instance, to wrap your head around object oriented programming, you may like implementing the tracer particle first, so that your diffusion code stays intact.  Or you might want to wait with commenting until the end if you think you'll have to change a module for this assignment.
+Email in your source code and the git log file of all your commits as a .zip or .tar file by email to rzon@scinethpc.ca and ljdursi@scinethpc.ca by
+<span style="color:#ee3300">3:00 pm on Friday February 8, 2013</span>.
 ===HW4===
-'''''TBA'''''
+In this homework, we are going to implement the class project of a tracer particle coupled to a diffusion equation.
+The full specification of the physical problem is [[Media:ScClassProject.pdf|here]].
+* Augment the tracer particle to include a force in the x and in the y direction, and a friction coefficient alpha, which at first can be constant.
+* Implement the so-called leapfrog integration algorithm for the tracer particle
+:::v &larr; v + f(v) &Delta;t / m
+:::r &larr; r + v &Delta;t
+:where v,r, and f are 2d vectors and f(v) is the total, velocity dependence force speficied in the class project, i.e., the sum of the external force F=qE and the friction force -&alpha;v.<br/>(Note: the v dependence of f make this strictly not a leapfrog integration, but we'll ignore that here.)
+* Further augment the tracer class with a member function 'couple' which takes a diffusion field as input, and adjusts the friction constant.
+* Your implementation of the 'couple' member function will need to interpolate the diffusion field to the current position of the particle. Use [[Media:CppInterpolation.tgz|this interpolation module]].
+* Rewrite your main routine so that before tracer's time step, one calls the coupling. You may need to modify the Diffusion class a bit to get <tt>rho[active]</tt> out.
+* For simplicity, use the same time step for both the diffusion and the tracer particle.
+* Keep using git and make.
+You will hand in your source code, makefiles and the git log file of all your commits by email by <span style="color:#ee3300">9:00 am on Thursday February 21, 2013</span>.  Email the files, preferably zipped or tarred, to rzon@scinethpc.ca and ljdursi@scinethpc.ca.
 =Part 2: Numerical Tools for Physical Scientists=
@@ Line 130: / Line 262: @@
 ==Topics==
-''Lecture 9&nbsp;''&nbsp;&nbsp; Numerics<br>
+===''Lecture 1:'' Numerics ===
-''Lecture 10''&nbsp;&nbsp; Random numbers<br>
+:::[[File:Lecture9-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture9-2013/lecture9-2013.html]]
-''Lecture 11''&nbsp;&nbsp; Numerical integration and ODEs<br>
+:::[[Media:Lecture9-2013-Numerics.pdf|Slides]] &nbsp;/ &nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture9-2013/lecture9-2013.mp4 Video recording]
-''Lecture 12''&nbsp;&nbsp; Molecular Dynamics<br>
-''Lecture 13''&nbsp;&nbsp; Linear Algebra part I <br>
+===''Lecture 2:'' Random numbers ===
-''Lecture 14''&nbsp;&nbsp; Linear Algebra part II and PDEs<br>
+:::[[File:Lecture10-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture10-2013/lecture10-2013.html]]
-''Lecture 15''&nbsp;&nbsp; Fast Fourier Transform<br>
+:::[[Media:Lecture10-2013-PRNG.pdf|Slides]] &nbsp;/ &nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture10-2013/lecture10-2013.mp4 Video recording] &nbsp;/ &nbsp;[http://wiki.scinethpc.ca/wiki/index.php/Scientific_Computing_Course#HW1_2 Homework assignment 1]
-''Lecture 16''&nbsp;&nbsp; FFT for real and multidimensional data
+===''Lecture 3:'' Numerical integration and ODEs ===
+:::[[File:Lecture11-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture11-2013/lecture11-2013.html]]
+:::[[Media:Lecture11-2013.pdf|Slides]] &nbsp;/ &nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture11-2013/lecture11-2013.mp4 Video recording]
+===''Lecture 4:'' Molecular Dynamics ===
+:::[[File:Lecture12-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture12-2013/lecture12-2013.html]]
+:::[[Media:Lecture12-2013.pdf|Slides]] &nbsp;/ &nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture12-2013/lecture12-2013.mp4 Video recording]  &nbsp;/ &nbsp;[http://wiki.scinethpc.ca/wiki/index.php/Scientific_Computing_Course#HW2_2 Homework assignment 2]
+===''Lecture 5:'' Linear Algebra part I ===
+:::[[Media:Lecture13-2013.pdf|Slides (combined with lecture 6)]]
+===''Lecture 6:'' Linear Algebra part II and PDEs===
+:::[[File:Lecture14-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture14-2013/lecture14-2013.html]]
+:::[[Media:Lecture13-2013.pdf|Slides (combined with lecture 5)]] &nbsp;/ &nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture14-2013/lecture14-2013.mp4 Video recording]  &nbsp;/ &nbsp;[http://wiki.scinethpc.ca/wiki/index.php/Scientific_Computing_Course#HW3_2 Homework assignment 3]
+===''Lecture 7:'' Fast Fourier Transform===
+:::[[File:Lecture15-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture15-2013/lecture15-2013.html]]
+:::[[Media:Lecture15-2013.pdf|Slides]] &nbsp;/ &nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture15-2013/lecture15-2013.mp4 Video recording]  &nbsp;/ &nbsp;[[Media:Sincfftw.cc|example code]]
+===''Lecture 8:'' FFT for real and multidimensional data===
+:::[[File:Lecture15-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture16-2013/lecture16-2013.html]]
+:::[[Media:Lecture16-2013.pdf|Slides]] &nbsp;/ &nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture16-2013/lecture16-2013.mp4 Video recording]  &nbsp;/ &nbsp; [http://wiki.scinethpc.ca/wiki/index.php/Scientific_Computing_Course#HW4_2 Homework assignment 4]
+==Homework Assignments==
+===HW1===
+This week's homework consists of two assignments.
+''Assignment 1''
+* Consider the sequence of numbers: 1 followed by 10<sup>8</sup> values of 10<sup>-8</sup>
+* Should sum to 2
+* Write code which sums up those values in order. What answer does it get?
+* Add to program routine which sums up values in reverse order. Does it get correct answer?
+* How would you get correct answer?
+* Submit code, Makefile, text file with answers.
+''Assignment 2''
+* Implement an linear congruential generator with a = 106, c = 1283, m = 6075 that generates random numbers from 0..1
+* Using that and MT: generate 10,000 pairs (dx, dy) with dx, dy each in -0.1 .. +0.1. Generate histograms of dx and dy (say 200 bins). Does it look okay? What would you expect variation to be?
+* For 10,000 points: take random walks from (x,y)=(0,0) until exceed radius of 2, then stop. Plot histogram of final angles for the two psuedo random number generators. What do you see?
+* Submit makefile, code, plots, git log.
+Both assignments due on Thursday Feb 28th, 2013, at 9:00 am. Email the files to rzon@scinethpc.ca and ljdursi@scinethpc.ca.
+===HW2===
+''Assignment 1''
+* Compute numerically (using the GSL):
+::&int;<sub>0</sub><sup>3</sup> f(x) &nbsp;dx
+:(that is the integral of f(x) from x=0 to x=3)
+:with
+::f(x) = ln(x) sin(x) e<sup>-x</sup>
+:using three different methods:
+# Extended Simpsons' rule
+# Gauss-Legendre quadrature
+# Monte Carlo sampling
+*Hint: what is f(0)?
+*Compare the convergence of these methods by increasing number of function evaluations.
+*Submit makefile, code, plots, version control log.
+''Assignment 2''
+* Using an adaptive 4th order Runge-Kutta approach, with a relative accuracy of 1e-4, compute the solution for t = [0,100] of the following set of coupled ODEs (Lorenz oscillator)
+::dx/dt = &sigma;(y - x)
+::dy/dt = (&rho;-z)x-y
+::dz/dt = xy - &beta;z
+:with &sigma;=10; &beta;=8/3; &rho; = 28, and with initial conditions
+::x(0) = 10
+::y(0) = 20
+::z(0) = 30
+* Hint: study the GSL documentation.
+*Submit makefile, code, plots, version control log.
+Both assignments due on Thursday Mar 7th, 2013, at 9:00 am. Email the files to rzon@scinethpc.ca and ljdursi@scinethpc.ca.
+===HW3===
+Part 1:
+The time-explicit formulation of the 1d diffusion equation looks like this:
+<math>
+\displaystyle
+\begin{eqnarray*}
+q^{n+1} & = & q^n + \frac{D \Delta t}{\Delta x^2}
+\left (
+\begin{matrix}
+-2 & 1 \\
+& -2 & 1 \\
+& 1 & -2 & 1 \\
+&  &  & \cdots & \\
+&  &  & 1 & -2 & 1 \\
+&  &  & & 1 & -2 \\
+\end{matrix}
+\right ) q^n \\
+& = & \left ( 1 + \frac{D \Delta t}{\Delta x^2} A \right ) q^n
+\end{eqnarray*}
+</math>
+what are the eigenvalues of the matrix A?   What modes would we expect to be amplified/damped by this operator?
+* Consider 100 points in the discretization (eg, A is 100x100)
+* Calculate the eigenvalues and eigenvectors (using D__EV ; which sort of matrix are we using here?)
+* Plot the modes with the largest and smallest absolute-value of eigenvalues, and explain their physical significance
+* The numerical method will become unstable with one eigenmode $v$ begins to grow uncontrollably whenever it is present, e.g.
+<math> \displaystyle \frac{D \Delta t}{\Delta x^2} A v = \frac{D \Delta t}{\Delta x^2} \lambda v > v</math>.
+In a timestepping solution, the only way to avoid this for a given physical set of parameters and grid size is to reduce the timestep, <math>\Delta t</math>.   Use the largest absolute value eigenvalue to place a constraint on <math>\Delta t</math> for stability.
+Part 2:
+Using the above constraint on $\Delta t$, for a 1d grid of size 100 (eg, a 100x100 matrix A), using lapack, evolve this PDE. Plot and explain results.
+* Have an initial condition of <math>q(x=0,t=0) = 1</math>, and <math>q(t=0)</math> everywhere else being zero (eg, hot plate just turned on at the left)
+* Take ~100 timesteps and plot the the evolution of $q(x,t)$ at 5 times over that period.
+* You’ll want to use a matrix multiply to compute the matrix-vector multiply ( http://www.gnu.org/software/gsl/manual/html_node/Level-2-GSL-BLAS-Interface.html). Do multiply in double precision (D__MV). Which  should you use?
+* The GSL has a cblas interface, http://www.gnu.org/software/gsl/manual/html_node/Level-2-GSL-BLAS-Interface.html ; an example of its use can be found here http://www.gnu.org/software/gsl/manual/html_node/GSL-CBLAS-Examples.html
+Important things to know about lapack:
+* If you are using an nxn array, the “leading dimension” of the array is n. (This argument is so that you could work on sub-matrices if you wanted)
+* Have to make sure the 2d array is contiguous block of memory
+* You'll (presumably) want to use the C bindings for LAPACK - [http://www.netlib.org/lapack/lapacke.html lapacke].  Note that the usual C arrays are row-major.
+Here's a simple example of calling a LAPACKE routine; note that how the matrix is described (here with a pointer to the data, a leading dimension, and the number of rows and columns) will vary with different types of matrix:
+<source lang="cpp">
+#include <iostream>
+#include <mkl_lapacke.h>
+double **matrix(int n,int m);
+void free_matrix(double **a);
+int main (int argc, const char * argv[])
+{
+   const int n=5;             // number of rows, columns of the matrix
+   const int m = n;           // nrows
+   const int leading_dim_A=n; // leading dimension (# of cols for row major);
+                              // lets us operate on sub-matrices in principle
+   const int leading_dim_b=n; // similarly for b
+   double **A;
+   double *b;
+   b = new double[leading_dim_b];
+   A = matrix(n,leading_dim_A);
+   for (int i=0; i<n; i++)
+       for (int j=0; j<leading_dim_A; j++)
+            A[i][j] = 0.;
+   // let's do a trivial solve
+   // It should be pretty clear that the solution to this system
+   // is x = {0,1,2...n-1}
+   for (int i=0; i<leading_dim_A; i++) {
+        A[i][i] = 2.;
+   }
+   for (int i=0; i<leading_dim_b; i++) {
+        b[i]    = 2*i;
+   }
+   const char transpose='N';     //solve Ax=b, not A^T x = b
+   const int  nrhs = 1;          //  we're only solving 1 right hand side
+   int info;
+   // Call DGELS; b will be overwritten with the value of x.
+   info = LAPACKE_dgels(LAPACK_COL_MAJOR,transpose,m,n,nrhs,
+                          &(A[0][0]),leading_dim_A, &(b[0]),leading_dim_b);
+   // print results
+   for(int i=0;i<n;i++)
+   {
+      if (i != n/2)
+        std::cout << "    " << b[i] << std::endl;
+      else
+        std::cout << "x = " << b[i] << std::endl;
+   }
+   return(info);
+}
+double **matrix(int n,int m) {
+   double **a = new double * [n];
+   a[0] = new double [n*m];
+   for (int i=1; i<n; i++)
+         a[i] = &a[0][i*m];
+   return a;
+}
+void free_matrix(double **a) {
+   delete[] a[0];
+   delete[] a;
+}
+</source>
+===HW4===
+''Assignment 1''
+Trigonometric interpolation uses a n point Fourier series to find values at intermediate points. It is one way of downscaling data, and was a motivation for Gauss, to be applied to planetary motion.
+The way it works is:
+# You fourier-transform your data
+# You add frequecies above the Nyquist frequency (in absolute values), but set all the amplitudes of the new frequencies to zero.
+# Note that the frequencies are stored such that eg. f<sub>n-1</sub> is a low frequency -1.
+# The resulting 2n array can be back transformed, and now gives an interpolated signal.
+For this assignment, write an application that will read in an image from a binary file into a 2d double precision array (this will require converting from bytes to doubles), and creates an image twice the size in all directions using trigonometric interpolation. Use a real-to-half-complex version of the fftw (note: in 2d, this version of the fftw mixes fourier components with the same physical magnitude of their wave number k, so this will work).
+You can process the red, green and blue values separately.
+''Assignment 2''
+Write an application which reads an image and performs a low pass filter on the image, i.e., any fourier components with magnitudes k larger than n/8 are to be set to zero, after which the fourier inverse is taken and the image is written out to disk again. Use the same fft technique as in the first assignment.
+'''Input image'''
+Use [[Media:gauss256.tgz|this image of Gauss]].
+'''Image format:'''
+Use the following simple PPM format:
+First line (ascii): 'P6\n'<br>
+Second line, in ascii, 'width height\n'<br>
+Third line, in ascii, 'maxcolorvalue\n' (this is typically just 255)<br>
+Following that, in binary, are byte-triplets with the red, green and blue values of each pixel.<br>
+Note: in C, the 'unsigned char' data type matches the concept of a byte best (for most machines anyway).
+In fact, between the first and second line, one can have comment lines that start with '#'.
 =Part 3: High Performance Scientific Computing=
@@ Line 150: / Line 538: @@
 You will need to bring a laptop with a ssh facility. Hands-on parts will be done on SciNet's GPC cluster.
-For those who don't have a SciNet account yet, the instructions can be found at http://wiki.scinethpc.ca/wiki/index.php/Essentials\#Accounts
+For those who don't have a SciNet account yet, the instructions can be found at http://wiki.scinethpc.ca/wiki/index.php/Essentials#Accounts
 ==Dates==
@@ Line 157: / Line 545: @@
 ==Topics==
-''Lecture 17''&nbsp;&nbsp; Intro to Parallel Computing<br>
+===''Lecture 1:'' Introduction to Parallel Programming ===
-''Lecture 18''&nbsp;&nbsp; Parallel Computing Paradigms<br>
+:::[[File:Lecture17-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture17-2013/lecture17-2013.html]]
-''Lecture 19''&nbsp;&nbsp; Shared Memory Programming with OpenMP, part 1<br>
+:::[[Media:Lecture17-2013.pdf|Slides]] &nbsp;/ &nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture17-2013/lecture17-2013.mp4 Video recording]
-''Lecture 20''&nbsp;&nbsp; Shared Memory Programming with OpenMP part 2<br>
-''Lecture 21''&nbsp;&nbsp; Distributed Parallel Programming with MPI, part 1<br>
+===''Lecture 2:'' Parallel Computing Paradigms ===
-''Lecture 22''&nbsp;&nbsp; Distributed Parallel Programming with MPI, part 2<br>
-''Lecture 23''&nbsp;&nbsp; Distributed Parallel Programming with MPI, part 3<br>
+:::[[File:Lecture18-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture18-2013/lecture18-2013.html]]
-''Lecture 24''&nbsp;&nbsp; Hybrid OpenMPI+MPI Programming
+:::[[Media:Lecture18-2013.pdf|Slides]] &nbsp;/ &nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture18-2013/lecture18-2013.mp4 Video recording] &nbsp;/ &nbsp; [[#HW1_3|homework 1]]
+===''Lectures 3,4:''  Shared Memory Programming with OpenMP, part 1,2===
+:::[[File:Lecture20-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture20-2013/lecture20.html]]
+:::[[Media:Lecture19-2013.pdf|Slides]] &nbsp;/ &nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture20-2013/lecture20.mp4 Video recording] &nbsp;/ &nbsp; [[#HW2_3|homework 2]]
+===''Lecture 5:'' Distributed Parallel Programming with MPI, part 1===
+:::[[File:Lecture21-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture21-2013/lecture21.html]]
+:::[[Media:Lecture21-2013.pdf|Slides]] &nbsp;/ &nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture21-2013/lecture21.mp4 Video recording]
+===''Lecture 6:'' Distributed Parallel Programming with MPI, part 2===
+:::[[File:Lecture22-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture22-2013/Untitled.html]]
+:::[[Media:Lecture22-2013.pdf|Slides]] &nbsp;/ &nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture22-2013/Untitled.mp4 Video recording] &nbsp;/ &nbsp; [[#HW3_3|homework 3]]
+===''Lecture 7:'' Distributed Parallel Programming with MPI, part 3===
+:::[[File:Lecture23-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture23-2013/lecture23-2013.html]]
+:::[[Media:Lecture23-2013.pdf|Slides]] &nbsp;/ &nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture23-2013/lecture23-2013.mp4 Video recording]
+===''Lecture 8:'' Hybrid OpenMPI+MPI Programming===
+:::[[File:Lecture24-2013-FirstFrame.png|180px|link=http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture24-2013/lecture24-2013.html]]
+:::[[Media:Lecture24-2013.pdf|Slides]] &nbsp;/ &nbsp; [http://support.scinet.utoronto.ca/CourseVideo/SCcourse/lecture24-2013/lecture24-2013.mp4 Video recording] &nbsp;/ &nbsp; [[#HW4_3|homework 4]]
+== Homework assignments ==
+=== HW1 ===
+* Read the SciNet tutorial (as it pertains to the GPC)
+* Read the GPC Quick Start.
+* Get the first set of code:
+<source lang="bash">
+   $ cd $SCRATCH
+   $ git clone /scinet/course/sc3/homework1
+   $ cd homework1
+   $ source setup
+   $ make
+</source>
+*This contains threaded program 'blurppm' and 266 ppm images to be blurred. Usage:
+<source lang="bash">
+  blurppm INPUTPPM OUTPUTPPM BLURRADIUS NUMBEROFTHREADS
+</source>
+* Simple test:
+<source lang="bash">
+  $ qsub -l nodes=1:ppn=8,walltime=2:00:00 -I -X -qdebug
+  $ cd $SCRATCH/homework1
+  $ time blurppm 001.ppm new001.ppm 30 1
+  real  0m52.900s
+  user  0m52.881s
+  sys   0m0.008s
+  $ display 001.ppm &
+  $ display new001.ppm &
+</source>
+''Assignment 1''
+* Time blurppm with a BLURRADIUS ranging from 1 to 41 in steps of 4, and for NUMBEROFTHREADS ranging from 1 to 16.  Record the (real) duration of each run.
+* Plot the duration as a function of NUMBEROFTHREADS, as well as  the speed-up and efficiency.
+* Submit script and plots of the duration, speedup and effiency as a function of NUMBEROFTHREADS.
+''Assignment 2''
+* Use GNU parallel to run blurppm on all 266 images with a radius of 41.
+* Investigate different scenarios:
+:# Have GNU parallel run 16 at a time with just 1 thread.
+:# Have GNU parallel run 8 at a time with 2 threads.
+:# Have GNU parallel run 4 at a time with 4 threads.
+:# Have GNU parallel run 2 at a time with 8 threads.
+:# Have GNU parallel run 1 at a time with 16 threads.
+:Record the total time it takes in each of these scenarios.
+* Repeat this with a BLURRADIUS of 3.
+* Submit scripts, timing data  and plots.
+=== HW2 ===
+In the course materials ( /scinet/course/ppp/nbodyc or nbodyf ) there is the source code for a serial N-body integrator.  This, like the molecular dynamics you've seen earlier, calculates long-range forces by particles on all of the other particles.
+OpenMP the force calculation, and present timing results for 1,4, and 8 threads compared to the serial version.  Note that you can turn off graphic output by removing the "USEPGPLOT = -DPGPLOT" line in Makefile.inc in the top level directory.
+Begin by doubling the work by _not_ calculating two forces at once (eg, not making use of f<sub>ji</sub> = -f<sub>ij</sub>), and simply parallelizing the outer force loop.  Then find a way to implement the forces efficiently but also in parallel.  Is there any other part of the problem which could usefully be parallelized?
+=== HW3 ===
+In the same area (  /scinet/course/ppp/diffusion/diffusion.c ) there is a 1d diffusion problem of a form you may recognize from earlier modules.   Your task is to parallelize this with MPI.  I'd encourage you to use the graphics output while you're developing this (use -DPGPLOT on the compile line), and then omit the graphics and do timings with a larger totpoints (say 16000) and run timings on 1, 2, 4, and 8 processors.   Note that for this simple problem, we don't necessarily expect a huge speedup; it's already pretty fast.
+I'd suggest doing this in steps:
+* include mpi.h, add mpi_init/finalize/comm_size/comm_rank, and compile with mpicc and make sure it runs;
+* calculate the local number of points from the total number of points and the size (and possibly rank); treat the case where the total number of points isn't divisible by the number of processors however you like, but make sure you're consistent about it.
+* Once you've calculated the local number of points, you won't need the variable totpoints; no arrays will be declared of that size, no plots will be made of that size, etc.   Make those changes, compile, and run on (say) 2 procs.
+* Now you'll find that you'll need to figure out the local xleft and local xright of the domain; again, once this is done you won't need to know the global variables any more.   Make those changes, compile, and run.
+* Finally, after the "old" boundary condition setting, do the internal boundary conditions, as in our example on class on Tuesday with sending messages around to our neighbour.
+=== HW4 ===
+* Take the diffusion code of last week’s homework and add OpenMP to the loops that you expect do most of the computational work.
+* Analyze your code’s scaling on a single GPC node, by timing 64 cases, varying both OMP NUM THREADS and the number of MPI processes from 1 to 8.
+* Plot the result and explain what you see.
+* Try this on a pair of GPC nodes as well. Let OMP_NUM_THREADS take values 1,2,4,8, while adjusting the number of mpi processes to 16/OMP_NUM_THREADS.
+* Plot the timing results and explain what you see.
+* Email code, makefile, git log, plots, together with the explanations in a file called explain.txt, by Tuesday April 23.
 =Links=
@@ Line 175: / Line 660: @@
 ==C/C++==
 * [[One-Day Scientific C++ Class]] at SciNet
+* C++ library reference: http://www.cplusplus.com/reference
 * C preprocessor: http://www.cprogramming.com/tutorial/cpreprocessor.html
+* Boost: http://www.boost.org
+* Boost Python tutorial: http://www.boost.org/doc/libs/1_53_0/libs/python/doc/tutorial/doc/html/index.html
 ==Git==
+* Git: http://git-scm.com
+* Version Control: [http://support.scinet.utoronto.ca/CourseVideo/PPPcourse/Thursday_Morning_BP_Revision_Control/Thursday_Morning_BP_Revision_Control.mp4 Video]/ [[Media:Snug_techtalk_revcontrol.pdf | Slides]]
 * Git cheat sheet from Git Tower: http://www.git-tower.com/files/cheatsheet/Git_Cheat_Sheet_grey.pdf
 ==Python==
-* Enthought python distribution: http://www.enthought.com/products/edudownload.php
+* Python: http://www.python.org
+* IPython: http://ipython.org
+* Matplotlib: http://www.matplotlib.org
+* Enthought python distribution: http://www.enthought.com/products/edudownload.php<br/>
+(this gives you numpy, matplotlib and ipython all installed in one fell swoop)
 * Intro to python from software carpentry: http://software-carpentry.org/4_0/python
 * Tutorial on matplotlib: http://conference.scipy.org/scipy2011/tutorials.php#jonathan
+* Npy file format: https://github.com/numpy/numpy/blob/master/doc/neps/npy-format.txt
+* Boost Python tutorial: http://www.boost.org/doc/libs/1_53_0/libs/python/doc/tutorial/doc/html/index.html
+==ODEs==
+* Integrators for particle based ODEs (i.e. molecular dynamics): http://www.chem.utoronto.ca/~rzon/simcourse/partmd.pdf. <br>'''Focus on 4.1.4 - 4.1.6 for practical aspects.'''
+* Numerical algorithm to solve ODEs (General) in ''Numerical Recipes for C'': http://apps.nrbook.com/c/index.html Chapter 16
+==Interpolation (2D) ==
+* Interpolation in ''Numerical Recipes for C'': http://apps.nrbook.com/c/index.html Pages 123-128
+* Wikipedia pages on [http://en.wikipedia.org/wiki/Bilinear_interpolation Bilinear Interpolation] and [http://en.wikipedia.org/wiki/Bicubic_interpolation Bicubic Interpolation] are not bad either.
+==BLAS==
+* [http://www.tacc.utexas.edu/tacc-projects/gotoblas2 gotoblas]
+* [http://math-atlas.sourceforge.net/ ATLAS]
+==LAPACK==
+* http://www.netlib.org/lapack
+==GSL==
+* GNU Scientific Library: http://www.gnu.org/s/gsl
+==FFT==
+* FFTW: http://www.fftw.org
+==Top500==
+* TOP500 Supercomputing Sites: http://top500.org
+==OpenMP==
+* OpenMP (open multi-processing) application programming interface for shared memory programming: http://openmp.org
+==GNU parallel==
+* Official citation: O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login: The USENIX Magazine, February 2011:42-47.
+* [[Media:Tech-talk-gnu-parallel.pdf|Slides of the SciNet TechTalk on Gnu Parallel (14 Nov 2012)]]
+* The documentation for GNU parallel can be found at http://www.gnu.org/software/parallel/
+* Its man page can be found here http://www.gnu.org/software/parallel/man.html
+* The man page is also available on the GPC when the gnu-parallel module is loaded, with the command <code>$ man parallel</code>. The man page contains options, such as how to make sure the output is not all scrambled, and examples.
+==SciNet==
+Anything on this wiki, really, but specifically:
+* [[Essentials|SciNet Essentials]]
+* [[GPC Quickstart]]
+* [[Media:SciNet_Tutorial.pdf |SciNet User Tutorial]]
+* [[Software and Libraries]]
+==Other Resources==
+* [http://galileo.phys.virginia.edu/classes/551.jvn.fall01/goldberg.pdf What Every Computer Scientist Should Know About Floating-Point Arithmetic] - the classic (and extremely comprehensive) overview of the basics of floating point math.   The first few pages, in particular, are very useful.
+* [http://arxiv.org/abs/1005.4117 Random Numbers In Scientific Computing: An Introduction] by Katzgraber.   A very lucid discussion of pseudo random number generators for science.

Difference between revisions of "Scientific Computing Course"

Latest revision as of 15:20, 24 November 2015

Syllabus

About the course

Location and Times

Instructors and office hours

Grading scheme

Sign up

Part 1: Scientific Software Development

Prerequisites

Dates

Topics (with lecture slides and recordings)

Lecture 1: C++ introduction

Lecture 2: More C++, build and version control

Lecture 3: Python and visualization

Lecture 4: Modular programming, refactoring, testing

Lecture 5: Object oriented programming

Lecture 6: ODE, interpolation

Lecture 7: Development tools: debugging and profiling

Lecture 8: Objects in Python, linking C++ and Python

Homework assignments

HW1

HW2

HW3

HW4

Part 2: Numerical Tools for Physical Scientists

Prerequisites

Dates

Topics

Lecture 1: Numerics

Lecture 2: Random numbers

Lecture 3: Numerical integration and ODEs

Lecture 4: Molecular Dynamics

Lecture 5: Linear Algebra part I

Lecture 6: Linear Algebra part II and PDEs

Lecture 7: Fast Fourier Transform

Lecture 8: FFT for real and multidimensional data

Homework Assignments

HW1

HW2

HW3

HW4

Part 3: High Performance Scientific Computing

Prerequisites

Dates

Topics

Lecture 1: Introduction to Parallel Programming

Lecture 2: Parallel Computing Paradigms

Lectures 3,4: Shared Memory Programming with OpenMP, part 1,2

Lecture 5: Distributed Parallel Programming with MPI, part 1

Lecture 6: Distributed Parallel Programming with MPI, part 2

Lecture 7: Distributed Parallel Programming with MPI, part 3

Lecture 8: Hybrid OpenMPI+MPI Programming

Homework assignments

HW1

HW2

HW3

HW4

Links

Unix

C/C++

Git

Python

ODEs

Interpolation (2D)

BLAS

LAPACK

GSL

FFT

Top500

OpenMP

GNU parallel

SciNet

Other Resources

Navigation menu

Search