Scientific Software Development Course
Syllabus
The first of SciNet's for-credit courses (Phys 2109 modular course credit / Ast 3100 mini-course credit) will started in November, with the first lecture held in SciNet's conference room, but the subsequent lectures moved to the Astronomy building (see below). Dates are Nov 4, 11, 18, and 25th, 9:30-11:30am.
The goal of the first course is to have students learn the basics of best practice for writing maintainable, modular scientific programs. At the end of minicourse I, "Scientific Software Development", students arriving with fairly modest scientific programming knowledge will leave being able to:
- Do basic software development in a linux-like environment
- Write a modular scientific program in C
- Read and write Makefiles for building large pieces of software
- Be able to install simple libraries on their systems
- Use git for basic version control for software development (or paper writing)
- Use Python for basic visualization and data manipulation
- debug C programs with gdb/ddd
- write simple C++ and Python programs.
- use gprof or python profiling to find performance "hot spots"
The course will require 4-6 hours each week spent on reading and homework.
Required Software
Each lecture will have a hands-on component; students are strongly encouraged to bring laptops. (Contact us if this will be a problem). Windows, Mac, or Linux laptops are all fine, but some software will have to be installed *before* the first lecture:
On windows laptops only, Cygwin ( http://www.cygwin.com/) will have to be installed ; ensure that development tools (gcc/g++/gfortran, gdb), git, and the X environment (Xorg) is installed.
On Mac laptops, ensure that the development tools (Xcode) is installed.
On Linux, ensure that packages for the gcc compiler suite (gcc/g++/gfortran), gdb, and git are installed
On all platforms, the Enthought python distribution ( http://www.enthought.com/products/edudownload.php ) must be installed.
Students who aren't already comfortable with working in a shell / terminal environment should work through at least the first three 10-minute lectures at http://software-carpentry.org/4_0/shell/ .
Course outline
The classes will cover the material as follows; homework will be due by email at Thursday noon on the day before the following class.
Week 1 - Intro to software developemnt: basics of C, make, git. HW1: modular, multifile programming and make.
Week 2 - Simple parabolic PDEs; modular programming, refactoring, and testing. Simple visualization using python. HW2: Refactoring, testing, and debugging a simple PDE solver
Week 3 - Structures in C; simple ODE solvers and interpolation. HW3: tracer particle evolution
Week 4 - Going further with C++ and Python; profiling HW4: "porting" tracer particle evolution to C++ or python.
Evaluation
Evaluation will be based entirely on the four home works, with equal weighting given to each.
Location (CHANGED!)
First Lecture: SciNet offices at 256 McCaul, 2nd Floor.
Because of the higher than expected turn-up of students, the remaining three lectures of part 1 of SciNet's scientific computing course will be held at:
Astronomy and Astrophysics Building 50 St. George Street Room AB 107 Same time and date: Friday 9:30 - 11:30.
Office Hours
The instructors will have office hours on Monday and Wednesday afternoons, 3pm-4pm, starting the week of the first class.
Location: SciNet offices at 256 McCaul, 2nd Floor.
- Mon, Oct 31, 3pm
- Wed, Nov 2, 3pm
- Mon, Nov 7, 3pm
- Wed, Nov 9, 3pm
- Mon, Nov 14, 3pm
- Wed, Nov 16, 3pm
- Mon, Nov 21, 3pm
- Wed, Nov 23, 3pm
- Mon, Nov 28, 3pm
- Wed, Nov 30, 3pm
Lectures
Lecture 1 (slides and video)
Intro to C, make and version control
Slides
Lecture 2 (slides)
Python, modular programming, course project, diffusion PDE, debugging, testing
(Recording unfortunately crashed)
Slides
Homeworks
Homework 1
We’ve learned programming in basic C, use of make and Makefiles to build projects, and local use of git for version control. In this first assignment, you’ll use these to make a multi-file C program, built with make, which computes and outputs a data file.
Start a new git repository, and begin writing a C program which gets an array size and a standard deviation from user input, allocates a 2d array, and stores a 2d Gaussian with a maximum at the centre of the array and with a standard deviation (in units of grid points in that array). The program then outputs that array to a text file, frees the array, and exits. The 2d array creation/freeing routines should be in one file (with an associated header file), the gaussian calculation be in another (ditto), and the output routine be in a third, with the main program calling each of these. The package should come with a Makefile which lets make build the software. (Note that you can start with everything in one file, and with hardcoded values for sizes and standard deviation and static array, and then refactor things into multiple files and add the “features” of user input and allocated arrays once the simplest part is working.)
The output text file should contain just the data in text format, with a row of the file corresponding to a row of the array and with whitespace between the numbers. You can include comments in the text file if you like, on lines that begin with the “#” character.
As a test, you should be able to use the ipython executable that came with your enthought python distribution to read your data and plot it. If your data file is named ‘data.txt’, running the following from the terminal:
$ ipython -pylab In [1]: data = numpy.genfromtxt('data.txt') In [2]: contour(data)
Should give you a nice contour plot of a 2-dimensional gaussian.
You will hand in your source code and the git log file of all your commits (and you will commit regularly!) by email by next Thursday at noon, so we can review the assignments and talk about any commonly occuring issues in Friday’s class. Email the files to rzon@scinet.utoronto.ca and ljdursi@scinet.utoronto.ca.
Homework 2
Today, we talked about modular programming and testing, and the project we’ll be working on for the next three weeks. This homework will start advancing on that project by working on the “legacy” code given to us by our prof, and whipping it into shape before we start adding new physics.
If you haven’t already, create a git repository that contains the source code ( diffuse2.c) and any related files (By the way, a decent Git cheatsheet can be found at the link below. Since you’re doing all local work, you can ignore the bits about “update and publish”, and “merge and release”: http://www.git-tower.com/files/cheatsheet/Git_Cheat_Sheet_grey.pdf )
For next Thursday, you’ll have refactored this into some sensible structure, with multiple files, and a Makefile. You’ll at least have factored out the main evolution code (which implements the diffusion operator on the 2d grid), and probably the array stuff, plotting stuff, and theory stuff as well. This should go pretty quickly, as it’s similar to last week’s work, and we’ll probably get a good start on it in class.
You’ll then write two tests for the evolution code, and have a separate make target which compiles the diffusion tests. On running that executable, you’ll find if the diffusion operator passed the tests or not.
Next, you’ll read up on interpolation and ODEs (the references will be given in class or soon afterwards) and write the interface and stub implementation for an interpolation routine which, when given a 2d array of values and a floating-point (x,y) pair in grid coordinates on the grid, will calculate an interpolated value of the field at that point. There’s a number of interpolations which could make sense; just document what you’ve done. If the coordinate pair is out of bounds, return an error code. Include at least two tests routines for this interpolation code, and have a separate make target for the interpolation test. The tests will necessarily fail.
Finally, get the legacy code up and fully running with the plotting package that it was supposed to work with: http://www.astro.caltech.edu/~tjp/pgplot/ . The installation instructions are a bit odd but clear. Then include -DPGPLOT as a compile-time flag, and you’ll have to add link flags and probably a compile flag to link into the new library. (This is something you’ll have to do often – install a particular package to run some piece of software. This is intended to walk you through this.)
The marking scheme will be 20% git log, 20% Makefile, 20% refactoring, 30% tests, and 10% getting pgplot to work. Office hours will again be Mon and Wed, but this time both will be held by Ramses. You can always email us with questions.
You will hand in your source code, and the git log file of all your commits by email by next Thursday at noon, so we can review the assignments and talk about any commonly occuring issues in Friday’s class. Email the files to rzon@scinet.utoronto.ca and ljdursi@scinet.utoronto.ca.
Links
Unix
- Cygwin: http://www.cygwin.com
- Intro to unix shell from software carpentry: http://software-carpentry.org/4_0/shell
Git
- Git cheat sheet from Git Tower: http://www.git-tower.com/files/cheatsheet/Git_Cheat_Sheet_grey.pdf
Python
- Enthought python distribution: http://www.enthought.com/products/edudownload.php
- Intro to python from software carpentry: http://software-carpentry.org/4_0/python
- Tutorial on matplotlib: http://conference.scipy.org/scipy2011/tutorials.php#jonathan
PGPLOT
- Pgplot plotting library: http://www.astro.caltech.edu/~tjp/pgplot
Interpolation (2D)
- Interpolation in Numerical Recipes for C: http://apps.nrbook.com/c/index.html Pages 123-128
- Wikipedia pages on Bilinear Interpolation and Bicubic Interpolation are not bad either.
ODEs
- Integrators for particle based ODEs (i.e. molecular dynamics): http://www.chem.utoronto.ca/~rzon/simcourse/partmd.pdf.
Focus on 4.1.4 - 4.1.6 for practical aspects. - Numerical algorithm to solve ODEs (General) in Numerical Recipes for C: http://apps.nrbook.com/c/index.html Chapter 16