Scientific Software Development Course

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search

Syllabus

The first of SciNet's for-credit courses (Phys 2109 modular course credit / Ast 3100 mini-course credit) will start in November, and be held in SciNet's conference room, at 256 McCaul street on the 2nd floor. Dates are Nov 4, 11, 18, and 25th, 9:30-11:30am on each Friday.

The goal of the first course is to have students learn the basics of best practice for writing maintainable, modular scientific programs. At the end of minicourse I, "Scientific Software Development", students arriving with fairly modest scientific programming knowledge will leave being able to:

  • Do basic software development in a linux-like environment
  • Write a modular scientific program in C
  • Read and write Makefiles for building large pieces of software
  • Be able to install simple libraries on their systems
  • Use git for basic version control for software development (or paper writing)
  • Use Python for basic visualization and data manipulation
  • debug C programs with gdb/ddd
  • write simple C++ and Python programs.
  • use gprof or python profiling to find performance "hot spots"

The course will require 4-6 hours each week spent on reading and homework.

Required Software

Each lecture will have a hands-on component; students are strongly encouraged to bring laptops. (Contact us if this will be a problem). Windows, Mac, or Linux laptops are all fine, but some software will have to be installed *before* the first lecture:

On windows laptops only, Cygwin ( http://www.cygwin.com/) will have to be installed ; ensure that development tools (gcc/g++/gfortran, gdb), git, and the X environment (Xorg) is installed.

On Mac laptops, ensure that the development tools (Xcode) is installed.

On Linux, ensure that packages for the gcc compiler suite (gcc/g++/gfortran), gdb, and git are installed

On all platforms, the Enthought python distribution ( http://www.enthought.com/products/edudownload.php ) must be installed.

Students who aren't already comfortable with working in a shell / terminal environment should work through at least the first three 10-minute lectures at http://software-carpentry.org/4_0/shell/ .

Course outline

The classes will cover the material as follows; homework will be due by email at Thursday noon on the day before the following class.

Week 1 - Intro to software developemnt: basics of C, make, git. HW1: modular, multifile programming and make.

Week 2 - Simple parabolic PDEs; modular programming, refactoring, and testing. Simple visualization using python. HW2: Refactoring, testing, and debugging a simple PDE solver

Week 3 - Structures in C; simple ODE solvers and interpolation. HW3: tracer particle evolution

Week 4 - Going further with C++ and Python; profiling HW4: "porting" tracer particle evolution to C++ or python.

Evaluation

Evaluation will be based entirely on the four home works, with equal weighting given to each.

Location (CHANGED!)

First Lecture: SciNet offices at 256 McCaul, 2nd Floor.

Because of the higher than expected turn-up of students, the remaining three lectures of part 1 of SciNet's scientific computing course will be held at:

   Astronomy and Astrophysics Building
   50 St. George Street
   Room AB 107
   Same time and date: Friday 9:30 - 11:30.

Office Hours

The instructors will have office hours on Monday and Wednesday afternoons, 3pm-4pm, starting the week of the first class.

Location: SciNet offices at 256 McCaul, 2nd Floor.

  • Mon, Oct 31, 3pm
  • Wed, Nov 2, 3pm
  • Mon, Nov 7, 3pm
  • Wed, Nov 9, 3pm
  • Mon, Nov 14, 3pm
  • Wed, Nov 16, 3pm
  • Mon, Nov 21, 3pm
  • Wed, Nov 23, 3pm
  • Mon, Nov 28, 3pm
  • Wed, Nov 30, 3pm

Lectures

Lecture 1 (slides and video)

Intro to C, make and version control 
Click for Video
Slides

Lecture 2 (slides)

Lecture 2 Slides

Homeworks

Homework 1

We’ve learned programming in basic C, use of make and Makefiles to build projects, and local use of git for version control. In this first assignment, you’ll use these to make a multi-file C program, built with make, which computes and outputs a data file.

Start a new git repository, and begin writing a C program which gets an array size and a standard deviation from user input, allocates a 2d array, and stores a 2d Gaussian with a maximum at the centre of the array and with a standard deviation (in units of grid points in that array). The program then outputs that array to a text file, frees the array, and exits. The 2d array creation/freeing routines should be in one file (with an associated header file), the gaussian calculation be in another (ditto), and the output routine be in a third, with the main program calling each of these. The package should come with a Makefile which lets make build the software. (Note that you can start with everything in one file, and with hardcoded values for sizes and standard deviation and static array, and then refactor things into multiple files and add the “features” of user input and allocated arrays once the simplest part is working.)

The output text file should contain just the data in text format, with a row of the file corresponding to a row of the array and with whitespace between the numbers. You can include comments in the text file if you like, on lines that begin with the “#” character.

As a test, you should be able to use the ipython executable that came with your enthought python distribution to read your data and plot it. If your data file is named ‘data.txt’, running the following from the terminal:

$ ipython -pylab

In [1]: data = numpy.genfromtxt('data.txt') 

In [2]: contour(data) 

Should give you a nice contour plot of a 2-dimensional gaussian.

You will hand in your source code and the git log file of all your commits (and you will commit regularly!) by email by next Thursday at noon, so we can review the assignments and talk about any commonly occuring issues in Friday’s class. Email the files to rzon@scinet.utoronto.ca and ljdursi@scinet.utoronto.ca.

Homework 2

Today, we talked about modular programming and testing, and the project we’ll be working on for the next three weeks. This homework will start advancing on that project by working on the “legacy” code given to us by our prof, and whipping it into shape before we start adding new physics.

If you haven’t already, create a git repository that contains the source code ( diffuse2.c) and any related files (By the way, a decent Git cheatsheet can be found at the link below. Since you’re doing all local work, you can ignore the bits about “update and publish”, and “merge and release”: http://www.git-tower.com/files/cheatsheet/Git_Cheat_Sheet_grey.pdf )

For next Thursday, you’ll have refactored this into some sensible structure, with multiple files, and a Makefile. You’ll at least have factored out the main evolution code (which implements the diffusion operator on the 2d grid), and probably the array stuff, plotting stuff, and theory stuff as well. This should go pretty quickly, as it’s similar to last week’s work, and we’ll probably get a good start on it in class.

You’ll then write two tests for the evolution code, and have a separate make target which compiles the diffusion tests. On running that executable, you’ll find if the diffusion operator passed the tests or not.

Next, you’ll read up on interpolation and ODEs (the references will be given in class or soon afterwards) and write the interface and stub implementation for an interpolation routine which, when given a 2d array of values and a floating-point (x,y) pair in grid coordinates on the grid, will calculate an interpolated value of the field at that point. There’s a number of interpolations which could make sense; just document what you’ve done. If the coordinate pair is out of bounds, return an error code. Include at least two tests routines for this interpolation code, and have a separate make target for the interpolation test. The tests will necessarily fail.

Finally, get the legacy code up and fully running with the plotting package that it was supposed to work with: http://www.astro.caltech.edu/~tjp/pgplot/ . The installation instructions are a bit odd but clear. Then include -DPGPLOT as a compile-time flag, and you’ll have to add link flags and probably a compile flag to link into the new library. (This is something you’ll have to do often – install a particular package to run some piece of software. This is intended to walk you through this.)

The marking scheme will be 20% git log, 20% Makefile, 20% refactoring, 30% tests, and 10% getting pgplot to work. Office hours will again be Mon and Wed, but this time both will be held by Ramses. You can always email us with questions.


You will hand in your source code, and the git log file of all your commits by email by next Thursday at noon, so we can review the assignments and talk about any commonly occuring issues in Friday’s class. Email the files to rzon@scinet.utoronto.ca and ljdursi@scinet.utoronto.ca.