Introduction To Performance
Jump to navigation
Jump to search
Serial Performance
Worrying about parallel performance before the code performs well with a single task doesn't make much sense! Profiling your code when running with one task allows you to spot serial `hot spots' for optimization, as well as giving you more detailed understanding of where your program spends its time.
/bin/time
gprof
vtune (Intel)
peekperf, hpmcount (p6)
Parallel Performance
Speedup
Efficiency
Strong Scaling
Weak Scaling
Common OpenMP Performance Problems
Common MPI Performance Problems
Overuse of MPI_BARRIER
Many Small Messages
Typically, a the time it takes for a message of size n to get from one node to another can be expressed in terms of a latency l and a bandwidth b, For small messages, the latency can dominate the cost of sending (and processing!) the message. By bundling many small messages into one, you can amortize that cost over many messages, reducing the time spent communicating.