Pragmatic Concurrency for Python

Tutors:

       Eilif Muller (eilif dot mueller at epfl dot ch)
       Bastian Venthur (mail at venthur dot de)
Topics covered
Exercises

The purpose of these exercises is not to amount to killer speed-ups (a laptop is not the right hardware for that), but rather to run and modify a few examples, become comfortable with APIs, and implement some simple parallel programs.

parallel_materials.tar.gz

1) Running MPI programs

Write a simple python program using the mpi4py module which imports mpi4py.MPI and displays the COMM_WORLD.rank, size and MPI.Get_processor_name() on each process. It is always handy to have such a program around to verify that the MPI environment is working as expected. In a distributed environment, the processor name will further inform you that your MPI execution was spawned accross machine boundaries, and how many processes are allocated per machine.

Note: To run your program mpi4py, it must be started as if it was any MPI program, i.e. as follows:

$ mpiexec -n X python <program.py>

2) Matrix Multiplication

Four implementations of matrix multiply are available in the source tar-ball (subdir “matmul”). “ipython_” is a ipython version, “mpi_” is an mpi version, “mp_*” are multiprocessing versions, using shared numpy arrays or not.

$ ipcluster start -n X

Where -n X is the number of slave processes to start.

3) Parallelization of mandelbrot

In the source tar-ball under “mandelbrot” is a serial implementation of a mandelbrot plotter.

a) Using similar decomposition techniques to the Matrix Multiplication example, parallelize the serial implementation of the mandelbrot plotter provided in the examples, using mpi4py, ipython and multiprocessing.

b) Load balancing - the mandelbrot compuation has the property that computing some pixels take much longer than others.

First, quantify the degree of inbalance by gathering and plotting the distribution of execution times per pixel. Assuming you used chunked decomposition as for matrix multiplication, how does this per-pixel imbalance translate into a per-chunk inbalance?

Second, Can you modify the decomposition of the problem to provide each worker with work-loads which are more equal?

Hints:

* ipython: read-up on the LoadBalancedView here: http://ipython.org/ipython-doc/rel-0.13/parallel/parallel_task.html * mpi4py: a pure mpi4py approach is more tricky. One method might be to use asynchronous messaging (Isend, Irecv) to a set of workers and let a master (e.g. rank 0) re-assign work as workers complete.

4) IPython map-reduce

Using the ipython approach, get a collection of processes to count the occurrences of a word in a collection of documents, and then reduce the results to a total count per word on the master process.

See also: http://en.wikipedia.org/wiki/MapReduce, http://labs.google.com/papers/mapreduce.html

Lecture material

parallel_talk.pdf