Genius Needed Software Porgrammer + Cluster Help Needed


VIP Member
VIP Member
Jan 8, 2007
Reaction score

I am currently doing my advanced project on a special method called FDTD (finite difference time domain) techniques method. One of the tasks i need to do is to look at parallel methods i.e clusters to make the code run faster. Does anybody know where i canfind useful information from looking from the Internet it is a minefield. Need sto take into how the machines can process the threads of the code and concurrency issues. If anybody has any ebooks or links to any useful documents. Would be very much appreciated



Inactive User
Jun 30, 2007
Reaction score
Back in the Toon
Well to start off, have a look at the wikipedia page here. There's a list of GPLed FTDT solvers on it (some of which are parallel), so you can see how others have done it. The most computationally demanding part of the calculation (and therefore the bit you want to 'parallelize' first) will most likely be some sort of matrix diagonalization. There are literally thousands of books, papers and sites about different algorithms for doing this sort of thing. The main things to consider are the type of matrix (dense, sparse, pseudo diagonal etc), amount of memory available to each machine (can you fit the whole matrix in memory on all machines, will you need to dump it to a temporary file, could use something like Global Arrays for distributed memory?) and network latency.

I would expect you should learn a bit about MPI, which is generally the parallelization technique used on clusters. MPI-2 supports distributed memory natively, so that might be useful. The MIT books on mpi are pretty good. I bought them, but I'm sure you can 'obtain' them from the ether.

I summary, I'd start off learning about MPI (plenty of tutorials and documentation on the web). After that, take a look at some of the packages listed on the wikipedia site to get an idea about what is possible.

The golden rule in parallelizing a piece of software is to be as coarse as you can. Split the work into as big blocks as possible, so that each block/cpu/node can be left alone to do its work as long as possible. Doing it this way minimizes the interprocess communication and improves scalability. Of course there are exceptions and caveats, but if you start off thinking like that you'll do okay.

Have fun :)