I was facing same problem with super computers. If we have many computers connected in parallel then with MPI(message passing interface) we have two problems the first comes LATENCY and then comes BANDWIDTH. These are the two major problems.
i will soon be able to get rid of this problem.
Then coming to performance
firstly for a single raspberry pi module
PMCPI 900 million iterations single node
now for 32 node raspberry modules
for same iterations