I cannot change my implementation now cause the code is almost complete...
I wrote the program like that in the sample because I have many restrictions for the datatypes I can send/receive
among processors in the cluster.
The problem with my implementation is that the data must be loaded from a file by one processor (the coordinator), arranged properly (dividing the data equally for every processor) for balanced load and then be distributed to all worker processors.
I just wanted to free some memory from the coordinator between some code blocks to try even bigger data input for my results.
I'll try to work it a bit more with a visual debugger / disassembler that I have although is way more sophisticated for what I want to do.