Distributed computation

I have an easily parallelizable workload that I would like to run on multiple computers on the same network.
Ideally I'd like it to work something like this:

* A manager process at a known location listens for incoming connections from workers.
* The manager handles the details of work assignment to online workers, storing results, networking, etc. It merely asks a loaded module for details of how to partition the workload.
* A worker process connects to the server and spins up threads according to the computer's capabilities, calling into a module in each thread to do the computation and then sending back completed results to the manager.
* Again, the worker takes care of the low lever details such as networking and threading.

The code in the loaded module (my code) should be something not much more complex than
//Called from manager
Workload get_workload(){

//Called from worker
Result process_workload(Workload &){

//Called from manager
void results_complete(const std::vector<Result> &){

Does anyone know of a system/library/whatever that does something like this, or how I could search for it?
Last edited on
Message Passing Interface (MPI) could implement manager-workers, but not quite like you describe. MPi is a library.

Schedulers like HTCondor and SLURM could launch jobs (including MPI) at remote machines.
Thanks, I'll look into it. What keywords should I use when searching for stuff like this? "Remote scheduler"?
I'm interested in this as well. I have a similar use case for which I currently use boost::interprocess (can't just use threads because a library I'm relying on for computation is not thread-safe), but I want to move to a distributed version. I was looking for 'RPC HPC' and 'actor model HPC' and came across Charm++ ( http://charmplusplus.org/ ). Looks nice but haven't had time to evaluate it yet. In any case, if you settle on something please share.
closed account (E0p9LyTq)
helios wrote:
I have an easily parallelizable workload that I would like to run on multiple computers on the same network.

You should look at the source code for the BOINC Manager, Berkeley Open Infrastructure for Network Computing.


Volunteer Mac developers are needed, the current solo one is leaving the project.


Note, this is non-paying. Good bullet point on the resume, though.

I run several BOINC projects on my PCs when I am not using them.
HTCondor and SLURM have their own pages:

They are not directly linked to what you run. SLURM assumes that every node sees the same (shared) path.

MPI-application uses the MPI library for interprocess communication (over network). If one process fails, whole application is bound to fail due to race conditions. (I presume; haven't touched MPI-base code within this millenia.)

There are MPI applications that have single-threaded processes. All communication is via MPI.
Others use one multi-threaded process per machine and use MPI only for inter-machine communication.
closed account (z05DSL3A)
Part of the Intel Performance Libraries is an MPI library, I guess you could check out their docs and sample code...
Topic archived. No new replies allowed.