Friday, July 27, 2018

Distribution System Pattern

Intro

Hi all! This blog post continues the "new patterns ideas" series. In these posts I suggest new patterns that utilize the latest technology available in computer programming. The well-known patterns from the Gang of Four book are all great and are widely used. But they were created a long time ago. They didn't have concurrency, futures and other stuff. I call them "new pattern ideas" because they are not actually patterns at this moment only ideas. I'm sure some programmers have used similar structures in their programs trying to solve similar problems.

Patterns help us in many ways. For those who don't know who to solve a problem they may show the way. For those who have already solved a problem they help explain the solution to others. In my opinion the best programs can be represented as a group of patterns linked together.

Motivation

The picture above shows the motivation for this pattern. You have a number of tasks which can be executed in parallel. The tasks must be distributed to workers and the results the workers produce add up to one final result. All the tools for this task are already available in Java: CompletableFuture, ExecutorService. This pattern will introduce some clarity how to use these tools.

Applicability

Use this pattern when you have a big task that can be decomposed into many small tasks that can be executed in parallel. Execution of one small task creates one small result. All the small results add up to produce one final result. 

Structure

Tasks

The tasks should be organized hierarchically like this:
Leaves contain the Task objects that do the actual work. The main idea is that the task execution must return a result and CompletableFuture uses Supplier in supplyAsync methods. Consequently it is easier to use task objects implementing Supplier. All the leaves have one execute method that returns a CompletableFuture received by passing the task object to CompletableFuture.supplyAsync (possibly also passing its own executor).  
Folders allow for grouping of tasks and ordered execution. For example one group of tasks may be executed before the other. It is important to remember though that all the tasks return results of one type or at least inherited from one type. Folders call execute on all of their children and return a CompletableFuture that is the result of CompletableFuture.allOf(...). If necessary folders may contain other folders.
The root is used for administrative purposes. It may contain the executor service for the tasks and also is the main entry point for the distribution system. It should have something like an execute method that begins the whole execution process and calls execute on all folders. 
Here is the call hierarchy:

execute always returns a CompletableFuture. 

Accumulating Results

Accumulating results is a very important part of this pattern. The idea is to create an interface like this:

public interface IDistribAccumulator<R, F, E> {

    public void addResult(R newResult);

    public List<R> getAllResults();

    public F getResult();

    public void addError(E newError);

    public List<E> getErrors();
}

We'll call this object the accumulator. Here R is the type of the task result, F the final result of this operation, E the type of an error. The leaf adds some handling to the task by means of CompletableFuture.handle. If the task completed without errors it adds the result to the accumulator. If an exception was thrown it adds an error to the accumulator. Here is a sample code of the execute method of the Leaf:

public CompletableFuture<R> execute() {
        CompletableFuture<R> execFuture = CompletableFuture.supplyAsync(task, es);
        return execFuture.handle((R arg0, Throwable t) -> {
            if (arg0 != null) {
                accumulator.addResult(arg0);
            }
            if (t != null) {
                accumulator.addError(new <Throwable wrapper>(t));
            }
            return arg0;
        });
    }

Tasks remain decoupled from the leaves and the work is done. 

Users of this distribution system pattern will implement this interface and may provide a way to get the final result from the individual results when all the tasks have completed execution. 

In progress status

Also a common issue is how to report to the user the status when the work is in progress. Some tasks have completed execution, some are in progress, some are waiting in the queue and the UI must show the current status. The solution is to use the accumulator. It can return the tasks that are already completed and even the results for them. It is of course desirable that the accumulator returns an unmodified list from getAllResults to avoid any issues. 


How to create the structure

It makes sense to create a Builder to create the distribution system. The instance of the Builder is created with some parameters and then the Builder uses these parameters to create first the root, then the folders and then the tasks. 

Structure mapped to motivation


Participants

TaskRoot - the entry point to the distribution system. Contains shared objects and begin the execution
TaskFolder - contains individual tasks. Used for grouping tasks.
TaskLeaf - contains the actual task object. Executes it, adds the result or the error to the accumulator, returns the CompletableFuture.
Task - the actual piece of work. All tasks are independent (at least within a folder) and can be run in parallel.
Accumulator - accumulates the results from individual tasks and returns the final result.

Consequences

  1. Clear structure. After Java more or less streamlined the process of parallel execution by adding ExecutorService, CompletableFuture and other facilities a lot of people have begun using these facilities. But in order to successfully implement a solution a clear structure is necessary. This pattern offers such a clear structure.
  2. Reduced coupling. The structure to execute the tasks and get results is decoupled from the actual tasks. The objects that must be implemented are the tasks themselves, the accumulator and the builder. 
  3. Code reuse. The main structure can be written once and use multiple times by supplying the task objects, the accumulator and the builder. 

Implementation Considerations

This pattern is a concurrent pattern. Consequently it is very important to synchronize all the methods that can be accessed by multiple threads. Mostly these are the methods in the accumulator. It may be best if all of them are synchronized. 
The parts that need to be changed in any concrete implementation are the tasks, the accumulator and the builder. All the other parts of the system are fairly general and can be reused.

Final Thoughts

There is an actual implementation of this pattern and it is working quite well. Maybe in one of the next posts I'll talk about it.

No comments:

Post a Comment