Tuesday, November 3, 2009

On Task Parallelism, Recursive Splitting, Discrete Event

Out of the three patterns, I would have expected Task Parallelism to be the easiest read. However the overall feeling was that it provided an abundance of details. The examples are various, a bit too elaborate and sometimes a bit too complex. There are around two tables and one diagram to support the importance of understanding implementation platform characteristics. I suppose that the complexity of the pattern writing form comes from its applicability to a numerous class of problems combined with the wish of the authors to cover a lot material. In the end, I would say that this pattern constitutes more of a good reference on Task Parallelism.

I found recursive splitting very intuitive because of its similarity to algorithms that can be expressed recursively or the commonly known Divide and Conquer (e.g. many flavors of sorting). In such cases, usually a problem is recursively split into smaller problems until the problem is small enough to solve directly. Similarly the solutions to the recursive splitting problems can be viewed as solving recursively generated tasks (e.g. quick sort or binary tree search). Recursive splitting is particularly useful when the knowledge of the hardware resources available is not always known, because it offers an algorithmic strategy that produces tasks but does not know about handling them. The handling of the generated tasks can then be handled by implementation strategy patterns like task queue and fork join etc, strategies that I was also already aware of from the previous readings on patterns on parallelism. The major concerns with mapping such recursive structures to parallel platforms are determining recursion depth vs. computation per node, load balancing and locality considerations. Defining a small base case for the recursion generates a large amount of concurrent tasks and keeps the hardware busy, however, the efficiency per thread can be low. Defining larger base cases can improve the computational efficiency per thread by keeping the overheads low, but might not generate enough concurrency. For instance, in the quicksort example, the base case is too small – an array of size 1 is sorted by default. This is not the ideal scenario – a better option is to use small efficient sorts like insertion sort or a serial version of quicksort within a task if the array size is less than a threshold (depending on the problem size and machine capabilities). I also found particularly interesting the description of possible optimizations to improve locality.

I was not familiar with the discrete event pattern. The pattern seems to be a generalization of the Pipeline pattern. In the Discrete Event pattern, there is no restriction to a linear structure, no restriction that the flow of data would be one-way, and the interaction takes place at irregular and sometimes unpredictable intervals. Sometimes it is desirable to compose existing, possibly sequential, program components that interact in possibly irregular ways into a parallel program without changing the internals of the components. For problems such as this, it might make sense to base a parallel algorithm on defining a task (or a group of tightly coupled tasks) for each component, or in the case of discrete-event simulation, simulation entity. Interaction between these tasks is then based on the ordering constraints determined by the flow of data between them. Load balancing is a difficult problem in this pattern due to its potentially irregular structure and possible dynamic nature. Some infrastructures that support this pattern allow task migration so that the load can be balanced dynamically at runtime. The solution is based on expressing the data flow using events as abstractions, with each event having a task that generates it and a task that processes it and defining ordering constraints between the tasks. Each task is an instance of the Facade pattern by providing a consistent event-based interface to the component. Optimistic and pessimistic approaches are used to deal with out of order invents. It is also possible for systems using this pattern to reach deadlock. In these cases, a middle-ground solution is to use timeouts instead of accurate deadlock detection, and is often the best approach.

No comments:

Post a Comment