Wednesday, September 30, 2009

On A Java Fork/Join Framework

Fork/Join is a simple design technique acting as the parallel version of the well known divide and conquer algorithm. As the name suggests the algorithm is based on two basic operations, fork - by which the current task starts a new parallel task, and join, which forces the current task not to proceed until the current subtask has completed. These two methods confer the algorithm its recursive nature, with tasks repeatedly splitting subtasks until they are small enough to solve using simple sequential methods.

While at a first glance, any framework (e.g. pthreads, java threads) supporting threads creation and ways to make them wait on their completion would seem a good candidate for implementing the Fork/Join framework, standard thread frameworks are in general to heavy to support the Fork/Join programs. One reason for that is that the synchronization requirements for fork/join tasks are more limited than those for regular threads, so the overhead associated with tracking blocked general purpose threads is wasted. In fact, the significance of the thread overhead is increasing with the granularity of the tasks. Consequently, there should be a good balance between task granularity and thread overhead that should maximize the effect of parallelism. Cilk, one of the first frameworks to tackle these problems, implements fork/join support on top of an operating system’s basic thread or process mechanisms. FJTask framework offers a variant of the design used in Cilk, but with a Java implementation, benefiting from its portability. The core design principle of these frameworks is to map the task to threads as an operating system maps threads to CPUs, but exploit the simplicity, regularity, and constraints of fork/join programs in performing the mapping.

FJTask architecture is simple: a pool of standard threads, as many as CPUs on the system is allocated, with each worker processing tasks held in an internal deque. The JVM and OS should be trusted to allow the mapping of tasks to different CPUs. Fork/join tasks (FJTasks) act as lightweight executable classes, by implementing the Runnable interface and its run method. The scheduling mechanic, inspired from Cilk, is the core of FJTasks framework. Each worker thread maintains runnable tasks in its internal deque, supporting push, pop, take operations. Subtasks generated in tasks run by a given worker thread are pushed onto that worker's own deque. Workers process their own deques in LIFO, or can steal tasks from others workers (using a FIFO), when they are out of tasks. When idle, workers enter a special priority adjustment sequence with attempts to get new tasks, which may eventually end up with the worker blocking until another task is invoked from top level. The mechanism is great because it reduces contention by having stealers operate on the opposite side of the deque as owners. Also, with this scheme programs adopting small task granularities for base actions are likely to run faster than those that only use coarse grained partitioning.

A potential drawback of having such a framework implemented is Java is related to garbage collection. At the first thought, with fork/join programs generating huge numbers of tasks, GC may come in handy when all the tasks are done processing and should quickly turn into garbage. However if garbage generation rates force frequent collections, this may affect the scalability of the framework, because stopping threads for collection takes time approximately proportional to the number of running threads.

No comments:

Post a Comment