Tuesday, November 10, 2009

On Shared Queue, Speculation, Digital Circuits

Shared queue: a queue is very common data structure that can be shared among multiple units of execution in a parallel program. Its implementation should tradeoff between simple synchronization protocols and increased concurrency. The easiest approach employs a single synchronization construct, however such a protocol most likely will encompass too much of the shared queue in that single synchronization construct, thus increasing the chances that UEs will remain blocked waiting to access the queue and will limit available concurrency. Fine tuning can increase the concurrency by providing concurrent access to multiple parts of the queue however this approach tends to be more complicated and thus error prone. On machines where the memory hierarchy of the underlying hardware and the number of UEs are limited, a distributed queue can avoid contention and decrease the parallel overhead. One idea of implementation is the distributed queue used in fork/join. Here each thread from the thread pool is associated a non-blocking distributed queue. When a thread generates a new task, it is put in its own task queue. When a thread executes a task, it first tries to take one from its own queue. If it is empty, it randomly chooses another thread and tries to steal one from that threads’ queue. As each task tries to take a new task first from its own task and there is no single hot spot, it eliminates the performance degradation.

I found speculation quite an interesting pattern. I was especially familiar with hardware speculation and particularly with branch predication, which is present in almost all major desktop CPUs. With branch predication, for each branch instruction, one path is speculatively chosen based on the past behavior of the program. This can be extended to speculatively execute both branches and terminate one of them when the result of the branch predicate is available. Speculation as a parallel algorithmic strategy is similar but it is usually at a much higher granularity. The main assumption in the speculation algorithmic strategy pattern is that we can break long dependency chains by ignoring them. Creating independent tasks is then a speculative action as ignoring these dependencies may not always hold good for all inputs. In such cases, we might not be much better off than a serial algorithm, but on average we will be able to take advantage of the available hardware parallelism. Predicates are constructs that are used to keep track of what needs to be true for the decomposition to hold. Predicate validity check is a very important aspect of this pattern. A quick way of checking the predicate validity is essential. A last step of the algorithm is having a commit/recovery mechanism depending on whether the predicate was true/false respectively.

The link assigned to the third pattern reads Digital circuits, while the description attached is actually for the Circuits pattern. Even though related the two are still different patterns. Digital circuits offers solutions for implementing an hardware architecture instruction set that meets the performance and power requirements. The circuits pattern applies to problems where the output is a logical function or bitwise calculation of the input. Since most of the instruction sets are restricted to operate on a word size of data, the pattern provides a way to efficiently compute the output. Digital circuits deals with bitwise operations as well as the arithmetic operations (ADD, SUBTRACT, ALU) on bits or bit-vectors. Therefore, it is a natural fit for the circuits pattern from the view of algorithmic strategies.

No comments:

Post a Comment