Concurrency bugs, such as atomicity violations and deadlocks, are some of the most difficult bugs to detect and diagnose due to their non-deterministic nature. While on uniprocessor machines solutions based on logic time have been used in order to deal with the nondeterminism caused by the timing variations, on multi-processors (SMPs and multi-cores), besides thread scheduling, signals and asynchronous events, timing variations exert another major influence on thread interleaving. Traditional solutions in the area of multi-processors suggest deterministic replay and try to record almost every inter-thread interaction such as synchronization and shared memory communication, in addition to those data necessary for uni-processor replay. Because hardware modification always comes with a high cost of fabrication and increased hardware complexity software solutions are generally preferred.
Next to CHESS, PRES is another novel software-only approach that was proposed for exposing concurrency bugs. However, while both navigate a non-deterministic execution space, CHESS tries to find unexplored interleavings to test, while PRES uses execution sketches as guidelines and feedback from failed replays to get closer to reproducing a target concurrency bug. In order to lower production-run recording overhead, PRES relaxes the traditional (perhaps idealistic) objective of “reproducing the bug on the first replay attempt”. It achieves that by recording only partial execution information during the production run. In a subsequent step, using an intelligent partial-information based replayer that systematically explores the unrecorded nondeterministic space via multiple coordinated replay, PRES attempts to reconstruct the complete information necessary for bug reproduction. After a bug is reproduced successfully once after several replay attempts, PRES can then reproduce it with 100% probability on every subsequent replay for diagnostic purposes.
PRES leverages three types of information: inter-thread global order from the production run, intra-thread local information from the production run, and feedback from previous unsuccessful replays. In order to record this information PRES uses five sketching (recording) methods that trade reproduction probability for recording overhead at different levels, and two basic schemes (Base and RW) that represent two extremes of the tradeoff for comparison. PI-Replayer, the central component of the architecture, conducts a repeated and automated process, i.e., replaying, generating feedback, using feedback, and replaying again. After an unsuccessful replay run, the system first identifies all dynamic race instruction pairs in the previous unsuccessful replay using an happens-before race detection algorithm and filters out all ordered-determined production run races. Next, from the suspect set, PI-Replayer follows certain policies to flip the execution order of one pair in the next replay and executes next replays deterministically until the suspect racing pair is reached (a deterministic execution is feasible because every non-deterministic event in the previous replay was recorded). The above steps may be repeated many times until the bug is reproduced. The mistakes made by early replays will be corrected, and PI-Replay will gradually get closer to bug reproduction. Since there are a limited number of execution paths and therefore a finite number of races, repeating the above process will eventually find a correct interleaving and successfully reproduce the bug.
While PRES’s overheads are significantly lower than those of previous methods, they may still be considered high for some performance-critical applications, so further optimizations may be useful. Nevertheless, PRES seems to be of great help in reproducing concurrency bugs on multi-processors systems.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment