Wednesday, October 21, 2009

On Dense Liniar Algebra, Graph Algorithms, Monte Carlo

I must admit that before reading about OPL I was not at all familiar with computation patterns in general. After doing a bit of internet research I came to understand the following about computational patterns. They are not exactly design patterns. OPL’s structural design patterns for instance, define the structure of a system, without indicating what is actually computed. Computational patterns (dwarfs) however, embody the generalized idiom for parallelism used to implement these structural patterns. The analogy is with civil engineering, where structural patterns would describe a factory’s physical structure and general workflow while computational patterns describe the factory’s machinery, flow of resources, and work-products. As with OPL, structural and computational patterns can be combined in the “pattern language” to provide a template for architecting arbitrarily complex parallel software systems.

I expect that the Graph Algorithm Pattern must have been a straightforward read for everybody. Once the graph concepts are clear, most of the details of the first three steps recommend by the pattern are clear: recognizing the problem, determining the data structure, defining the temporary data structures to store traversal variables. Very similar challenges are to be faced with every sequential graph problem. However, I found particularly interesting the graph related aspects of parallelism scattered throughout the four steps. Since for many graph structures there is a limited number of parallelization opportunities, I think that this information is very important. For instance, based on knowledge graph theory, we can choose to eliminate some edges and scope the traversal operation into independent localized partitions such that the partitions can be solved in parallel. Each type of graph is presented in the paper with its specific parallelization opportunities.

Dense Linear Algebra Pattern also seems to be well written. I especially appreciated the figures in this pattern, which support the good understanding of the pattern applied to matrix multiplication. In the Monte Carlo description, one can also get a very good visual understanding of the pattern.

One difference I noticed in the Dense Linear Algebra Pattern compared with Graph Algorithm and Monte Carlo is the way in which the Solution section is structured. In comparison with the latter two patterns, which offer a formal guideline as a sequence of steps in which a particular problem should be approached, Dense Linear Algebra Pattern’s solution suggests the usage of linear algebra libraries or BLAS routines and templates. Finally, if no such library is available or the problem is too specialized, the pattern goes over details on how to structure computation in order to optimize utilization of the memory hierarchy. These are all valuable and interesting details however I am still missing a formal sequence of steps to tell me how to approach this type of problems. For instance, I would have expected something like: 1) recognize linear algebra operations to solve 2) figure out if linear algebra library is available and offers this particular operation 3) figure out whether structuring data to pass as input to library does not lead to potentially suboptimal algorithms 4) if that is the case implement yourself optimized operation using domain specific language.

On Armstrong Thesis Ch 4

A side effect is any computation or operation by a procedure that is not the primary purpose of that procedure. Obviously sharing data between concurrent processes represents a context favoring side-effects, because the view on the shared resource in each process can easily be compromised by another’s action on the same resource. And the problem is not breaking preconditions and invariants, because this can happen in sequential code as well; it is the fact that, in concurrent code, they can be broken even in the middle of performing a certain function, after the invariants or preconditions have been verified by the concurrent code. Consequently, our process can easily run into side effects. After checking preconditions and invariants sequential code is guaranteed to perform the function until the end. Of course, sequential code is also concerned with the manipulation of hardware, which also involves side effects. In the Erlang sequential subset there are a few operations with side-effects, but they are virtually never needed.

Making processes and concurrency part of the Erlang programming language as opposed to relying on the host operating system has a number of advantages. Concurrent programs run identically on different OSs, all issues of synchronization and interaction between processes are the same irrespective of the properties of the host operating system. Also, the separation from OS allows Erlang processes to be implemented as light weight processes without the overhead associated to context switching (thus improved performance). Besides performance, this separation or very little need for an OS, offers a lot of flexibility, an Erlang system being easily ported on specialized environments as for example embedded systems.

I mainly agree with Armstrong saying that “abstracting out concurrency is one of the most powerful means available for structuring large software systems”, an idea definitely supporting the suggested dirty/clean paradigm for structuring code for highly reliable systems. However, I wonder whether Armstrong’s idea should be generalized to any types of software systems or it should only be applied to systems that aim for high reliability.

I recently read a very interesting article where the authors were trying to foresee the future of programming languages. “How high level can we get?” was the question posted by the authors. The current trends favor high level abstractions in everything, higher level languages and tools, from compiler to programming environment. However, it is to not really clear whether languages should take Erlang’s path and change fundamentally to handle distributed systems and concurrency. In fact, the authors even give a good example of a system architecture where concurrency should not be abstracted away altogether: distributed systems. In these systems, local resources (memory and objects etc) are generally fast to access, however accessing remote resources may be hundreds of times slower, or even worse, the resources may not be available at all. These differences should not be abstracted away and hidden for the programmer. An abstraction that allows us to handle different resources using the same techniques is fine, one which prevents us from knowing how our resources are allocated or created is not.

Probably a reasonable idea would be for a language to stay away from concurrency and other such restrictive abstractions or support them only through add-on libraries to facilitate what is a new programming paradigm. Obviously Erlang took the opposite path and built concurrency support into the core language. Would this be among the reasons Erlang not being mainstream?

Tuesday, October 20, 2009

Event based, Implicit invocation & Map-Reduce

Map-Reduce

Interestingly enough the idea of the MapReduce pattern is based on the “map” and “reduce” combinators from a functional language like Lisp. In Lisp, a map takes as input a function and a sequence of values. It then applies the function to each value in the sequence. A reduce combines all the elements of a sequence using a binary operation. For example, it can use "+" to add up all the elements in the sequence. Inspired by this concepts the MapReduce algorithm developed within Google offers a mechanism for processing large amounts of raw data, for example, crawled documents or web request logs.

The pattern is mainly related to Master Worker pattern. The Master-Worker parallel structural pattern is well-known for its load balancing characteristics when there are a large number of (relatively) small requests. In the big picture the mapping phase is carried out by a structure very similar to the Master Worker. Additionally MapReduce exhibits the reduce step, which it is fundamentally a concurrent operation with communication costs that are difficult to hide. Scalable reduction algorithms as PLPP and Reduction are crucial to use in this phase. The key to the load balancing characteristics of the Master-Worker is the dynamic assignment of requests to idle workers. In this structure workers execute at their own speed, taking and processing requests when they need more work. In order to avoid the large scheduling overhead of workers, MapReduce however may employ a slightly different mapping strategy. For instance, if the cost of computing the tasks are similar across the full set of tasks, if they can be ordered from large time to small time, a static round-robin schedule may more adequate.

Event based, Implicit invocation

Implicit invocation architectures differ from explicit invocation systems in that implicit invocation system components use events to communicate with each other. Connectors in such architectures are bindings between events and component methods. Because these bindings are determined dynamically at runtime, components are loosely coupled; there is no compile time determination of which method calls will be made. Loose coupling offers software architects the great benefit of increased flexibility and maintainability: new components can be added by simply registering them as event listeners.

Loosely coupled components work together, but do not rely on each other to do their own job. The interaction policy is separate from the interacting components, providing flexibility. Components can be introduced into a system simply by registering them for events of the system, aiding greatly in reusability. Introduction of new components does not require change in other component interfaces, providing scalability as new features are added. Overall, implicit invocation eases system evolution.

Event-based, implicit invocation is an example of a well-crafted architectural style with high cohesion and loose coupling. As such, it is one of the more broadly accepted architectural styles in software engineering. Examples of implicit invocation systems abound, including virtually all modern operating systems, integrated development environments, and database management systems.

Monday, October 19, 2009

On Armstrong Thesis Ch 2

Joe Armstrong advocates an architecture of inter-process communication solely via messages with no data sharing as the way to construct fault isolating systems. He claims this architecture also facilitates parallelism, however parallelism is not here the main goal. Obviously there is a potential efficiency associated with such an architecture, since components share nothing so a number of independent concurrent processes can be implemented on a multi-processor or run on a distributed network of processors. However parallelism is not a feasible approach when the application cannot easily be partitioned into a number of truly independent tasks. Employing parallelism enforces a new set of design constraints on the application, besides constraints for fault tolerance (e.g. no strong data dependencies between tasks).

Concurrent processes with no data sharing providing a strong measure of fault isolation, is the main design decision of the architecture. In this respect, the message passing abstraction hides underlying state changes that may be used in the implementation of sending messages. This is to limit the consequences of a software error. An important example of an architecture using a similar idea for enforcing fault isolation is the Microkernel architecture. The main goal of the microkernels is to make operating systems more reliable and more secure. With this in mind microkernels break the typical monolithic kernel architecture into a combination of minimal kernel running the privileged code and several servers running the not trusted code. The communication between the kernel and the server components is done through message passing.

A characteristic feature of microkernels is minimalism. Minimalism requires that OS’s trusted computing base (TCB) should be kept minimal. As the kernel (the code that executes in the privileged mode of the hardware) is always part of the TCB, minimizing it is natural in a security-driven and fault tolerant design. I mostly embraced Armstrong philosophy of designing a fault tolerant system. I guess it is a good approach of dealing with the problem of fault isolation while keeping a reasonable behavior. Looking at the microkernel approach I feel that the principle of minimalism should be also taken into account when designing the hierarchy of tasks. I agree that the most complex tasks should be organized on the first level that should be dealt with first. But how about the granularity of each level and applying the principle of minimalism of Microkernels in between layers? Employing such a design philosophy would mean breaking otherwise complex tasks from upper levels into a set of subtasks that can be downgraded to inferior levels in the task hierarchy. Consequently, the task hierarchy tree will tend to grow deeper. The task decomposition should of course keep the same invariants and behavior of the application. Designing a tool to visualize the task hierarchy would offer a very good illustration of this principle. Applications with deep hierarchical task trees will likely be more reliable than the rest.

Erlang makes no assumptions about reliable message passing. The programmer must write his or her application so that it works in the presence of unreliable message passing. I guess Armostron’s point is that since there is no guarantee for a mechanism not to fail, he advocates moving the responsibility of dealing with the reliability of passing messages between components, from the language or framework, into the components themselves and make it part of the philosophy of designing reliable components with respect to any type of interaction with the environment.

Thursday, October 15, 2009

On OPL Patterns

The first versions of the Pipes & Filters and Layered Systems patterns seem to have been written in the GOF form, which I personally prefer because it is a very structured form, breaking up the pattern into many headings: Intent, Motivation, Applicability, Structure, Participants, Collaborations, Consequences, Implementation, Sample Code, Known Uses, and Related Patterns. The GOF pattern are quite large, but if the user does not assume familiarity with the pattern the amount of information is very useful.

The new Pipes & Filters and Layered Systems patterns are written in the Compact form, which, in my opinion, for people closer to the development process, may not be enough. I understand that the intent is that longer pattern languages will benefit from this form because it is easier to read quickly, the structure of the overall language can be understood easier. However, since there is no doubt that both forms offer you at least a section with the crux of the problem (e.g. Intent/Motivation in GOF and Problem/Solution in Compact form), it feels like this short form supports mainly pattern theoreticians, than those who use the patterns in practice. And we all know that nothing is more useful than a code example. I cannot help but remember again a quote from Christopher Alexander: “People who study design methods without also practicing design are almost always frustrated designers who have no sap in them, who have lost, or never had, the urge to shape things”.

Iterative Refinement Pattern helps one exploit the concurrency implied by computational structures consisting of a sequence of high level steps which repeat until some exit condition is met, with each step, comprising a number of mostly independent computations. The Bulk Synchronous Processing (BSP) model, proposed by Leslie Valiant in 1990, is a decomposition explicit, mapping implicit model with communication being implied by the location of the processes and synchronization taking place across the whole program. BSP is designed specifically to support the Iterative Refinement Pattern.

A BSP machine consists of an arbitrary number of processors, each with local memory, connected by an interconnection network, providing functionality as data delivery and barrier synchronization of the processors. A BSP program is divided into supersteps, with each superstep consisting of a local computation for each individual processor, a global message transmission from each processor to any subset of the others and a barrier synchronization. At the end of the superstep, the transmitted messages become available as local data for the next superstep. Because communication all happens together at the end of the computation phase of the superstep, it is possible to perform automatic optimization of the communications pattern.

Coming to the question of what was hardest to understand about the Iterative Refinement Pattern, I feel I should mention again the minimalistic form in which the pattern was written. I think that understanding the solution to the iterative refinement pattern, and how the four parts (computation initialization, the sequence of computational steps, the collection of tasks that execute inside a step, the exit condition) work, an implementation example would have helped a lot.

Wednesday, October 14, 2009

On CHESS

“Heisenbugs” are bugs resulted from unexpected interference among threads, which are typically extremely difficult to reproduce and eliminate. This type of bugs may show up in systems that otherwise have been running for months and their cause may be as trivial as for example adding some extra debug statements. CHESS is a tool able to reproduce such bugs, which has been integrated into many code bases inside Microsoft.

The core design principle in CHESS is that, when attached to a concurrent program, it takes complete control over the scheduling of threads and asynchronous events, thereby capturing all the interleaving nondeterminism in the program. There are two main benefits resulting from such a design. First, in case of executions resulting in errors, CHESS is able to reproduce the erroneous thread interleaving. And secondly, CHESS offers methods for systematic exploration of possible thread interleavings. Consequently it can force every run of the program to follow a different thread interleaving, thus increasing the chances for finding errors in existing tests. Another consequence is that CHESS can find in simple configurations bugs that would otherwise be exhibited by more complex configurations, thus rendering useless artificial system stress tests. The tradeoff for having this design is that CHESS has to deal with the complexity of the concurrency APIs; it should understand the precise signatures of those threading and synchronization functions.

CHESS’s primary goals are capturing all the nondeterministic choices during the execution of the program and exposing them to a search engine that systematically enumerate the range of all possible executions of the concurrent program. This is the basis of reproducing and replay of a chosen concurrent execution. The challenge here is not only to integrate CHESS with complex APIs but also to make sure that it does not introduce extra behavior that is not related to the concurrent program.

The most important component of CHESS, the scheduler, is implemented by redirecting calls to concurrency primitives to alternate implementations from a wrapper library. Including an implementation of these primitives as part of the program is a design choice that may simplify the schedule which now needs to understand only simpler system primitives that the lock implementation uses. However the tradeoff with this approach is that it does not allow the scheduler to expose all the nondeterminism in the lock implementation.

Another design decision meant to simplify the implementation of the scheduler is to abstract the execution of the concurrent program using Lamport’s happens-before graph, which is a graph that captures the relative execution order of threads in a concurrent program. This approach offers a common framework for reasoning about all synchronization primitives used in the program which abstracts away the timing of instructions in the execution. In order to build the tree, CHESS determines whether a task may be disabled by executing a potentially blocking API call, then it labels each call to API by a triplet of task, synchronization variable and operation, and finally informs the scheduler about the creation and termination of a task.

One CHESS innovation is the way it deals with data races: Instead of using a dynamic data-race detection tool, CHESS schedules the outcome of data-races indirectly by enforcing single threaded execution. Consequently since no two threads can concurrently access memory locations, all data-races occur in the order in which CHESS schedule the threads. This may slow down the execution a bit but it can be compensated by running multiple CHESS instances in parallel, each exploring a different part of the interleaving state space.

Monday, October 12, 2009

On Reentrancer

Reentrancer is an Eclipse based refactoring tool that deals with reentrancy related problems in Java programs. The main refactoring steps are as follows:

- Handling library globals. Since removing access to Library globals is kind of hard to achieve, Reentrancer only reports possibly problematic accesses to global library state and leaves it to the programmer to handle them.

- Static fields are replaced by getter and setter methods and every access to these fields are replaced by calls to these accessor methods. This step is mainly designed to support the lazy initialization step. Since the transformation does not preserve static initializer semantics in all cases, Reentrancer have formulated preconditions that ensure that the initializers in an input program are safe for refactoring. Reentrancer's analyses for checking the refactoring preconditions are implemented based on the WALA framework, which is primarily used to compute call graphs from static initializers to discover their transitive callees.

- Reentracer then replaces static initializers with explicit lazy initialization. Static initializer methods are an impediment to reentrancy because they are only executed once, upon the first use of the declaring class. With the help of thread local state, the lazy initialization step will be executed once per thread. Since lazy initialization may alter the point at which initialization code executes thus potentially altering program behavior, Reentrancer performs a static analysis to reveal such cases and issues warnings to the programmer if needed.

- In the next step, global state is replaced with thread-local state with the effect that each thread will get its own copy of this state. This is accomplished by wrapping each appropriate static field in a java.lang.ThreadLocal object and using the methods ThreadLocal.get() and ThreadLocal.set() to read/write the value of the wrapped object.

- The final transformation creates a fresh thread for each execution of the application, thus ensuring that each execution observes gets a different copy of the static fields (now made thread local) and the lazy initialization is executed once per thread.

Reentrancer has been evaluated by observing its behavior on several single-threaded, non-reentrant benchmarks. For each individual benchmark, the reentrancy problems have been first confirmed to exist, either by observing failures when running existing tests in parallel or by writing new tests that exposed problems. After refactoring and checking that the preconditions are still met, the refactored version was run in parallel on a dual-core processor, which yielded performance benefits.

In the recent years, significant advances have been made in the area of automated tool support for refactoring. However, Reentrancer is the first enabling semi-automatic refactoring support for making existing Java programs reentrant.

On BA Chapter 14 (Reading the classics)

I guess the main idea of the chapter is more like a conclusion of the all BA book. Read the classics, get inspired from the good old beautiful architectures, but try to stay practical. This is because the difficult thing about theoretical ideas that also have some power is that they also need to be practical.

Underlying the complexity of beautiful architectures, one can usually discover some simple and elegant principles of functional organization and functional order. Discovering these principles takes intellectual engagement, crucial part of the experience and pleasure of architecture. Interestingly enough some of these principles may contradict each other in different contexts; however this should not diminish their importance to one’s baggage of experience.

An interesting example given by the author is how different classics look at inheritance. Formally, OOP principles would advocate making commonality explicit by using inheritance (“is-a”) whereas the GoF favors object composition (“has”) over class inheritance. One can still remember what a great impact had frameworks as MFC on Windows programming. Being able to just subclass classes supplied by the framework and avoid all the complexity of Win32 programming was a great achievement. At the same by carefully analyzing the “is-a” inheritance pattern, one can notice that it is really hard to come up with class hierarchies that really fit this pattern. Very often we notice methods in base classes that do not necessarily make sense in the derived class. This goes beyond the scope of public inheritance and it is usually a sign of bad design.

Smalltalk, with its innovative interface (when programming was done in a monochrome text terminal), has become a classic in itself: a pure object oriented language. Today’s popular object oriented languages, with their primitive types, are not purely object oriented. In Smalltalk, it is possible to create, modify or delete classes during runtime. This is basically the concept of metaprogramming and it is mainly possible because everything is an object. In Smalltalk we also have the notion of latent typing, which is the only typing mechanism available. In fact, latent typing is at the basis of generic programming via templates in C++. With latent typing, types are defined implicitly by what they do and by their interfaces. This offers another alternative to polymorphism because any type can be used anywhere it offers methods fitting the context. One can easily see the relationship to templates.

As classic famous buildings influence young architects, Smalltalk has left his mark on the computer science community. Adamant but beautiful, innovative in its programming model but not compromising, rich in programming concepts that were later embraced by many of the modern programming languages, Smalltalk is a classic that set examples in many ways and has been imitated by more practical approaches. As with building architecture, software architecture is a chaotic adventure at the end of which, beauty and usefulness should balance in harmony. And the recipe for this is reading the classic and practicing and experiencing on your own. Christopher Alexander, the architect father of design patterns, wisely noticed that “people who study design methods without also practicing design are almost always frustrated designers who have no sap in them, who have lost, or never had, the urge to shape things”.

Thursday, October 8, 2009

On ReLooper

Retrofitting parallelism into existing sequential applications is likely to be successful only for certain classes of problems such as for example the so-called embarrassingly parallel problems. These are problems where either computations are large, data sets are large, or some combination of the two, and where the processing that goes on is easily divisible. Re-architecting a complete system for parallelism may take significant amounts of time and efforts. When possible, refactoring for parallelism by using a refactoring tool is a good choice.

The paper introduces ReLooper, an Eclipse based refactoring tool for Java that reduces the burden of analyzing and rewriting parallel loops, and is fast enough to be used interactively. There are two ways in which ReLooper helps Java programmers who want to parallelize their programs by using ParallelArray: it helps them discover when parallelizing a loop is unsafe and it performs the messy conversion of the loop, selecting a good operator from the 132 that come with ParallelArray.

ReLooper relies on Java’s ParallelArray. There are many libraries that target concurrency and parallelism (e.g., Java's ParallelArray, Microsoft's TPL, Intel's TBB, OpenMP, MPI, etc.). The disadvantage of using such a library directly is that the parallel constructs provided by libraries are in general more verbose than parallel constructs provided by programming languages, thus they require many code changes. This is where a refactoring tool comes in handy. In addition, these libraries assume that all parallel computations do not interfere with each other, so they run without any synchronization. It is the programmer’s responsibility to verify non-interference; and this is why ReLooper is interactive, with minimal user input.

Automatic loop parallelization has been a long time topic in the Fortran community, with compilers having various degrees of success. Much of this work is done in the context of numerical computation on scalar arrays and does not deal with the problems posed by sharing heap-allocated array elements, which are common to object oriented languages.

Automatic parallelizing compilers perform full program analysis to find loops that can be safely and automatically parallelized. In contrast ReLooper attempts to parallelize loops explicitly chosen by the programmer, employing a demand driven analysis to signal problems that the human may have missed. The aim here is to quickly and correctly identify dependencies for most cases occurring in practice.

Some of the transformations presented in the paper make the code harder to understand (e.g., replacing a loop with a parallel operation). ReLooper infers the parallel operations based on the kind of array accesses in the loop. If a loop contains accesses triggering more than one operation, ReLooper chooses the most specific operation. This is both simpler to understand, and is faster at runtime.

Wednesday, October 7, 2009

On BA Chapter 13 (OO vs Functional)

Roughly, the differences between functional and objected oriented programming can be summed up as follows: In object oriented programming everything is treated as an object that has state and behavior. Among others this offers modularity and information hiding. In functional programming one is using a set of functions each of which performing a task. Selectively executing these function results in the solution to the problem at hand. While one of OO's first principles is modularity, functional programming is also achieving modularity, by means of systematic use of stateless functions, high level functions (e.g. combinators), lists and other recursively defined types, as well as lazy evaluation. However this is only fine-grain modularization. Functional programming seems to have no contribution to large-grain modularity, which pertains to software architecture.

The author thinks that the stateless nature of functional programming does not seem to affect in any way software architecture. While I agree with this thought, I also feel that this mostly applies to the development view of the 4+1 views. In the process and physical view however, being stateless may become an important feature. This is because purely functional programs have no side effects, which makes them trivially parallelizable. Nevertheless, by looking at how a _good_ functional program is written (e.g. as a set of modules that are not dependent on each other), in the development view it almost feels like it is designed with object oriented principles in mind.

Proponents of functional programming believe that having functions as first-class citizens lets you find more opportunities for abstraction, which means your code is smaller, tighter, more reusable, and more scalable. But having functions as first class citizens can be achieved in OO as well. However, since in the OO context, the only first-class citizens are, at runtime, objects, corresponding in the static structure to classes, a new construction has been designed: the notion of agents. An agent is an object representing a feature of a certain class, ready to be called. Following a parallel with functional languages, agents are high level routine objects; that is the equivalent of combinators (high level functions).

Reusability and dynamic binding are some aspects where OO excels. Depending on the problem, owe to commonalities captured by inheritance, the number of feature definitions may be significantly smaller in OO as if one were to implement the same problem in a functional language. By collecting the features applicable to many variants into correct abstract data types and grouping them toward the top of the inheritance hierarchy, one can benefit from effective code reduction. There seems to be no equivalent to inheritance in functional programming, where it is required to define the variant of every operation for every combinator, repeating any common ones. Dynamic binding is also essential with OO, because it removes the need for client classes to perform multibranch discriminations to perform operations. Dynamic binding solves what traditionally used to be a major source for architecture degradation and obsolescence.

After reading the chapter, my thoughts converge with those of the author’s, that OOD in his modern form, subsumes the functional approach, retaining its benefits while providing more alternatives for extension and reuse.

Monday, October 5, 2009

On BA Chapter 12 (KDE arch)

KDE is working proof that the Open Source Bazaar style is a successful software development model that can yield great technologies comparable and often even superior to some of the most complex commercial software. Now, in its 4.0 version, KDE has learned to deal with all the demands of a modern platform: pervasive concurrency, increasingly distributed processing, easier to use, cleaner and more beautiful interfaces, meaningful responsiveness, clean interactions with the software, high reliability, stability and data safety.

The social and coordinative aspects of the context in which KDE is being developed are even more interesting. In this environment achieving good communication and getting favorable consensus are imperative; these are usually hampered by distance, cultural barriers, one’s own preferences and prejudices. For larger and successful open source projects as KDE its structure is very important, as it needs to hold trademarks, receive donations organize conferences etc. In fact, finding a formal structure, to solve problems KDE faced once it became significant and at the same time it did not hinder the further development of the project, contributed most significantly to the long-term stability of the project community.

The Akonadi framework, one of the so called pillars of the KDE platform aims to provide access to the user’s personal information, the associated metadata, and the relationships between these data, as well as the services operating on them. This information is aggregated from a variety of sources, such as email and groupware servers, web and grid services, local apps that feed into it. Additionally Akonadi caches this information and provides access to it. Most of the important design decisions of the project have been taken over a series of meetings beginning with the meeting that brought the initial fundamental idea, which was the need for a robust, reliable, transactional storage layer, clearly separated from the interface and allowing concurrent multi-client access. In its current version Akonadi-PIM offers a type independent architecture, with a simple integration interface that allows for new types to be easily added by the user. This is achieved by the so-called serializer plug-ins, which are runtime libraries capable of converting back and forth from a type into a binary representation stored in the Akonadi server. From a concurrency point of view, an application generally resides in a different address space and can open one or more connections to the server, each represented internally by a thread. The storage layers rely on a solution inspired by the maildir standard, which allows lock-free ACID access by relying on atomic rename operations on the file system.

Another core library of KDE 4 is ThreadWeaver, a job scheduling library for concurrency, with its main purpose to manage and arbitrate resource usage in multi-threaded environments. By using job sequences and job collections registered in a global application job queue, ThreadWeaver assigns worker threads to the job queue thus decoupling it from the main application thread. Also, instead of relying on the direct implementation of queuing behavior, ThreadWeaver uses queue policies: one based on dependencies between resources and another based on resource restrictions. Priorities can also be used to influence the order of execution as well. Priorities can be changed without changing the job implementation by means of Decorators.

Refactoring Sequential Java Code for Concurrency

Concurrencer is a refactoring tool, which helps programmers refactor sequential java code with the help of j.u.c (java.util.concurrent). Compared with find-and-replace refactoring transformations, its transformations are more complex as they require program analysis and may spawn multiple, non-adjacent, program statements.

The motivation behind implementing such a tool is based on the fact that there are many cases where retrofitting concurrency is easier than rewriting it. Doing this manually is tedious as it may require changing many lines of code; it is error-prone because one can easily choose the wrong refactoring API when multiple of them are available; and at the same time omission-prone as it is always hard to manually choose the most efficient API alternative when more of them are available.

There are three types of behavior preserving refactorings supported in Concurrencer: converting int to AtomicInteger, converting HashMap to ConcurrentHashMap and the most laborious, converting recursion to ForkJoinTask. The first two transformations enable a sequential program to become thread safe, a thread safe program become more scalable, while the third makes a sequential program run concurrently, with obvious performance improvements. When refactoring, it is the programmer’s responsibility to identify all the shared data and a target refactoring. It is then Concurrencer’s responsibility to analyze all the accesses to shared data and apply the transformations in the best possible way.

When refactoring to AtomicInteger and ConcurrentHashMap there are interesting aspects regarding the initialization and field access of the affected variables, however what is worth mentioning is how Concurrencer is dealing with existing synchronization primitives. Typically if the original code contains synchronized accesses around the fields, Concurrencer removes them since this becomes superfluous after the transformation as the new types have built-in thread safety.

The third and the most interesting transformation, relies on the support of the ForkJoinTask framework available in Java 7. Divide-and-conquer programs are the best candidates for this framework offering support for fine grained parallelism in computationally intensive applications. Using the user supplied threshold, which defines the size of the problem that can be solved sequentially, along with the divide-and-conquer method, Concurrencer creates the task java classes by subclassing from RecursiveAction class and encapsulating the parallel computation of the original recursive method, changes the base of the recursion, replaces recursive calls with task instantiations, then executes the parallel tasks and combines the results of the subtasks.

While covering most common refactoring scenarios, Concurrencer is not complete. However even though the approach is neither sound nor complete, it is still useful as it saves programmer’s time overall. Moreover, it is very easy to extend these transformation patterns to cover more refactoring scenarios.