Wednesday, September 30, 2009

On A Java Fork/Join Framework

Fork/Join is a simple design technique acting as the parallel version of the well known divide and conquer algorithm. As the name suggests the algorithm is based on two basic operations, fork - by which the current task starts a new parallel task, and join, which forces the current task not to proceed until the current subtask has completed. These two methods confer the algorithm its recursive nature, with tasks repeatedly splitting subtasks until they are small enough to solve using simple sequential methods.

While at a first glance, any framework (e.g. pthreads, java threads) supporting threads creation and ways to make them wait on their completion would seem a good candidate for implementing the Fork/Join framework, standard thread frameworks are in general to heavy to support the Fork/Join programs. One reason for that is that the synchronization requirements for fork/join tasks are more limited than those for regular threads, so the overhead associated with tracking blocked general purpose threads is wasted. In fact, the significance of the thread overhead is increasing with the granularity of the tasks. Consequently, there should be a good balance between task granularity and thread overhead that should maximize the effect of parallelism. Cilk, one of the first frameworks to tackle these problems, implements fork/join support on top of an operating system’s basic thread or process mechanisms. FJTask framework offers a variant of the design used in Cilk, but with a Java implementation, benefiting from its portability. The core design principle of these frameworks is to map the task to threads as an operating system maps threads to CPUs, but exploit the simplicity, regularity, and constraints of fork/join programs in performing the mapping.

FJTask architecture is simple: a pool of standard threads, as many as CPUs on the system is allocated, with each worker processing tasks held in an internal deque. The JVM and OS should be trusted to allow the mapping of tasks to different CPUs. Fork/join tasks (FJTasks) act as lightweight executable classes, by implementing the Runnable interface and its run method. The scheduling mechanic, inspired from Cilk, is the core of FJTasks framework. Each worker thread maintains runnable tasks in its internal deque, supporting push, pop, take operations. Subtasks generated in tasks run by a given worker thread are pushed onto that worker's own deque. Workers process their own deques in LIFO, or can steal tasks from others workers (using a FIFO), when they are out of tasks. When idle, workers enter a special priority adjustment sequence with attempts to get new tasks, which may eventually end up with the worker blocking until another task is invoked from top level. The mechanism is great because it reduces contention by having stealers operate on the opposite side of the deque as owners. Also, with this scheme programs adopting small task granularities for base actions are likely to run faster than those that only use coarse grained partitioning.

A potential drawback of having such a framework implemented is Java is related to garbage collection. At the first thought, with fork/join programs generating huge numbers of tasks, GC may come in handy when all the tasks are done processing and should quickly turn into garbage. However if garbage generation rates force frequent collections, this may affect the scalability of the framework, because stopping threads for collection takes time approximately proportional to the number of running threads.

On BA Chapter 11 (Emacs Architecture)

With its development beginning in the mid-70s and continuing actively even today, “Emacs text editors are most popular with technically proficient computer users and computer programmers. The most popular version of Emacs is GNU Emacs, a part of the GNU project, which is commonly referred to simply as Emacs.”

Structured as a Model-View-Controller architecture for interactive applications, with the Model, the underlying representation of the data, the View presenting the data to the user and the Controller taking care of the user’s interaction with the View, Emacs delivers a very influential architecture. And this comes from Emacs’s most striking feature, extensibility.

Applications like Eclipse, Firefox and other architectures are extensible through user extensions. In a similar manner, if the user wants to customize Emacs to meet his own needs, he has to write his own customization code in Lisp. Emacs Lisp, the flavor of Lisp implemented by Emacs, is key to Emacs’s ability to accommodate a wide range of new functionality. This is mainly because Emacs Lisp acts as an important abstraction boundary, which hides away the complexity of the Lisp interpreter and of the underlying processor architecture. Through the contributions of the users (now isolated from details), Emacs has grown more into some sort of platform rather than a unified whole, a platform comprising of a multitude of Lisp packages. In fact, the concept of Emacs is more than that of an editor. Same even say that Emacs was many years ahead of his time.

Another interesting aspect of Emacs is the Emacs compiler, capable of translating Emacs Lisp source files into a special representation known as bytecode. Compared to source files, bytecode files load faster, occupy less space on the disk, use less memory when loaded, and run faster. From this description, the analogy with the Java, .NET and other interpreted languages seems to be clear. One can almost claim that Java is an extension of Emacs as a platform concept: it has a byte compiled, garbage collected language, a display library and network functionality, etc.

The dissociation of the Lisp code from handling events from an event loop because of the automatic display management is another great important feature which speaks again for the simplicity of integrating new features into Emacs. The same automatic display management we notice with JavaScript, which only modifies the DOM tree representing the web page, while the browser takes care of updating the display when needed.

Even from only the perspective of designing an editor, analyzing the Emacs architecture, one may notice again design decisions that make Emacs unique. For instance, in order to manipulate text, the Model is a buffer, which is a flat string, where newline characters mark line endings. This is way simpler than the approach of most other editors where the text is represented as an object, a data structure, a tree etc. This representation matches perfectly with the Emacs Lisp primitive operations on buffers that can insert, delete, extract portions of buffers as strings etc. With its UI design, with frames, windows, with its ability to manipulate commands and output of commands inside the editor, with its command line, Emacs, as an editor, has again a different fresh approach, which makes its proponents swear nothing else comes even close.

Sunday, September 27, 2009

On OPL

The paper on OPL offers a great survey of patterns restricted to describing the design of parallel software; it is basically concerned with software architecture and ways to design and implement parallel algorithms. Its target audience is the application programmer and not compiler writers or OS or parallel libraries developers. OPL is also not specific to any specific application domain.

OPL is structured as stacked layered system that defines five categories: architectural patterns - describing overall organizations of a parallel system and how the computing elements interact, computational patterns - describing the core classes of computations that make up the application, parallel algorithm strategy patterns - covering the methods to exploit concurrency in a parallel application, implementation strategy patterns – parallel program organization and common data structures specific to parallel programming and concurrent execution patterns.

While being familiar with many of the patterns I also found some that I was not that familiar with. Therefore, I appreciate the initiative of the authors to define the OPL layers and list the patterns and I also look forward for their next steps where they promised to follow up with pattern descriptions and careful review.

From the implementation strategy patterns the Master-worker/Task-queue is an interesting one. Structurally, the pattern is represented as a Master, maintaining a task queue and controlling a group of processing elements or workers. Usually, only one master and several identical worker components simultaneously exist and process during the execution time.

In this pattern, the same operation is simultaneously applied in effect to different pieces of data. Operations in each worker component are independent of operations in other components. The structure of the solution involves a central Master that distributes data among workers by request. Parallelism is introduced by having multiple data sets processed at the same time.

The tasks or the data pieces may have different sizes. This means that the independent computations of each task should adapt to the data size to be processed, in order to obtain automatic load-balancing. Also, the coordination of the independent computations has to take up a limited amount of time in order not to impede performance of the processing elements. The solution has to scale over the number of workers. Changes in the number of workers should be reflected by the execution time. Improvement in performance is achieved when execution time decreases.

On BA Chapter 10 (Jikes RVM)

The core principle of Jikes RVM is the meta-circularity, or the property of being a self-hosting runtime. While this is an important principle for compilers, many runtime environments are not written in the language in which they typically run, which may have certain limitations. For instance, for a Java runtime written in C or C++, a bug in the memory safety may have serious consequences, even though the Java application itself does have memory safety. Having a self-hosting environment also allows the runtime to easily take advantages of better libraries and abstractions. Very often, the application and the runtime need to communicate with each other. Implementing this functionality is considerably more complex, when the runtime and the application running on it are written in different languages. Meta-circularity also helps the developers to gain from the features they introduce by relying on a system where application, runtime and compiler have a consistent view of the system.

Jikes RVM does not include an interpreter. All the bytecode must first be translated into native machine code. An initial, basic and non optimized compilation is performed by the baseline compiler, which relies on lazy compilation. With lazy compilation, methods are compiled first time when they are invoked by the program. In the later execution stages, the adaptive optimization system starts detecting program hot spots and selectively recompiles them with the Jikes RVM’s optimizing compiler. Selective optimization is the key to enabling the deployment of sophisticated optimizing compilers as dynamic compilers. Since Jikes RVM is written in Java, the implementation of the adaptive optimization system has inherent benefits such as threads and monitors to structure the code. Consequently, compiler tasks are being carried out by separated threaded components that run concurrently in the java thread safe environment. Also, the easy to understand Java collection libraries confer simplicity to each component, hiding away details of the underlying data structure management.

Another benefit of Jikes RVM that comes from using Java is the clean threading interface between the language and the operating system threads, which allows Jikes RVM to have different underlying threading models in order to adapt to new programmer behaviors. For garbage collection Jikes RVM relies on Memory Management Toolkit (MMTk), which provides a powerful and popular set of precise garbage collectors. As MMTK is also written in Java, it can be directly linked into the code being compiled for efficiency. Thus, during the initial creation of object representation, MMTk naturally comes into play providing Jikes RVM with iterators for process references, object allocation and barrier implementations. Obviously, with Java’s threading model, all garbage collectors are parallel and integrate with the runtime model.

Jikes RVM provides not only a very performant virtual machine but also a very promising research platform. The extensibility coming from its meta-circular nature provides a great platform for multilanguage virtual machine research, an extension that would also allow aspects of Jikes RVM to be written in different programming languages. Providing aspect oriented programming within the virtual machine, making the entire virtual machine into an OS to remove barriers to runtime optimization are only a few other interesting extension that can be built into Jikes RVM.

Thursday, September 24, 2009

On BA Chapter 9 (JPC Emulator)

JPC is an x86 emulator written in pure java. Its greatest feature is abstracting away the underlying hardware and operating system. In other words x86 code is converted to Java byte code, which in turn is interpreted by the processor specific JVM. And since there is no native code in it, JPC can emulate all the standard components of an x86 PC while remaining entirely inside the browser. While this architecture provides great flexibility and means to isolate the software behind two independent verified security layers (Java Applet sandbox and JVM), designing the architecture was a rather difficult task.

Disregarding JPC’s biggest challenge, emulation speed (and how the JVM architects tackled java performance limitations to achieve speed), and also the obvious benefits of JPC that come from running virtual hardware in isolation, I think that JPC as a project has a great vision. I personally liked Bochs and Qemu for their support for debugging kernel code. For instance, Bochs can set breakpoints in any kind of software (even if it is compiled without debugging info!), and provides an additional "debugging out port" you can easily access from within your kernel code to print debug messages. Qemu also can be configured to listen for a "gdb connection" before it starts executing any code to debug it. And those who spent sleepless nights trying to debug a kernel oops would greatly appreciate these features.

However, I think JVM’s vision is far beyond just being another emulator. Host platform and OS independency, the ability to run a virtual machine over the Web are great achievements. One can have its hard drive reside on his own server on the Internet, and access it from anywhere in the world by loading a local JVM and pointing it to the server. Furthermore, the core emulation task can be carried out on a remote server. While having the screen output and user input pushed via the Internet to the virtual machine owner we can imagine a model where there is no one to one mapping from user to virtual machine. We can model this as N users using M virtual machines. With this concept, if a machine is idle, any one of the users can use it, remotely launching a JPC image to work on their personal disk image data. This mostly fits users who use larger batch farms to run massively parallel tasks, such as rendering frames, optimizations, etc.

With cloud computing becoming more popular, JPC designers wonder why not use the millions of idle desktop computers worldwide instead and save the financial and environmental costs of using a datacentre. Old issues associated with cloud computing are easily overcome by JPC’s architecture. Gaining the trust of potential computing power donors would no longer be an issue as JPC provides a very secure approach. The cost associated with downloading, installing, maintaining foreign software is always a significant problem for the donor and/or system admin, but this is again no issue because all these tasks will be confined to the boundaries of the virtual machine. Last but not least, the available hardware and operating system provided by the donors will most likely be heterogeneous, but this again is not issue for JVM.

However, the road to creating an emulator hardware and OS independent, without compromising the requirement for maximum performance, was hard. In order to harness the complexity of the IA-32 architecture, the JPC designers had to come up with a clear and modular design and at the same time exploit all the little performance tips associated with the java language. I particularly liked a phrase where the authors catch the real measure of designing and implementing an emulator as opposed to designing hypervisors. During the design of JPC, they felt like they became “as schizophrenic as the codebase was”. And that was because ultimately, not having access to the memory or processor systems, called for different design decisions, that aimed for code clarity and modular design. While working within the processor and memory system, the design usually aims for ultimate performance, so breaking modularity or isolation between layers is often a good design choice.

On Adaptive Object-Model Architecture

The Adaptive Object-Model Architectural (AOM) Style provides an alternative to usual object-oriented design. While the traditional OOD generates classes with attributes and methods, for every business entity, AOM does not treat business entities as first class objects. Instead AOM represents classes, attributes, relationships and behavior as metadata. Every domain change that would otherwise, in OOD, require recompilation, can be performed by the actual user that can change the metadata. Furthermore, the metadata is interpreted at runtime, which means, if a business rule changed it is immediately reflected in the running application. Consequently the model is adaptable.

Transforming an OOD class hierarchy into an AOM one only makes sense when the behavior between subclasses that would represent business entities is very similar or can be broken out into separate objects. This usually reduces the number of classes in the object model and creates a class structure that does not change. Changing the spec of an AOM application usually means changing the content of the database where metadata is saved. So we can safely say that AOM, when applicable, reduces time-to-market by allowing users to experiment and provide feedback.

The adaptability of the system is implemented by means of a few design patterns as TypeObject, Properties, Composite and Strategy. Strategy and RuleObjects are usually used to specify naturally complex rules, while combinations, of rules (e.g. predicates and sets) are modeled through the Composite pattern. Metadata is usually read and interpreted first time when the object model is instantiated and also at runtime when the business rules should be applied. The easiest way to implement the metadata persistence model is by means of object oriented databases. However it is also possible to use a relational database model and even XML.

While very benefic for systems constantly changing, or for those that want to enable their users to dynamically configure and extend their system, the Adaptive Object Model has also disadvantages. Among these, the most important is the complexity of implementing such a system. Beside the several design patterns involved, the system should also provide new tools and GUIs for defining the objects in the system. Further complexity is added from having to implement the model interpreter and by the fact that two object systems coexist: the AOM model that is interpreted and the interpreter itself written in an object oriented programming language. Finally, since the tendency for an AOM is to lead to a domain specific language, all the problems associated with developing a language, such as providing debuggers, version control, etc., there will be extra burden for the AOM designer. On the performance side, as with every interpreter there are certain performance issues associated with AOMs.

Nevertheless, when applied correctly, AOMs are very interesting design models. While developers writing them have split opinions (with the ones understanding it, praising it, and the others claiming that it is too complex), the architects developing AOMs are usually very proud of them.

Monday, September 21, 2009

On BA Chapter 8 (Tandem Architecture)

Tadem is a fault-tolerant computer system, marketed to the transaction processing customers, using ATMs, banks, stock exchanges, etc. Guardian, the OS running on Tandem machines of the NonStop series, was designed in parallel with the hardware, in order to provide fault tolerance with minimal overhead costs.

In many ways, Tandem/16 was a revolutionary system, however with an unexpectedly low impact on the industry and design of modern machines. This is most probably because Tandem/16 was very different from most systems and it was developed in a purely commercial development.

A key design principle of the Tandem’s fault-tolerant architecture was modularity, both hardware and software being decomposed into modules, acting as units of failure, diagnosis, repair and growth. Modularity was very important for Tandem as a fault tolerant system, because individual modules had to be replaceable online. Furthermore, the isolation that comes with modularity decreases the chances that the failure of one module affects the operation of another. In Tandem, the process model and the messaging system are the two important mechanisms used in implementing fault isolation.

Furthermore, each module is designed based on Fail Fast principle. By implementing a mechanism of self checking, each module is designed to either work properly or stop, first time when it detects a fault. This is imperative for guaranteeing data integrity in the event of a failure.

Another important design principle for Tandem as a fault tolerant system was Single Failure: when a hardware or software module fails, its functionality is immediately taken over by another one, given a mean time to repair measured in milliseconds. For instance, for a CPU, there is always a second CPU, ready to assume duty in case the first one fails. The same goes for a running processes, that always run in process pairs, a primary and a backup process.

Tandem is also designed to support online maintenance. Hardware and software can be diagnosed and repaired while the rest of the system continues to deliver services to the user. Hardware components, data and programs can be reintegrated into the system without interrupting the service.

The general feeling is that Tandem was a revolutionary machine, but it was the small things that got in Tandem’s way of imposing its visions. Naming issues and certain incompatibilities, as for instance the interprocess communication unusual concept, are just a few of the issues that prevented Tandem from being broadly accepted. In the nineties, factors as computer hardware becoming generally more reliable and significantly much faster accelerated the decline of the Tandem architecture.

On The BIG BALL OF MUD

A BIG BALL OF MUD is a casually, structured system, whose organization or lack of organization, is rather dictated by expediency than design. Its success and popularity speak of the BIG BALL OF MUD, as architecture in its own right. However, the big questions still remain: why these kinds of systems are architecturally undistinguished and still so popular, what makes good programmers build ugly systems, and what can we do to improve them?

The root cause of BIG BALLS OF MUD is complex. Factors as the time constraints, the cost of investing in the architecture of a new domain (whose benefits are initially hard to estimate), the experience and skill of the designer, the inherent complexity of the application domain and the scalability issues associated with design decisions can all contribute to the appearance of the BIG BALLS OF MUD. The very nature of software architecture as hypothesis about the future, that holds that subsequent change will be confined to that part of the design space encompassed by that architecture, seems to give a philosophical explanation for the existence of BIG BALLS OF MUD architecture.

From systems emerging from quick-and-dirty code (THROWAWAY CODE), that was intended to be used only once and then discarded to those with well-defined architectures, the architectures are all prone to structural erosion. With time, those clean architectures may become overgrown as PIECEMEAL GROWTH gradually allows elements of the system to sprawl in an uncontrolled fashion. And the only way to deal with entropy in software is to refactor it. A sustained commitment to refactoring can keep a system from subsiding into a BIG BALL OF MUD. It is also important to KEEP IT WORKING, from the point of designing a change, through implementation, testing and maintenance. By taking small design steps in any direction, we can make sure that it is never more than a few steps back to a working system. Daily builds or keeping around the last working version are successful maintenance practices. The importance of testing in keeping a working system is emphasized by both the traditional Waterfall approach as well as newer techniques as Extreme Programming. The dynamics of a growing architecture are very complex. The system itself and all of its components evolve at different rates, with the general tendency for the components that change faster to become distinct from those employing slow changes. The SHEARING LAYERS form between the components and identifying these layers, understanding component interactions and grouping components based on how similar their change rates are, help balancing adaptability and stability, forces that are usually in constant tension. Often times, when facing the mess of the BIG BALL OF MUD, the architect should choose between SWEEPING IT UNDER THE RUG and RECONSTRUCTION. However, distilling meaningful abstractions from a BIG BALL OF MUD is a difficult and demanding task, requiring skill, insight, and persistence. At times, RECONSTRUCTION may seem like the less painful course.

In the end the authors note that there are good reasons that good programmers build BIG BALLS OF MUD and accept that expedient programming is, in fact, a state-of-the-art strategy. While they agree that, casual architecture is natural during the early stages of a system’s evolution, the authors also hope that at least there are ways that we can do better. The key to finding those ways is learning about the domain and the architectural opportunities looming within it, as the system grows and matures.

Wednesday, September 16, 2009

On the Layers Pattern

Layers is a common architectural system fit for large systems that structures applications so that they can be decomposed into groups of subtasks such that each group of subtasks is a particular level of abstraction. It is common sense in software architecture that implementing an application following a layered model has more advantages than the monolithic approach. A direct advantage is logical segmentation, facilitating team development, incremental coding and testing.

It is essential that within a layer all components work at the same level of abstraction. The main structural property of the pattern is that the services in layer J are only used by layer J+1 and there are no further dependencies between layers. In other words, each individual layer shields all lower layers from direct access from the higher ones.

Finding the right decomposition is not trivial. Defining the layers can be done in bottom-up fashion, following a ‘yo-yo’ approach or else by refining the structure based on a sequence of steps, which involves defining the abstraction criterion for grouping tasks into layers (usually the distance from the platform – the first layer), determining the number of layers, layer naming and task assignment to each layer and finally specification of services between layers. Strategies involving refining seem natural choices as it is generally impossible to define an abstraction criterion right before defining the layers and their services. Defining the components and services first and later forcing the layer architecture based on usage relationships is also problematic since the pattern does not capture an inherent ordering principle. This means that new components, usually added to the architecture as part of system maintenance, may easily break the strict layering principle.

Once the decomposition is defined, specifying the interface to each layer and the communication between them, structuring the layers, decoupling adjacent layers and last but the not least, designing an error handling strategy are actions that are equally important for designing a good layered architecture. Decoupling layers is usually a nice exercise of design. A top-down, one-way coupling can be achieved by fixing the interface and the semantics of the previous layer. While this allows for top-down communication, for bottom-up communication one may use callbacks. In OOD, one can decouple the lower layer from the upper layer, and even have the upper layers change implementation of lower ones at runtime by means of base classes. This principle is at the basis of Layering Through Inheritance.

I once read about an interesting application of this pattern where the layers are actually used to isolate unrelated concepts that are part of the same application. Having multiple unrelated concepts is common in large systems. The usual tendency to tie these concepts closely together may lead to applications that are hard to implement, change and even understand. Writing a layer that isolates all the concepts from one another solves the issues above with the tradeoff of having a possibly very complex layer, which represents a single point of failure.

Tuesday, September 15, 2009

On BA Chapter 7 (Xen and the Beauty of ...)

After reading the chapter on Xen and realizing that I have to comment on it in the context of an Architecture class as opposed to the typical OS class where things like performance are the most analyzed concerns, I could not help comparing a few Xen architectural decisions with similar architectures, among which the most notable being KVM. KVM (for Kernel-based Virtual Machine) is a full virtualization solution for Linux on x86 hardware containing virtualization extensions (Intel VT or AMD-V).

Xen is open source, decision which most definitely benefited Xen because of the huge amount of software that it has been able to harness, from the Linux kernel to QEMU emulator. KVM architects made a similar decision by leveraging on the Linux kernel and QEMU, however they do not modify the kernel code base and provide a patched version of the kernel as Xen does. They integrate the hypervisor directly into the kernel as a loadable kernel module, thus leaving the kernel structures untouched; the integration is done via the well defined interface for kernel loadable modules. The separation of concerns and simplicity of KVM come from its clear design: a device driver for managing the virtualization hardware and a user space QEMU based component for emulating PC hardware. It is exactly the separation of concerns that prevented Xen from finding its way into the mainline Linux kernel. The kernel component of KVM is included in mainline Linux, as of 2.6.20.

Another Xen design decision was to provide VM management via Xen domain0. In contrast, a KVM virtual machine is simply a Linux process. All of the standard tools apply: one can destroy, pause, and resume a virtual machine with the kill command and even Ctrl-C, or use top to view VM resource usage. This level of reusability confers KVM great flexibility in managing VMs. Also, adhering to the Linux standard interface, already well understood by many users, helps KVM users learn VM management easily.

Last but not least, by adding virtualization capabilities to a standard Linux kernel, KVM can benefit all the fine tuning work that has gone and is going into the kernel. While Xen had to deal with frontend and backend driver model, with driver domains, KVM simply had all this support in the Linux kernel itself. Adding support for a new driver in the Linux kernel would automatically mean having that driver available to the KVM supervisor.

Regarding paravirtualization, strongly promoted by Xen as _the_ technique to improve the performance of the guest operating system, it has also got support in KVM. However, as a design decision it is very hard to be judged. With its main drawback, guest modification, and with the user preference to run unmodified guests, it is hard to foresee the future of paravirtualization. The only certain fact is that the success of paravirtualization will always begin where the concrete capabilities of the hardware assisted virtualization will end. And hardware will always evolve.

Monday, September 14, 2009

On Pipes and Filters

Pipes and Filters is an architectural pattern that structures a system processing a stream of data as a series of components (filters, pipes, etc.), which allows for a greater flexibility in building families of related systems. With this pattern, the task is divided into several sequential steps, with the output of a step being input to a subsequent step. Each individual step is implemented by a filter component that consumes and delivers data incrementally for low latency and parallel processing. Pipes implement the dataflow between adjacent processing steps. Processing pipelines are sequences of filters connected via pipes. The input and output of a system are provided by a data source and a data sink respectively, which are also connected through pipes to the processing pipeline. A filter may enrich, refine or transform its input data. With passive filters the input is pushed in by the previous filter or the output data is pulled by the subsequent one. Active filters pull their input, process and push their output down the pipeline.

There are a few benefits associated with the Pipes and Filters architecture. One can investigate intermediate data flowing down the pipeline while still preserving incremental and parallel computation of results by using a T-junction in the pipeline. This approach eliminates the need for using intermediate files to analyze the intermediate data. Exchanging a filter component is very straightforward, even if direct calls between filters are being used instead of separate pipes that synchronize adjacent active filters. Filter recombination is the major benefit. One can rearrange, remove or add new filters in order to create new processing pipelines. For instance, an entire processing pipeline can substitute a single filter in another processing pipeline. By implementing active filters and providing end-user support for the construction of pipelines in the filter hosting platform, one can achieve a great deal of flexibility and reusability. A benefit directly derived from recombination and reusability is rapid prototyping for developers, who are able to implement the rough functionality of the system based on a pipeline architecture then optimize it incrementally. Furthermore, when each filter in the pipeline consumes and produces data incrementally, it is possible to achieve parallel processing by starting active filters in parallel on a multiprocessor system or network. However this pays off only when the cost of the computation carried out by a single filter is higher than the cost of transferring data between filters. Trying to benefit from parallelism in a network environment or in a single processor machine where context switching between threads or processes is usually expensive might not be a good idea.

Obviously there are liabilities associated with the pattern. However, when the pattern is applied to the right problem space, they do not overshadow the beauty of it. For instance, applying the pattern to systems where the processing steps need to share a large amount of global state, it is inefficient to push this data down the pipeline. The flexibility associated with using a single format for input/output often results in conversion overheads. Also, it is not recommended to use the pattern for mission-critical application because efficient error handling is very hard to implement in this architecture.

Sunday, September 13, 2009

On BA Chapter Six (Facebook Arch)

In the recent years, with the rise of the Web, data centric architectures and information technologies have become important subjects in architecture design. New products are motivated by the data users are interested in, while designers are mainly focused on building the “logic” and “display” tiers of the application. A successful example of such an application is Facebook, which manipulates and displays social data as biographical information, relationship mappings, user media, etc.

However, what makes Facebook an interesting architecture is the fact that the importance of data is recognized beyond the boundaries of the platform. With this mind, Facebook architects created a concept that improved upon the isolation of a typical n-tier stack by integrating external systems in the form of applications. External apps integration is supported by a suite of web service (Facebook API), a query language (FQL), a data-driver markup language (FBML) and some other custom artifacts.

There are mainly two integration scenarios that are worth mentioning. The first one is adding Facebook’s powerful social context to external applications, thus eliminating the need for these otherwise separate applications to implement their own social network. With this scenario, web apps, desktop OS apps and other alternative device apps can add social context by connecting to a Facebook externally available web service, which implements the Platform API. A beautiful feature of the web service design is the use of metadata (specified in Thrift – an open source cross-language) to encapsulate the types and signatures describing the API. Among the main benefits of using metadata are automatic binding, automatic documentation and cross-language synchronization, the service being able to be consumed externally by XML and JSON clients and internally by PHP, Java, C++, etc. Regarding user security and privacy, some of the original concerns of the Facebook architecture, they are also preserved in this scenario by implementing a simple authentication scheme where the client of the web service sends a “session_key” along with every request.

The second scenario is integrating external applications into Facebook, the social site itself. This way applications benefit from the full power of the social platform by getting more exposure, thus increasing their critical mass of users. When integrating apps into Facebook, the apps themselves are framed as web services providing FBML content to Facebook for display. FBML is a data-driven markup language designed by Facebook that allows developers to provide logic and display from the application stack and additionally add requests for platform protected data. As a web service client, Facebook then renders FBML entirely in its trusted server environment.

The platform architecture is in fact more complex. Besides FBML and Thrift, Facebook is providing more custom artifacts designed to ease the integration of external apps. FBJS, platform’s java script emulation suite, the design of the platform cookies and FQL, a simple query language wrapper around Facebook’s internal data are only a few examples reflecting the reach capabilities of Facebook.

Thursday, September 10, 2009

On BA Chapter Five (ROA architectures)

C. Brancusi: "Architecture is inhabited sculpture; we are forced to endure the choices that we make for quite some time".

In this fifth chapter, the authors not only describe another “beautiful” architecture, the resource-oriented architecture (ROA) and its close friend the Web, but also advocate a change in the way we should look at and design software architectures, that should provide us with more benefits in the long term.

With this new approach, the accent is put on designing architectures by focusing on information as a first-class citizen, as opposed to getting tied into specific bindings usually enforced by popular technologies (e.g. J2EE, .NET, SOAP), an approach that makes changes almost impossible without breaking existing clients. Their belief is that by connecting systems following the information-centric model, the user can only gain in efficiency, simplicity and flexibility, features that made the Web one of the most popular architectures.

In resource-oriented style, clients issue logical request for named resources, which are then resolved and transferred back to the requestor in one form or another by a resource oriented engine. Resolving the named request may be an action as executing a database query or in the case of RESTful web services, a bit of functionality in charge of managing information.

Resilience to changes is one of the notable features of the Web that is also inherited by ROA. Existing clients are not affected by changes performed on the resource oriented engine that are meant to accommodate new clients. This behavior is achieved by having resources retain their identity (e.g. URLs) and by picking the physical representation of the resource within the context of resolving a request. One other important concern addressed by ROA is obviously security. While many existing SOA restrict access to services based on identity or role, they rarely restrict access to data flowing through these services. In contrast to this, by making a clear distinction between identifying resources and resolving them for different contexts, ROA enforces security directly on the data. Information is passed across application boundaries as references and each context provides enough information for the ROA engine to decide whether or not to produce the information for a particular client. Last but not least, since it leverages on Web/REST, ROA is also achieving separation of concerns with all the abstractions describing the interaction with the clients, by isolating the resources, the actions performed by the clients on the resources and the representation of the resources when returned to the clients.

One last word on REST vs. SOAP: even though they are often compared when both perceived as means for manipulating information, they are essentially different. While SOAP is only a technology for invoking behavior, REST is all about managing information, not necessarily by invoking arbitrary behavior through URLs. Feeling that the CRUD-pattern implemented in Web based architectures by the four HTTP verbs is not enough, is a clear indication the architecture is focused on managing behavior rather than information.

Wednesday, September 9, 2009

On BA Chapter 3 (Darkstar architecture)

RJ: "Project Darkstar was supposed to show how to build scalable game servers. However, the architecture doesn't seem very closely tied to games. In what way is the architecture aimed at games?"

Nevertheless the Darkstar environment offers an environment and structures supporting a massive multiplayer model where load scaling, load balancing, fault tolerance and minimizing latency are major concerns.

As far as game interaction with the clients goes, Darkstar offers a mechanism based on a set of servers, each holding copies of the game logic and of its behavior represented by the associated services. These copies can each process different events, the coordination between them being taken care of by the Darkstar environment. Also, the communication among players in and out of the game, crucial for MMOGs, and between internal tasks is supported by communication channels (implementing game validation techniques), which are part of the Darkstar infrastructure.

Furthermore, there are similarities with the traditional game engines in a way that Darkstar is also concerned with improving object localization during the game. However, in contrast with traditional engines, which are mostly using geographical decomposition (decomposing the world into areas that are assigned to servers during game development), Darkstar uses a novel approach where co-location is based on runtime information and can be adjusted based on the current playing pattern.

Compared with the traditional approach where the game engine keeps a significant game state in the memory in order to reduce latency, Darkstar offers again an efficient automatic persistence mechanism based on storing the entire game state in the data store and improved data caching and coherence. This mechanism allow for fail-over and hiding server errors as well as eliminating game rollbacks. Additionally the interaction with the environment is transactional, which helps eliminating some bugs typically associated with the traditional game engines (e.g. duplication, etc.)

Sunday, September 6, 2009

On the 4+1 Architectural View Model

RJ: "Suppose you wanted to document the architecture of some system. How would the ideas in the 4+1 views paper help you? Or would they?"

The 4+1 views are very useful for documenting a system from the architectural point of view. The 4+1 model offers a way of looking at different aspects of the system in isolation, thus easing the complexity of analyzing the system.

The first four views represent the logical, development, process and physical aspects of the architecture. Highlighting some aspect while intentionally suppressing others offers a systematic and standardized way for documenting the various layers of the system. The fifth view is represented by use cases and scenarios that might further help with documenting the other views. Furthermore, when studied and documented together, the four plus one views make sure that no important aspect of the system is overlooked.

The separation of concerns achieved by the 4+1 model makes a documentation structured around it, a great tool for software architects as well as other stakeholders of the architecture: end-users, developers, systems engineers, project managers, testers, etc. Documents created using the 4+1 view process are easily used by all members of the development team.

Last but not least, the fact that one can naturally map UML diagrams to each individual model view speaks again for the 4+1 as being a great model for structuring one's documentation. For instance: class, communication and sequence diagrams can represent the logical view, component and package diagrams - the development view, activity diagrams - the process view, the deployment diagram - the physical view and finally the use case diagrams are a great tool to represent scenarios.

Friday, September 4, 2009

On BA Chapter Two

The Messy Metropolis was not a good architecture by any metric and all its flaws went beyond the design. They find their roots into the development process and company culture. Under these conditions any new architecture would be doomed to fail. In contrast, the Design Town, even though a very similar software project to Messy Metropolis, was the result of many positive factors: from doing intentional upfront design by experienced designers, a team carefully chosen and responsible for the overall design of the software, good project management and generally making the right decisions at the appropriate time.

However, I feel that the two architectures are hardly comparable. There is no such thing as inherently good or bad architecture; it is only a matter of how fit the architecture is to its stated purpose. Having as the sole goal to “deliver yesterday” for the Messy Metropolis is like having no goal at all and Messy Metropolis should be conferred another name: Ghost Town, a town where maybe only a very experienced designer would “draw” faster in front of a management with such demands and where the good designer’s days would still be numbered.

Judging software architectures should be done only in the context of their specific goals. Furthermore, comparing architectures should be done only within the same context, when architectures have similar purposes. A legitimate example of a comparison of two architectures may be the debate on REST vs. SOAP. Both REST and SOAP have a slightly similar general purpose: devising a Web Service Architecture. REST can be considered the architectural style for the World Wide Web. A major difference between the REST point of view and that of the SOAP-based web services architecture is that REST views the Web as an information system in its own right. In contrast SOAP sees the Web as simply a transport protocol. SOAP is mainly driven by software vendors that are customer-oriented while WWW was mainly supported by non-commercial users. Accommodating legacy systems, often associated with SOAP or other customer-driven architectures, is a concern considerably competing with simplicity and flexibility. An architecture as simple, elegant and flexible as the Web could have not come out of such a relationship. REST is inherently benefiting from many of the good features of the Web architecture.

Wednesday, September 2, 2009

On BA chapter 1

Chapter one is trying to shed light on the question of what software architecture is. While being very similar to building architecture in the way that it consists of a set of structures designed to help the stakeholders see how their concerns are satisfied, software architecture is still fundamentally different through its dynamic and interactive nature, where the number of interactive components, the environment where they are running and the way the performers interact with the system play a major role.

In the context of software, because it highlights some details by abstracting away from others, the architecture is then seen as a subset of design. In other words, architecture is concerned with the relationship between components, and the externally visible properties of the system components, while design is additionally concerned with their internal structure.

Getting a solid architecture is about making the right decisions. These decisions should be made intentionally rather than just “letting the architecture emerge”. Early decisions, documented, made explicitly with the view in mind of the entire system, its stakeholders and its evolution, are key to getting a good architecture. Having the architecture reflect one set of design ideas as opposed to many good but uncoordinated and independent set of ideas confers that system conceptual integrity (Brooks 1995), which makes the architecture maintainable. There is also a bidirectional relationship between architectures with conceptual integrity and the organizational pattern in a way that good architecture can influence organization and good organization results in conceptual integrity.

As confusing as it may sound, the first concern of the architect is neither functionality or quality, but understanding the stakeholders and their concerns and prioritizing them. Funders, architects, developers, project managers, marketing, users, testers and technical support may all have something to say about the chosen architecture. Furthermore, beside functional and quality requirements, individual systems may have their own additional critical concerns (as changeability, capacity, security, etc.)

In the end the authors go over architectural structures, which are formal means of addressing each particular concern, by specifying relationships between individual components of a system. The most notable types of structures are the ones dealing with information hiding, data access, structures dealing with relationships between processes and Uses structures, dealing with relationships between programs in a system.

The overall impression from the chapter is probably the one most of us already experienced. There is no single correct architecture and no single correct answer, it is all a game of trade-offs and compromises that only make the architecture good, relative to the stakeholders involved and their own specific concerns.

Tuesday, September 1, 2009

On "A Field Guide to Boxology"

The paper offers an interesting overview and classification of the software architectural styles. The authors argue that these styles are appropriate to certain classes of problems but not suitable for all of them. The designer should then employ careful discrimination when choosing a suitable architecture.

For the first classification, Pipe-and-Filter Systems and Dataflow networks, even though the ideas are very intuitive, except for Unix pipes and filters I found it hard to recall immediate applications of the styles to specific problems.

The second classification however, Cooperative Message-Passing Processes, offered a number of styles ranging from very common ones as Client-Server architecture to styles whose application is very specific to certain application domains (e.g. distributed systems, parallel computing, etc.)

Below I will elaborate a bit on “Token passing along edges in a graph” and its application in implementing mutual exclusion in distributed systems.

Distributed processes need to coordinate their activities. When multiple processes share a resource or a collection of resources, then distributed mutual exclusion is required to prevent interference and keep the access to resources consistent. Token-passing algorithms are being used to implement distributed mutual exclusion of a shared resource.

Depending on the way the control is transferred, two types of token passing algorithms can be distinguished:

1) Central server algorithms. This algorithm simply employs a server that grants permission to enter the critical section in the form of a permission token. In order to enter the critical section processes send a token request and wait for a token reply. If no other process has the token the server replies immediately with the token granting permission to the current process, otherwise the server queues the request. On exiting the critical section the process holding the token sends it back to the server.

2) Ring-based token algorithms. With this approach the control is passed from process to process without a server being involved. The processes are arranged in a logical ring and communication channels are available between a process and his immediate neighbor in the ring. The permission token is passed from process to process in a single direction, around the ring. Processes that do not want to enter the critical section simply pass the token along. However, if a process requires access to the critical section, it retains the token. Upon exiting the critical section the process sends the token to his neighbor.