zamolx3s: On Pipes and Filters

Pipes and Filters is an architectural pattern that structures a system processing a stream of data as a series of components (filters, pipes, etc.), which allows for a greater flexibility in building families of related systems. With this pattern, the task is divided into several sequential steps, with the output of a step being input to a subsequent step. Each individual step is implemented by a filter component that consumes and delivers data incrementally for low latency and parallel processing. Pipes implement the dataflow between adjacent processing steps. Processing pipelines are sequences of filters connected via pipes. The input and output of a system are provided by a data source and a data sink respectively, which are also connected through pipes to the processing pipeline. A filter may enrich, refine or transform its input data. With passive filters the input is pushed in by the previous filter or the output data is pulled by the subsequent one. Active filters pull their input, process and push their output down the pipeline.

There are a few benefits associated with the Pipes and Filters architecture. One can investigate intermediate data flowing down the pipeline while still preserving incremental and parallel computation of results by using a T-junction in the pipeline. This approach eliminates the need for using intermediate files to analyze the intermediate data. Exchanging a filter component is very straightforward, even if direct calls between filters are being used instead of separate pipes that synchronize adjacent active filters. Filter recombination is the major benefit. One can rearrange, remove or add new filters in order to create new processing pipelines. For instance, an entire processing pipeline can substitute a single filter in another processing pipeline. By implementing active filters and providing end-user support for the construction of pipelines in the filter hosting platform, one can achieve a great deal of flexibility and reusability. A benefit directly derived from recombination and reusability is rapid prototyping for developers, who are able to implement the rough functionality of the system based on a pipeline architecture then optimize it incrementally. Furthermore, when each filter in the pipeline consumes and produces data incrementally, it is possible to achieve parallel processing by starting active filters in parallel on a multiprocessor system or network. However this pays off only when the cost of the computation carried out by a single filter is higher than the cost of transferring data between filters. Trying to benefit from parallelism in a network environment or in a single processor machine where context switching between threads or processes is usually expensive might not be a good idea.

Obviously there are liabilities associated with the pattern. However, when the pattern is applied to the right problem space, they do not overshadow the beauty of it. For instance, applying the pattern to systems where the processing steps need to share a large amount of global state, it is inefficient to push this data down the pipeline. The flexibility associated with using a single format for input/output often results in conversion overheads. Also, it is not recommended to use the pattern for mission-critical application because efficient error handling is very hard to implement in this architecture.

zamolx3s

Monday, September 14, 2009

On Pipes and Filters

No comments:

Post a Comment

Followers

Blog Archive