Structured Parallel Computation in Structured Documents
D. B. Skillicorn (Queen's University, Kingston, Canada)
Abstract: Document archives contain large amounts of data to which sophisticated queries are applied. The size of archives and the complexity of evaluating queries makes the use of parallelism attractive. The use of semantically-based markup such as SGML makes it possible to represent documents and document archives as data types. We present a theory of trees and tree homomorphisms, modelling structured text archives and operations on them, from which it can be seen that: many apparently unrelated tree operations are homomorphisms, homomorphisms can be described in a simple parameterised way that gives standard sequential and parallel implementations for them, and certain special classes of homomorphisms have parallel implementations of practical importance. In particular, we develop an algorithm for path expression search, a novel powerful query facility for structured text, taking time logarithmic in the text size. This algorithm is the first example of a new algorithm discovered using homomorphic skeletons over data types.
Keywords: categorical data type, parallel algorithms, query evaluation, software development methodology, structured text
Categories: D.1.3, D.2.2, D.3.3, H.2.3, H.3.3, I.7.2