Syntax, Parsing and Production of Natural Language in a Framework of Information Compression by Multiple Alignment, Unification and Search
J. Gerard Wolff (University of Wales, UK)
This article introduces the idea that information compression by multiple alignment, unification and search (ICMAUS) provides a framework within which natural language syntax may be represented in a simple format and the parsing and production of natural language may be performed in a transparent manner.
In this context, multiple alignment has a meaning which is similar to its meaning in bioinformatics but with significant differences, while unification means a simple merging of matching patterns, a meaning which is related to but simpler than the meaning of that term in logic. The concept of search in the present context means search for alignments which are `good' in terms of information compression, using heuristic methods or arbitrary constraints (or both) to restrict the size of the search space.
These concepts are embodied in a software model, SP61. The organisation and operation of the model are described and a simple example is presented showing how the model can achieve parsing of natural language.
Notwithstanding the apparent paradox of `decompression by compression', the ICMAUS framework, without any modification, can produce a sentence by decoding a compressed code for the sentence. This is illustrated with output from the SP61 model. The article includes four other examples one of the parsing of a sentence in French and three from the domain of English auxiliary verbs. These examples show how the ICMAUS framework and the SP61 model can accommodate `context sensitive' features of syntax in a relatively simple and direct manner.
An important motivation for this research is the possibility of developing the ICMAUS framework as a unifying framework for diverse aspects of computing in addition to those described in this article. Other aspects which appear to fall within the scope of the ICMAUS framework but which are outside the scope of this article, include the representation of natural language semantics, bestmatch pattern recognition and information retrieval, deductive and probabilistic reasoning, planning and problem solving, and unsupervised inductive learning.
Keywords: MDL, MML, information compression, multiple alignment, natural language, parsing, production, syntax, unification