Integrative Discovery of Multifaceted Sequence Patterns by Frame-Relayed Search and Hybrid PSO-ANN
Sing-Wu Liou (National Yunlin University of Science and Technology, Taiwan)
Chia-Ming Wang (National Yunlin University of Science and Technology, Taiwan)
Yin-Fu Huang (National Yunlin University of Science and Technology, Taiwan)
Abstract: For de novo pattern mining in genomic sequences, the main issues are constructingpattern definition model (PDM) and mining sequence patterns (MSP). The representations of PDMs and the discovery of patterns are functionally dependent; the performances thus dependon the adopted PDMs. The popular PDMs provide only descriptive patterns; they lack multifaceted considerations. Many of existing MSP methods are tied up with the exclusively devisedPDMs, and the specialized and sophisticated models make the mined results hard to be reused. In this research, an integrative pattern mining system is proposed, which consists of a computation-oriented PDM (CO-PDM) and general-purpose MSP (GP-MSP) methods. The CO-PDM defines four computational concerns (CCs) as facets of MSP: expression (E), location (L), range (R)and weight (W), which are integrated into a frame-relayed pattern model (FRPM). The GP-MSP develops a frame-relayed search strategy to resolve the ELR-CCs firstly, with the aids of critical-parameter automating (CPA) procedure; and then the W-CC is determined by hybridizing particle swarm optimization (PSO) and artificial neural network (ANN). The proposed FRPM andGP-MSP had been implemented and applied to 22,448 human introns; from the results, all the well-known patterns were recovered and some new ones were also discovered. Furthermore, theeffectiveness of identified patterns were verified by a two-layered k-nearest neighbor (k-NN) classifier; the average precision and recall are 0.88 and 0.92, respectively. By the case study, theintegrative PDM-MSP system is believed to be effective and reliable; it is optimistic the proposed CO-PDM and GP-MSP are both widely applicable and reusable for mining sequence patterns inthe eukaryotic protein-coding genes.
Keywords: computation-oriented pattern definition model, computational concerns, frame-relayed pattern mode, multifaceted sequence patterns, pattern mining
Categories: I.2.4, I.2.6, I.5.2, J.3