Go home now Header Background Image
Search
Submission Procedure
share: |
 
Follow us
 
 
 
 
Volume 19 / Issue 4

available in:   PDF (164 kB) PS (295 kB)
 
get:  
Similar Docs BibTeX   Write a comment
  
get:  
Links into Future
 
DOI:   10.3217/jucs-019-04-0563

 

A Semi-Supervised Ensemble Learning Method for Finding Discriminative Motifs and its Application

Thi Nhan Le (Japan Advanced Institute of Science and Technology, Japan)

Tu Bao Ho (Japan Advanced Institute of Science and Technology, Japan)

Saori Kawasaki (Japan Advanced Institute of Science and Technology, Japan)

Tatsuo Kanda (Chiba University, Japan)

Katsuhiko Takabayashi (Chiba University, Japan)

Shuang Wu (Chiba University, Japan)

Osamu Yokosuka (Chiba University, Japan)

Abstract: Finding discriminative motifs has recently received much attention in biomedicine as such motifs allow us to characterize in distinguishing two different classes of sequences. It is common in biomedical applications that the quantity of labeled sequences is very limited while a large number of unlabeled sequences is usually available. The current methods of discriminative motif finding are powerful and effective with large labeled datasets, but they do not function well on small labeled datasets. In this paper, we present a semi-supervised ensemble method for finding discriminative motifs which is based on the SLUPC algorithm, a separate-and-conquer searching method to discover motifs of type `discriminative one occurrence per sequence'. The proposed method, named E-SLUPC (Ensemble SLUPC), uses SLUPC to search discriminative motifs from an extended labeled dataset that contains labeled data and unlabeled data with predicted labels. Strong discriminative and frequent motifs characterizing two outcome classes of hepatitis C virus treatment (sustained viral response and non-sustained viral response) were detected and analyzed. Furthermore, the experimental evaluation shows that our method can function considerably well in the common context of medical research when the labeled data is usually difficult to obtain.

Keywords: NS5A region, discriminative motif, ensemble learning, hepatitis C virus, self-training technique, separate-and-conquer search

Categories: I.5