An OCR Free Method for Word Spotting in Printed Documents: the Evaluation of Different Feature Sets
Israel Rios (Pontifical Catholic University of Parana, Brazil)
Alceu de Souza Britto Jr (Pontifical Catholic University of Parana, Brazil)
Alessandro Lameiras Koerich (Pontifical Catholic University of Parana, Brazil)
Luis Eduardo Soares Oliveira (Federal University of Parana, Brazil)
Abstract: An OCR free word spotting method is developed and evaluated under a strong experimental protocol. Different feature sets are evaluated under the same experimental conditions. In addition, a tuning process in the document segmentation step is proposed which provides a significant reduction in terms of processing time. For this purpose, a complete OCR-free method for word spotting in printed documents was implemented, and a document database containing document images and their corresponding ground truth text files was created. A strong experimental protocol based on 800 document images allows us to compare the results of the three feature sets used to represent the word image.
Keywords: document retrieval, word recognition, word spotting
Categories: I.5, I.7