Table-form Extraction with Artefact Removal
Luiz Antônio Pereira Neves (PUCPR, Brazil)
João Marques de Carvalho (PUCPR, Brazil)
Jacques Facon (PUCPR, Brazil)
Flávio Bortolozzi (PUCPR, Brazil)
Abstract: In this paper we present a novel methodology to recognize the layout structure of handwritten filled table-forms. Recognition methodology includes locating line intersections, correcting wrong intersections produced by what we call artefacts (overlapping data, broken segments and smudges), extracting correct table-form cells and using as little previous table-form knowledge as possible. To improve layout structure recognition, a novel artefact identification and deletion method is also proposed. To evaluate the effectiveness of the methodology, a database composed of 350 handwritten filled table-form images damaged by different types of artefacts was used. Experiments show that the artefact identification method improves performance of the table-forms structure extractor that reached a success rate of 85%.
Keywords: document segmentation, handwritten data, table-form extraction, table-form recognition
Categories: I.4, I.4.6, I.7, I.7.m