Go home now Header Background Image
Search
Submission Procedure
share: |
 
Follow us
 
 
 
 
Volume 15 / Issue 18

available in:   PDF (364 kB) PS (822 kB)
 
get:  
Similar Docs BibTeX   Write a comment
  
get:  
Links into Future
 
DOI:   10.3217/jucs-015-18-3364

 

Automatically Deciding if a Document was Scanned or Photographed

Gabriel Pereira e Silva (ederal University of Pernambuco, Brazil)

Marcelo Thielo (HP Labs, Brazil)

Rafael Dueire Lins (Federal University of Pernambuco, Brazil)

Brenno Miro (Federal University of Pernambuco, Brazil)

Steven J. Simske (HP Labs, USA)

Abstract: Portable digital cameras are being used widely by students and professionals in different fields as a practical way to digitize documents. Tools such as PhotoDoc enable the batch processing of such documents, performing automatic border removal and perspective correction. A PhotoDoc processed document and a scanned one look very similar to the human eye if both are in true color. However, if one tries to automatically binarize a batch of documents digitized from portable cameras compared to scanners, they have different features. The knowledge of their source is fundamental for successful processing. This paper presents a classification strategy to distinguish between scanned and photographed documents. Over 16,000 documents were tested with a correct classification rate of over 99.96%.

Keywords: MPEG-7, Web-based services, XML, content-based multimedia retrieval, hypermedia systems, multimedia, semantic web

Categories: H.3.1, H.3.2, H.3.3, H.3.7, H.5.1