Document Style Census for OCR

George Nagy; Prateek Sarkar

doi:10.1109/DIAL.2004.1263245

Proceedings. First Workshop on Document Image Analysis for Libraries

Document Style Census for OCR

Year: 2004, Pages: 134

DOI Bookmark: 10.1109/DIAL.2004.1263245

Authors

George Nagy, Rensselaer Polytechnic Institute
Prateek Sarkar, Palo Alto Research Center

Abstract

Four methods of converting paper documents to computer-readable form are compared with regard to hypothetical labor cost: keyboarding, omnifont OCR, style-specific OCR, and style-constrained or style-adaptive OCR. The best choice is determined primarily by (1) the reject rates of the various OCR systems at a given error rate, (2) the fraction of the material that must be labeled for training the system, and (3) the cost of partitioning the material according to style. For large corpora, sampling strategies are proposed both for estimating conversion costs and for taking advantage of style homogeneity.

Like what you’re reading?

Already a member?

Get this article FREE with a new membership!

A Framework for Document Specific Error Detection and Corrections in Indic OCR
2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)
An Embedded OCR Software Architecture for Enhancing Portability
Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)
Hybrid OCR Techniques for Cursive Script Languages - A Review and Applications
Computational Intelligence, Communication Systems and Networks, International Conference on
Performance of Document Image OCR Systems for Recognizing Video Texts on Embedded Platform
Computational Intelligence and Communication Networks, International Conference on
Optical Font Recognition for Multi-Font OCR and Document Processing
Proceedings. Tenth International Workshop on Database and Expert Systems Applications. DEXA 99
Document image OCR accuracy prediction via latent Dirichlet allocation
2015 13th International Conference on Document Analysis and Recognition (ICDAR)
α-Soft: An English Language OCR
2010 Second International Conference on Computer Engineering and Applications (ICCEA 2010)
A Novel Approach for Skew Estimation of Document Images in OCR System
International Conference on Computer Graphics, Imaging and Visualization (CGIV'05)
An Omnifont Open-Vocabulary OCR System for English and Arabic
IEEE Transactions on Pattern Analysis & Machine Intelligence
Improving Handwritten OCR with Augmented Text Line Images Synthesized from Online Handwriting Samples by Style-Conditioned GAN
2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR)

Document Style Census for OCR

Authors

Abstract

Related Articles