One possible alternative solution is to chop the image into smaller images (with... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

piker on July 11, 2017 | parent | context | favorite | on: Using Tesseract OCR with Python

One possible alternative solution is to chop the image into smaller images (with something like ImageMagick) based on each value's likely location in the document, then OCR those. You get a confidence interval with tesseract, so you can iterate over possible templates (or shrink/expand crops) until you get an [edit: aggregate] interval you're comfortable with.

kumartanmay on July 13, 2017 [–]

Thanks for the suggestion. Will try and share the results here

kumartanmay on July 13, 2017 | [–]

Except for the size of cheque and the position of magnetic characters, none of the text on cheques is standardised in India. Hence we might stand a chance of chopping characters

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact