Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One possible alternative solution is to chop the image into smaller images (with something like ImageMagick) based on each value's likely location in the document, then OCR those. You get a confidence interval with tesseract, so you can iterate over possible templates (or shrink/expand crops) until you get an [edit: aggregate] interval you're comfortable with.


Thanks for the suggestion. Will try and share the results here


Except for the size of cheque and the position of magnetic characters, none of the text on cheques is standardised in India. Hence we might stand a chance of chopping characters




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: