Mulkiya-OCR is an experiment app, I’ve used Tesseract Open Source OCR engine to read text from images(for example mulkiya image) so that data input process can be automated and accurate as much as possible. The code is open source at github.
For those who are not familiar with word “Mulkiya”. It’s an arabic word which means vehicle registration card or vehicle ownership certificate.
- Install tesseract-ocr open source OCR engine
git clone firstname.lastname@example.org:minhajkk/mulkiya-ocr.gitclone repo from github
npm installInstall the dependencies in the local
bower instalkInstall client side bower dependencies.
npm startto run the application.
The application is built upon nodejs and angularjs frameworks, find bellow more details about stack.
Server Side Dependencies (NPM)
multerMulter is a node.js middleware for handling multipart/form-data.
expressjsWeb application framework.
node-tesseractA simple wrapper for the Tesseract OCR package for node.js
Client Side Dependencies (Bower)
ui-routrFor robust routing provider.
angular-materialFor amazing material design support.
ng-file-uploadAmazing angularjs file uploading directive additionally allows mobile devices to capture using camera.
Mulkiya-ocr is deployed on heroku @ https://mulkiya-ocr.herokuapp.com/. There’s a great step by step tutorial provided by heroku for deploying a Node.js app in minutes have a look in case you’re interested in using heroku.
Deployment was the main challenge for this application since I had to install
tesseract-ocr on server before deployment but thanks for heroku’s multiple buildpack support It was great experiment using heroku multiple buildpacks on this app.
Some useful links
- Using Multiple Buildpacks for an App
- Implementing OCR for Heroku Rails app
- Use multiple buildpacks on your app
- Manipulate image before processing for example increase/reduce brightness/contrast so text can be extracted easily.
- Develop algorithm’s to extract right information from mulkiya such as model, car, year etc etc