An application that extract meaningful data from any type of files.
For end users.
Currently in progress to set up an environment
- Upload a file using the frontend.
 - Tesseract will extract the texts available in the file uploaded.
 
For developers.
The application has a number of dependencies. Kindly ensure you have the following installed on your machine:
- Python
 - Python packages (Complete details provided below)
 - Mongo
 - Mongodb compass(optional , alternatives available)
 - Tesseract
 - Git
 
- 
Python
 - 
Tesseract
 - 
Mongo
 - 
Compass
 - 
Git
- Install Python if it is not installed already. Add the environment variables and check version.
 
C:\Users\username> python Python 3.8.5 (tags/v3.8.5:580fbb0, Jul 20 2020, 15:43:08) [MSC v.1926 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information.
- Install Mongodb if it is not installed already.
 - Install Mongodb compass. ( Client )
 - Go to Mongo db bin folder and run the server
 
C:\Program Files\MongoDB\Server\4.4\bin> mongodIt will be available in port 27017
- Go to compass get in to the db
 
mongodb://localhost:27017
- 
Install Tesseract
 - 
Clone the repository
 
git clone https://github.com/SandeepBalachandran/Pytheract.git
- Check into the cloned repository
 
cd Pytheract- If you are using Pipenv, setup the virtual environment and start it as follows:
 
pipenv install
- Run Flask
 
set FLASK_APP=app.py set FLASK_ENV=development flask run
It will be available in port 5000
 
- Extraction texts from pdf files.
 - Extraction texts from zip files contains both images and pdf files.
 - Get webcam on UI.
 - Capture image/ extract texts from captured image.
 - Using regex locate specific contents . For eg: Email address, Phone number etc
 
Please check the Contributing Guidelines before contributing.
