An end to end Deep Learning approach for table detection and structure recognition from invoice documents
Finding Tables is an automatic table recognition method for interpretation of tabular data in document images. We present an improved deep learning-based end to end approach for solving both problems of table detection and structure recognition using a finetuned model on our invoice data which is already pretrained on TableBank. Finding Tables is a Faster RCNN High-Resolution Network that detects the regions of tables. For our structure recognition we propose an entirely novel approach leveraging the SOTA methods in NLP. We use layoutLM a BERT based model to process the text in the image and map them as question answers pairs, so that we can then transform it into json files.
Setting up LayoutLM:
git clone -b remove_torch_save https://github.com/NielsRogge/unilm.git cd unilm/layoutlm pip install unilm/layoutlm git clone https://github.com/huggingface/transformers.git cd transformers pip install ./transformers
Code is developed under following library dependencies
Torch==1.7.0+cu101
Torchvision==0.8.1+cu101
Cuda = 10.1
To install Detectron V2
pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.7/index.html
Installing other dependencies
pip install -r requirements.txt
Please download weights for Detectron V2 and LayoutLM and keep it data folder
Detectronv2 weights: Detectron_finetuned_model_weights
LayoutLM LayoutLM weights
To test custom images on our model, go inside the folder and run the command "python run_inference.py 00017.PNG (path of image file)"
Original Image:
Detecting Images:
Example from Structure Recognition:
If you are having troubles getting it to work, please feel free to contact me or raise an issue
Neham ([email protected]) & Tanay ([email protected])