Subex Hackathon

An end to end Deep Learning approach for table detection and structure recognition from invoice documents

Results: 1st Place out of 150+ participants

1. Introduction

Finding Tables is an automatic table recognition method for interpretation of tabular data in document images. We present an improved deep learning-based end to end approach for solving both problems of table detection and structure recognition using a finetuned model on our invoice data which is already pretrained on TableBank. Finding Tables is a Faster RCNN High-Resolution Network that detects the regions of tables. For our structure recognition we propose an entirely novel approach leveraging the SOTA methods in NLP. We use layoutLM a BERT based model to process the text in the image and map them as question answers pairs, so that we can then transform it into json files.

2. Setting it all up

Setting up LayoutLM:

git clone -b remove_torch_save https://github.com/NielsRogge/unilm.git cd unilm/layoutlm pip install unilm/layoutlm git clone https://github.com/huggingface/transformers.git cd transformers pip install ./transformers

Code is developed under following library dependencies

Torch==1.7.0+cu101
Torchvision==0.8.1+cu101
Cuda = 10.1

To install Detectron V2

pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.7/index.html

Installing other dependencies

pip install -r requirements.txt

Please download weights for Detectron V2 and LayoutLM and keep it data folder

Detectronv2 weights: Detectron_finetuned_model_weights

LayoutLM LayoutLM weights

3. Running inference

To test custom images on our model, go inside the folder and run the command "python run_inference.py 00017.PNG (path of image file)"

4. Examples

Original Image:

Detecting Images:

Example from Structure Recognition:

If you are having troubles getting it to work, please feel free to contact me or raise an issue

Neham ([email protected]) & Tanay ([email protected])

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
imgs		imgs
models		models
output_images		output_images
pre		pre
training-procedure		training-procedure
utils		utils
00017.PNG		00017.PNG
README.md		README.md
approach.pptx		approach.pptx
convert_to_csv.py		convert_to_csv.py
requirements.txt		requirements.txt
run_inference.py		run_inference.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Subex Hackathon

Results: 1st Place out of 150+ participants

1. Introduction

2. Setting it all up

3. Running inference

4. Examples

About

Releases

Packages

Contributors 2

Languages

nehamjain10/Finding_Tables

Folders and files

Latest commit

History

Repository files navigation

Subex Hackathon

Results: 1st Place out of 150+ participants

1. Introduction

2. Setting it all up

3. Running inference

4. Examples

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages