Can inference be done without boxes and transcripts using PICK-pytorch ???? #105

pushpalatha1405 · 2021-09-30T12:44:36Z

Hi wenwenyu,

i prepared my custom dataset according to the PICK-pytorch form and trained using the models used in PICK-pytorch .The below is the train score for 100 epochs around 69% and also test score around mEF 0.7150 using below test.py script(from PICK_pytorch)
'''
python test.py --checkpoint /datadrive/PICK-pytorch/saved/models/PICK_Default/test_0924_145754/checkpoint-epoch100.pth --boxes_transcripts /datadrive/PICK-pytorch/predictions/boxes_and_transcripts --images_path /datadrive/PICK-pytorch/predictions/images --output_folder /datadrive/PICK-pytorch/output_pred --batch_size 1 --gpu 0
'''

Now my question is when iam building end to end inference pipeline then i need to provide only the image and checkpoint-epoch100.pth file ,then i must get the corresponding entities extractions inform of json /txt file and bounding box coordinates.

But why i need to provide again box_and_transcripts annotation during inference?

Is there any way where i can use PICK-pytorch for inferencing by automatically give the checkpoint file and image path and get the predictions in form of text and images with bounding box.

pls let me know if any solutions exist i want to use PICK-pytorch model( after this much progress where i have trained tested on my custom dataset) in our product, but the barrier is passing box_and transcripts to test.py.

Hoping for the reply at the earliest

regards,
Pushpalatha M

ziodos · 2021-09-30T12:53:28Z

the model accepts both image and bounding boxes and corresponding transcripts as input, you can't only rely on image itself.

pushpalatha1405 · 2021-10-01T07:36:57Z

Hi ziodos,

Thanks for replying.

I partially agree with your input because if a user need to select the new field which is not been trained using PICK-pytorch model might require its box_transcripts to be passed to test.py.

But still how to get box predictions from the trained PICK-pytorch model. Please let me know is there any way i can include in the script and obtain the predicted bbox after training the model.

Hoping for your reply at the earliest

pushpalatha1405 · 2021-10-04T05:06:04Z

hi ziodos,

i did not get your input for the question i asked above.Please let me know if any possibility exists.even i will code if possibility exists.

Pushpa.

mrtranducdung · 2021-11-10T09:43:12Z

Hi authors,
I have similar question. if we have already had the boxes and transcripts why we need to run the PICK model? Because all the necessary information is in boxes and transcripts (box, text, class).
So, could you please explain if there is any way to run predict with only input image?

tengerye · 2021-11-10T10:19:21Z

For OCR (include detection and recognition):
image -> boxes and transcripts;

For layout analysis (e.g., PICK):
image, boxes, and transcripts -> labels for each boxes.

mrtranducdung · 2021-11-10T10:46:27Z

Hi tengerye
I got it, thank you for your explanation. i thought we need image, boxes , transcripts and class(such as company, address...) for testing the PICK model. If the classes are not required, it is ok now.
Thank you very much.

pushpalatha1405 · 2021-11-10T10:48:00Z

hi mrtranducdung,

PICK model does not inference automatically by giving just an image as input. along with the image corresponding bbox, text transcripts annotations must be provided, during prediction. No need to give class label it predicts the class label automatically through model.

The only possibility would be to apply the OCR techniques if need to predict with only input image.
i have implemented the inference by giving just the input image.

pls refer to the articles below for implementing auto inference. In below notebook refer the inference code section.
layoutlm_preprocess.txt
layoutlm_inference.txt

1)https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/LayoutLM/Fine_tuning_LayoutLMForTokenClassification_on_FUNSD.ipynb#scrollTo=vm3sGnBsL64o
OR
if u cannot access the notebook............. search for Fine_tuning_LayoutLMForTokenClassification_on_FUNSD.ipynb.

Also iam sending the code which i implemented , but followed most of the logic from above notebook to perform automatic inference using layoutLM transformer model. so similarly u can modify the logic for PICK transformer model.

regards,
Pushpalatha M

mrtranducdung · 2021-11-10T11:03:32Z

Hi pushpalatha1405,
Thank you very much for your reply. I though we need class label for testing the PICK model. If only box and text annotation are required, i can create them using my OCR model.
Thank you very much.

pushpalatha1405 · 2021-11-11T06:34:36Z

hi mrtranducdung,

Can you give bit more details on ocr model u r using to get (bbox , transcripts) at prediction stage, if its ok for you to share the details.

regards,
Pushpalatha M

mrtranducdung · 2021-11-11T07:11:11Z

Hi pushpalatha,
You can you pytesseract to get the boundingbox and transcripts --> Then you re-arrange the boxes and the transcripts according to the test example of PICK.
here are some pytesseract command you may need:

pytesseract.image_to_data(Image.open('your image')) --> return: boxes, confidences, line
pytesseract.image_to_string('your image') --> return transcript
pytesseract.image_to_boxes(Image.open('your image')) --> return: box

pushpalatha1405 · 2021-11-11T07:16:19Z

Got it! Thanks very much mrtranducdung..

regards,
Pushpalatha M

pushpalatha1405 · 2022-02-03T05:55:03Z

hi mrtranducdung,

iam revisiting this issue, where u use tesseractocr model to extract bbox and transcripts and then convert to PICK annoformat ,which can be used for model auto inference...

my questions include the following:

a)complexity of document like what if the document is very complex structure like utility bills? does really PICK can predict fields appropriately if above logic is used...because if annotation is created in some form for these n utility bills (word wise or sentense wise due to the document complex structure and huge number) ,do u still comment saying the above solution applying ocr model, accessing box & transcripts,convert to pick anno form,perform auto model inefernce.how well it goes?

pls Share your experience on this ,because i need your input as iam building a fledge robust auto inference pipeline using pytesseract or easyocr model...but i end up in delimma due to the complex document structure.

if any other solution exist to build auto inference using PICK...pls can u share.

awaiting for your reply.

regards,
Pushpalatha M

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can inference be done without boxes and transcripts using PICK-pytorch ???? #105

Can inference be done without boxes and transcripts using PICK-pytorch ???? #105

pushpalatha1405 commented Sep 30, 2021

ziodos commented Sep 30, 2021

pushpalatha1405 commented Oct 1, 2021

pushpalatha1405 commented Oct 4, 2021

mrtranducdung commented Nov 10, 2021

tengerye commented Nov 10, 2021

mrtranducdung commented Nov 10, 2021

pushpalatha1405 commented Nov 10, 2021

mrtranducdung commented Nov 10, 2021

pushpalatha1405 commented Nov 11, 2021

mrtranducdung commented Nov 11, 2021

pushpalatha1405 commented Nov 11, 2021

pushpalatha1405 commented Feb 3, 2022

Can inference be done without boxes and transcripts using PICK-pytorch ???? #105

Can inference be done without boxes and transcripts using PICK-pytorch ???? #105

Comments

pushpalatha1405 commented Sep 30, 2021

ziodos commented Sep 30, 2021

pushpalatha1405 commented Oct 1, 2021

pushpalatha1405 commented Oct 4, 2021

mrtranducdung commented Nov 10, 2021

tengerye commented Nov 10, 2021

mrtranducdung commented Nov 10, 2021

pushpalatha1405 commented Nov 10, 2021

mrtranducdung commented Nov 10, 2021

pushpalatha1405 commented Nov 11, 2021

mrtranducdung commented Nov 11, 2021

pushpalatha1405 commented Nov 11, 2021

pushpalatha1405 commented Feb 3, 2022