Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urd Language [Bad Results] [Fine Tunning] [Jupyter Notebook] #357

Open
IrtazaIjaz opened this issue Oct 17, 2023 · 1 comment
Open

urd Language [Bad Results] [Fine Tunning] [Jupyter Notebook] #357

IrtazaIjaz opened this issue Oct 17, 2023 · 1 comment

Comments

@IrtazaIjaz
Copy link

Hi everyone,

I'm working on the Urdu language to enhance the accuracy of tesseract. I have used the below code to get the output however, the result was extremely bad.

Code:

from PIL import Image
import pytesseract

# Load the image
image_path = 'image path'
image = Image.open(image_path)

# Perform OCR with the "deu" language
text = pytesseract.image_to_string(image, lang="urd")

# Print the extracted text
print(text)

Code Output:

ےل ۸ ا ہج مم لہ
ح4

۱ :

< شی : فریادی ےک سک شون یت رکا

کانزی ہے پیرن پپرپسیسک رتصو رکا

کا وکا سحخت مال بے تنما یز وھ

مکنا شا ۱لا ہے بر ےبشیا

جنر ہے تار شوقی تق دنساپاے

سے تھی سے با ہڑے زم می کا

1 بی دائمشنیدلنٹیں قد ریا بیاۓ

ما عنقاسے بے الم تق یرک
سک پوں الب !اسیری میم بی ان زیر ہا
ٹڈ ےک گنس درد ًٌے علغ مری رکا

+جویم+مجھےمسور ہج

+پ وپ پودمچھھھوم
ا جس

کے ہے ہے ا

I have attached the picture which I have used.

page_10

Now I used finetune here to train the data and get the output. I followed the steps mentioned in the Readme guide. However, the outcome of the newly trained data is also disppointed. Please see the below details:

2 Percent improvement time=122, best error was 6.777 @ 5093
At iteration 5215/8900/8900, Mean rms=0.454%, delta=0.009%, char train=4.594%, word train=22.277%, skip ratio=0%,  New best char error = 4.594 wrote checkpoint.

2 Percent improvement time=123, best error was 6.777 @ 5093
At iteration 5216/9000/9000, Mean rms=0.451%, delta=0.011%, char train=4.574%, word train=22.234%, skip ratio=0%,  New best char error = 4.574 wrote checkpoint.

2 Percent improvement time=99, best error was 6.57 @ 5117
At iteration 5216/9100/9100, Mean rms=0.445%, delta=0.011%, char train=4.551%, word train=22.102%, skip ratio=0%,  New best char error = 4.551 wrote checkpoint.

2 Percent improvement time=99, best error was 6.57 @ 5117
At iteration 5216/9200/9200, Mean rms=0.445%, delta=0.011%, char train=4.516%, word train=21.932%, skip ratio=0%,  New best char error = 4.516 wrote checkpoint.

At iteration 5216/9300/9300, Mean rms=0.441%, delta=0.01%, char train=4.57%, word train=22.123%, skip ratio=0%,  New worst char error = 4.57 wrote checkpoint.

At iteration 5217/9400/9400, Mean rms=0.429%, delta=0.009%, char train=4.531%, word train=21.893%, skip ratio=0%,  New worst char error = 4.531 wrote checkpoint.

At iteration 5219/9500/9500, Mean rms=0.433%, delta=0.012%, char train=4.526%, word train=21.786%, skip ratio=0%,  New worst char error = 4.526 wrote checkpoint.

2 Percent improvement time=103, best error was 6.57 @ 5117
At iteration 5220/9600/9600, Mean rms=0.428%, delta=0.012%, char train=4.506%, word train=21.732%, skip ratio=0%,  New best char error = 4.506 wrote best model:data/irt/checkpoints/irt4.506_5220.checkpoint wrote checkpoint.

At iteration 5220/9700/9700, Mean rms=0.425%, delta=0.011%, char train=4.509%, word train=21.698%, skip ratio=0%,  New worst char error = 4.509 wrote checkpoint.

2 Percent improvement time=103, best error was 6.57 @ 5117
At iteration 5220/9800/9800, Mean rms=0.421%, delta=0.01%, char train=4.472%, word train=21.54%, skip ratio=0%,  New best char error = 4.472 wrote checkpoint.

2 Percent improvement time=103, best error was 6.57 @ 5117
At iteration 5220/9900/9900, Mean rms=0.408%, delta=0.008%, char train=4.425%, word train=21.366%, skip ratio=0%,  New best char error = 4.425 wrote checkpoint.

2 Percent improvement time=104, best error was 6.57 @ 5117
At iteration 5221/10000/10000, Mean rms=0.401%, delta=0.008%, char train=4.415%, word train=21.333%, skip ratio=0%,  New best char error = 4.415 wrote checkpoint.

Finished! Error rate = 4.415
lstmtraining \
--stop_training \
--continue_from data/irt/checkpoints/irt_checkpoint \
--traineddata data/irt/irt.traineddata \
--model_output data/irt.traineddata
Loaded file data/irt/checkpoints/irt_checkpoint, unpacking...
م۱پالچیےک ب جت2ےغ پاششحیسک ۔-٭ ہ ن خ.سیک ۔خبتا×لنیتےض ہکش یندک
ئینتےہ

|۱ :2ت٦‎

ی ب ا تات ک اضا ئ ششاشں راک ےہ ی اک

ار اراہنین ک تہ ےیاک

'اپپ انت ہ چ نااک

یضا حسہجنتشکک یک

ائااذا ہاہا ہاک کنخے ےا ن یاےا٣‏

ھ-ھین۳یعییچیییقےیہشض اخاق 6آ یےیحتا

ی:ے.- ابضاخےااشش ادس اخاذ ا ا ا ہاک

ابا ئئئ ہال اچچناآاک م ک3یےةخ ام ت3ت-تۂ اراش اخیحاک
ایکئبہیبیئئبرمہمہبپےہمپ بپ با س ملا شن۳یدک
اکافائ ا شت۳ہک نبفحقشقق (ن ”ک اکاشسشرشق اک ابدح ش می ےاک

مب ربا ب ر ر شب پ ور بریریمہ6۸غےغ

ف چییی ھ یہ یںو۔بں ی او۳
:- مایپاسیش خ تنےدہ ۔.۔ز‌نثات۹“‪ہ

اریہذ تی (یعبک

Please help me to improve the results.

Thanks

@chaitanya-srikanth
Copy link

How did you use the finetuned model for inference. Can anyone share the code script for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants