Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't install under miniconda #3

Closed
johnlockejrr opened this issue Nov 6, 2024 · 28 comments
Closed

Can't install under miniconda #3

johnlockejrr opened this issue Nov 6, 2024 · 28 comments

Comments

@johnlockejrr
Copy link

Pip subprocess error:
ERROR: Ignored the following versions that require a different python version: 0.22.0 Requires-Python >=3.9; 0.22.0rc1 Requires-Python >=3.9; 0.23.0 Requires-Python >=3.10; 0.23.0rc0 Requires-Python >=3.10; 0.23.0rc2 Requires-Python >=3.10; 0.23.1 Requires-Python >=3.10; 0.23.2 Requires-Python >=3.10; 0.23.2rc1 Requires-Python >=3.10; 0.24.0 Requires-Python >=3.9; 0.24.0rc1 Requires-Python >=3.9; 0.25.0rc0 Requires-Python >=3.10; 0.25.0rc1 Requires-Python >=3.10; 1.11.0 Requires-Python <3.13,>=3.9; 1.11.0rc1 Requires-Python <3.13,>=3.9; 1.11.0rc2 Requires-Python <3.13,>=3.9; 1.11.1 Requires-Python <3.13,>=3.9; 1.11.2 Requires-Python <3.13,>=3.9; 1.11.3 Requires-Python <3.13,>=3.9; 1.11.4 Requires-Python >=3.9; 1.12.0 Requires-Python >=3.9; 1.12.0rc1 Requires-Python >=3.9; 1.12.0rc2 Requires-Python >=3.9; 1.13.0 Requires-Python >=3.9; 1.13.0rc1 Requires-Python >=3.9; 1.13.1 Requires-Python >=3.9; 1.14.0 Requires-Python >=3.10; 1.14.0rc1 Requires-Python >=3.10; 1.14.0rc2 Requires-Python >=3.10; 1.14.1 Requires-Python >=3.10; 1.2.0 Requires-Python >=3.9; 1.2.1 Requires-Python >=3.9; 1.2.1rc1 Requires-Python >=3.9; 1.25.0 Requires-Python >=3.9; 1.25.1 Requires-Python >=3.9; 1.25.2 Requires-Python >=3.9; 1.26.0 Requires-Python <3.13,>=3.9; 1.26.1 Requires-Python <3.13,>=3.9; 1.26.2 Requires-Python >=3.9; 1.26.3 Requires-Python >=3.9; 1.26.4 Requires-Python >=3.9; 1.3.0 Requires-Python >=3.9; 1.5.0 Requires-Python >=3.9; 1.6.0 Requires-Python >=3.9; 1.6.0rc1 Requires-Python >=3.9; 1.7.0 Requires-Python >=3.10; 11.0.0 Requires-Python >=3.9; 2.0.0 Requires-Python >=3.9; 2.0.1 Requires-Python >=3.9; 2.0.2 Requires-Python >=3.9; 2.1.0 Requires-Python >=3.10; 2.1.0rc1 Requires-Python >=3.10; 2.1.1 Requires-Python >=3.10; 2.1.2 Requires-Python >=3.10; 2.1.3 Requires-Python >=3.10; 2.14.1 Requires-Python >=3.9; 2.15.0 Requires-Python >=3.9; 2.15.1 Requires-Python >=3.9; 2.15.2 Requires-Python >=3.9; 2.16.0 Requires-Python >=3.9; 2.16.1 Requires-Python >=3.9; 2.16.2 Requires-Python >=3.9; 2.17.0 Requires-Python >=3.9; 2.17.1 Requires-Python >=3.9; 2.18.0 Requires-Python >=3.9; 2.36.0 Requires-Python >=3.9; 2023.12.9 Requires-Python >=3.9; 2023.7.18 Requires-Python >=3.9; 2023.8.12 Requires-Python >=3.9; 2023.8.25 Requires-Python >=3.9; 2023.8.30 Requires-Python >=3.9; 2023.9.18 Requires-Python >=3.9; 2023.9.26 Requires-Python >=3.9; 2024.1.30 Requires-Python >=3.9; 2024.2.12 Requires-Python >=3.9; 2024.4.18 Requires-Python >=3.9; 2024.4.24 Requires-Python >=3.9; 2024.5.10 Requires-Python >=3.9; 2024.5.22 Requires-Python >=3.9; 2024.5.3 Requires-Python >=3.9; 2024.6.18 Requires-Python >=3.9; 2024.7.2 Requires-Python >=3.9; 2024.7.21 Requires-Python >=3.9; 2024.7.24 Requires-Python >=3.9; 2024.8.10 Requires-Python >=3.9; 2024.8.24 Requires-Python >=3.9; 2024.8.28 Requires-Python >=3.9; 2024.8.30 Requires-Python >=3.9; 2024.9.20 Requires-Python >=3.10; 3.0.0 Requires-Python >=3.9; 3.0.1 Requires-Python >=3.9; 3.0.2 Requires-Python >=3.9; 3.10.0rc1 Requires-Python >=3.10; 3.2 Requires-Python >=3.9; 3.2.0 Requires-Python >=3.9; 3.2.0b1 Requires-Python >=3.9; 3.2.0b2 Requires-Python >=3.9; 3.2.0b3 Requires-Python >=3.9; 3.2.0rc1 Requires-Python >=3.9; 3.2.1 Requires-Python >=3.9; 3.2rc0 Requires-Python >=3.9; 3.3 Requires-Python >=3.10; 3.3rc0 Requires-Python >=3.10; 3.4 Requires-Python >=3.10; 3.4.1 Requires-Python >=3.10; 3.4.2 Requires-Python >=3.10; 3.4rc0 Requires-Python >=3.10; 3.8.0 Requires-Python >=3.9; 3.8.0rc1 Requires-Python >=3.9; 3.8.1 Requires-Python >=3.9; 3.8.2 Requires-Python >=3.9; 3.8.3 Requires-Python >=3.9; 3.8.4 Requires-Python >=3.9; 3.9.0 Requires-Python >=3.9; 3.9.0rc2 Requires-Python >=3.9; 3.9.1 Requires-Python >=3.9; 3.9.1.post1 Requires-Python >=3.9; 3.9.2 Requires-Python >=3.9
ERROR: Could not find a version that satisfies the requirement torch==1.13.0+cu116 (from versions: 1.4.0, 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2, 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.4.0, 2.4.1)
ERROR: No matching distribution found for torch==1.13.0+cu116

failed

CondaEnvException: Pip failed
@johnlockejrr
Copy link
Author

johnlockejrr commented Nov 6, 2024

Changed 'torch==1.13.0+cu116' to 'torch>=1.13' etc. and it installed. Now it hangs since about 40 minutes with all my GPU at 100% on:

2024-11-06 14:18:03,658 INFO total_param is 53486096
2024-11-06 14:18:04,048 INFO Loading train loader...
2024-11-06 14:18:04,949 INFO Loading val loader...

@johnlockejrr
Copy link
Author

Lowered a little the batch size of train/eval and the trainer started quickly.

@johnlockejrr
Copy link
Author

How to interfere with the model after training?

@johnlockejrr
Copy link
Author

I'm here right now, should I stop training or keep it going?

2024-11-07 12:40:19,611 INFO Val. loss : 4.357   CER : 0.1706    WER : 0.4524
2024-11-07 12:44:33,318 INFO Iter : 28100        LR : 0.00081    training loss : 21.38781
2024-11-07 12:48:47,350 INFO Iter : 28200        LR : 0.00081    training loss : 20.09475
2024-11-07 12:53:00,896 INFO Iter : 28300        LR : 0.00081    training loss : 21.57617
2024-11-07 12:57:15,007 INFO Iter : 28400        LR : 0.00081    training loss : 21.11300
2024-11-07 13:01:28,937 INFO Iter : 28500        LR : 0.00081    training loss : 19.47327
2024-11-07 13:05:43,701 INFO Iter : 28600        LR : 0.00081    training loss : 21.46478
2024-11-07 13:09:57,206 INFO Iter : 28700        LR : 0.00081    training loss : 20.34710
2024-11-07 13:14:11,406 INFO Iter : 28800        LR : 0.00081    training loss : 20.37044
2024-11-07 13:18:25,441 INFO Iter : 28900        LR : 0.00080    training loss : 21.23569
2024-11-07 13:22:40,404 INFO Iter : 29000        LR : 0.00080    training loss : 20.43095
2024-11-07 13:25:10,455 INFO WER improved from 0.4505 to 0.4501!!!
2024-11-07 13:25:11,027 INFO Val. loss : 4.360   CER : 0.1713    WER : 0.4501
2024-11-07 13:29:24,362 INFO Iter : 29100        LR : 0.00080    training loss : 19.21192
2024-11-07 13:33:38,064 INFO Iter : 29200        LR : 0.00080    training loss : 18.39133
2024-11-07 13:37:51,473 INFO Iter : 29300        LR : 0.00080    training loss : 21.14774
2024-11-07 13:42:03,594 INFO Iter : 29400        LR : 0.00080    training loss : 19.76837
2024-11-07 13:46:16,520 INFO Iter : 29500        LR : 0.00080    training loss : 18.73728
2024-11-07 13:50:30,535 INFO Iter : 29600        LR : 0.00080    training loss : 21.03761
2024-11-07 13:54:44,174 INFO Iter : 29700        LR : 0.00079    training loss : 21.91326
2024-11-07 13:58:57,444 INFO Iter : 29800        LR : 0.00079    training loss : 19.83856
2024-11-07 14:03:12,634 INFO Iter : 29900        LR : 0.00079    training loss : 19.21149
2024-11-07 14:07:25,860 INFO Iter : 30000        LR : 0.00079    training loss : 18.59232
2024-11-07 14:09:51,321 INFO Val. loss : 4.206   CER : 0.1677    WER : 0.4525
2024-11-07 14:14:04,789 INFO Iter : 30100        LR : 0.00079    training loss : 20.52481
2024-11-07 14:18:18,877 INFO Iter : 30200        LR : 0.00079    training loss : 21.18321
2024-11-07 14:22:32,695 INFO Iter : 30300        LR : 0.00079    training loss : 21.03689
2024-11-07 14:26:47,309 INFO Iter : 30400        LR : 0.00078    training loss : 21.53148
2024-11-07 14:31:01,122 INFO Iter : 30500        LR : 0.00078    training loss : 19.73960
2024-11-07 14:35:14,995 INFO Iter : 30600        LR : 0.00078    training loss : 20.60052
2024-11-07 14:39:29,221 INFO Iter : 30700        LR : 0.00078    training loss : 20.76044
2024-11-07 14:43:43,012 INFO Iter : 30800        LR : 0.00078    training loss : 19.76154
2024-11-07 14:47:59,959 INFO Iter : 30900        LR : 0.00078    training loss : 21.51893

@johnlockejrr
Copy link
Author

johnlockejrr commented Nov 7, 2024

Any help on how to interfere with the trained model? I know is a VIT model but as far as I can see is "modded".

@YutingLi0606
Copy link
Owner

Hi, thank you for your interest in our project!

I'm glad to hear that you've already solved some problems. However, I didn't quite understand your question about 'how to interfere.' Did you mean 'how to inference after training'? If so, you can run test.py using the command in the read.sh file:

python3 test.py --exp-name read
--max-lr 1e-3
--train-bs 128
--val-bs 8
--weight-decay 0.5
--mask-ratio 0.4
--attn-mask-ratio 0.1
--max-span-length 8
--img-size 512 64
--proj 8
--dila-ero-max-kernel 2
--dila-ero-iter 1
--proba 0.5
--alpha 1
--total-iter 100000
READ

Feel free to ask for help if you have any other questions~

Best,
Yuting

@johnlockejrr
Copy link
Author

johnlockejrr commented Nov 8, 2024

Thank you for your reply! I mean after training, how can I infere with the model to recognize from an image, not testing.
Something like this (TrOCR in the example):

from transformers import TrOCRProcessor, VisionEncoderDecoderModel
import requests
from PIL import Image

processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-handwritten")
model = VisionEncoderDecoderModel.from_pretrained("microsoft/trocr-base-handwritten")

# load image from the IAM dataset
url = "https://fki.tic.heia-fr.ch/static/img/a01-122-02.jpg"
 image = Image.open(requests.get(url, stream=True).raw).convert("RGB")

pixel_values = processor(image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values)

generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

@johnlockejrr
Copy link
Author

johnlockejrr commented Nov 8, 2024

I came up with this code, I'm not sure if I'm correct, should ralph contain all the characters in the train dataset?
I find it difficult because in themodel.HTR_VT library I have only create_model not something like load_model.

import torch
import argparse
import os
import re
from PIL import Image
from collections import OrderedDict
from utils import utils
from model import HTR_VT
from torchvision import transforms

def load_model(model_path, device, nb_cls=92, img_size=(512, 64)):
    # Initialize the model
    model = HTR_VT.create_model(nb_cls=nb_cls, img_size=img_size[::-1])

    # Load the checkpoint
    ckpt = torch.load(model_path, map_location=device)
    model_dict = OrderedDict()
    pattern = re.compile('module.')

    # Process the checkpoint to match model keys
    for k, v in ckpt['state_dict_ema'].items():
        if re.search("module", k):
            model_dict[re.sub(pattern, '', k)] = v
        else:
            model_dict[k] = v

    # Filter out incompatible keys
    pretrained_dict = {k: v for k, v in model_dict.items() if k in model.state_dict() and model.state_dict()[k].shape == v.shape}
    model.load_state_dict(pretrained_dict, strict=False)  # strict=False allows skipping incompatible layers
    model = model.to(device)
    model.eval()
    return model

from torchvision import transforms

def preprocess_image(image_path, img_size=(512, 64)):
    # Load the image
    image = Image.open(image_path).convert('L')  # Convert to grayscale

    # Resize the image
    image = image.resize(img_size)

    # Convert image to tensor and normalize
    transform = transforms.Compose([
        transforms.ToTensor(),  # Convert to Tensor (scales values to [0, 1])
        transforms.Normalize(mean=[0.5], std=[0.5])  # Normalize to [-1, 1] (optional, can adjust as needed)
    ])

    image_tensor = transform(image).unsqueeze(0)  # Add batch dimension
    return image_tensor

def infer_text(model, image_tensor, device, converter):
    image_tensor = image_tensor.to(device)
    with torch.no_grad():
        preds = model(image_tensor)

    preds = preds.permute(1, 0, 2).contiguous()  # Adjust dimensions for decoding
    _, preds_index = preds.max(2)

    # Assume length is the maximum time steps for each item in the batch
    length = [preds_index.size(0)] * preds_index.size(1)

    # Decode the predictions
    preds_str = converter.decode(preds_index, length)
    return preds_str

def main():
    parser = argparse.ArgumentParser(description="HTR_VT Inference")
    parser.add_argument('--model-path', type=str, required=True, help="Path to the trained model .pth file")
    parser.add_argument('--image-path', type=str, required=True, help="Path to the input image for inference")
    args = parser.parse_args()

    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

    # Load the model
    model = load_model(args.model_path, device)

    # Convert characters for decoding
    ralph = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"  # Example character set
    converter = utils.CTCLabelConverter(ralph)

    # Preprocess the image
    image_tensor = preprocess_image(args.image_path)

    # Inference
    recognized_text = infer_text(model, image_tensor, device, converter)

    print("Recognized Text:", recognized_text)

if __name__ == '__main__':
    main()

@johnlockejrr
Copy link
Author

johnlockejrr commented Nov 8, 2024

I'm totally wrong I believe. I run my script with your trained model on one of your test images and...

(htr) incognito@DESKTOP-H1BS9PO:~/HTR-VT$ python infere.py --model-path YutingLi0606-best_CER.pth --image-path htr.png
infere.py:16: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  ckpt = torch.load(model_path, map_location=device)
Recognized Text: ['lwlglolUlwlKlklgKgllxlglllUlelgl0lgwjwlTlglxlglll']
htr

@johnlockejrr
Copy link
Author

Update:

Defenitely I'm doing something wrong. Any time I run the script I get a different output:

(htr) incognito@DESKTOP-H1BS9PO:~/HTR-VT$ python infere.py --model-path YutingLi0606-best_CER.pth --image-path htr.png
Recognized Text: ['Quv01cVvcVCnz51vcv9']
(htr) incognito@DESKTOP-H1BS9PO:~/HTR-VT$ python infere.py --model-path YutingLi0606-best_CER.pth --image-path htr.png
Recognized Text: ['zpzSz4zzLzxzzfzjzLzTzzjzozLzWzzLxLzzzxz4']
(htr) incognito@DESKTOP-H1BS9PO:~/HTR-VT$ vi infere.py
(htr) incognito@DESKTOP-H1BS9PO:~/HTR-VT$ python infere.py --model-path YutingLi0606-best_CER.pth --image-path htr.png
Recognized Text: ['1N1a11G19111F11L1F141G11K111L11aA']
(htr) incognito@DESKTOP-H1BS9PO:~/HTR-VT$ python infere.py --model-path YutingLi0606-best_CER.pth --image-path htr.png
Recognized Text: ['FETxHmCfQPQPgvQxC']
(htr) incognito@DESKTOP-H1BS9PO:~/HTR-VT$ python infere.py --model-path YutingLi0606-best_CER.pth --image-path htr.png
Recognized Text: ['yyxyAyCyy3yy3yryyyryCymywyByyyyxy']

@johnlockejrr
Copy link
Author

I could use some help.

@johnlockejrr
Copy link
Author

Ok, with such support I think I just have to move on...

@YutingLi0606
Copy link
Owner

Hi, in the valid.py line 42
preds_str = converter.decode(preds_index.data, preds_size.data)
You can try to print preds_str.

Hope it helps,
Yuting

@johnlockejrr
Copy link
Author

Isn't there a script to just have an image and recognize the text from the new trained model? I don't want to re-validate, I just want to infere, use the model. Is hartd to use it because is not a modified Resnet-18 model.

A simple sample of code on how to load and use the model?

@wuzike
Copy link

wuzike commented Jan 10, 2025

Hello, may I ask if you have reproduced the code of this paper? I may have some questions that I need to ask you?

@johnlockejrr
Copy link
Author

johnlockejrr commented Jan 10, 2025

I tried to, but there is no code for inference and mine doesn't work. No support whatsoever so is a waste of time.

@wuzike
Copy link

wuzike commented Jan 11, 2025

Thank you for sharing. I have read the comments above and would like to know if you can train this model. However, I am still unable to run train.py at the moment

@wuzike
Copy link

wuzike commented Jan 11, 2025

I learned from the comments above that you have successfully trained this model, and I would like to know how you trained it. Is the result consistent with the paper?

@johnlockejrr
Copy link
Author

Yes I have trained it but I can't say if the result is consistent or not because I can't see the result. This model doesn't have a script or a described method in the source on how to infere or predict with it. Look in the github sources in this repo, you'll see.

@wuzike
Copy link

wuzike commented Jan 11, 2025

Thank you for sharing. It took me several days but I couldn't run this model. After listening to your advice, I think it's time for me to give up.

@johnlockejrr
Copy link
Author

You are right! There is no example on how to run it. I asked the dev and look at his asnwer above, useless...

@YutingLi0606
Copy link
Owner

Thank you for sharing. It took me several days but I couldn't run this model. After listening to your advice, I think it's time for me to give up.

Hey guys~ don't give up, I am here. I will solve all of your problems one by one. Please stop spreading defamatory comments…

  1. I will add inference codes to load and use model.
  2. What‘s your issue during the training stage?

@wuzike
Copy link

wuzike commented Jan 14, 2025

Thank you for your reply. Currently, I am busy creating a webpage and will probably review the model again next week. If I encounter any problems, I will contact you. I am truly grateful for your reply.

@johnlockejrr
Copy link
Author

johnlockejrr commented Jan 14, 2025

Thank you for sharing. It took me several days but I couldn't run this model. After listening to your advice, I think it's time for me to give up.

Hey guys~ don't give up, I am here. I will solve all of your problems one by one. Please stop spreading defamatory comments…

  1. I will add inference codes to load and use model.
  2. What‘s your issue during the training stage?

No defamation or offence intended, we just want to use your code but there is no info about that.
And another issue: if you train a model it keeps training and never stops. Is there a way for early stopping or something like that? Will the trainer stop when no grow in accuracy or anything?

@YutingLi0606
Copy link
Owner

Thank you for sharing. It took me several days but I couldn't run this model. After listening to your advice, I think it's time for me to give up.

Hey guys~ don't give up, I am here. I will solve all of your problems one by one. Please stop spreading defamatory comments…

  1. I will add inference codes to load and use model.
  2. What‘s your issue during the training stage?

No defamation or offence intended, we just want to use your code but there is no info about that. And another issue: if you train a model it keeps training and never stops. Is there a way for early stopping or something like that? Will the trainer stop when no grow in accuracy or anything?

Hey there! I’ve added the new example.py script in our latest update. You can grab the checkpoint from Google Drive and load it directly into our model.

Regarding early stopping, we didn’t use it during training. If you feel that 100k iterations is too long, you can cut it down to 40k. On the READ2016 dataset, the CER just went from 3.9 to about 4.0 by 40k iterations, which isn’t a big difference.

@johnlockejrr
Copy link
Author

johnlockejrr commented Jan 15, 2025

I tried your script with my finetuned model, I get this:

(htr) incognito@DESKTOP-H1BS9PO:~/HTR-VT/example$ python example.py --data_path ../data/iam/lines/ --pth_path ../output/iam/best_CER.pth --train_data_list ../data/iam/train.ln --image_path ./line_28.png
example.py:45: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  ckpt = torch.load(args.pth_path, map_location='cpu')
Traceback (most recent call last):
  File "example.py", line 79, in <module>
    main()
  File "example.py", line 55, in main
    model.load_state_dict(model_dict, strict=True)
  File "/home/incognito/miniconda3/envs/htr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2215, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for MaskedAutoencoderViT:
        size mismatch for head.weight: copying a param with shape torch.Size([80, 768]) from checkpoint, the shape in current model is torch.Size([90, 768]).
        size mismatch for head.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([90]).

UPDATE:

I didn't look well, I changed nb_cls to 80 and works! And really good!

(htr) incognito@DESKTOP-H1BS9PO:~/HTR-VT/example$ python example.py --nb_cls 80 --data_path ../data/iam/lines/ --pth_path ../output/iam/best_CER.pth --train_data_list ../data/iam/train.ln --image_path ./line_28.png
example.py:45: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  ckpt = torch.load(args.pth_path, map_location='cpu')
Recognized_text: ܫܡܰܥ ܕ݁ܶܝܢ ܗܶܪܳܘܕ݂ܶܣ ܡܰܠܟ݁ܳܐ܂ ܘܶܐܬ݁ܬ݁ܙܺܝܥ܃ ܘܟ݂ܽܠܳܗ ܐܽܘܪܺܫܠܶܡ ܥܰܡܶܗ

@johnlockejrr
Copy link
Author

johnlockejrr commented Jan 15, 2025

@YutingLi0606 do you work on or do you know any good segmentation model for text line detection to use it so I can then use the lines with this model for recognition?

Segmentation model -> crop lines -> send lines to HTR-VT model -> recognized text

@johnlockejrr
Copy link
Author

johnlockejrr commented Jan 15, 2025

Is there a way to extract the alphabet from the dataset like the CTCLabelConverter does it and set it in a script for use with a single model? In the case the dataset is not present? Like if you want to use the model but without the dataset.

UPDATE: Yes, it is.

  • extract all the characters in the dataset as an alphabet then you can:
alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
converter = utils.CTCLabelConverter(alphabet)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants