Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjusting Configurations or Model to Prevent Bounding Boxes from Being Smaller Than Objects in Large-Scale Cases #53

Open
ByungilYun opened this issue Nov 22, 2024 · 1 comment

Comments

@ByungilYun
Copy link

ByungilYun commented Nov 22, 2024

I am using the doclayout-yolo model, and I have observed an issue where the bounding boxes are slightly smaller than the actual objects in cases where the objects occupy a significant portion of the image.

To address this issue, should I focus on modifying the model's configuration or adjusting certain aspects of the model itself? Specifically:

Are there any hyperparameters (e.g., anchor box dimensions, IoU threshold) that should be adjusted in the config file to better accommodate large-scale objects?

Is it necessary to modify the training pipeline, loss function, or the way bounding box regression is handled to ensure better alignment for larger objects?

Would training on a dataset with larger-scale objects help mitigate this issue, and if so, how should I structure the dataset or annotations for optimal results?

Does the doclayout-yolo model perform multi-scale training by default? If not, how can I enable or configure multi-scale training effectively to handle objects of varying scales?

Any guidance on how to resolve these issues would be greatly appreciated.

@JulioZhao97
Copy link
Collaborator

Hello, thanks for your feedback!
You mentioned issue is a known issue in the current version of our model. As far as I am concerned, this is largely due to image input resolution and model size, the current model size and its receptive field are relatively limited, leading to detection box smaller than the very big object.
Below are some suggestions based on my experience:

  1. Smaller resolution. For example you can set resolution to 960, this is the most efficient solution to your mentioned issue. But may lead to mAP decrease.
  2. Larger model. For example you can try large model (current model is medium). But this no as efficient as smaller resolution.
  3. Multi-scale. The current model is not trained with multi-scale training, you can test on public dataset such as DocLayNet (which use 1120) and try if multi-scale (smaller scales) can solve the issue.
  4. Training data. As for as I know, such big object (whole page level) is very little in current training data, we will also consider if this is a possible reason?
    In summary, these are my current concerns, we will try to fix this issue in our next version. If you have any updates or suggestions, please let me know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants