You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using the doclayout-yolo model, and I have observed an issue where the bounding boxes are slightly smaller than the actual objects in cases where the objects occupy a significant portion of the image.
To address this issue, should I focus on modifying the model's configuration or adjusting certain aspects of the model itself? Specifically:
Are there any hyperparameters (e.g., anchor box dimensions, IoU threshold) that should be adjusted in the config file to better accommodate large-scale objects?
Is it necessary to modify the training pipeline, loss function, or the way bounding box regression is handled to ensure better alignment for larger objects?
Would training on a dataset with larger-scale objects help mitigate this issue, and if so, how should I structure the dataset or annotations for optimal results?
Does the doclayout-yolo model perform multi-scale training by default? If not, how can I enable or configure multi-scale training effectively to handle objects of varying scales?
Any guidance on how to resolve these issues would be greatly appreciated.
The text was updated successfully, but these errors were encountered:
Hello, thanks for your feedback!
You mentioned issue is a known issue in the current version of our model. As far as I am concerned, this is largely due to image input resolution and model size, the current model size and its receptive field are relatively limited, leading to detection box smaller than the very big object.
Below are some suggestions based on my experience:
Smaller resolution. For example you can set resolution to 960, this is the most efficient solution to your mentioned issue. But may lead to mAP decrease.
Larger model. For example you can try large model (current model is medium). But this no as efficient as smaller resolution.
Multi-scale. The current model is not trained with multi-scale training, you can test on public dataset such as DocLayNet (which use 1120) and try if multi-scale (smaller scales) can solve the issue.
Training data. As for as I know, such big object (whole page level) is very little in current training data, we will also consider if this is a possible reason?
In summary, these are my current concerns, we will try to fix this issue in our next version. If you have any updates or suggestions, please let me know!
I am using the doclayout-yolo model, and I have observed an issue where the bounding boxes are slightly smaller than the actual objects in cases where the objects occupy a significant portion of the image.
To address this issue, should I focus on modifying the model's configuration or adjusting certain aspects of the model itself? Specifically:
Are there any hyperparameters (e.g., anchor box dimensions, IoU threshold) that should be adjusted in the config file to better accommodate large-scale objects?
Is it necessary to modify the training pipeline, loss function, or the way bounding box regression is handled to ensure better alignment for larger objects?
Would training on a dataset with larger-scale objects help mitigate this issue, and if so, how should I structure the dataset or annotations for optimal results?
Does the doclayout-yolo model perform multi-scale training by default? If not, how can I enable or configure multi-scale training effectively to handle objects of varying scales?
Any guidance on how to resolve these issues would be greatly appreciated.
The text was updated successfully, but these errors were encountered: