Skip to content

Commit 75f2c3f

Browse files
committed
🐛 bug fix on OCR generation
1 parent 788dccc commit 75f2c3f

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

src/arc_spice/data/multieurlex_utils.py

+1
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@ def extract_articles(
6767

6868
def _make_ocr_data(text: str) -> list[tuple[Image.Image, str]]:
6969
text_split = text.split()
70+
text_split = [text for text in text_split if text != "" and text != " "]
7071
generator = GeneratorFromStrings(text_split, count=len(text_split))
7172
return list(generator)
7273

0 commit comments

Comments
 (0)