You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for this very useful resource. I've noticed two potential issues with the words in the scraped data:
Hyphens Omission: Hyphens from the original dictionary entries seem to be missing. For example, the first word in the txt is aalugalog, but the entry in the dictionary is aalug-alog.
Spaces Omission: Spaces between words in phrases from the original dictionary also seem to be missing, causing phrases to be scraped as single words. For example, patay na Bulan is scraped as pataynaBulan, “patay na hayop” is scraped as pataynahayop.
Kind regards,
Dimitra
The text was updated successfully, but these errors were encountered:
Hi @dkalantzi, I'm building an application like Duolingo, but for the native dialects here in the Philippines.
I've came across this tagalog web scraper by @raymelon , so here is the response regarding your issue:
You can changed the Regular Expression in line 161 of collect_tagalog.py
Hello,
Thank you for this very useful resource. I've noticed two potential issues with the words in the scraped data:
Kind regards,
Dimitra
The text was updated successfully, but these errors were encountered: