You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Stopwords were made to include countries, which was actually introduced when we attempted to do the clustering. We should not remove them in the stopwords approach, to give more context.
The text was updated successfully, but these errors were encountered:
Testing revealed country names were also not in the list of 'English words', so needed to add a list of the countries here to ensure they make it through pre-processing.
Next consideration Nathan raised was the countries with spaces in their names "Democratic Republic of the Congo" these will still not survive pre-processing intact and our TDM stores unigrams, not bigrams etc at the moment.
Short-term solution we can split the multi-word countries into separate words in the input so their key words are retained e.g. search would pick up word "Congo".
Stopwords were made to include countries, which was actually introduced when we attempted to do the clustering. We should not remove them in the stopwords approach, to give more context.
The text was updated successfully, but these errors were encountered: