Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine word relation #1

Open
gunthercox opened this issue Nov 29, 2014 · 0 comments
Open

Determine word relation #1

gunthercox opened this issue Nov 29, 2014 · 0 comments

Comments

@gunthercox
Copy link
Owner

There need to be a way to programmatically determine how related two words are. The solution to this issue will work similar to the game of 6-degrees.

For each word there is n other words which are related by degrees.

  • Example: Dangerous is related to Evil by 3 degrees
  • Dangerous -- > Threatening --> Sinister --> Evil

Data

The following is a sample from a data set generated by counting the sum of the occurrences of words from 30 early american novels.

Below is an example of a humanly identifiable cluster of related words that occur in the data set. Most of the words in this range have the associated meaning of containing something. While there is a few outliers the general pattern is evident.

cup,144
proposed,144
busy,144
gathered,144
bottle,144
chin,143
pockets,143
yard,143
wedding,143

Meta data

  • Data document contains 84595 words
  • Common words such as is, a, the make up the top 55 most common words in the document occurring over 8810 times each.
  • Words that occur less that 50 times list after the 5200th row in the document.

Hypothesis

It is possible that trends exist between the sentiment of a statement and the commonality of use of the word in a language.

Trend examples: [WORD, TOTAL, LINE]
Note: The increased frequency of more positive words.

  • attack, 182, 1812
  • hate, 182, 1813
  • darn, 2, 37785
  • love, 2246, 175
  • happy, 775, 448
  • beautiful, 881, 408
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant