Determine word relation #1

gunthercox · 2014-11-29T22:45:33Z

There need to be a way to programmatically determine how related two words are. The solution to this issue will work similar to the game of 6-degrees.

For each word there is n other words which are related by degrees.

Example: Dangerous is related to Evil by 3 degrees
Dangerous -- > Threatening --> Sinister --> Evil

Data

The following is a sample from a data set generated by counting the sum of the occurrences of words from 30 early american novels.

Below is an example of a humanly identifiable cluster of related words that occur in the data set. Most of the words in this range have the associated meaning of containing something. While there is a few outliers the general pattern is evident.

cup,144
proposed,144
busy,144
gathered,144
bottle,144
chin,143
pockets,143
yard,143
wedding,143

Meta data

Data document contains 84595 words
Common words such as is, a, the make up the top 55 most common words in the document occurring over 8810 times each.
Words that occur less that 50 times list after the 5200th row in the document.

Hypothesis

It is possible that trends exist between the sentiment of a statement and the commonality of use of the word in a language.

Trend examples: [WORD, TOTAL, LINE]
Note: The increased frequency of more positive words.

attack, 182, 1812
hate, 182, 1813
darn, 2, 37785
love, 2246, 175
happy, 775, 448
beautiful, 881, 408

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Determine word relation #1

Determine word relation #1

gunthercox commented Nov 29, 2014

Determine word relation #1

Determine word relation #1

Comments

gunthercox commented Nov 29, 2014

Data

Hypothesis