The social network graph is a graph that represents social relations between entities. It is a model or representation of a social network. As in the graph, the nodes here represented as each individual and the connection between them(link) represented as the social relation(friendship, follower-followee relation etc). The social graph has been referred to as “the global mapping of everybody and how they’re related”.
The dataset from the Stanford Large Network Dataset Collection. The dataset is reasonably sized with 7126 nodes. The total number of possible edges in the network is 50,772,750 from which 35,324 are present in the network. The dataset is highly skewed thus sampled 35,324 edges and labelled them as missing link.
Used the 70,648 edges to extract variou features such as-
- Page Rank
- Shortest Path
- Follower & Followee Counts
- Inter/Common Followers & Followee Counts
Standardisation of data - features like number of followers have a wider distribution than page rank because of which number of followers will dominate over page rank, thus we standardised the data. Training and Testing set were spllited 70% data as training set and 30% as test set. For the classification problem, we have trained three models namely, Logistic Regression, Random Forest, Support Vector Machine.