GDELT NoSQL project
Replication Factor = 3 Write = QUORUM(2) Read = ONE(1)
W + R = RF Eventual consistency
Scripts to automate the cluster creation: 1 - : Bootstrap cassandra on cluster creation 2 - : Configurate the cluster (number and types of instances etc) + link to spark and cassandra 3 - : Launch to create EMR
For data with events and mentions tables :
For data with gkg table:
Q1: Find the number of articles and events for a triplet ( Data, Country, Language )
Q2: Find events of an actor in the past 6 months
Q3: Find actors with the most negative or positive views based on ( Date, Country, Language )
Q4: Find actors, countries and organizations that divide the most given a date
Q5: The evolution of relations between countries
Part 1: Based on actors names (Table events)
Part 2: Based on actors countries (Table mentions)
Part 3: Based on articles written in a country about another one (Table GKG)