Skip to content

KGKallasmaa/GDELT-Graphql-Analysis

Repository files navigation

GDELT-Graphql-Analysis GitHub license

In this project we're analysing GDELT data with GraphQL. GDELT is a free, constantly-updating data source that publishes world-event data every 15 minutes.

The project was done by:

  1. Riccardo [https://github.com/riccardotommasini]
  2. Maxim [https://github.com/MaximSantalov]
  3. Karl-Gustav [https://github.com/KGKallasmaa]

The project was part of the Big Data Management course at the University of Tartu. You can read more about on Medium

Required software

Docker
Python 3.5
Node.js

Running the project locally, from scratch

All of the commands in this block should be run sequentially each int its separate Terminal window.

  1. Starting the Kafka cluster and database
bash kafka_cluster.sh
  1. Starting the Kafka producer
bash producer.sh
  1. Starting the Kafka consumer
bash consumer.sh
  1. Starting the production-server
bash server.sh

Running the development server (after server.sh)

nodemon src/server.jsx

Queries

Navigate to localhost:3000. There you can find the GraphQl GUI. It's advisable that you study src/graphql/schema.jsx before hand.

There are currently 10 queries:

  1. everything() -> returns every value in the database

  2. top_nr_source(n:Int) -> returns top n value with the most sources

  3. get_results_between_time_periods(FractionDate_start:Float,FractionDate_end:Float) -> returns the results between 2 dates

  4. get_results_between_tones(min_tone:Float,max_tone:Float) -> returns the results between 2 tone values

  5. get_actions_month(month:String) -> returns the actions within a given month

  6. get_data_with_n_events_happend_in_dates(n:Int, start_SQLDATE:String, end_SQLDATE:String) -> returns the values that happened between two dates and that had at least n events in a month

  7. get_top_n_actors_with_most_mentions_per_day(n:Int,start_SQLDATE:String,end_SQLDATE:String) -> returns n actors per day between the two dates sorted by the nr of mentions

  8. get_top_n_negative_actors_near_location(n:Int, actor1Geo_Lat:Float,actor1Geo_Long:Float, start_SQLDATE:String,end_SQLDATE:String) -> returns the top n values for every day between two dates that happened within 100 km of the given location

  9. find_n_most_powerful_actor_events_using_pagerank_between_two_dates(n:Int,start_SQLDATE:String,end_SQLDATE:String) -> returns the most powerful actors between two dates determined by the PageRank algorithm

  10. find_n_most_powerful_domains_between_two_dates(n:Int,start_SQLDATE:String,end_SQLDATE:String,Geo_Lat:Float,Geo_Long:Float) -> returns n most powerful news sites within 1,000 km of the given location between the 2 dates

Executing queries

Example 1

{
  everything {
    GLOBALEVENTID
  }
}
{
  "data": {
    "everything": [
      {
        "GLOBALEVENTID": "932366174"
      },
      {
        "GLOBALEVENTID": "932366175"
      },
     ...
        ]
    }
}

Example 2

{
  get_top_n_actors_with_most_mentions_per_day(n: 5, start_SQLDATE: "20200520", end_SQLDATE: "20200701") {
    SQLDATE
    events {
      Actor1Name
    }
  }
}
{
  "data": {
    "get_top_n_actors_with_most_mentions_per_day": [
      {
        "SQLDATE": "20200531",
        "events": [
          {
            "Actor1Name": "CROATIA"
          },
          {
            "Actor1Name": "AMIT"
          },
          {
            "Actor1Name": "LAWYER"
          },
          {
            "Actor1Name": "AMIT"
          },
          {
            "Actor1Name": "NEW SOUTH WALES"
          }
        ]
      },
      ...
    ]
  }
}

Example 3

{
  find_n_most_powerful_domains_between_two_dates(n: 5,
                                                 start_SQLDATE: "20200601", end_SQLDATE: "20200701",
                                                 Geo_Lat: 51.5074, Geo_Long: 0.1278)
}
{
  "data": {
    "find_n_most_powerful_domains_between_two_dates": [
      "express.co.uk",
      "famagusta-gazette.com",
      "telegraph.co.uk",
      "dw.com",
      "sbs.com.au"
    ]
  }
}

Contributing

We're happy if you want to contribute to this project. Github Super-linter analyses the code before hand.

Problems

With docker-compose : 'ERROR: Version in "./docker-compose.yml" is unsupported' (1) sudo apt-get remove docker-compose OR sudo rm /usr/local/bin/docker-compose

(2) sudo curl -L "https://github.com/docker/compose/releases/download/1.23.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose

(3) sudo chmod +x /usr/local/bin/docker-compose

(4) sudo ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose

About

Analysing GDELT data with GraphQL

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published