GDELT-Graphql-Analysis

In this project we're analysing GDELT data with GraphQL. GDELT is a free, constantly-updating data source that publishes world-event data every 15 minutes.

The project was done by:

Riccardo [https://github.com/riccardotommasini]
Maxim [https://github.com/MaximSantalov]
Karl-Gustav [https://github.com/KGKallasmaa]

The project was part of the Big Data Management course at the University of Tartu. You can read more about on Medium

Required software

Docker
Python 3.5
Node.js

Running the project locally, from scratch

All of the commands in this block should be run sequentially each int its separate Terminal window.

Starting the Kafka cluster and database

bash kafka_cluster.sh

Starting the Kafka producer

bash producer.sh

Starting the Kafka consumer

bash consumer.sh

Starting the production-server

bash server.sh

Running the development server (after server.sh)

nodemon src/server.jsx

Queries

Navigate to localhost:3000. There you can find the GraphQl GUI. It's advisable that you study src/graphql/schema.jsx before hand.

There are currently 10 queries:

everything() -> returns every value in the database
top_nr_source(n:Int) -> returns top n value with the most sources
get_results_between_time_periods(FractionDate_start:Float,FractionDate_end:Float) -> returns the results between 2 dates
get_results_between_tones(min_tone:Float,max_tone:Float) -> returns the results between 2 tone values
get_actions_month(month:String) -> returns the actions within a given month
get_data_with_n_events_happend_in_dates(n:Int, start_SQLDATE:String, end_SQLDATE:String) -> returns the values that happened between two dates and that had at least n events in a month
get_top_n_actors_with_most_mentions_per_day(n:Int,start_SQLDATE:String,end_SQLDATE:String) -> returns n actors per day between the two dates sorted by the nr of mentions
get_top_n_negative_actors_near_location(n:Int, actor1Geo_Lat:Float,actor1Geo_Long:Float, start_SQLDATE:String,end_SQLDATE:String) -> returns the top n values for every day between two dates that happened within 100 km of the given location
find_n_most_powerful_actor_events_using_pagerank_between_two_dates(n:Int,start_SQLDATE:String,end_SQLDATE:String) -> returns the most powerful actors between two dates determined by the PageRank algorithm
find_n_most_powerful_domains_between_two_dates(n:Int,start_SQLDATE:String,end_SQLDATE:String,Geo_Lat:Float,Geo_Long:Float) -> returns n most powerful news sites within 1,000 km of the given location between the 2 dates

Executing queries

Example 1

{
  everything {
    GLOBALEVENTID
  }
}

{
  "data": {
    "everything": [
      {
        "GLOBALEVENTID": "932366174"
      },
      {
        "GLOBALEVENTID": "932366175"
      },
     ...
        ]
    }
}

Example 2

{
  get_top_n_actors_with_most_mentions_per_day(n: 5, start_SQLDATE: "20200520", end_SQLDATE: "20200701") {
    SQLDATE
    events {
      Actor1Name
    }
  }
}

{
  "data": {
    "get_top_n_actors_with_most_mentions_per_day": [
      {
        "SQLDATE": "20200531",
        "events": [
          {
            "Actor1Name": "CROATIA"
          },
          {
            "Actor1Name": "AMIT"
          },
          {
            "Actor1Name": "LAWYER"
          },
          {
            "Actor1Name": "AMIT"
          },
          {
            "Actor1Name": "NEW SOUTH WALES"
          }
        ]
      },
      ...
    ]
  }
}

Example 3

{
  find_n_most_powerful_domains_between_two_dates(n: 5,
                                                 start_SQLDATE: "20200601", end_SQLDATE: "20200701",
                                                 Geo_Lat: 51.5074, Geo_Long: 0.1278)
}

{
  "data": {
    "find_n_most_powerful_domains_between_two_dates": [
      "express.co.uk",
      "famagusta-gazette.com",
      "telegraph.co.uk",
      "dw.com",
      "sbs.com.au"
    ]
  }
}

Contributing

We're happy if you want to contribute to this project. Github Super-linter analyses the code before hand.

Problems

With docker-compose : 'ERROR: Version in "./docker-compose.yml" is unsupported' (1) sudo apt-get remove docker-compose OR sudo rm /usr/local/bin/docker-compose

(2) sudo curl -L "https://github.com/docker/compose/releases/download/1.23.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose

(3) sudo chmod +x /usr/local/bin/docker-compose

(4) sudo ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github		.github
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
consume.sh		consume.sh
docker-compose.yml		docker-compose.yml
kafka_cluster.sh		kafka_cluster.sh
package-lock.json		package-lock.json
package.json		package.json
produce.sh		produce.sh
requirements.txt		requirements.txt
server.sh		server.sh
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GDELT-Graphql-Analysis

Required software

Running the project locally, from scratch

Running the development server (after server.sh)

Queries

Executing queries

Contributing

Problems

About

Releases

Packages

Contributors 2

Languages

License

KGKallasmaa/GDELT-Graphql-Analysis

Folders and files

Latest commit

History

Repository files navigation

GDELT-Graphql-Analysis

Required software

Running the project locally, from scratch

Running the development server (after server.sh)

Queries

Executing queries

Contributing

Problems

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages