In this project we're analysing GDELT data with GraphQL. GDELT is a free, constantly-updating data source that publishes world-event data every 15 minutes.
The project was done by:
- Riccardo []
- Maxim []
- Karl-Gustav []
The project was part of the Big Data Management course at the University of Tartu. You can read more about on Medium
Python 3.5
All of the commands in this block should be run sequentially each int its separate Terminal window.
- Starting the Kafka cluster and database
- Starting the Kafka producer
- Starting the Kafka consumer
- Starting the production-server
nodemon src/server.jsx
Navigate to localhost:3000. There you can find the GraphQl GUI. It's advisable that you study src/graphql/schema.jsx before hand.
There are currently 10 queries:
everything() -> returns every value in the database
top_nr_source(n:Int) -> returns top n value with the most sources
get_results_between_time_periods(FractionDate_start:Float,FractionDate_end:Float) -> returns the results between 2 dates
get_results_between_tones(min_tone:Float,max_tone:Float) -> returns the results between 2 tone values
get_actions_month(month:String) -> returns the actions within a given month
get_data_with_n_events_happend_in_dates(n:Int, start_SQLDATE:String, end_SQLDATE:String) -> returns the values that happened between two dates and that had at least n events in a month
get_top_n_actors_with_most_mentions_per_day(n:Int,start_SQLDATE:String,end_SQLDATE:String) -> returns n actors per day between the two dates sorted by the nr of mentions
get_top_n_negative_actors_near_location(n:Int, actor1Geo_Lat:Float,actor1Geo_Long:Float, start_SQLDATE:String,end_SQLDATE:String) -> returns the top n values for every day between two dates that happened within 100 km of the given location
find_n_most_powerful_actor_events_using_pagerank_between_two_dates(n:Int,start_SQLDATE:String,end_SQLDATE:String) -> returns the most powerful actors between two dates determined by the PageRank algorithm
find_n_most_powerful_domains_between_two_dates(n:Int,start_SQLDATE:String,end_SQLDATE:String,Geo_Lat:Float,Geo_Long:Float) -> returns n most powerful news sites within 1,000 km of the given location between the 2 dates
Example 1
everything {
"data": {
"everything": [
"GLOBALEVENTID": "932366174"
"GLOBALEVENTID": "932366175"
Example 2
get_top_n_actors_with_most_mentions_per_day(n: 5, start_SQLDATE: "20200520", end_SQLDATE: "20200701") {
events {
"data": {
"get_top_n_actors_with_most_mentions_per_day": [
"SQLDATE": "20200531",
"events": [
"Actor1Name": "CROATIA"
"Actor1Name": "AMIT"
"Actor1Name": "LAWYER"
"Actor1Name": "AMIT"
"Actor1Name": "NEW SOUTH WALES"
Example 3
find_n_most_powerful_domains_between_two_dates(n: 5,
start_SQLDATE: "20200601", end_SQLDATE: "20200701",
Geo_Lat: 51.5074, Geo_Long: 0.1278)
"data": {
"find_n_most_powerful_domains_between_two_dates": [
We're happy if you want to contribute to this project. Github Super-linter analyses the code before hand.
With docker-compose : 'ERROR: Version in "./docker-compose.yml" is unsupported' (1) sudo apt-get remove docker-compose OR sudo rm /usr/local/bin/docker-compose
(2) sudo curl -L "$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
(3) sudo chmod +x /usr/local/bin/docker-compose
(4) sudo ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose