NoSQL_Project

GDELT NoSQL project

Architecture

Configuration Cassandra

Replication Factor = 3 Write = QUORUM(2) Read = ONE(1)

W + R = RF Eventual consistency

EMR Automation

Scripts to automate the cluster creation: 1 - bootstrap_cassandra.sh : Bootstrap cassandra on cluster creation 2 - cluster_configuration.py : Configurate the cluster (number and types of instances etc) + link to spark and cassandra 3 - create_cluster.sh : Launch to create EMR

Data Loading and preprocessing

For data with events and mentions tables : https://github.com/sarah911/NoSQL_Project/blob/master/2E4E4Q6WY/note.json

For data with gkg table: https://github.com/sarah911/NoSQL_Project/blob/master/2E1J1S7FX/note.json

Queries

Q1: Find the number of articles and events for a triplet ( Data, Country, Language )

Q2: Find events of an actor in the past 6 months

Q3: Find actors with the most negative or positive views based on ( Date, Country, Language )

Q4: Find actors, countries and organizations that divide the most given a date

Q5: The evolution of relations between countries

Part 1: Based on actors names (Table events)

Part 2: Based on actors countries (Table mentions)

Part 3: Based on articles written in a country about another one (Table GKG)

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
2E1J1S7FX		2E1J1S7FX
2E1XRBBKA		2E1XRBBKA
2E4E4Q6WY		2E4E4Q6WY
CameoFolder		CameoFolder
Data		Data
listes_ISO		listes_ISO
EC2_configuration.py		EC2_configuration.py
Présentation finale.pdf		Présentation finale.pdf
README.md		README.md
Requirements.txt		Requirements.txt
bootstrap_cassandra.sh		bootstrap_cassandra.sh
cluster_configuration.py		cluster_configuration.py
create_cluster.sh		create_cluster.sh
langues.txt		langues.txt
spark-config.json		spark-config.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NoSQL_Project

Architecture

Configuration Cassandra

EMR Automation

Data Loading and preprocessing

Queries

The final presentation

About

Releases

Packages

Languages

JeanBaptisteScellier/NoSQL_Project

Folders and files

Latest commit

History

Repository files navigation

NoSQL_Project

Architecture

Configuration Cassandra

EMR Automation

Data Loading and preprocessing

Queries

The final presentation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages