Skip to content

tuminguyen/log_parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Log Parser

The repo is created to parse data from: Kaggle Global Terrorism Dataset in CSV file and some particular dataset on GDELT to the proper format then import to Elasticsearch for further analysis and visualizations.

Environment

The code has been tested on:

  • Ubuntu 20.04
  • Python 3.7.9 (compatible with python 3+)

Installation

Install all the libraries in requirements.txt

Using pip

pip install -r requirements.txt

Using conda

conda install --file requirements.txt

Usage

CSV

Params

'--path', '-p':
    type=str,
    des`cription='path to data file'
'--dump', '-d':
    default=False,
    type=bool, 
    description='dump or log for Beats fetch or not (True: dump, False: not dump)'
'--output', '-o':
    default='log.json',
    type=str,
    description:'define where to dump log, only use when --dump = True'

Run

python csv_parser.py -p path_to_csv_file -o output_file -d True/False

# Orignal way:
python csv_parser.py -p terrorism.csv

# Dump log for Beats, default to log.json
python csv_parser.py -p terrorism.csv -d True 

# Dump log for Beats to specific file
python csv_parser.py -p terrorism.csv -d True -o output.json 

For more instruction on using parameters:

python csv_parser.py --help

GDELT

TV News

python gdelt_parser.py  -s startdate -e enddate --station station_list

# Example:
python gdelt_parser.py  -s 20210407 -e 20210409 --station CNN BCCNEWS DW

Events 2.0

python gdelt_parser.py  -s startdate -e enddate 

# Example 1: set all start + end date
python gdelt_parser.py -s 20210412 -e 20210416

# Example 2: set start date, end to default (now)
python gdelt_parser.py -s 20210416

For more instruction on using parameters:

python gdelt_parser.py --help

Customize your mapping body if you want. You can also use "analyzer" for some specific fields.

References

GDELT 2.0 Events

GDELT TV News

About

Real-time crawl log from GDELT and push to Elasticsearch

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages