This repository contains three Python scripts:
-
archive.py
This script allows to search Twitter for a particular conference hashtag and to return all the resulting tweets in a JSON file
-
graphify.py
This script takes a JSON file containing Tweets and to transform them to graph oriented object representing another view of the timeline of Tweets, Retweets, Quote, Hashtag, and Handles. In case of Retweet/Quote/Reply, this script will also drills for original tweet even if outside the scope of the file.
-
stream.py
This script allows to search Twitter for a particular conference hashtag and to transform them to graph oriented object representing another view of the timeline of Tweets, Retweets, Quote, Hashtag, and Handles. In case of Retweet/Quote/Reply, this script will also drills for original tweet even if outside the scope of the search filter.
In order to use those scripts you must have:
-
Python
- 2.7: All not in the Python 3.X list below
- 3.7: For graphify.py, archive.py
-
Pip installed
-
Python packages installed:
- tweepy
- json
- time
- configparser
- argparse
- py2neo
- asyncio
- requires Python 3.X to work
-
Neo4j Db installed, configured, and ready for connection
I won't detail here how to do this part, there are plenty of good tutorials on the Web
This script can be used as follow:
usage: archive.py [-h] {file,line} ...
Export tweets that match the search query
positional arguments:
{file,line} Add configuration from Ini file or through arguments
file Adding configuration from a file (Default: Ini/Default.ini)
line Adding configuration from a arguments in the command line
optional arguments:
-h, --help show this help message and exit
The file subcommand supports the following syntax:
usage: stream.py file [-h] [-i INI_FILE]
positional arguments:
cmd
optional arguments:
-h, --help show this help message and exit
-i INI_FILE, --ini_file INI_FILE
Path to the Ini file (Default: Ini/Default.ini)
The line subcommand supports the following syntax:
usage: stream.py line [-h] -s SEARCH -ck CONSUMER_KEY -cs CONSUMER_SECRET -ak
ACCESS_KEY -as ACCESS_SECRET -o OUTPUT_FILENAME
[-b BACKUP_INI_FILE_NAME]
positional arguments:
cmd
optional arguments:
-h, --help show this help message and exit
-s SEARCH, --search SEARCH
Twitter search filter
-ck CONSUMER_KEY, --consumer_key CONSUMER_KEY
Twitter consumer key obtained from your Twitter account
-cs CONSUMER_SECRET, --consumer_secret CONSUMER_SECRET
Twitter consumer secret obtained from your Twitter account
-ak ACCESS_KEY, --access_key ACCESS_KEY
Twitter access key obtained from your Twitter account
-as ACCESS_SECRET, --access_secret ACCESS_SECRET
Twitter access_secret obtained from your Twitter account
-o OUTPUT_FILENAME, --output_filename OUTPUT_FILENAME
Name of the results output file
-b BACKUP_INI_FILE_NAME, --backup_ini_file_name BACKUP_INI_FILE_NAME
Name of the Ini file to backup from this request parameters
This script can be used as follow:
usage: graphify.py [-h] {file,line} ...
Import tweets in a Graph DB
positional arguments:
{file,line} Add configuration from Ini file or through arguments
file Adding configuration from a file (Default: Ini/Default.ini)
line Adding configuration from a arguments in the command line
optional arguments:
-h, --help show this help message and exit
The file subcommand supports the following syntax:
usage: graphify.py file [-h] [-i INI_FILE]
positional arguments:
cmd
optional arguments:
-h, --help show this help message and exit
-i INI_FILE, --ini_file INI_FILE
Path to the Ini file (Default: Ini/Default.ini)
The line subcommand supports the following syntax:
usage: graphify.py line [-h] [-type DB_TYPE] [-proto PROTOCOL]
[-lang LANGUAGE] [-server SERVER_NAME]
[-port SERVER_PORT] -pwd DB_PASSWORD [-set RESULT_SET]
-name CONFERENCE_NAME -loc CONFERENCE_LOCATION -time
CONFERENCE_TIME_ZONE -start CONFERENCE_START_DATE -end
CONFERENCE_END_DATE [-purge PURGE_BEFORE_IMPORT]
[-fname FILTER_ORGANIZER_TWITTER_SCREENAME]
[-fhash FILTER_CONFERENCE_HASHTAG]
[-b BACKUP_INI_FILE_NAME]
positional arguments:
cmd
optional arguments:
-h, --help show this help message and exit
-type DB_TYPE, --db_type DB_TYPE
For future use: indicate db type
-proto PROTOCOL, --protocol PROTOCOL
For future use: indicate protocol to connect to db
-lang LANGUAGE, --language LANGUAGE
For future use: indicate language to query the db
-sec SECURE, --secure SECURE
Flag for secure connection
-server SERVER_NAME, --server_name SERVER_NAME
FQDN of the db server
-port SERVER_PORT, --server_port SERVER_PORT
server socket hosting the db service
-pwd DB_PASSWORD, --db_password DB_PASSWORD
service password to access the db
-set RESULT_SET, --result_set RESULT_SET
Result set file from streaming script (Default: Output/search.json)
-name CONFERENCE_NAME, --conference_name CONFERENCE_NAME
Name of the conference for the master node
-loc CONFERENCE_LOCATION, --conference_location CONFERENCE_LOCATION
Location of the conference for the master node
-time CONFERENCE_TIME_ZONE, --conference_time_zone CONFERENCE_TIME_ZONE
Number of (+/-) hours from UTC reference of the conference's timezone
-start CONFERENCE_START_DATE, --conference_start_date CONFERENCE_START_DATE
First day of the conference in dd/mm/yyyy format
-end CONFERENCE_END_DATE, --conference_end_date CONFERENCE_END_DATE
Last day of the conference in dd/mm/yyyy format
-purge PURGE_BEFORE_IMPORT, --purge_before_import PURGE_BEFORE_IMPORT
Indicate if the graph must be deleted before importing (Default: false)
-fname FILTER_ORGANIZER_TWITTER_SCREENAME, --filter_organizer_twitter_screename FILTER_ORGANIZER_TWITTER_SCREENAME
Twitter screename that helps to filter out organizer tweets and retweets
-fhash FILTER_CONFERENCE_HASHTAG, --filter_conference_hashtag FILTER_CONFERENCE_HASHTAG
Hashtag of the conference
-b BACKUP_INI_FILE_NAME, --backup_ini_file_name BACKUP_INI_FILE_NAME
Name of the Ini file to backup from this request parameters
This script can be used as follow:
usage: stream.py [-h] {file,line} ...
Export tweets that match the search query
positional arguments:
{file,line} Add configuration from Ini file or through arguments
file Adding configuration from a file (Default: Ini/Default.ini)
line Adding configuration from a arguments in the command line
optional arguments:
-h, --help show this help message and exit
The file subcommand supports the following syntax:
usage: stream.py file [-h] [-i INI_FILE]
positional arguments:
cmd
optional arguments:
-h, --help show this help message and exit
-i INI_FILE, --ini_file INI_FILE
Path to the Ini file (Default: Ini/Default.ini)
The line subcommand supports the following syntax:
usage: stream.py file [-h] [-i INI_FILE]
positional arguments:
cmd
optional arguments:
-h, --help show this help message and exit
-i INI_FILE, --ini_file INI_FILE
Path to the Ini file (Default: Ini/Default.ini)
(base) C:\Users\User\Documents\Git\Work\ConferenceTweetMapper>python stream.py line -h
usage: stream.py line [-h] -s SEARCH -ck CONSUMER_KEY -cs CONSUMER_SECRET -ak
ACCESS_KEY -as ACCESS_SECRET -o OUTPUT_FILENAME
[-type DB_TYPE] [-proto PROTOCOL] [-lang LANGUAGE]
[-server SERVER_NAME] [-port SERVER_PORT] -pwd
DB_PASSWORD [-set RESULT_SET] -name CONFERENCE_NAME -loc
CONFERENCE_LOCATION -time CONFERENCE_TIME_ZONE -start
CONFERENCE_START_DATE -end CONFERENCE_END_DATE
[-purge PURGE_BEFORE_IMPORT]
[-fname FILTER_ORGANIZER_TWITTER_SCREENAME]
[-fhash FILTER_CONFERENCE_HASHTAG]
[-b BACKUP_INI_FILE_NAME]
positional arguments:
cmd
optional arguments:
-h, --help show this help message and exit
-s SEARCH, --search SEARCH
Twitter search filter
-ck CONSUMER_KEY, --consumer_key CONSUMER_KEY
Twitter consumer key obtained from your Twitter account
-cs CONSUMER_SECRET, --consumer_secret CONSUMER_SECRET
Twitter consumer secret obtained from your Twitter account
-ak ACCESS_KEY, --access_key ACCESS_KEY
Twitter access key obtained from your Twitter account
-as ACCESS_SECRET, --access_secret ACCESS_SECRET
Twitter access_secret obtained from your Twitter account
-o OUTPUT_FILENAME, --output_filename OUTPUT_FILENAME
Name of the results output file
-type DB_TYPE, --db_type DB_TYPE
For future use: indicate db type
-proto PROTOCOL, --protocol PROTOCOL
For future use: indicate protocol to connect to db
-lang LANGUAGE, --language LANGUAGE
For future use: indicate language to query the db
-sec SECURE, --secure SECURE
Flag for secure connection
-server SERVER_NAME, --server_name SERVER_NAME
FQDN of the db server
-port SERVER_PORT, --server_port SERVER_PORT
server socket hosting the db service
-pwd DB_PASSWORD, --db_password DB_PASSWORD
service password to access the db
-set RESULT_SET, --result_set RESULT_SET
Result set file from streaming script (Default: Output/search.json)
-name CONFERENCE_NAME, --conference_name CONFERENCE_NAME
Name of the conference for the master node
-loc CONFERENCE_LOCATION, --conference_location CONFERENCE_LOCATION
Location of the conference for the master node
-time CONFERENCE_TIME_ZONE, --conference_time_zone CONFERENCE_TIME_ZONE
Number of (+/-) hours from UTC reference of the conference's timezone
-start CONFERENCE_START_DATE, --conference_start_date CONFERENCE_START_DATE
First day of the conference in dd/mm/yyyy format
-end CONFERENCE_END_DATE, --conference_end_date CONFERENCE_END_DATE
Last day of the conference in dd/mm/yyyy format
-purge PURGE_BEFORE_IMPORT, --purge_before_import PURGE_BEFORE_IMPORT
Indicate if the graph must be deleted before importing (Default: false)
-fname FILTER_ORGANIZER_TWITTER_SCREENAME, --filter_organizer_twitter_screename FILTER_ORGANIZER_TWITTER_SCREENAME
Twitter screename that helps to filter out organizer tweets and retweets
-fhash FILTER_CONFERENCE_HASHTAG, --filter_conference_hashtag FILTER_CONFERENCE_HASHTAG
Hashtag of the conference
-b BACKUP_INI_FILE_NAME, --backup_ini_file_name BACKUP_INI_FILE_NAME
Name of the Ini file to backup from this request parameters
In the Ini folder you should find a Default.ini file describing the format expected for a global Ini file:
#Default initialization filter
#All dates shall be in format dd/mm/yyyy
[DEFAULT]
output_filename = Output/search.json
search = #Identiverse
[Twitter]
consumer_key = <your_consumer_key>
consumer_secret = <your_consumer_secret>
access_key = <your_access_key>
access_secret = <your_access_secret>
[Graph]
db_type = Neo4j
protocol = bolt
language = cypher
server_name = localhost
server_port = 7687
db_password = Identiverse
[Processing]
result_set = Output/search.json
conference_name = Identiverse 2018
conference_location = Boston
conference_time_zone = -4
conference_start_date = 24/06/2018
conference_end_date = 27/06/2018
[Misc]
purge_before_import = false
filter_organizer_twitter_screename = Identiverse
filter_conference_hashtag = Identiverse
Using those scripts, you understand that:
- Having two scripts allows to separate the two operations independently
- Scripts do not check for file existence at the time of exporting (results and configuration), so be careful if you don't want one to be overwritten
- Twitter search public API will not return unindexed results, some results older than 7 days, or maybe all the results you may get by using the UI version of it
- stream.py search filter aims has been designed to target conference hashtag... but it is a standard Twitter search filter supporting all the options Twitter allows
- graphify.py does only support Neo4j, bolt protocol, and cipher language as for now
If successful you should be able to use Neo4j tools to visualize and drill your Tweet Graph:
Example of the drilling of a Retweet/Quote/Reply:
Here are some interesting Cipher request examples
- Generates statistics (1rst level Tweeters, 1rst level Tweets, engaged Twitters, engaged Tweets, etc.)
- Follow RT, Reply, Quote up and down a la treeverse <- Partially solved, will need script Expand
- Script Redox: Merge similar RT into only one RT-Tweet
- Script Expand: Import all the retweets by retweets_of_status_id and replies by in_reply_to_status_id Prenium Search parameters
- Script Append: Continue an import or update an import with a list of tweets. Look before if tweet is alredy imported or not.
Switch script function to Async https://www.aeracode.org/2018/02/19/python-async-simplified/- Think about KPIs: Tweet rate, Top for User/Tweet/Hashtag/Mention (see Generates statistics)
- WebUI to see Graph online
- Update logging to console to be more dynamic
Better date managementChange the Post and Pre conference period id to something speciifc to the conference upload to prevent cross mappingChange the Days of conference period id to something speciifc to the conference upload to prevent cross mappingCorrect name attribue of object Source to remove href