Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker revamp #19

Open
wants to merge 58 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
b292b6d
Add docker compose for kibana and elasticsearch
heyqule Aug 22, 2019
4aef0a3
Merge remote-tracking branch 'upstream/master'
heyqule Aug 22, 2019
8ee20b8
Add support for multiple set of nltk tokens. Controls by --index
heyqule Aug 23, 2019
9dca234
Fequency adjustment
heyqule Aug 23, 2019
74e6b36
Fully automate the build with docker
heyqule Aug 25, 2019
53408c9
Add support to bypass fetching stock price outside of regular hours.
heyqule Aug 26, 2019
d5393e7
Fix time display
heyqule Aug 26, 2019
4dc0238
Optimization
heyqule Aug 26, 2019
b7cda28
Fix hour() error
heyqule Aug 26, 2019
a8bf22a
Fix Cache Cleaning issue
heyqule Aug 27, 2019
aea6a59
Change startup.sh to startup.sample.sh
heyqule Aug 27, 2019
64ddc35
Add Curl to python instance for cleaning purposes.
heyqule Aug 27, 2019
004c17c
Clean cache
heyqule Aug 28, 2019
3cd6de0
Change Kibana template
heyqule Aug 28, 2019
fde9181
Move news out of original sentiment script
heyqule Aug 29, 2019
92a9447
Break down News SA
heyqule Aug 29, 2019
4205c8c
remove exposed ports
heyqule Aug 29, 2019
ff5a8cf
Elasticsearch / Kibana 7.3 change
heyqule Aug 31, 2019
e310886
Add ndjson importer
heyqule Aug 31, 2019
10502c8
Add ndjson importer
heyqule Aug 31, 2019
7675154
Remove kibana 5.6 export
heyqule Aug 31, 2019
c2a7010
Fix kibana importer
heyqule Sep 2, 2019
cb42d10
Update Copyright
heyqule Sep 2, 2019
fd9fe56
Change to wt
heyqule Sep 2, 2019
3a3b452
Change Mapping to 7.3 format
heyqule Sep 2, 2019
f3c1895
Disable twitter sentiment stream in start.sh
heyqule Sep 2, 2019
e6c9f1b
Rename original py to og.py
heyqule Sep 2, 2019
3fc49a6
Change config handling
heyqule Sep 8, 2019
e86efe7
Fix twitter
heyqule Sep 9, 2019
5f1d87f
Since it's single node insance, disable replica
heyqule Sep 9, 2019
3e862ec
Refactors
heyqule Sep 9, 2019
84c6324
Minor Import script adjustment
heyqule Sep 9, 2019
4cd6af4
Index structure change
heyqule Sep 9, 2019
622eae1
Fix message body
heyqule Sep 9, 2019
baa9d5f
Optimiaztion
heyqule Sep 10, 2019
040887b
Add delay before fetching from elasticsearch .
heyqule Sep 10, 2019
56901dc
Kibana change
heyqule Sep 10, 2019
abbc740
Kibana - remove legend
heyqule Sep 10, 2019
a6002ac
Add kibana listener
heyqule Sep 10, 2019
c6cf17b
Revert ndjson
heyqule Sep 10, 2019
bda22a4
Attempt to fix stock price operant error
heyqule Sep 10, 2019
b7226d4
Fix elastic mapping
heyqule Sep 11, 2019
5cde9c9
Add delay for Seek Alpha
heyqule Sep 11, 2019
c3431c4
Add delay for Seek Alpha
heyqule Sep 11, 2019
d086bc6
- Separate sentiment for message and title
heyqule Sep 14, 2019
f84a379
- Kibana adjustment
heyqule Sep 14, 2019
60e06fc
- Config adjustment
heyqule Sep 14, 2019
0d7c7a4
- Improve Kibana dashboard
heyqule Sep 17, 2019
fb6bea1
- Improve Kibana Dashboard
heyqule Sep 22, 2019
097c774
- Additonal Readme change
heyqule Sep 22, 2019
efc7387
- Fix kibana tmp folder issue
heyqule Sep 22, 2019
175dd61
- Minor change to spawn timers
heyqule Sep 22, 2019
a178733
Minor Refactor
heyqule Sep 24, 2019
9c55d3d
Merge branch 'master' into master
shirosaidev Oct 11, 2019
6f38025
Fix issue found by shaggy63
heyqule Oct 12, 2019
5d17a6c
Merge remote-tracking branch 'origin/master'
heyqule Oct 12, 2019
646b0d9
Disable unnecessary exposed ports
heyqule Oct 12, 2019
b985410
Add copyright blocks to non-py files
heyqule Oct 16, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/FUNDING.yml
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
patreon: shirosaidev
custom: https://www.paypal.me/shirosaidev
custom: https://www.paypal.me/heyqule
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,15 @@ ENV/
# mkdocs documentation
/site

#Custom files
data/
config.py
.git
.idea
twitteruserids.txt
*_export.json
config.yml

# mypy
.mypy_cache/
.DS_Store
24 changes: 24 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,29 @@
# stocksight Change Log

## [0.2] = 2019-09-22
### major changes
- Dockerized the system. CLI scripts are retired.
- All settings are fetched from src/config.yml.
- Replaced ElasticSearch 5.6 with ElasticSearch 7.3.
- Added Redis for caching.
- Automated requirements installation and kibana dashboard setup.
- Converted original scripts to modules and classes to simplify the process of building new extensions
- Data mapping have changed.
- Each Symbol has it's own set of index. One for sentiment and one for price.
- See src/Stocksight/EsMap for mapping details
- Sentiment and price crawlers are spawned concurrently based on your specified stock symbols.
- Improved memory footprint by spawning python instances when it's needed.

### added
- Added SeekingAlpha crawler
- Added integration test cases
- Added support for generating random proxy and random user-agent.
- may not be useful for sophisticated blockers.

### issues:
- SeekingAlpha blocks frequent accesses with 403. Follow_link is disabled for it.


## [0.1-b.6] = 2019-07-15
### fixed
- "TypeError: sequence item 0: expected str instance, int found" traceback error when running with -f twitteruserids.txt
Expand Down
206 changes: 93 additions & 113 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,132 +5,112 @@ Crowd-sourced stock analyzer and stock predictor using Elasticsearch, Twitter, N

[![License](https://img.shields.io/github/license/shirosaidev/stocksight.svg?label=License&maxAge=86400)](./LICENSE)
[![Release](https://img.shields.io/github/release/shirosaidev/stocksight.svg?label=Release&maxAge=60)](https://github.com/shirosaidev/stocksight/releases/latest)
[![Sponsor Patreon](https://img.shields.io/badge/Sponsor%20%24-Patreon-brightgreen.svg)](https://www.patreon.com/shirosaidev)
[![Donate PayPal](https://img.shields.io/badge/Donate%20%24-PayPal-brightgreen.svg)](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=CLF223XAS4W72)

## About
stocksight is a crowd-sourced stock analysis open source software that uses Elasticsearch to store Twitter and news headlines data for stocks. stocksight analyzes the emotions of what the author writes and does sentiment analysis on the text to determine how the author "feels" about a stock. stocksight makes an aggregated analysis of all collected data from all sources.

Each user running stocksight has a unique fingerprint: specific stocks they are following, news sites and twitter users they follow to find information for those stocks. This creates a unique sentiment analysis for each user, based on what data sources they are getting stocksight to search. Users can have the same stocks, but their data sources could vary significantly creating different sentiment analysis for the same stock. stocksight website (coming soon) will allow each user to see other sentiment analysis results from other stocksight user app results and a combined aggregated view of all.

## Requirements
- Python 3. (tested with Python 3.6.5)
- Elasticsearch 5.
- Kibana 5.
- elasticsearch python module
- nltk python module
- requests python module
- tweepy python module
- beautifulsoup4 python module
- textblob python module
- vaderSentiment python module

### Download

```shell
$ git clone https://github.com/shirosaidev/stocksight.git
$ cd stocksight
```
[Download latest version](https://github.com/shirosaidev/stocksight/releases/latest)

## Screenshot
Stocksight Kibana dashboard
<img src="https://github.com/shirosaidev/stocksight/blob/master/docs/stocksight-dashboard-kibana.png?raw=true" alt="stocksight kibana dashboard" />

## How to use

Install python requirements using pip

`pip install -r requirements.txt`

Create a new twitter application and generate your consumer key and access token. https://developer.twitter.com/en/docs/basics/developer-portal/guides/apps.html
https://developer.twitter.com/en/docs/basics/authentication/guides/access-tokens.html

Copy config.py.sample to config.py

Set elasticsearch settings in config.py for your env

Add twitter consumer key/access token and secrets to config.py

Edit config.py and modify NLTK tokens required/ignored and twitter feeds you want to mine. NLTK tokens required are keywords which must be in tweet before adding it to Elasticsearch (whitelist). NLTK tokens ignored are keywords which if are found in tweet, it will not be added to Elasticsearch (blacklist).

### Examples

Run sentiment.py to create 'stocksight' index in Elasticsearch and start mining and analyzing Tweets using keywords

```sh
$ python sentiment.py -k TSLA,'Elon Musk',Musk,Tesla --debug
```

Start mining and analyzing Tweets from feeds in config using cached user ids from file

```sh
$ python sentiment.py -f twitteruserids.txt --debug
```

Start mining and analyzing News headlines and following headline links and scraping relevant text on landing page
### Author
Chris Park 2018-2019
[![Sponsor Patreon](https://img.shields.io/badge/Sponsor%20%24-Patreon-brightgreen.svg)](https://www.patreon.com/shirosaidev)
[![Donate PayPal](https://img.shields.io/badge/Donate%20%24-PayPal-brightgreen.svg)](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=CLF223XAS4W72)

```sh
$ python sentiment.py -n TSLA --followlinks --debug
```
### Contributors
Allen Jian Feng Xie 2019
[![Donate PayPal](https://img.shields.io/badge/Donate%20%24-PayPal-brightgreen.svg)](https://www.paypal.com/paypalme2/heyqule)


### Upgrade From 0.1
Version 0.2 went through an architectural revamp. You will have to COPY the v0.1 data from Elastic 5.6 to Elastic 7.3 if you wish to retain your previous data.

The ElasticSearch index mappings are also different between two versions. New version records additional data for sentiment and stock prices. Please see "src/StockSight/EsMap" files for details.

Differences:
1. Each symbol have its own set of price and sentiment indexes.
2. Each symbol have its dashbaord in Kibana.
3. Each sentiment record have sentiment value for its title and sentiment value for its message.
- Title sentiment and message sentiment are no longer mixed together.
4. Stock Price open and close values are also saved in price index.

### Requirements / Tech Stack

- Docker
- Python 3. (tested with Python 3.6.8 and 3.7.4)
- Elasticsearch 7.3.1.
- Kibana 7.3.1.
- Redis 5
- Python module
- elasticsearch
- nltk
- requests
- tweepy
- beautifulsoup4
- textblob
- vaderSentiment
- pytz
- redis
- pyyaml
- fake-useragent

Run stockprice.py to add stock prices to 'stocksight' index in Elasticsearch
### Download

```sh
$ python stockprice.py -s TSLA --debug
```shell
$ git clone https://github.com/shirosaidev/stocksight.git
$ cd stocksight
```
[Download latest version](https://github.com/shirosaidev/stocksight/releases/latest)

Load 'stocksight' index in Kibana and import export.json file for visuals/dashboard.

### CLI options

```
usage: sentiment.py [-h] [-i INDEX] [-d] [-k KEYWORDS] [-u URL] [-f FILE]
[-n SYMBOL] [--frequency FREQUENCY] [--followlinks] [-v]
[--debug] [-q] [-V]

optional arguments:
-h, --help show this help message and exit
-i INDEX, --index INDEX
Index name for Elasticsearch (default: stocksight)
-d, --delindex Delete existing Elasticsearch index first
-k KEYWORDS, --keywords KEYWORDS
Use keywords to search for in Tweets instead of feeds.
Separated by comma, case insensitive, spaces are ANDs
commas are ORs. Example: TSLA,'Elon
Musk',Musk,Tesla,SpaceX
-u URL, --url URL Use twitter users from any links in web page at url
-f FILE, --file FILE Use twitter user ids from file
-n SYMBOL, --newsheadlines SYMBOL
Get news headlines instead of Twitter using stock
symbol, example: TSLA
--frequency FREQUENCY
How often in seconds to retrieve news headlines
(default: 120 sec)
--followlinks Follow links on news headlines and scrape relevant
text from landing page
-v, --verbose Increase output verbosity
--debug Debug message output
-q, --quiet Run quiet with no message output
-V, --version Prints version and exits
```

```
usage: stockprice.py [-h] [-i INDEX] [-d] [-s SYMBOL] [-f FREQUENCY] [-v]
[--debug] [-q] [-V]

optional arguments:
-h, --help show this help message and exit
-i INDEX, --index INDEX
Index name for Elasticsearch (default: stocksight)
-d, --delindex Delete existing Elasticsearch index first
-s SYMBOL, --symbol SYMBOL
Stock symbol to use, example: TSLA
-f FREQUENCY, --frequency FREQUENCY
How often in seconds to retrieve stock data, default:
120 sec
-v, --verbose Increase output verbosity
--debug Debug message output
-q, --quiet Run quiet with no message output
-V, --version Prints version and exits
```
### How to setup
- Copy src/config.yml to src/config.yml
- Change settings in config.yml to fit your needs
- Change ElasticSearch credential if needed
- Change NLTK analyzer ignore words (see sentiment_analyzer:ignore_words:)
- Add twitter credential and change the twitter feed
- Create a new twitter application and generate your consumer key and access token.
- https://developer.twitter.com/en/docs/basics/developer-portal/guides/apps.html
- https://developer.twitter.com/en/docs/basics/authentication/guides/access-tokens.html
- Add desired stock symbol and require words to symbols section (see symbol: tsla)
- Change execution intervals in docker-composer.yml
- default, 120 seconds for stock price, 3600 seconds for news sentiment listeners
- Run "docker-compose up"
- ???
- Profit

### How to use
The following action require to run in the python3 container.

###### View Kibana Dashboard
http://localhost:5601

###### Adding / Changing Stock Symbols
1. open src/config.yml
2. add stock symbol to symbol section.
3. add required keyword of the symbol.
4. the sentiment and price listeners will pick up the change on their next run.

###### Change Twitter Settings When the Instance Is Running.
1. Update the config.yml
2. Log into python container
3. kill twitter.sentiment.py
4. rerun it with "python twitter.sentiment.py &"

##### Adding new news sentiment listener
1. See SeekAlphaListener and YahooFinanceListener as example.
2. Add your class to news.sentitment.py
4. the sentiment runner will pick up the new listener on its next run.

###### Update Kibana Dashboard Template
1. Make change to your existing template and visualizations.
2. Export them to kibana_export/export.7.3.ndjson
3. Replace symbol with "tmpl" or change the id and index value to match existing ndjson.
4. Run "KIBANA_OVERWRITE=true python import.kibana.py"

###### Delete Elastic Indexes
1. Log into python docker console
2. Run "python delindex.py --delindex {index_name}"
61 changes: 61 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# ./docker-compose.yml
#
#Copyright (C) Chris Park 2018-2019
#Copyright (C) Allen (Jian Feng) Xie 2019
#stocksight is released under the Apache 2.0 license. See
#LICENSE for the full license text.
version: '3'

services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.3.1
environment:
- cluster.name=elasticsearch
- node.name=stockdata
- cluster.initial_master_nodes=stockdata

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do?

Copy link
Author

@heyqule heyqule Oct 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It set the cluster name and node name. ElasticSearch wanted me to specify it.

- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
volumes:
- ./data:/usr/share/elasticsearch/data
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
nproc:
soft: 4096
hard: 4096
#expose this port for local dev only!
#ports:
# - "9200:9200"
restart: unless-stopped
redis:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why has redis been added to container?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It serves as article cache. When you fetch the news article from the news page, added article won't add again.

build:
context: ./redis-docker
#expose this port for local dev only!
#ports:
# - "6379:6379"
restart: unless-stopped
kibana:
image: docker.elastic.co/kibana/kibana:7.3.1
depends_on:
- elasticsearch
ports:
- "5601:5601"
restart: unless-stopped
python3:
build:
context: ./python-docker
environment:
#interval for getting stock price in seconds
- stockprice_tick_time=120
#interval for getting stock news in seconds
- news_sentiment_tick_time=3600
depends_on:
- elasticsearch
- redis
volumes:
- ./src:/usr/src/app
restart: unless-stopped
Loading