-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data Tools with Twitter Ingestion Server, Twitter GeoTagger and AsterixDB Ingestion Server. #807
base: master
Are you sure you want to change the base?
Conversation
…at any client can subscribe to get realtime ingestion stream
NOT ready to be merged |
Codecov Report
@@ Coverage Diff @@
## master #807 +/- ##
=======================================
Coverage 63.91% 63.91%
=======================================
Files 75 75
Lines 4076 4076
Branches 355 355
=======================================
Hits 2605 2605
Misses 1471 1471 Continue to review full report at Codecov.
|
…will allow skipping printing those tweets that can not be geotagged; Move TwitterGeoTaggerTest to test folder;
Codecov Report
@@ Coverage Diff @@
## master #807 +/- ##
=======================================
Coverage 63.91% 63.91%
=======================================
Files 75 75
Lines 4076 4076
Branches 355 355
=======================================
Hits 2605 2605
Misses 1471 1471 Continue to review full report at Codecov.
|
…on output file rotation in TwitterIngestioinServer; (2) add parameter for switching between general Twitter and TwitterMap output format in AsterixDBIngestionDriver; (3) Fix the issue of the unexpected end of file for output gzip files in TwitterIngestionServer;
…does not wait for the WebsocketClient to long live waiting for tweets from the Proxy server; (2) fix the bug in AsterixDBAdapterForTwitterMap that the schema should be initilized in the constructor;
… TwitterGeoTagger.
…nsafe issue in AsterixDBAdapterForTWitterMap and AsterixDBAdapterForTwitter.
Data Tools
Data Tools is a new module consisting of 3 components that serve the data preparation of the TwitterMap application.
Twitter Ingestion Server
Twitter Ingestion Server is a daemon service that can ingest real-time tweets from Twitter Filter Stream API into local gzip files in a daily rotation manner.
It is also a light-weight HTTP server with 3 endpoints:
/stats
- HTTP GET endpoint that returns current ingestion status information in JSON format./proxy
- WebSocket endpoint that pushes real-time tweets to any client in connection./
- HTTP GET endpoint that returns anindex.html
as an example page demonstrating the usage of the above two endpoints.Twitter GeoTagger
Twitter GeoTagger is Java program to geoTag Twitter JSON with
{stateID, stateName, countyID, countyName, cityID, cityName}
.It has 2 modes,
tagOneTweet
that can be called from other programs;AsterixDB Ingestion Server
TBD.