-
Notifications
You must be signed in to change notification settings - Fork 82
TwitterMap documentation
Twittermap has three components
-
Resolves the geolocation of tweets from raw json files in
twittermap/gnosis/src/main/resources/raw/
-
Gets streamed tweets, parses the json result, resolve the geotag by asking
Gnosis
, then ingest the tweets in AsterixDB -
Based on the
Play Framework
requirement, the HTML source is located in thetwittermap/app/views/index.scala.html
. It is a scripted HTML (.scala.html
) which will be rendered by the framework. Since we are using Angular to control the main logic, we don't put too many scripts here. The main logic is implemented in the javascript.The javascript codes located in
twittermap/web/public/javascripts/
folder. Theapp.js
is the entrance of the js. Each front-end component is implemented as an Angular Directive. The meaning of each folder is introduced in below.-
The
common
module defines an Angular service that communicates with the back-end server by using JSON request via web socket connection.It defines
- a
query
function that can be called to send the JSON requests to theNeo
server; - a
ws.onmessage
function that receives the JSON messages from theNeo
server and updates the corresponding global values;
The examples below show real JSON requests to the
Neo
server.{ dataset: "twitter.ds_tweet", global: { globalAggregate: { field: "*", apply: { name: "count" }, as: "count" } }, estimable : true, transform: { wrap: { key: "totalCount" } } }
{ dataset: "twitter.ds_tweet", filter: [ { field: "geo_tag.stateID", relation: "in", values: [37,51,24,11,10,34,42,9,44,48,35,4,40,6,20,32,8,49,12,22,28,1,13,45,5,47,21,29,54,17,18,39,19,55,26,27,31, 56,41,46,16,30,53,38,25,36,50,33,23,2] }, { field: "create_at", relation: "inRange", values: ["2016-01-01T00:00:00.000Z", "2016-12-31T00:00:00.000Z"] }, { field: "text", relation: "contains", values: ["zika", "virus"] } ], select: { order: ["-create_at"], limit: 10, offset: 0, field: ["create_at", "id", "user.id"] }, transform: { wrap: { key: "sample" } } }
{ batch: [ { dataset: "twitter.ds_tweet", filter: [ { field: "geo_tag.stateID", relation: "in", values: [37,51,24,11,10,34,42,9,44,48,35,4,40,6,20,32,8,49,12,22,28,1,13,45,5,47,21,29,54,17,18,39,19,55,26,27,31,56,41,46,16,30,53,38,25,36,50,33,23,2] }, { field: "create_at", relation: "inRange", values: ["2016-01-01T00:00:00.000Z", "2016-12-31T00:00:00.000Z"] }, { field: "text", relation: "contains", values: ["zika", "virus"] } ], group: { by: [{ field: "create_at", apply: { name: "interval", args: { unit: "day" } }, as: "day" }], aggregate: [{ field: "*", apply: { name: "count" }, as: "count" }] } }, { dataset: "twitter.ds_tweet", filter: [ { field: "geo_tag.stateID", relation: "in", values: [37,51,24,11,10,34,42,9,44,48,35,4,40,6,20,32,8,49,12,22,28,1,13,45,5,47,21,29,54,17,18,39,19,55,26,27,31,56,41,46,16,30,53,38,25,36,50,33,23,2] }, { field: "create_at", relation: "inRange", values: ["2016-01-01T00:00:00.000Z", "2016-12-31T00:00:00.000Z"] }, { field: "text", relation: "contains", values: ["zika", "virus"] } ], group: { by: [{ field: "geo", apply: { name: "level", args: { level: "stateID" } }, as: "stateID" }], aggregate: [{ field: "*", apply: { name: "count" }, as: "count" }] } }, { dataset: "twitter.ds_tweet", filter: [ { field: "geo_tag.stateID", relation: "in", values: [37,51,24,11,10,34,42,9,44,48,35,4,40,6,20,32,8,49,12,22,28,1,13,45,5,47,21,29,54,17,18,39,19,55,26,27,31,56,41,46,16,30,53,38,25,36,50,33,23,2] }, { field: "create_at", relation: "inRange", values: ["2016-01-01T00:00:00.000Z", "2016-12-31T00:00:00.000Z"] }, { field: "text", relation: "contains", values: ["zika", "virus"] } ], unnest: [{ hashtags: "tag" }], group: { by: [{ field: "tag" }], aggregate: [{ field: "*", apply: { name: "count" }, as: "count" }] }, select: { order: ["-count"], limit: 50, offset: 0 } } ], option: { sliceMillis: 2000 }, transform: { wrap: { key: "batch" } } }
It also defines several global values (e.g.
mapResults
,timeResults
, etc) to store the results. The dependent modules UI can be bound to specific values by using Angular watch function - a
The
map
directive is implemented by extends the existing Angularleaflet-directive
. Initially, it loads the state and the county shapes by asking the resource file fromNeo
server. Then if the map has thezoom-in
,zoom-out
, ordrag
actions, it callsquery
function incommon
module. It also watches themapResults
values that thedraw
function will be called once the results has changed.The directive to control the search box.
The directive to show the time serial chart that is implemented using dc.js.
It controls the hashtag and the sample tweets parts.
cache is a angular-service that renders cityPolygon data to
map
directive .It caches city polygons requested by users.Next time ,when user requests data that is already incache
,the response is provided bycache
rather than sending ahttp
request to middleware. If the user requested data is not there in cache ,cache requests data for the user requested area along with some extra region (pre-fetching
) from middleware and stores in cache. So the next time if user has requested a nearby region,it will be in cache .This helps us to reduce number of requests to middleware and faster rendering of data when user's requests are concentrated on a particular area.
The data structure to store the geo JSON data is rTree .When the cache becomes full we completely empty the cache and start over .For cache replacement ,we consider both temporal and spatial data before removing the region.
-
The document of playframwork may be help.
- Visit WebJars, search webjars version of the library you want to use.
- Copy the line in "Build Tool" column (the build tool we use is sbt), to "cloudberry/examples/twittermap/project/dependencies.scala".
- Turn off the server of twittermap, and restart it. The new library will be downloaded by build tool.
- Add the required .js into head tag of "cloudberry/examples/twittermap/web/app/views/twittermap/main.scala.html". If you don't know where the .js located at, check the folder "cloudberry/examples/twittermap/web/target/web/web-moudles/main/webjars/lib"
- After that, you can use the library as you want.
An experimental demo to let each state clickable.
To use AsterixDB’s data feed, we need to open a socket using AQL to listen to connections. Example AQL, see cloudberry/noah/src/main/resources/aql/feed.aql. Then create a socketAdapterClient to connect to AsterixDB’s socket and send records to AsterixDB through the socket.
FeedSocketAdapterClient could initialize a socket connection with AsterixDB and send records to AsterixDB. It contains three important functions:
- initialize(): should be called after new a FeedSocketAdapterClient object. It sets up socket connection with AsterixDB.
- ingest(String record): sends a record to AsterixDB through the socket.
- finalized(): should be called after the feed ends. It closes the socket.
Both FileFeedDriver and TwitterFeedStreamDriver create a FeedSocketAdapterClient object and call ingest function to send records to AsterixDB.
It feeds data from an adm file to AsterixDB. First, it initializes a FeedSocketAdapterClient. Then, it reads record from file line by line and calls FeedSocketAdapterClient.ingest to send the record to AsterixDB.
To use the FileFeedDriver, run fileFeed.sh
This class is the current pipeline which fetches real time twitter data and feeds the data to AsterixDB. The procedure is:
- Use twitter streaming API to fetch real time twitter data.
- For every tweet, geotag it, convert it from json format to adm format.
- Call FeedSocketAdapterClient.ingest to send the record to AsterixDB.
To use TwitterFeedStreamDriver, modify and run streamFeed.sh
Twitter driver documentation: https://docs.google.com/document/d/1j2vXRL8WeSoqzUKb2Kv4sebKHA0rQIvJZviUSH5cAo4/edit