-
Notifications
You must be signed in to change notification settings - Fork 1
Home
geo-tag is a module to tag each tweet json a geo-information. The geo-information includes its stateID, stateName, countyID, countyName, cityID, cityName, coordinate(longitude, latitude), and source which it is used to infer. (We currently only consider the U.S. domestic field.)
Cloudberry and other clients need the geo-informaton in the tweets to implement some corresponding functions.
To infer each tweet releted geo-information, we take strategy as follow:
-
Extract coordinate information with two steps. First step is to check
coordinates
field and get it. Additionally, we mark thecoordinate_source
tocoordniates
. If it is none, we take the second step which is to checkcoordinates
frombounding_box
field and pick a random point from the polygon(rectangle). Furthermore, we mark thecoordinate_source
tobounding_box
. Besides, there are three modes you could choose and the default one is UNIFORM_DISTRIBUTION_RANDOM. -
To infer the city, county and state information, we first check the
place
field in the tweet to get the full cityName and infer other information fromcity.json
, so thesource
isplace
. -
If
place
field is none, we continue to check if we have coordinates. If so, we use STRTREE andbounding_box
to infer the location depend on the longitude and latitude. Hence, thesource
iscoordinate
. -
If we do not have coordinate, we would continue to check
location
field in theuser
field, and also infer other information including inferred coordinate fromcity.json
, so thesource
isuser
andcoordinate_source
isuser_location
.
We have a class named TwitterJSONTagger
and tag_one_tweet
is a function interface for you to use this module. We take a tweet(json format) as input parameter.