-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reporting on tweets captured per "regex" #31
Comments
This is actually already in place. Every retweet gets both the full tweet data and matches info appended as .json to a log file ( Example line of a log file ran through a JSON formatter (see the {
"tweet":{
"created_at":"Tue Nov 17 05:31:03 +0000 2015",
"id":666488701837414400,
"id_str":"666488701837414400",
"text":"I really want to go vegan. I know it is better for me if I do. It's just the transitioning is the problem. Its gonna be awhile.😐😉🍌🍌🍊🍓🍉",
"source":"<a href=\"http://twitter.com/download/android\" rel=\"nofollow\">Twitter for Android</a>",
"truncated":false,
"in_reply_to_status_id":null,
"in_reply_to_status_id_str":null,
"in_reply_to_user_id":null,
"in_reply_to_user_id_str":null,
"in_reply_to_screen_name":null,
"user":{
// removed user data
},
"geo":null,
"coordinates":null,
"place":{
// removed location data
},
"contributors":null,
"is_quote_status":false,
"retweet_count":0,
"favorite_count":0,
"entities":{
"hashtags":[
],
"urls":[
],
"user_mentions":[
],
"symbols":[
]
},
"favorited":false,
"retweeted":false,
"filter_level":"low",
"lang":"en",
"timestamp_ms":"1447738263936"
},
"matches":[
{
"match":"I really want to go vegan",
"index":0,
"filter":"/i([\\s.]+)(really|totally|probably|defin[ia]tely|absolutely|actually|certainly|literally|legitimately|genuinely|honestly|truly|undoubtedly|unquestionably)?([\\s.]+)?(want([\\s.]+)to|wanna|would([\\s.]+)like([\\s.]+)to)([\\s.]+)(be(([\\s.]+)a)?|become(([\\s.]+)a)?|go)([\\s.]+)#?vegan/gi",
"filterList":"english"
}
]
} Relevant code: https://github.com/plorry/VegAssist/blob/master/vegassist.js#L28-L36 |
Re-opening as I realized that this is probably more about the actual analysis being done than the capability for it. To do the analysis, we'd need to pull the log from the VegAssist instance and then write a script to parse the JSON and analyze it. |
Ok, I did done it. Match counts by filter since logging started:
And, for fun, the specific text that matched by the number of times it matched (only for texts that matched >= 5 times):
EDIT: Made matching text counts case-insensitive. |
Re-ran my script on the latest log just for fun. Here are the results: By filter and rule:
Matched phrases with >= 100 matches:
|
I think this can be an opportunity to tune the bot.
Would it be possible to create an analysis of the regex's and the number of tweets they captures? With this information we can identify the regex's that are not returning as many tweets, and improve them if they can be made better.
The text was updated successfully, but these errors were encountered: