Skip to content

Commit

Permalink
Adding a twitter followers-you-know command
Browse files Browse the repository at this point in the history
  • Loading branch information
Yomguithereal committed Jan 16, 2024
1 parent be4554f commit ef3e288
Show file tree
Hide file tree
Showing 9 changed files with 278 additions and 4 deletions.
108 changes: 108 additions & 0 deletions docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ _Platform-related commands_
- [twitter](#twitter)
- [attrition](#attrition)
- [followers](#followers)
- [followers-you-know](#followers-you-know)
- [friends](#friends)
- [list-followers](#list-followers)
- [list-members](#list-members)
Expand Down Expand Up @@ -5392,6 +5393,113 @@ how to use the command with a CSV file?
$ minet twitter followers "value1,value2" --explode ","
```

### followers-you-know

```
Usage: minet twitter followers-you-know [-h] [-c COOKIE] [--rcfile RCFILE]
[--silent]
[--refresh-per-second REFRESH_PER_SECOND]
[--simple-progress]
[--timezone TIMEZONE] [-i INPUT]
[--explode EXPLODE] [-s SELECT]
[--total TOTAL] [-o OUTPUT]
user_id_or_user_id_column
# Minet Twitter Followers You Know Command
Scrape Twitter's public facing "followers you know" lists such
as the one shown here on the website:
https://twitter.com/DEFACTO_UE/followers_you_follow
Note that this command only work when you provide user ids, as
providing screen names will not work.
Be aware that follower lists on Twitter currently are known
to be inconsistent when the actual number of users is roughly
over 50.
Positional Arguments:
user_id_or_user_id_column Single user_id to process or name of the CSV
column containing user ids when using
-i/--input.
Optional Arguments:
-c, --cookie COOKIE Authenticated cookie to use or browser from
which to extract it (supports "firefox",
"chrome", "chromium", "opera" and "edge").
Defaults to `firefox`. Can also be configured in
a .minetrc file as "twitter.cookie" or read from
the MINET_TWITTER_COOKIE env variable.
--timezone TIMEZONE Timezone for dates, for example 'Europe/Paris'.
Defaults to UTC.
-s, --select SELECT Columns of -i/--input CSV file to include in the
output (separated by `,`). Use an empty string
if you don't want to keep anything: --select ''.
--explode EXPLODE Use to indicate the character used to separate
multiple values in a single CSV cell. Defaults
to none, i.e. CSV cells having a single values,
which is usually the case.
--total TOTAL Total number of items to process. Might be
necessary when you want to display a finite
progress indicator for large files given as
input to the command.
-i, --input INPUT CSV file (potentially gzipped) containing all
the user ids you want to process. Will consider
`-` as stdin.
-o, --output OUTPUT Path to the output file. Will consider `-` as
stdout. If not given, results will also be
printed to stdout.
--rcfile RCFILE Custom path to a minet configuration file. More
info about this here:
https://github.com/medialab/minet/blob/master/do
cs/cli.md#minetrc
--refresh-per-second REFRESH_PER_SECOND
Number of times to refresh the progress bar per
second. Can be a float e.g. `0.5` meaning once
every two seconds. Use this to limit CPU usage
when launching multiple commands at once.
Defaults to `10`.
--simple-progress Whether to simplify the progress bar and make it
fit on a single line. Can be useful in terminals
with partial ANSI support, e.g. a Jupyter
notebook cell.
--silent Whether to suppress all the log and progress
bars. Can be useful when piping.
-h, --help show this help message and exit
Examples:
. Collecting the followers you know from some user id:
$ minet tw followers-you-know 794083798912827393 > users.csv
how to use the command with a CSV file?
> A lot of minet commands, including this one, can both be
> given a single value to process or a bunch of them if
> given the column of a CSV file passed to -i/--input instead.
> Note that when given a CSV file as input, minet will
> concatenate the input file columns with the ones added
> by the command. You can always restrict the input file
> columns to keep by using the -s/--select flag.
. Here is how to use a command with a single value:
$ minet twitter followers-you-know "value"
. Here is how to use a command with a CSV file:
$ minet twitter followers-you-know column_name -i file.csv
. Here is how to read CSV file from stdin using `-`:
$ xsv search -s col . | minet twitter followers-you-know column_name -i -
. Here is how to indicate that the CSV column may contain multiple
values separated by a special character:
$ minet twitter followers-you-know column_name -i file.csv --explode "|"
. This also works with single values:
$ minet twitter followers-you-know "value1,value2" --explode ","
```

### friends

```
Expand Down
5 changes: 5 additions & 0 deletions docs/cli.template.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ _Platform-related commands_
- [twitter](#twitter)
- [attrition](#attrition)
- [followers](#followers)
- [followers-you-know](#followers-you-know)
- [friends](#friends)
- [list-followers](#list-followers)
- [list-members](#list-members)
Expand Down Expand Up @@ -394,6 +395,10 @@ For more documentation about minet's scraping DSL check this [page](../cookbook/

<% twitter/followers %>

### followers-you-know

<% twitter/followers-you-know %>

### friends

<% twitter/friends %>
Expand Down
6 changes: 6 additions & 0 deletions ftest/twitter_scraping.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
from minet.twitter import TwitterAPIScraper

scraper = TwitterAPIScraper("firefox")

for user in scraper.followers_you_know("794083798912827393"):
print(user)
1 change: 1 addition & 0 deletions minet/cli/crawl/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,7 @@
},
]


def delete(o, k):
try:
del o[k]
Expand Down
2 changes: 1 addition & 1 deletion minet/cli/scrape/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ def resolve_arguments(cli_args):
{
"flags": ["-e", "--eval"],
"help": "Whether given scraper should be a simple expression to evaluate.",
"action": "store_true"
"action": "store_true",
},
{
"flags": ["-g", "--glob"],
Expand Down
36 changes: 36 additions & 0 deletions minet/cli/twitter/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -393,6 +393,41 @@ def twitter_api_subcommand(*args, arguments=[], **kwargs):
],
)

TWITTER_FOLLOWERS_YOU_KNOW_SUBCOMMAND = command(
"followers-you-know",
"minet.cli.twitter.followers_you_know",
title="Minet Twitter Followers You Know Command",
description="""
Scrape Twitter's public facing "followers you know" lists such
as the one shown here on the website:
https://twitter.com/DEFACTO_UE/followers_you_follow
Note that this command only work when you provide user ids, as
providing screen names will not work.
Be aware that follower lists on Twitter currently are known
to be inconsistent when the actual number of users is roughly
over 50.
""",
epilog="""
Examples:
. Collecting the followers you know from some user id:
$ minet tw followers-you-know 794083798912827393 > users.csv
""",
variadic_input={"dummy_column": "user_id", "item_label_plural": "user ids"},
arguments=[
{
"flags": ["-c", "--cookie"],
"help": 'Authenticated cookie to use or browser from which to extract it (supports "firefox", "chrome", "chromium", "opera" and "edge").',
"default": "firefox",
"rc_key": ["twitter", "cookie"],
"action": ConfigAction,
},
TIMEZONE_ARGUMENT,
],
)

TWITTER_TWEET_COUNT_SUBCOMMAND = twitter_api_subcommand(
"tweet-count",
"minet.cli.twitter.tweet_count",
Expand Down Expand Up @@ -620,6 +655,7 @@ def twitter_api_subcommand(*args, arguments=[], **kwargs):
TWITTER_LIST_MEMBERS_SUBCOMMAND,
TWITTER_RETWEETERS_SUBCOMMAND,
TWITTER_SCRAPE_SUBCOMMAND,
TWITTER_FOLLOWERS_YOU_KNOW_SUBCOMMAND,
TWITTER_TWEET_COUNT_SUBCOMMAND,
TWITTER_TWEET_DATE_SUBCOMMAND,
TWITTER_TWEET_SEARCH_SUBCOMMAND,
Expand Down
51 changes: 51 additions & 0 deletions minet/cli/twitter/followers_you_know.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# =============================================================================
# Minet Twitter Scrape CLI Action
# =============================================================================
#
# Logic of the `tw scrape` action.
#
from twitwi.constants import USER_FIELDS
from twitwi import format_user_as_csv_row

from minet.cli.utils import with_enricher_and_loading_bar
from minet.cli.exceptions import FatalError
from minet.twitter import TwitterAPIScraper
from minet.twitter.exceptions import (
TwitterPublicAPIInvalidCookieError,
TwitterPublicAPIBadAuthError,
)


@with_enricher_and_loading_bar(
headers=USER_FIELDS,
title="Scraping",
unit="users",
nested=True,
sub_unit="followers",
)
def action(cli_args, enricher, loading_bar):
try:
scraper = TwitterAPIScraper(cli_args.cookie)
except TwitterPublicAPIInvalidCookieError:
raise FatalError(
[
"Invalid Twitter cookie!",
"Try giving another browser to --cookie and sure you are correctly logged in.",
]
)

for row, user_id in enricher.cells(cli_args.column, with_rows=True):
with loading_bar.step(user_id):
iterator = scraper.followers_you_know(user_id, locale=cli_args.timezone)

try:
for user in iterator:
addendum = format_user_as_csv_row(user)
enricher.writerow(row, addendum)
loading_bar.nested_advance()

except TwitterPublicAPIBadAuthError as error:
raise FatalError(
"Bad authentication (%i). Double check your --cookie and make sure you are logged in."
% error.status
)
2 changes: 1 addition & 1 deletion minet/scrape/classes/function.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,6 @@ def __call__(self, html: AnyScrapableTarget, context: Optional[Dict] = None):
soup = cast(WonderfulSoup, ensure_soup(html, strainer=self.strainer))

if isinstance(self.fn, str):
return eval(self.fn, {"row": row, 'soup': soup}, None)
return eval(self.fn, {"row": row, "soup": soup}, None)

return self.fn(row, soup)
71 changes: 69 additions & 2 deletions minet/twitter/api_scraper.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
import time
import datetime
from urllib.parse import urlencode, quote
from twitwi import normalize_tweet
from twitwi import normalize_tweet, normalize_user
from ebbe import with_is_first, getpath, pathgetter
from json import JSONDecodeError
from tenacity import RetryCallState
Expand Down Expand Up @@ -254,6 +254,10 @@ def forge_search_params(query, target="tweets", count=DEFAULT_COUNT, cursor=None
]
)

ENTRIES_FOLLOWER_YOU_KNOW_PATH_GETTER = pathgetter(
["data", "user", "result", "timeline", "timeline", "instructions", -1, "entries"]
)


def extract_cursor_from_tweets_payload(payload):
found_cursor = CURSOR_FIRST_POSSIBLE_PATH_GETTER(payload)
Expand All @@ -274,6 +278,27 @@ def extract_cursor_from_users_payload(payload):
return CURSOR_USER_PATH_GETTER(payload)


def extract_data_from_followers_you_know_payload(payload):
entries = ENTRIES_FOLLOWER_YOU_KNOW_PATH_GETTER(payload)

if entries is None:
return None, []

bottom_cursor_node = next(
(entry for entry in entries if entry["entryId"].startswith("cursor-bottom")),
None,
)

if bottom_cursor_node is None:
return None, []

return bottom_cursor_node["content"]["value"], [
entry["content"]["itemContent"]["user_results"]["result"]
for entry in entries
if entry["entryId"].startswith("user-")
]


def process_single_tweet(tweet_id, tweet_index, user_index):
try:
tweet = tweet_index[tweet_id]
Expand Down Expand Up @@ -644,12 +669,37 @@ def request_user_search(self, query, locale, cursor=None, dump=False):
# return data

# users = [
# normalize_user(user, locale=locale)
# (user, locale=locale)
# for user in data["globalObjects"]["users"].values()
# ]

# return next_cursor, users

@retrying_method()
def request_followers_you_know(self, user_id, cursor=None):
url = "https://twitter.com/i/api/graphql/sT_K0qB2bqpWfL3-cq4zaQ/FollowersYouKnow?variables=%7B%22userId%22%3A%22{}%22%2C%22count%22%3A20%2C%22includePromotedContent%22%3Afalse%7D&features=%7B%22responsive_web_graphql_exclude_directive_enabled%22%3Atrue%2C%22verified_phone_label_enabled%22%3Afalse%2C%22creator_subscriptions_tweet_preview_api_enabled%22%3Atrue%2C%22responsive_web_graphql_timeline_navigation_enabled%22%3Atrue%2C%22responsive_web_graphql_skip_user_profile_image_extensions_enabled%22%3Afalse%2C%22c9s_tweet_anatomy_moderator_badge_enabled%22%3Atrue%2C%22tweetypie_unmention_optimization_enabled%22%3Atrue%2C%22responsive_web_edit_tweet_api_enabled%22%3Atrue%2C%22graphql_is_translatable_rweb_tweet_is_translatable_enabled%22%3Atrue%2C%22view_counts_everywhere_api_enabled%22%3Atrue%2C%22longform_notetweets_consumption_enabled%22%3Atrue%2C%22responsive_web_twitter_article_tweet_consumption_enabled%22%3Afalse%2C%22tweet_awards_web_tipping_enabled%22%3Afalse%2C%22freedom_of_speech_not_reach_fetch_enabled%22%3Atrue%2C%22standardized_nudges_misinfo%22%3Atrue%2C%22tweet_with_visibility_results_prefer_gql_limited_actions_policy_enabled%22%3Atrue%2C%22rweb_video_timestamps_enabled%22%3Atrue%2C%22longform_notetweets_rich_text_read_enabled%22%3Atrue%2C%22longform_notetweets_inline_media_enabled%22%3Atrue%2C%22responsive_web_media_download_video_enabled%22%3Afalse%2C%22responsive_web_enhance_cards_enabled%22%3Afalse%7D".format(
quote(user_id)
)

if cursor is not None:
url = "https://twitter.com/i/api/graphql/sT_K0qB2bqpWfL3-cq4zaQ/FollowersYouKnow?variables=%7B%22userId%22%3A%22{}%22%2C%22count%22%3A20%2C%22cursor%22%3A%22{}%22%2C%22includePromotedContent%22%3Afalse%7D&features=%7B%22responsive_web_graphql_exclude_directive_enabled%22%3Atrue%2C%22verified_phone_label_enabled%22%3Afalse%2C%22creator_subscriptions_tweet_preview_api_enabled%22%3Atrue%2C%22responsive_web_graphql_timeline_navigation_enabled%22%3Atrue%2C%22responsive_web_graphql_skip_user_profile_image_extensions_enabled%22%3Afalse%2C%22c9s_tweet_anatomy_moderator_badge_enabled%22%3Atrue%2C%22tweetypie_unmention_optimization_enabled%22%3Atrue%2C%22responsive_web_edit_tweet_api_enabled%22%3Atrue%2C%22graphql_is_translatable_rweb_tweet_is_translatable_enabled%22%3Atrue%2C%22view_counts_everywhere_api_enabled%22%3Atrue%2C%22longform_notetweets_consumption_enabled%22%3Atrue%2C%22responsive_web_twitter_article_tweet_consumption_enabled%22%3Afalse%2C%22tweet_awards_web_tipping_enabled%22%3Afalse%2C%22freedom_of_speech_not_reach_fetch_enabled%22%3Atrue%2C%22standardized_nudges_misinfo%22%3Atrue%2C%22tweet_with_visibility_results_prefer_gql_limited_actions_policy_enabled%22%3Atrue%2C%22rweb_video_timestamps_enabled%22%3Atrue%2C%22longform_notetweets_rich_text_read_enabled%22%3Atrue%2C%22longform_notetweets_inline_media_enabled%22%3Atrue%2C%22responsive_web_media_download_video_enabled%22%3Afalse%2C%22responsive_web_enhance_cards_enabled%22%3Afalse%7D".format(
quote(user_id), quote(cursor)
)

data = self.request_search(url)

next_cursor, user_entries = extract_data_from_followers_you_know_payload(data)

users = []

for user_entry in user_entries:
user_data = user_entry["legacy"]
user_data["id_str"] = user_entry["rest_id"]
user_data["protected"] = user_data.get("protected", False)
users.append(user_data)

return next_cursor, users

def search_tweets(
self,
query,
Expand Down Expand Up @@ -716,3 +766,20 @@ def search_users(self, query, locale=None, limit=None):
return

cursor = new_cursor

def followers_you_know(self, user_id, locale=None):
cursor = None

while True:
next_cursor, users = self.request_followers_you_know(user_id, cursor=cursor)

if not users:
return

for user in users:
yield normalize_user(user, locale=locale)

if next_cursor is None or next_cursor == cursor:
return

cursor = next_cursor

0 comments on commit ef3e288

Please sign in to comment.