Skip to content

Commit

Permalink
Refactoring fb user-infos
Browse files Browse the repository at this point in the history
  • Loading branch information
Yomguithereal committed Feb 23, 2024
1 parent 5533e94 commit 33357fe
Show file tree
Hide file tree
Showing 7 changed files with 138 additions and 46 deletions.
103 changes: 100 additions & 3 deletions docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ _Platform-related commands_
- [posts](#posts-1)
- [post-authors](#post-authors)
- [url-likes](#url-likes)
- [user-infos](#user-infos)
- [google](#google)
- [sheets](#sheets)
- [hyphe](#hyphe)
Expand All @@ -58,7 +59,7 @@ _Platform-related commands_
- [post-infos](#post-infos)
- [user-followers](#user-followers)
- [user-following](#user-following)
- [user-infos](#user-infos)
- [user-infos](#user-infos-1)
- [user-posts](#user-posts)
- [mediacloud (mc)](#mediacloud)
- [medias](#medias)
Expand Down Expand Up @@ -2736,7 +2737,7 @@ how to use the command with a CSV file?
```
Usage: minet facebook [-h]
{comments,experimental-comments,post-authors,post-stats,post,posts,url-likes}
{comments,experimental-comments,post-authors,post-stats,post,posts,url-likes,user-infos}
...

# Minet Facebook Command
Expand All @@ -2747,7 +2748,7 @@ Optional Arguments:
-h, --help show this help message and exit

Subcommands:
{comments,experimental-comments,post-authors,post-stats,post,posts,url-likes}
{comments,experimental-comments,post-authors,post-stats,post,posts,url-likes,user-infos}
Subcommand to use.
```
Expand Down Expand Up @@ -3315,6 +3316,102 @@ how to use the command with a CSV file?
$ minet facebook url-likes "value1,value2" --explode ","
```
### user-infos
```
Usage: minet facebook user-infos [-h] [-c COOKIE] [--rcfile RCFILE] [--silent]
[--refresh-per-second REFRESH_PER_SECOND]
[--simple-progress] [--throttle THROTTLE]
[-i INPUT] [--explode EXPLODE] [-s SELECT]
[--total TOTAL] [-o OUTPUT]
user_url_or_user_url_column

# Minet Facebook User Infos Command

Retrieve various information about Facebook users like their name, hometown,
current city, gender etc.

Positional Arguments:
user_url_or_user_url_column Single user to process or name of the CSV column
containing users when using -i/--input.

Optional Arguments:
-c, --cookie COOKIE Authenticated cookie to use or browser from
which to extract it (supports "firefox",
"chrome", "chromium", "opera" and "edge").
Defaults to `firefox`. Can also be configured in
a .minetrc file as "facebook.cookie" or read
from the MINET_FACEBOOK_COOKIE env variable.
--throttle THROTTLE Throttling time, in seconds, to wait between
each request. Defaults to `2.0`.
-s, --select SELECT Columns of -i/--input CSV file to include in the
output (separated by `,`). Use an empty string
if you don't want to keep anything: --select ''.
--explode EXPLODE Use to indicate the character used to separate
multiple values in a single CSV cell. Defaults
to none, i.e. CSV cells having a single values,
which is usually the case.
--total TOTAL Total number of items to process. Might be
necessary when you want to display a finite
progress indicator for large files given as
input to the command.
-i, --input INPUT CSV file (potentially gzipped) containing all
the users you want to process. Will consider `-`
as stdin.
-o, --output OUTPUT Path to the output file. Will consider `-` as
stdout. If not given, results will also be
printed to stdout.
--rcfile RCFILE Custom path to a minet configuration file. More
info about this here:
https://github.com/medialab/minet/blob/master/do
cs/cli.md#minetrc
--refresh-per-second REFRESH_PER_SECOND
Number of times to refresh the progress bar per
second. Can be a float e.g. `0.5` meaning once
every two seconds. Use this to limit CPU usage
when launching multiple commands at once.
Defaults to `10`.
--simple-progress Whether to simplify the progress bar and make it
fit on a single line. Can be useful in terminals
with partial ANSI support, e.g. a Jupyter
notebook cell.
--silent Whether to suppress all the log and progress
bars. Can be useful when piping.
-h, --help show this help message and exit

Examples:

. Fetching user infos of a series of users in a CSV file:
$ minet fb user-infos user_url -i fb-users.csv > user-infos.csv

how to use the command with a CSV file?

> A lot of minet commands, including this one, can both be
> given a single value to process or a bunch of them if
> given the column of a CSV file passed to -i/--input instead.
> Note that when given a CSV file as input, minet will
> concatenate the input file columns with the ones added
> by the command. You can always restrict the input file
> columns to keep by using the -s/--select flag.
. Here is how to use a command with a single value:
$ minet facebook user-infos "value"

. Here is how to use a command with a CSV file:
$ minet facebook user-infos column_name -i file.csv

. Here is how to read CSV file from stdin using `-`:
$ xan search -s col . | minet facebook user-infos column_name -i -

. Here is how to indicate that the CSV column may contain multiple
values separated by a special character:
$ minet facebook user-infos column_name -i file.csv --explode "|"

. This also works with single values:
$ minet facebook user-infos "value1,value2" --explode ","
```
## Google
```
Expand Down
7 changes: 6 additions & 1 deletion docs/cli.template.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ _Platform-related commands_
- [posts](#posts-1)
- [post-authors](#post-authors)
- [url-likes](#url-likes)
- [user-infos](#user-infos)
- [google](#google)
- [sheets](#sheets)
- [hyphe](#hyphe)
Expand All @@ -58,7 +59,7 @@ _Platform-related commands_
- [post-infos](#post-infos)
- [user-followers](#user-followers)
- [user-following](#user-following)
- [user-infos](#user-infos)
- [user-infos](#user-infos-1)
- [user-posts](#user-posts)
- [mediacloud (mc)](#mediacloud)
- [medias](#medias)
Expand Down Expand Up @@ -285,6 +286,10 @@ For more documentation about minet's scraping DSL check this [page](../cookbook/

<% fb/url-likes %>

### user-infos

<% fb/user-infos %>

## Google

<% google %>
Expand Down
10 changes: 3 additions & 7 deletions ftest/facebook_user_infos.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,9 @@
import csv
import sys
from tqdm import tqdm
import time
from minet.facebook import FacebookMobileScraper
from minet.cli.console import console

scraper = FacebookMobileScraper(cookie="firefox")

USERS_URL = [
]
USERS_URL = ["https://www.facebook.com/guillaume.plique.9/"]

for url in USERS_URL:
print(scraper.user_infos(url))
console.print(scraper.user_infos(url), highlight=True)
6 changes: 3 additions & 3 deletions minet/cli/facebook/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -247,15 +247,15 @@
$ minet fb url-likes url -i url.csv > url_likes.csv
""",
variadic_input={"dummy_column": "url", "item_label": "url"},

)

FACEBOOK_USER_INFOS_SUBCOMMAND = command(
"user-infos",
"minet.cli.facebook.user_infos",
title="Minet Facebook User Infos Command",
description="""
Retrieve the name, hometow, current city and gender of a given Facebook user..
Retrieve various information about Facebook users like their name, hometown,
current city, gender etc.
""",
epilog="""
Examples:
Expand Down Expand Up @@ -283,6 +283,6 @@
FACEBOOK_POST_SUBCOMMAND,
FACEBOOK_POSTS_SUBCOMMAND,
FACEBOOK_URL_LIKES_SUBCOMMAND,
FACEBOOK_USER_INFOS_SUBCOMMAND
FACEBOOK_USER_INFOS_SUBCOMMAND,
],
)
17 changes: 8 additions & 9 deletions minet/cli/facebook/user_infos.py
Original file line number Diff line number Diff line change
@@ -1,25 +1,24 @@
# =============================================================================
# Minet Facebook User Places Lived CLI Action
# Minet Facebook User Infos CLI Action
# =============================================================================
#
# Logic of the `fb user-places-lived` action.
# Logic of the `fb user-infos` action.
#
from minet.cli.utils import with_enricher_and_loading_bar
from minet.cli.loading_bar import LoadingBar
from minet.cli.facebook.utils import with_facebook_fatal_errors
from minet.facebook import FacebookMobileScraper
from minet.facebook.types import MobileFacebookUserInfo


@with_facebook_fatal_errors
@with_enricher_and_loading_bar(
headers=MobileFacebookUserInfo, title="Finding user profile infos", unit="users"
headers=MobileFacebookUserInfo, title="Scraping user infos", unit="users"
)
def action(cli_args, enricher, loading_bar):
def action(cli_args, enricher, loading_bar: LoadingBar):
scraper = FacebookMobileScraper(cli_args.cookie, throttle=cli_args.throttle)

for i, row, user_url in enricher.enumerate_cells(
cli_args.column, with_rows=True, start=1
):
for row, user_url in enricher.cells(cli_args.column, with_rows=True):
with loading_bar.step():
user_infos = scraper.user_infos(user_url)
print(row)
enricher.writerow(row, user_infos.as_csv_row() if user_infos is not None else None)
enricher.writerow(row, user_infos)
39 changes: 16 additions & 23 deletions minet/facebook/mobile_scraper.py
Original file line number Diff line number Diff line change
Expand Up @@ -772,40 +772,33 @@ def post_author(self, url):
else:
raise TypeError

def user_infos(self, url) :

def user_infos(self, url):
url = convert_url_to_mobile(url)

html = self.request_page(url)
soup = BeautifulSoupWithoutXHTMLWarnings(html, "lxml")

name = soup.find('title').text
if name == 'Content Not Found' :
name = soup.find("title").get_text().strip()

if name == "Content Not Found":
name = None

hometown_field = soup.find('span', string='Hometown')
if hometown_field is not None :
hometown = None
hometown_field = soup.find("span", string="Hometown")

if hometown_field is not None:
hometown = hometown_field.parent.parent.next_sibling.text
else :
hometown = None

current_city_field = soup.find('span', string='Current city')
if current_city_field is not None :
current_city = None
current_city_field = soup.find("span", string="Current city")

if current_city_field is not None:
current_city = current_city_field.parent.parent.next_sibling.text
else :
current_city = None

gender_field = soup.find('span', string='Gender')
if gender_field is not None :
gender = None
gender_field = soup.find("span", string="Gender")

if gender_field is not None:
gender = gender_field.parent.parent.next_sibling.text
else :
gender = None

return MobileFacebookUserInfo(name, hometown, current_city, gender)







2 changes: 2 additions & 0 deletions minet/facebook/types.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,13 +137,15 @@ class MobileFacebookUser(TabularRecord):
handle: Optional[str]
url: str


@dataclass
class MobileFacebookUserInfo(TabularRecord):
name: Optional[str]
hometown: Optional[str]
current_city: Optional[str]
gender: Optional[str]


@dataclass
class MobileFacebookPost(TabularRecord):
url: str
Expand Down

0 comments on commit 33357fe

Please sign in to comment.