Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module suggestion: firefox history/bookmarks/etc. #118

Open
redthing1 opened this issue Dec 24, 2020 · 7 comments
Open

module suggestion: firefox history/bookmarks/etc. #118

redthing1 opened this issue Dec 24, 2020 · 7 comments
Labels
enhancement New feature or request module

Comments

@redthing1
Copy link

I think a module for accessing Firefox data would be very useful.
This documentation on the Mozilla website details how the data is stored.

I am thinking that a separate script could be used to walk through that database and generate a JSON dump (similarly to how rexport works), and then an HPI module could provide access to that data.

I'm new to this project so I am not yet really familiar with how modules work, but when I have time I will attempt it and submit a PR.

@redthing1
Copy link
Author

Oh, it looks like one of the contributors to this repo @seanbreckenridge has already done this!

I wonder if there is already a corresponding HPI module or does it still need to be written?

@seanbreckenridge
Copy link
Contributor

seanbreckenridge commented Dec 24, 2020

I personally dont use bookmarks in the browser (just have a textfile with a script to open stuff), so I haven't written anything to parse that yet. Feel free to open an issue on ffexport if thats something youre interested in

Otherwise yeah, ffexport lets you export history, have a script here that saves my history sqlite file every couple weeks.

The my.browsing file on my branch uses parts of ffexport to load the data in; it also copies the live history database when computing my so it includes any backups and the current history.

As a demo:

>>> from collections import Counter
>>> from urllib.parse import urlparse
>>> from my.browsing import history
>>> Counter(map(lambda v: urlparse(v.url).netloc, history())).most_common(5)
[('github.com', 39666), ('discord.com', 21064), ('www.youtube.com', 19497), ('duckduckgo.com', 19152), ('www.google.com', 9598)]

No need to export it to JSON (though ffexport can do that), it merges and removes duplicates this from copies of the sqlite files directly

I know karlicoss uses promnesia, so that may be why that hasnt been incorporated into HPI

@karlicoss karlicoss added enhancement New feature or request module labels Feb 18, 2021
@seanbreckenridge
Copy link
Contributor

seanbreckenridge commented May 23, 2021

Just as an update, I've since converted that into browserexport, which supports reading history from:

  • Firefox (and Waterfox)
  • Chrome (and Chromium, Brave, Vivaldi)
  • Safari
  • Palemoon

If you wanted to use this, you could install my HPI modules alongside this repository (see here)

Run hpi module install my.browsing to install dependencies

setup a config block in your config file like:

# uses browserexport https://github.com/seanbreckenridge/browserexport
class browsing:
    # folder which contains your backed up databases
    export_path: Paths = "~/data/browsing"

    # additionally, read history from my active firefox database
    from browserexport.browsers.firefox import Firefox

    live_databases: Paths = Firefox.locate_database()

Then use the history function:

[ ~ ] $ ipython

In [1]: from my.browsing import history

In [2]: visits = list(history())

In [3]: len(visits)
Out[3]: 390621

[ ~ ] $ hpi query --limit 1 my.browsing.history
[{"url":"https://duckduckgo.com/?q=Brave+Verified+sites&t=brave","dt":"2020-07-21T00:11:23.544069+00:00","metadata":{"title":"Brave Verified sites at DuckDuckGo","description":null,"preview_image":null,"duration":null}}]

No support for bookmarks (yet), (I just use this); may add it in the future if someone is interested

@karlicoss
Copy link
Owner

That's great, thanks!
I'll experiment with hooking it up to cachew, and definitely would be up for using it in Promnesia!

@seanbreckenridge
Copy link
Contributor

seanbreckenridge commented May 24, 2021

Sounds good - I think I already have it hooked up to cachew, unless you mean something different. Corresponding promnesia Source for now

Only thing missing before a PR is the FirefoxMobile Browser/logic, need to export a db from my (now rooted) phone, and look at the browser source file in promnesia.

@karlicoss
Copy link
Owner

Ah -- by cachew support, I meant 'incremental' caching, so ideally if you add a new database, you'd ideally just 'merge' it in with the previously cached results.. kind of what the madness here was achieving, but without the madness :) https://github.com/karlicoss/promnesia/blob/ea9d9ef8e654c9daee7f7fb1ac458d586f8d4393/src/promnesia/sources/browser.py#L50-L51

@seanbreckenridge
Copy link
Contributor

seanbreckenridge commented Feb 14, 2022

@redthing1 browser history has a module here now; see here to set it up

If bookmarks from the databases is something you're still interested in, feel free to create an issue here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request module
Projects
None yet
Development

No branches or pull requests

3 participants