[Feature Request] Ability to slurp multiple pages using [] range format #51

zanodor · 2024-07-27T17:06:28Z

I've used this in the past in with the browser extension DownThemAll.
There the syntax to extract pdf pages was:
https://adt.arcanum.com/check-access-save/MNYTESZ_Hun_1/?pg=[0:1149]

Now, you understand I don't want pdf's with this plugin. I just showed the syntax.

So syntax would be something like [0:9] or [0-9].

Plugin creator would not be responsible for knowing how many digits or items there are in the range (some sites use 1, 01 or even 001 as first item or sometimes the range you'd want is from 22 to 45 only) -- that's for the user to feel out.

Great plugin,

Cheers

Z.

The text was updated successfully, but these errors were encountered:

inhumantsar · 2024-07-28T19:51:38Z

could you provide a few examples of sites you would use this on?

zanodor · 2024-07-28T21:15:37Z

Well, currently found one:
https://lpan.eva.mpg.de/austronesian/word.php?v=123&sort=language

inhumantsar · 2024-07-28T23:11:21Z

Hmm slurp seems to have grabbed it but loading the markdown crashes obsidian for me on Android.

Anyway, I wanted to try it because Slurp/Readability is built to parse news articles, blog posts, and things like that, not tabular data.

If you're mostly thinking of using this feature for that kind of page, I suspect it will disappoint you.

zanodor · 2024-07-28T23:33:33Z

Worked perfectly for me, buddy, but of course the volume was too large for Obsidian.

But going thru the numbers one by one is tedious.

Any way to hook your plugin to a Templater script, maybe?

inhumantsar · 2024-07-29T19:10:58Z

Worked perfectly for me, buddy, but of course the volume was too large for Obsidian.

yeah when i was back at my PC, i noticed that the mobile client was able to parse, save, and even sync the file. kind of surprised that it crashed the app though, it's only ~2MB. big for a plaintext file sure, but not excessive or complex. 🤷

Any way to hook your plugin to a Templater script, maybe?

i'm not sure about templater scripting. haven't really used it.

i do see the use case for this. there's just going to be some landmines to avoid. eg: if you assume that it takes an average of 3s to download, parse, and save a page like that, then 200 of those pages is going to take ~10 minutes. i'm not sure how obsidian or the various OSs its running on will react. hitting a server could get slurp's user agent or the user's IP (or both) banned for bot scraping.

i'll have a look into it at some point and see what's possible. in the short term, i'd probably recommend running a simple script. you should be able to do something like this:

#!/bin/bash
URL="https://...your url here.../"
START=1
END=200

for i in $(seq $START $END); do
    # use slurp's obsidian's URL integration
    # on MacOS *i think* you can use "open" instead of "xdg-open"
    xdg-open obsidian://slurp?url=${URL}${i}
    # give slurp time to do the thing
    sleep 3
done

save that somewhere with whatever name you like, eg: multislurp.sh then open up a terminal window and run:

cd /path/to/the/dir
chmod +x multislurp.sh
./multislurp.sh

i'm sure the same could be accomplished with PowerShell on Windows as well but i'm not really sure how. i do know that running the bash script in WSL on Windows will not work though.

zanodor · 2024-07-29T20:47:37Z

Good (and I daresay "un-inhuman") of you to take the time to provide this information.
After a few tries, I got it to work with user input (on Linux). I put this up for posterity:

#!/bin/bash

# Prompt user for start and end values using zenity
START=$(zenity --entry --title="Input Start Value" --text="Enter the start value:")
END=$(zenity --entry --title="Input End Value" --text="Enter the end value:")

# Check if user input is valid (non-empty and numeric)
if ! [[ "$START" =~ ^[0-9]+$ ]] || ! [[ "$END" =~ ^[0-9]+$ ]]; then
    zenity --error --text="Invalid input! Please enter numeric values."
    exit 1
fi

URL="https://lpan.eva.mpg.de/austronesian/word.php?v="
SORT="&sort=language"
SLEEP_TIME=5

for i in $(seq $START $END); do
    FULL_URL="${URL}${i}${SORT}"
    ENCODED_URL=$(printf '%s' "$FULL_URL" | jq -s -R -r @uri)
    echo "Processing URL: $FULL_URL"
    echo "Encoded URL: $ENCODED_URL"
    xdg-open "obsidian://slurp?url=${ENCODED_URL}"
    sleep $SLEEP_TIME
done

This method would suffice for me, surely. So I'd say only implement something if the FR racks up a dozen likes or so.

Cheers mate

All the best,
Z.

inhumantsar added the enhancement New feature or request label Jul 28, 2024

inhumantsar mentioned this issue Dec 11, 2024

Feature proposal: Slurp multiple pages in website and locally link them #59

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Ability to slurp multiple pages using [] range format #51

[Feature Request] Ability to slurp multiple pages using [] range format #51

zanodor commented Jul 27, 2024 •

edited

Loading

inhumantsar commented Jul 28, 2024

zanodor commented Jul 28, 2024

inhumantsar commented Jul 28, 2024

zanodor commented Jul 28, 2024 •

edited

Loading

inhumantsar commented Jul 29, 2024 •

edited

Loading

zanodor commented Jul 29, 2024

[Feature Request] Ability to slurp multiple pages using [] range format #51

[Feature Request] Ability to slurp multiple pages using [] range format #51

Comments

zanodor commented Jul 27, 2024 • edited Loading

inhumantsar commented Jul 28, 2024

zanodor commented Jul 28, 2024

inhumantsar commented Jul 28, 2024

zanodor commented Jul 28, 2024 • edited Loading

inhumantsar commented Jul 29, 2024 • edited Loading

zanodor commented Jul 29, 2024

zanodor commented Jul 27, 2024 •

edited

Loading

zanodor commented Jul 28, 2024 •

edited

Loading

inhumantsar commented Jul 29, 2024 •

edited

Loading