Questions, Feedback, and Suggestions #4 #5262

mikf · 2024-03-01T19:58:48Z

Continuation of the previous issue as a central place for any sort of question or suggestion not deserving their own separate issue.

Links to older issues: #11, #74, #146.

BakedCookie · 2024-03-01T21:51:46Z

For most sites I'm able to sort files into year/month folders like this:

"directory": ["{category}", "{search_tags}", "{date:%Y}", "{date:%m}"]

However for redgifs it doesn't look like there's a date keyword available for directory. There's only a date keyword available for filename. Is this an oversight?

mikf · 2024-03-02T14:17:17Z

Yep, that's a mistake that happened when adding support for galleries in 5a6fd80.
Will be fixed with the next git push.

edit: 82c73c7

#5262 (comment)

taskhawk · 2024-03-06T00:54:10Z

There's a typo in extractor.reddit.client-id & .user-agent:

"I'm not a rebot"

#5262 (comment)

the-blank-x · 2024-03-06T05:04:39Z

There's also another typo in extractor.reddit.client-id & .user-agent, "reCATCHA"

biggestsonicfan · 2024-03-06T18:55:37Z

Can you grab all the media from quoted tweets? Example.

#5262 (comment) It's implemented as a search for 'quoted_tweet_id:…' on Twitter.

#5262 (comment) This on was on the same line as the previous one ... (9fd851c)

mikf · 2024-03-07T15:56:47Z

Regarding typos, thanks for pointing them out.
I would be surprised if there aren't at least 10 more somewhere in this file.

@biggestsonicfan
This is implemented as a search for quoted_tweet_id:…- on Twitter's end.
I've added an extractor for it similar to the hashtags one (40c0553), but it only does said search under the hood.

BakedCookie · 2024-03-07T18:46:22Z

~~Normally %-encoded characters in the URL get converted nicely when running gallery-dl, eg.~~

~~https://gelbooru.com/index.php?page=post&s=list&tags=nighthawk_%28circle%29~~
~~gives me a nighthawk_(circle) folder~~

~~but for this url:~~
~~https://gelbooru.com/index.php?page=post&s=list&tags=shin%26%23039%3Bya_%28shin%26%23039%3Byanchi%29~~

~~I'm getting a shin'ya_(shin'yanchi) folder. Shouldn't I be getting a shin'ya_(shin'yanchi) folder instead?~~

EDIT: Actually, I think there's just something wrong with that URL. I had it saved for a long time and searching that tag normally gives a different URL (https://gelbooru.com/index.php?page=post&s=list&tags=shin%27ya_%28shin%27yanchi%29). I still got valid posts from the weird URL so I didn't think much of it.

mikf · 2024-03-07T19:16:43Z

%28 and so on are URL escaped values, which do get resolved.
#039; is the HTML escaped value for '.

You could use {search_tags!U} to convert them.

taskhawk · 2024-03-08T08:46:17Z

Is there support to remove metadata like this?

gallery-dl -K https://www.reddit.com/r/carporn/comments/axo236/mean_ctsv/

...
preview['images'][N]['resolutions'][N]['height']
  144
preview['images'][N]['resolutions'][N]['url']
  https://preview.redd.it/mcerovafack21.jpg?width=108&crop=smart&auto=webp&s=f8516c60ad7fa17c84143d549c070738b8bcc989
preview['images'][N]['resolutions'][N]['width']
  108
...

Post-processor:

"filter-metadata":
    {
      "name": "metadata",
      "mode": "delete",
      "event": "prepare",
      "fields": ["preview[images][0][resolutions]"]
    }

I've tried a few variations but no dice.

"fields": ["preview[images][][resolutions]"]

"fields": ["preview[images][N][resolutions]"]

"fields": ["preview['images'][0]['resolutions']"]

YuanGYao · 2024-03-08T15:23:23Z

Hello, I left a comment in #4168 . Does the _pagination method of the WeiboExtractor class in weibo.py return when data["list"] is an empty list?
When I used gallery-dl to batch download the album page of Weibo, the download also appeared incomplete.
Through testing on the web page, I found that Weibo's getImageWall api sometimes returns an empty list when the image is not completely loaded. I think this may be what causes gallery-dl to terminate the download.

mikf · 2024-03-08T21:35:56Z

@taskhawk
fields selectors are quite limited and can't really handle lists.
You might want to use a python post processor (example) and write some code that does this.

def remove_resolutions(metadata):
    for image in metadata["preview"]["images"]:
        del image["resolutions"]

(untested, might need some check whether preview and/or images exists)

@YuanGYao
Yes, the code currently stops when Weibo's API returns no more results (empty list).
This is probably not ideal, as I've hinted at in #4168 (comment)

YuanGYao · 2024-03-09T02:48:13Z

@mikf
Well, I think for Weibo's album page, since_id should be used to determine whether the image is fully loaded.
I updated my comment in #4168(comment) and attached the response returned by Weibo's getImageWall API.
I think this should help solve this problem.

BakedCookie · 2024-03-11T00:45:02Z

Not sure if I'm missing something, but are directory specific configurations exclusive to running gallery-dl via the executable?

Basically, I have a directory for regular tags, and a directory for artist tags. For regular tags I use "directory": ["{category}", "{search_tags}", "{date:%Y}", "{date:%m}"] since the tag number is manageable. For artist tags though, there's way more of them so this "directory": ["{category}", "{search_tags[0]!u}", "{search_tags}", "{date:%Y}", "{date:%m}"] makes more sense.

So right now the only way I know to get this per-directory configuration to work, is to copy the gallery-dl executable everywhere I want to use a master configuration override. Am I missing something? It feels like there should be a better way.

Hrxn · 2024-03-11T03:31:28Z

Huh? No, the configuration works always in the same way. You're simply using different configuration files?

BakedCookie · 2024-03-11T03:42:40Z

@Hrxn

From the readme:

When run as executable, gallery-dl will also look for a gallery-dl.conf file in the same directory as said executable.

It is possible to use more than one configuration file at a time. In this case, any values from files after the first will get merged into the already loaded settings and potentially override previous ones.

I want to override my master configuration %APPDATA%\gallery-dl\config.json in specific directories with a local gallery-dl.conf but it seems like that's only possible with the standalone executable.

taskhawk · 2024-03-11T04:00:08Z

You can load additional configuration files from the console with:

-c, --config FILE           Additional configuration files

You just need to specify the path to the file and any options there will overwrite your main configuration file.

Edit: From my understanding, yeah, automatic loading of local config files in each directory is only possible having the standalone executable in each directory. Are different directory options the only thing you need?

BakedCookie · 2024-03-11T04:20:03Z

@taskhawk

Thanks, that's exactly what I was looking for! Guess I didn't read the documentation thoroughly enough.

For now the only thing I'd want to override is the directory structure for artist tags. I don't think it's possible to determine from the metadata alone if a given tag is the name of an artist or not, so I thought the best way to go about it is to just have a separate directory for artists, and use a configuration override. So yeah, loading that override with the -c flag works great for that purpose, thanks again!

taskhawk · 2024-03-11T04:57:23Z

You kinda can, but you need to enable tags for Gelbooru in your configuration to get them, which will require an additional request:

    "gelbooru": {
      "directory": {
        "search_tags in tags_artists": ["{category}", "{search_tags[0]!u}", "{search_tags}", "{date:%Y}", "{date:%m}"],
        ""                           : ["{category}", "{search_tags}", "{date:%Y}", "{date:%m}"]
      },
      "tags": true
    },

Set "tags": true in your config and run a test with gallery-dl -K "https://gelbooru.com/index.php?page=post&s=list&tags=TAG" so you can see the tags_* keywords.

Of course, this depends on the artists being correctly tagged. Not sure if it happens on Gelbooru, but at least in other boorus and booru-like sites I've come across posts with the artist tagged as a general tag instead of an artist tag. Another limitation is that your search tag can only include one artist at a time, doing more will require a more complex expression to check all tags are present in tags_artists.

What I do instead is that I inject a keyword to influence where it will be saved, like this:

gallery-dl -o keywords='{"search_tags_type":"artists"}' "https://gelbooru.com/index.php?page=post&s=list&tags=ARTIST"

And in my config I have

    "gelbooru": {
      "directory": ["boorus", "{search_tags_type}", "{search_tags}"]
    },

You can have:

    "gelbooru": {
      "directory": {
        "search_tags_type == 'artists'": ["{category}", "{search_tags[0]!u}", "{search_tags}", "{date:%Y}", "{date:%m}"],
        ""                             : ["{category}", "{search_tags}", "{date:%Y}", "{date:%m}"]
      }
    },

You can do this for other tag types, like general, copyright, characters, etc.

Because it's a chore to type that option every time I made a wrapper script, so I just call it like this because artists is my default:

~/script.sh "TAG"

For other tag types I can do:

~/script.sh --copyright "TAG"
~/script.sh --characters "TAG"
~/script.sh --general "TAG"

BakedCookie · 2024-03-11T06:08:52Z

Thanks for pointing out there's a tags option available for the gelbooru extractor. I already used it in the kemono extractor to get the name of the artist, but it didn't occur to me that gelbooru might also have such an option (and just accepted that the tags aren't categorized).

For artists I store all the url's in their respective gelbooru.txt, rule34.txt, etc files like so:

https://gelbooru.com/index.php?page=post&s=list&tags=john_doe
https://gelbooru.com/index.php?page=post&s=list&tags=blue-senpai
https://gelbooru.com/index.php?page=post&s=list&tags=kaneru
.
.
.

And then just run gallery-dl -c gallery-dl.conf -i gelbooru.txt. Since the search_tags ends up being the artist anyway, getting tags_artists is probably not worth the extra request. Same for general tags, and copyright tags, in their respective directories. With this workflow I can't immediately see where I'd be able to utilize keyword injection, but it's definitely a useful feature that I'll keep in mind.

Wiiplay123 · 2024-03-18T03:15:32Z

When I'm making an extractor, what do I do if the site doesn't have different URL patterns for different page types? Every single page is just a numerical ID that could be a forum post, image, blog post, or something completely different.

mikf · 2024-03-19T12:39:55Z

@Wiiplay123 You handle everything with a single extractor and decide what type of result to return on the fly. The gofile code is a good example for this I think, or aryion.

I-seah · 2024-03-20T03:58:08Z

Hi, what options should I use in my config file to change the format of dates in metadata files? I would like to use "%Y-%m-%dT%H:%M:%S%z" for the values of "date" and "published" (from coomer/kemono downloads).

And would it also be possible to do this for json files that ytdl creates? I downloaded some videos with gallery-dl but the dates got saved as "upload_date": "20230910" and "timestamp": 1694344011, so I think it might be better to convert the timestamp to a date to get a more precise upload time, but I'm not sure if it's possible to do that either.

fireattack · 2024-10-20T06:15:37Z

>python3 -m pip install -U -I --force-reinstall --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/m
aster.tar.gz
Collecting https://github.com/mikf/gallery-dl/archive/master.tar.gz
  Downloading https://github.com/mikf/gallery-dl/archive/master.tar.gz
     - 731.8 kB 643.2 kB/s 0:00:01
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: gallery_dl
  Building wheel for gallery_dl (pyproject.toml) ... done
  Created wheel for gallery_dl: filename=gallery_dl-1.27.7.dev0-py3-none-any.whl size=568515 sha256=9fb25b7eefd00dcd729bc26fa937ad531a0741f26d9c7fc7fbdb86bc578611e5
  Stored in directory: C:\Users\ikena\AppData\Local\Temp\pip-ephem-wheel-cache-be3ub4hf\wheels\2a\d6\3c\0feca23fdac400e011e04ee81cac7ad83f629ec3e2cc8dc1ed
Successfully built gallery_dl
Installing collected packages: gallery_dl
Successfully installed gallery_dl-1.26.9.dev0

Now it says 1.26.9.dev0. despite the built wheel clearly says gallery_dl-1.27.7.dev0-py3-none-any.whl. Did pip just randomly calculate these version numbers on its own?

biggestsonicfan · 2024-10-20T06:19:49Z

I feel like something has gone awry for sure. Try creating a fresh venv and installing in that, just in case?

fireattack · 2024-10-20T06:33:19Z

Ah thanks, I figured it out. Apparently I have billions of gallery_dl (some are called gallery-dl, even) installed in my system.

And doing pip uninstall gallery_dl will only uninstall one of them.. others will happily continue to exist (pip list will only list one of them, too.)

So, I have to run pip uninstall gallery_dl multiple times until pip list reports none and then re-install.

I suspect this is caused by -I argument in the command given in README:

-I, --ignore-installed
Ignore the installed packages, overwriting them. This can break your system if the existing package is of a different version or was installed with a different package manager!

(environment variable: PIP_IGNORE_INSTALLED)

Maybe we shouldn't let the users use it unless really needed, @mikf ? (Or change to --force-reinstall instead.)

Log if interested

Microsoft Windows [Version 10.0.19045.5011]
(c) Microsoft Corporation. All rights reserved.

D:\3>pip uninstall gallery-dl
Found existing installation: gallery-dl 1.26.9.dev0
Uninstalling gallery-dl-1.26.9.dev0:
  Would remove:
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl-1.26.9.dev0.dist-info\*
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\*
    c:\users\ikena\appdata\local\programs\python\python311\scripts\gallery-dl.exe
  Would not remove (might be manually added):
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\archive.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\agnph.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\ao3.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\boosty.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\cien.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\civitai.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\cohost.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\hentainexus.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\koharu.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\postprocessor\hash.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\postprocessor\rename.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\update.py
Proceed (Y/n)? y
  Successfully uninstalled gallery-dl-1.26.9.dev0

D:\3>pip list | findstr gallery
gallery_dl                     1.27.0.dev0

D:\3>pip uninstall gallery_dl
Found existing installation: gallery_dl 1.27.0.dev0
Uninstalling gallery_dl-1.27.0.dev0:
  Would remove:
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl-1.27.0.dev0.dist-info\*
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\archive.py
Proceed (Y/n)? y
  Successfully uninstalled gallery_dl-1.27.0.dev0

D:\3>python3 -m pip install -U -I --force-reinstall --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/m
Collecting https://github.com/mikf/gallery-dl/archive/m
  ERROR: HTTP error 404 while getting https://github.com/mikf/gallery-dl/archive/m
ERROR: Could not install requirement https://github.com/mikf/gallery-dl/archive/m because of HTTP error 404 Client Error: Not Found for url: https://github.com/mikf/gallery-dl/archive/m for URL https://github.com/mikf/gallery-dl/archive/m

D:\3>python -m pip install -U -I --force-reinstall --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/master.tar.gz
Collecting https://github.com/mikf/gallery-dl/archive/master.tar.gz
  Downloading https://github.com/mikf/gallery-dl/archive/master.tar.gz (731 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 731.8/731.8 kB 2.0 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: gallery_dl
  Building wheel for gallery_dl (pyproject.toml) ... done
  Created wheel for gallery_dl: filename=gallery_dl-1.27.7.dev0-py3-none-any.whl size=568515 sha256=9695c9ea1c21c83ed4dfa7d9c9ad91a1692c7adb7936fe7ea17dbbbdf28a1485
  Stored in directory: C:\Users\ikena\AppData\Local\Temp\pip-ephem-wheel-cache-_uiz8zay\wheels\2a\d6\3c\0feca23fdac400e011e04ee81cac7ad83f629ec3e2cc8dc1ed
Successfully built gallery_dl
Installing collected packages: gallery_dl
Successfully installed gallery_dl-1.27.2.dev0

D:\3>pip list | findstr gallery
gallery_dl                     1.27.2.dev0

D:\3>pip uninstall gallery-dl gallery_dl
Found existing installation: gallery_dl 1.27.2.dev0
Uninstalling gallery_dl-1.27.2.dev0:
  Would remove:
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl-1.27.2.dev0.dist-info\*
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\*
    c:\users\ikena\appdata\local\programs\python\python311\scripts\gallery-dl.exe
  Would not remove (might be manually added):
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\ao3.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\boosty.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\civitai.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\cohost.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\postprocessor\hash.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\postprocessor\rename.py
Proceed (Y/n)? y
  Successfully uninstalled gallery_dl-1.27.2.dev0

D:\3>pip list | findstr gallery
gallery_dl                     1.27.2

D:\3>pip uninstall gallery_dl
Found existing installation: gallery_dl 1.27.2
Uninstalling gallery_dl-1.27.2:
  Would remove:
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl-1.27.2.dist-info\*
Proceed (Y/n)? y
  Successfully uninstalled gallery_dl-1.27.2

D:\3>pip uninstall gallery_dl
Found existing installation: gallery_dl 1.27.7.dev0
Uninstalling gallery_dl-1.27.7.dev0:
  Would remove:
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl-1.27.7.dev0.dist-info\*
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\ao3.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\boosty.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\civitai.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\cohost.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\postprocessor\hash.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\postprocessor\rename.py
Proceed (Y/n)? n

D:\3>pip list | findstr gallery
gallery_dl                     1.27.7.dev0

D:\3>pip uninstall gallery_dl
Found existing installation: gallery_dl 1.27.7.dev0
Uninstalling gallery_dl-1.27.7.dev0:
  Would remove:
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl-1.27.7.dev0.dist-info\*
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\ao3.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\boosty.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\civitai.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\cohost.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\postprocessor\hash.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\postprocessor\rename.py
Proceed (Y/n)? y
  Successfully uninstalled gallery_dl-1.27.7.dev0

D:\3>pip list | findstr gallery

D:\3>python -m pip install -U --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/master.tar.gz
Collecting https://github.com/mikf/gallery-dl/archive/master.tar.gz
  Downloading https://github.com/mikf/gallery-dl/archive/master.tar.gz (731 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 731.8/731.8 kB 6.0 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: gallery_dl
  Building wheel for gallery_dl (pyproject.toml) ... done
  Created wheel for gallery_dl: filename=gallery_dl-1.27.7.dev0-py3-none-any.whl size=568515 sha256=8c3fef44c47c3c0f5b123a50a76a60798804bdfc81509d242dc745c3b561e186
  Stored in directory: C:\Users\ikena\AppData\Local\Temp\pip-ephem-wheel-cache-b8nlpxh6\wheels\2a\d6\3c\0feca23fdac400e011e04ee81cac7ad83f629ec3e2cc8dc1ed
Successfully built gallery_dl
Installing collected packages: gallery_dl
Successfully installed gallery_dl-1.27.7.dev0

501stRookie · 2024-10-27T03:57:00Z

Is there a way to download specifically the revisions on an artist's page on kemono.su? For example, one artist has had many of their posts updated with a revision that removed the content, while the original revision retains them. There are hundreds of posts on their page like that, so I was wondering if there was a way to set it to download the original revisions for all of them automatically.

topchaser · 2024-10-27T10:44:52Z

I am getting the error pixiv: Unable to download work 59915441 ('sanity_level' warning) when I try to download this link (NSFW, but you cannot see it unless logged in): https://www.pixiv.net/en/artworks/59915441

I see many mentions of this error: https://github.com/mikf/gallery-dl/issues?q=sanity+level+warning

but I read through many of them trying to understand what to do, and I cannot figure it out. Will someone please tell me how to fix this.

Also, just to vent, I had no idea how long this had been happening, or if any of my attempts to download pixiv profiles prior had been subject to this. I can't retroactively check any logs, since I think I used to have logs, but it would cause redownloading profiles to skip media it already downloaded, which annoyed me. I didn't know if I could disable that specifically, so I just gave up on having logs. So, I potentially am missing media when I intended to get everything. I am a bit sad about it. Also, the "logs" I am describing might actually be something entirely different, and might not have told me of this error anyway. I don't know. I barely manage to get gallery-dl working for myself, so it working at all is essentially where my knowledge on the program ends.

I just noticed that the latest gallery-dl release made this "just werk":
https://github.com/mikf/gallery-dl/releases/tag/v1.27.7

Improvements
[pixiv] implement sanity_level workaround for user artworks results (#4327, #5435, #6339)

I still don't know whether it was possible to download such artwork using gallery-dl before (I thought it was, so I was just asking for someone to explain to me in simple terms how to do it), but, again, it "just werks" now, so, much appreciated.

biggestsonicfan · 2024-10-30T22:11:04Z

So Seiga is now region-locked. Can I proxy/wireguard just that extractor?

EDIT: I've managed to get Wireguard locally to proxy via a port using wireproxy, but I just need a post(pre)processor to launch it as a daemon and close it when it's done.

EDIT2: Figured it out:

            "actions": {
            "*": "exec wireproxy -c ~/.config/wireproxy/wp-config.conf -d"
            },
            "postprocessors": [
                "json_metadata",
                {
                    "name": "exec",
                    "command": "pkill wireproxy",
                    "event": "finalize"
                }
            ]

biggestsonicfan · 2024-11-01T04:41:13Z

I hate posting so frequently here but I hate making new issues more. This is once again an issue for me.

I've just supported a user that has a preview image and download urls in their post. I normally parse the json files with a python script, however this preview image had been downloaded previously and I don't overwrite json data anymore. So I will re-run the user with skip set to false, but I really need a solution to separate data if supported or not and by which support tier.

EDIT: I also don't get how the metadata archive works either. Will the metadata entry be the same as the one for the extractor?

mikf · 2024-11-03T17:37:27Z

@biggestsonicfan
It should be possible to use the feeRequired value and/or isRestricted in a filter statement to determine whether you can access a post or not. You can also use the metadata option to get plan data ("metadata": "plan") and potentially use that in a filter as well.

however this preview image had been downloaded previously and I don't overwrite json data anymore

A metadata post processor by default runs only when a file gets downloaded ("event": "file"), but you can also have it run for skipped downloads ("event": "file,skip") or just for the post itself ("event": "post") which is never "skipped", although that requires a custom filename.

Will the metadata entry be the same as the one for the extractor?

metadata archive IDs are by default different from the actual file's archive ID, but you can always change that with archive-format and archive-prefix.

baodrate · 2024-11-07T13:49:31Z

could it be allowed that the default config be in toml? so the user does not have to specify --config-toml FILE on the command line every time?

i.e. add to gallery_dl.config._default_configs:

${XDG_CONFIG_HOME}/gallery-dl/config.toml
~/.config/gallery-dl/config.toml

(And it would probably make sense to also add the equivalent yaml paths)

topchaser · 2024-11-09T14:18:58Z

Trying to download this: https://misskey.gg/notes/9yp3zt35c3

using: gallery-dl misskey:https://misskey.gg/notes/9yp3zt35c3

produces this error: [downloader.http][warning] ('Connection broken: IncompleteRead(0 bytes read, 58762 more expected)', IncompleteRead(0 bytes read, 58762 more expected)) (1/5)

until it hits 5/5 then fails. It happens for all misskey.gg links. In contrast, misskey.io links work without even needing to preface the link with "misskey:". For example: https://misskey.io/notes/9ru7yqi5u4j6070a

Is there anything I can do to make misskey.gg links work?

The example link I provided no longer seems to be online, but I just noticed when downloading a profile on misskey.gg that the 5/5 timeout error no longer happened. But, it also doesn't appear that it added any older media that I assumed was being skipped. I didn't actually look at what was causing the 5/5 timeout error to see if it was media, but, since it appears to "just werk" at this point, I assume what was timing out simply was not media at all. I don't know. Either way, I am saying that I just noticed this is no longer reproducible.

biggestsonicfan · 2024-11-09T19:23:03Z

@mikf
A bit of banging my head against my metadata issues again only to find I am running into this issue. If I find a restricted post, they don't always have a preview image, so no metadata is downloaded. However if I use the "event": "post", my filename convention isn't honored as the metadata is gone.

God-damnit-all · 2024-11-11T17:20:09Z

--retries -1 is apparently considered invalid now, maybe due to some new argument parsing that tries to perceive -1 as a parameter?

#5262 (comment) fixes regression introduced in 9e72968 'argparse' sets a flag and changes its behavior when using something that looks like a negative number as option string, '-4' and '-6' in this case.

biggestsonicfan · 2024-11-14T17:35:34Z

Can the postprocessor use multiple filters? I'm trying but I'm getting TypeError: compile() arg 1 must be a string, bytes or AST object

mikf · 2024-11-14T17:45:44Z

@God-damnit-all
Fixed in cd6d6ea. Thanks for reporting, I wouldn't have caught this otherwise.

@biggestsonicfan
Post processors filter expressions can currently only be specified as a simple string and not as a list as is possible for image-filter. You can manually combine conditional expressions with ( cond1 ) and ( cond2 ) and ... though.

biggestsonicfan · 2024-11-14T17:58:06Z

@mikf

Post processors filter expressions can currently only be specified as a simple string and not as a list as is possible for image-filter.

Gotcha. It might be nice to clarify that in the post-processor docs, as that's where I got the idea to use it as a list.

My idea is to use filters to run specific postprocessors in order if:

The post is not restricted, has a paid plan, has a filename (run as prepare in /paid/plan-cost directory)
The post is not restricted, has a paid plan, has no filenames (run as post in /paid/plan-cost directory)
The post is not restricted and has a filename (run as prepare in /free directory)
The post is not restricted, has a paid plan of 0, and no filename (run as post in /free directory)
The post is restricted (run as post in /not-paid/plan-cost directory)

Which I think would resolve to:

not locals().get('isRestricted') and 'filename' in locals().keys
not locals().get('isRestricted') and not 'filename' in locals().keys
not locals().get('isRestricted') and locals().get('feeRequired') == 0
1. not locals().get('isRestricted') and locals().get('feeRequired') == 0 and not 'filename' in locals().keys
locals().get('isRestricted')

#5262 (comment) allow (theoretically*) all filter expression statements to be a list of individual filters (*) except for 'filename' and 'directory' conditionals, as dict keys cannot be lists

mikf · 2024-11-15T17:37:12Z

@biggestsonicfan
All filters can now consist of multiple statements, including post processor filters: 5bc3657

biggestsonicfan · 2024-11-19T19:22:03Z

I am gently moving away from PixivUtil2 for my pixiv downloads and would like to configure gallery-dl to match it's settings configuration as closely as possible. I am unable to actually endocde ugoira files, however.

Pixiv config settings:

        "pixiv":
        {
            "refresh-token": "",
            "avatar": true,
            "sleep": 2.0,
            "sleep-request": 2.0,
            "sleep-extractor": 2.0,
            "base-directory": "/run/media/rob/v/pixiv/",
            "directory": ["{category}", "{user['id']}"],
            "filename":{
            "type=='ugoira'": "{filename} - {title}.{extension}",
            "id=='avatar'": "folder.{extension}",
            "id=='background'": "bg_folder.{extension}",
            "":"{id}_p{num} - {title}.{extension}"
            },
            "include": ["avatar", "background", "artworks"],
            "cookies": ["firefox", "", null, null, ".pixiv.net"],
            "tags": "japanese",
            "ugoira": true,
            "metadata": true,
            "postprocessors": ["json_metadata","mtime","process_ugoira"]
        },

Post processor settings:

        "process_ugoira":
        {
            "name": "ugoira",
            "extension": "webm",
            "ffmpeg-twopass": true,
            "ffmpeg-args": ["-lossless", "0", "-crf", "15", "-b", "0", "-vsync", "0", "-pix_fmt", "yuv420p"],
            "ffmpeg-demuxer": "archive",
            "keep-files": true,
            "metadata": true,
            "mtime": true,
            "repeat-last-frame": true,
            "skip": true
        },

ffmpeg-location is omitted because the default is ffmpeg, which should execute ffmpeg which I have installed and works as a valid command from the terminal. Adding /usr/bin/ffmpeg to ffmpeg-location results in "postprocessor.ugoira: 112353287_ugoira1920x1080 - Holoctober 2023 🗿 🚔.webm" being appended to the the zip archive download for this work but no file is produced.

mikf · 2024-11-19T20:31:05Z

@biggestsonicfan
"ffmpeg-demuxer": "archive" is meant to combine the individual ugoira frames downloaded with "ugoira": "original" into a new .zip archive. When applied to a downloaded .zip archive from "ugoira": true, it doesn't do much except maybe write frame timecode metadata into the archive.

Set your pixiv "ugoira" settings to "original" and use and adapt these postprocessor settings to generate an animation as well as an archive from higher-quality, "original" ugoira frames.

I'd also recommend installing mkvmerge when generating animations to get exact animation time codes for all ugoira.

fireattack · 2024-11-21T02:51:16Z

I know issues about anti-crawling from Instagram has been asked billions of times, but I never saw it being so strict.

I download newly added posts/stories from merely 8 accounts once a day.
And my account/cookies (extracted from Firefox) will be HTTP 400 in no more than 2 days, until I interact in browser to click through a few warnings. I haven't get my account banned or suspended (yet).

Given that I barely download anything most of time (these 8 accounts don't really update that frequently), I wonder how do they even detect. Is the way we request their API endpoints too easy to spot? Anyway, if there is any insight to avoid or mitigate this, it would be very helpful.

Hrxn · 2024-11-21T03:04:18Z

I already notice a slight delay in loading this, so I'd suggest to close this and open Questions, Feedback, and Suggestions #5

God-damnit-all · 2024-11-21T17:58:49Z

The page is quite laggy now, yes.

Infinitay · 2024-11-30T07:34:08Z

Would you please consider uploading nightly builds onto WinGet, Scoop, Chocolatey, or some other package manager? It would be great to have the latest changes with the ability of package managers to keep gallery-dl updated automatically. Especially with the frequency of updates.

For example, the latest release is v1.27.7 (October 25th, 2024) but currently 123 commits behind master. The [nightly build's latest version is 1.28.0-dev]*(https://github.com/gdl-org/builds/releases/tag/2024.11.28) (November 28th, 2024) and as of now not behind any commits on master.

It would be great to have the latest commits and support in cases where broken modules were updated for sites that constantly change or additional support was added.

mikf · 2024-12-01T18:53:05Z

Closing this as suggested by Hrxn (#5262 (comment)).
New issue: #6582.

mikf · 2024-12-01T19:01:11Z

@fireattack
Spotting requests to their API not coming from the website or mobile app is quite easy, I'd guess. Even more so since the API endpoints used by gallery-dl are not at all the ones used by the IG website.

@Infinitay
I have no control over what versions third-party package managers distribute. You'd have to ask them to package the binaries from https://github.com/gdl-org/builds/releases. Or use those yourself and update with gallery-dl -U.

And if you really want the latest commits, you can always git clone and run from source.

mikf added Questions Meta labels Mar 1, 2024

mikf mentioned this issue Mar 1, 2024

Questions, Feedback and Suggestions #3 #146

Closed

mikf pinned this issue Mar 1, 2024

mikf added a commit that referenced this issue Mar 2, 2024

[redgifs] make 'date' available for directories (#5262)

82c73c7

#5262 (comment)

mikf added a commit that referenced this issue Mar 6, 2024

[docs] fix typo: rebot -> robot (#5262)

9fd851c

#5262 (comment)

mikf added a commit that referenced this issue Mar 7, 2024

[twitter] add 'quotes' extractor (#5262)

40c0553

#5262 (comment) It's implemented as a search for 'quoted_tweet_id:…' on Twitter.

mikf added a commit that referenced this issue Mar 7, 2024

[docs] fix another typo (#5262)

052811b

#5262 (comment) This on was on the same line as the previous one ... (9fd851c)

fireattack mentioned this issue Nov 18, 2024

Remove -I for installing from the source in readme #6493

Merged

mikf mentioned this issue Dec 1, 2024

Questions, Feedback, Suggestions #5 #6582

Open

mikf closed this as completed Dec 1, 2024

mikf unpinned this issue Dec 1, 2024

Questions, Feedback, and Suggestions #4 #5262

Questions, Feedback, and Suggestions #4 #5262

Comments

mikf commented Mar 1, 2024

BakedCookie commented Mar 1, 2024

mikf commented Mar 2, 2024 • edited Loading

taskhawk commented Mar 6, 2024

the-blank-x commented Mar 6, 2024

biggestsonicfan commented Mar 6, 2024

mikf commented Mar 7, 2024 • edited Loading

BakedCookie commented Mar 7, 2024 • edited Loading

mikf commented Mar 7, 2024

taskhawk commented Mar 8, 2024

YuanGYao commented Mar 8, 2024

mikf commented Mar 8, 2024

YuanGYao commented Mar 9, 2024 • edited Loading

BakedCookie commented Mar 11, 2024

Hrxn commented Mar 11, 2024

BakedCookie commented Mar 11, 2024

taskhawk commented Mar 11, 2024 • edited Loading

BakedCookie commented Mar 11, 2024

taskhawk commented Mar 11, 2024 • edited Loading

BakedCookie commented Mar 11, 2024

Wiiplay123 commented Mar 18, 2024

mikf commented Mar 19, 2024 • edited Loading

I-seah commented Mar 20, 2024 • edited Loading

fireattack commented Oct 20, 2024 • edited Loading

biggestsonicfan commented Oct 20, 2024

fireattack commented Oct 20, 2024 • edited Loading

501stRookie commented Oct 27, 2024

topchaser commented Oct 27, 2024 • edited Loading

biggestsonicfan commented Oct 30, 2024 • edited Loading

biggestsonicfan commented Nov 1, 2024 • edited Loading

mikf commented Nov 3, 2024

baodrate commented Nov 7, 2024

topchaser commented Nov 9, 2024

biggestsonicfan commented Nov 9, 2024

God-damnit-all commented Nov 11, 2024

biggestsonicfan commented Nov 14, 2024

mikf commented Nov 14, 2024

biggestsonicfan commented Nov 14, 2024 • edited Loading

mikf commented Nov 15, 2024

biggestsonicfan commented Nov 19, 2024

mikf commented Nov 19, 2024 • edited Loading

fireattack commented Nov 21, 2024 • edited Loading

Hrxn commented Nov 21, 2024

God-damnit-all commented Nov 21, 2024

Infinitay commented Nov 30, 2024

mikf commented Dec 1, 2024

mikf commented Dec 1, 2024 • edited Loading

mikf commented Mar 2, 2024 •

edited

Loading

mikf commented Mar 7, 2024 •

edited

Loading

BakedCookie commented Mar 7, 2024 •

edited

Loading

YuanGYao commented Mar 9, 2024 •

edited

Loading

taskhawk commented Mar 11, 2024 •

edited

Loading

taskhawk commented Mar 11, 2024 •

edited

Loading

mikf commented Mar 19, 2024 •

edited

Loading

I-seah commented Mar 20, 2024 •

edited

Loading

fireattack commented Oct 20, 2024 •

edited

Loading

fireattack commented Oct 20, 2024 •

edited

Loading

topchaser commented Oct 27, 2024 •

edited

Loading

biggestsonicfan commented Oct 30, 2024 •

edited

Loading

biggestsonicfan commented Nov 1, 2024 •

edited

Loading

biggestsonicfan commented Nov 14, 2024 •

edited

Loading

mikf commented Nov 19, 2024 •

edited

Loading

fireattack commented Nov 21, 2024 •

edited

Loading

mikf commented Dec 1, 2024 •

edited

Loading