zimdump fails on long URLs #213

rgaudin · 2021-01-13T09:41:07Z

here's the tail of the output of zimdump dump --dir /data/mqc somezim.zim:

Error writing file to errors dir. /data/mqc/_exceptions/H%2fs0.wp.com%2f_static%2f??-eJylUu1uwyAMfKERr9VaLT+mPQsEj9HwJTCL8vZzEk3NGi2KtD+Iw3fmfABDEl0MhIHAV5FcNTYUGFIXvSjeOhwfUNOV8gQrmXLR3IUxa6kLGBeVdMe4NHBp5Lr4Om8UK1PO9ljghpRk14sZbeg%2fXFMZKsyGKxmhba7NGVS1Tk8eZrnKMo9QaHT4%2fzb0if5Im1m1GkKOsZIw2erDTh5aZEk2mPKHfBXfNAGf+yRp~3
Exception: Error writing file to errors dir. /data/mqc/_exceptions/H%2fs0.wp.com%2f_static%2f??-eJylUu1uwyAMfKERr9VaLT+mPQsEj9HwJTCL8vZzEk3NGi2KtD+Iw3fmfABDEl0MhIHAV5FcNTYUGFIXvSjeOhwfUNOV8gQrmXLR3IUxa6kLGBeVdMe4NHBp5Lr4Om8UK1PO9ljghpRk14sZbeg%2fXFMZKsyGKxmhba7NGVS1Tk8eZrnKMo9QaHT4%2fzb0if5Im1m1GkKOsZIw2erDTh5aZEk2mPKHfBXfNAGf+yRp~3

Touching that file fails as well

touch: /data/mqc/_exceptions/H%2fs0.wp.com%2f_static%2f??-eJylUu1uwyAMfKERr9VaLT+mPQsEj9HwJTCL8vZzEk3NGi2KtD+Iw3fmfABDEl0MhIHAV5FcNTYUGFIXvSjeOhwfUNOV8gQrmXLR3IUxa6kLGBeVdMe4NHBp5Lr4Om8UK1PO9ljghpRk14sZbeg%2fXFMZKsyGKxmhba7NGVS1Tk8eZrnKMo9QaHT4%2fzb0if5Im1m1GkKOsZIw2erDTh5aZEk2mPKHfBXfNAGf+yRp~3: File name too long

Very long URLs seems like a common use case and I believe it calls for a design change in the way those are written to disk.

Might be related to #190

Note: this is zimdump 2.1.0

The text was updated successfully, but these errors were encountered:

kelson42 · 2021-03-02T06:48:15Z

I propose to:

truncate exception files/directory if they hare too long for the filesystem
Introduce an exceptions.log files in jsons to track them. This file would for each exception give the original path, the exception path, and the reason of the exception (see Machine readable exception log for zimdump (for dumping all content) #108)

mgautierfr · 2021-03-02T09:57:33Z

truncate exception files/directory if they hare too long for the filesystem

Care must be taken when truncating directories.
We may have 3 entries :

"long_directory_xxxxxxxxxxyyyyyy/foo.html"
"long_directory_xxxxxxxxxxyyyyyy/bar.html"
"long_directory_xxxxxxxxxxzzzzzz/foo.html"

Both first directories must correctly truncated to the same "short name" ("long_directory~1") but the second must be different ("long_directory~2")

adamlamar · 2022-12-21T00:37:58Z

Is there an example zim file available that has this problem?

2600box · 2023-10-11T09:36:40Z

Is there an example zim file available that has this problem?

Please go ahead with this one that is 3GB: https://www.transfernow.net/dl/20231009UhHnE3Sy

benoit74 · 2024-04-04T07:52:33Z

We should probably consider at the same time to ignore / replace all characters that are not allowed / interpreted differently on the target filesystem, this is causing many files to not be dumped.

benoit74 · 2024-04-04T07:53:46Z

Building a very small ZIM with many "strange" ZIM paths is probably the way to go, quite easy to do with python-libzim or python-scraperlib. This would make testing the change on many filesystems much easier.

rgaudin · 2024-04-04T07:58:44Z

Indeed but I'd like to mention that filesystems limitations are all properly documented. It should be designed with those limitations in mind as testing on various filesystems is cumbersome.

rgaudin added bug zimdump labels Jan 13, 2021

kelson42 pinned this issue Jan 22, 2021

kelson42 mentioned this issue Feb 7, 2021

zimdump stops if cannot create redirect because of invalid filename #190

Open

kelson42 added the good first issue label Mar 2, 2021

kelson42 added this to the 3.2.0 milestone Sep 25, 2022

kelson42 modified the milestones: 3.2.0, 3.3.0 Mar 22, 2023

kelson42 changed the title ~~dump fails on long URLs~~ zimdump fails on long URLs Aug 13, 2023

kelson42 modified the milestones: 3.3.0, 3.4.0 Sep 26, 2023

mgautierfr mentioned this issue Oct 9, 2023

zimdump - invalid characters and long names cause error and incomplete extraction #373

Closed

2600box mentioned this issue Oct 11, 2023

Don't stop for errors dumping invalid names or long names #375

Closed

mgautierfr mentioned this issue Nov 1, 2023

zimdump fails due to filename too long kiwix/kiwix-tools#644

Closed

This was referenced Mar 31, 2024

zimdump - Bypassing excessively long filenames. #318

Closed

zimdump 3.3.0 fails to extract many .zim files?! #390

Open

kelson42 modified the milestones: 3.4.1, 3.5.0 May 16, 2024

kelson42 modified the milestones: 3.4.2, 3.5.0 Jul 8, 2024

kelson42 modified the milestones: 3.5.0, 3.6.0 Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zimdump fails on long URLs #213

zimdump fails on long URLs #213

rgaudin commented Jan 13, 2021 •

edited

Loading

kelson42 commented Mar 2, 2021 •

edited

Loading

mgautierfr commented Mar 2, 2021 •

edited

Loading

adamlamar commented Dec 21, 2022

2600box commented Oct 11, 2023

benoit74 commented Apr 4, 2024

benoit74 commented Apr 4, 2024

rgaudin commented Apr 4, 2024

zimdump fails on long URLs #213

zimdump fails on long URLs #213

Comments

rgaudin commented Jan 13, 2021 • edited Loading

kelson42 commented Mar 2, 2021 • edited Loading

mgautierfr commented Mar 2, 2021 • edited Loading

adamlamar commented Dec 21, 2022

2600box commented Oct 11, 2023

benoit74 commented Apr 4, 2024

benoit74 commented Apr 4, 2024

rgaudin commented Apr 4, 2024

rgaudin commented Jan 13, 2021 •

edited

Loading

kelson42 commented Mar 2, 2021 •

edited

Loading

mgautierfr commented Mar 2, 2021 •

edited

Loading