Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zimdump - Bypassing excessively long filenames. #318

Closed
ISJ-439 opened this issue Aug 23, 2022 · 13 comments
Closed

zimdump - Bypassing excessively long filenames. #318

ISJ-439 opened this issue Aug 23, 2022 · 13 comments

Comments

@ISJ-439
Copy link

ISJ-439 commented Aug 23, 2022

Hello,

The filenames in a zim im trying to recover are too long, is it possible to add a flag or similar to bypass this or truncate it?

The error is "Exception: Error writing file to errors dir. " with an excessively long file name following.

Thanks

@mgautierfr
Copy link
Collaborator

It should already be the case, we truncate filename longer than 255 chars.
What is the file name you want to extract ?
Which command are you using ?

@ISJ-439
Copy link
Author

ISJ-439 commented Aug 23, 2022

Im going to replace the site details with an equal number of x's for privacy, hope that's okay.

ZIMtools Version: 3.1.1-2

Command
/home/localadmin/Downloads/zim-tools_linux-x86_64-3.1.1-2/zimdump dump --dir='/mnt/2TB' ./xxxxxxxxxxxxx.com-May2022_2022-05.zim

Filename
Wrote /mnt/2TB/H/xxxxxxxxxxxxx.com/category/xxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxx/page/43/ to /mnt/2TB/_exceptions/H%2fxxxxxxxxxxxxx.com%2fxxxxxxxx%2fxxxxxxxxxxxxxxxxx%2fxxxxxxxxxxxxxxx%2fpage%2f43%2f
Wrote /mnt/2TB/H/www.bitchute.com/embed/xxxxxxxxxxx/view/ to /mnt/2TB/_exceptions/H%2fwww.bitchute.com%2fembed%xxxxxxxxxxx%2fview%2f
Error writing file to errors dir. /mnt/2TB/_exceptions/A%2fhtml5-player.libsyn.com%2fembed%2fepisode%2xxx%2xxxxxxxx%2fheight%2f90%2fwidth%2f640%2ftheme%2fcustom%2fautonext%2fno%2fthumbnail%2fyes%2fautoplay%2fno%2fpreload%2fno%2fno_addthis%2fno%2fdirection%2fbackward%2frender-playlist%2fno%2fcustom-color%2f0009aa%2f
Exception: Error writing file to errors dir. /mnt/2TB/_exceptions/A%2fhtml5-player.libsyn.com%2fembed%2fepisode%2xxx%2xxxxxxxx%2fheight%2f90%2fwidth%2f640%2ftheme%2fcustom%2fautonext%2fno%2fthumbnail%2fyes%2fautoplay%2fno%2fpreload%2fno%2fno_addthis%2fno%2fdirection%2fbackward%2frender-playlist%2fno%2fcustom-color%2f0009aa%2f

{end of program output}

@ISJ-439
Copy link
Author

ISJ-439 commented Aug 23, 2022

This is 262 characters which is over the 255 limit for file names.

@kelson42 kelson42 modified the milestones: 3.2.0, 3.3.0 Aug 23, 2022
@mgautierfr
Copy link
Collaborator

What is failing is the writing of errored file.

zim dump try to write the file in your out directory (/mnt/2TB) but if it fails for some reason, it will write the file in the exception directory (/mnt/2TB/_exceptions), when doing so, it replace all / by %2f (so there is no subdirectory) and it doesn't try to truncate the filename.

The question is why it fails to write the file in the first instance ? (Sadly, zimdump doesn't report the error information)

  • What is the content of mnt/2TB/A/html5-player.libsyn.com/embed/episode/xx/xxxxxxx/height/90/width/640/theme/custom/autonext/no/thumbnail/yes/autoplay/no/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/0009aa/ ?
  • Is your directory full ? Or with some quota ?
  • ...

@ISJ-439
Copy link
Author

ISJ-439 commented Sep 1, 2022

Hello sir,

Firstly, thank you for your time.

"What is the content of mnt/2T"
The content is likely related to a MP3 player applet, yes I know this is not supported and im sorry to use this outside the scope.

"Is your directory full ? Or with some quota ?"
First thing i checked, no sir, almost a TB free, is set the directory to 'chmod -R ./* 777' so its not permissions.

My purposed solution: Provide a switch to bypass errors and just continue extracting. This would likely be simple and easy, in my limited knowledge.

Thank you.

@mgautierfr
Copy link
Collaborator

I agree with you proposal.
But it would even be better to also know why it fails. If you can provide me the zim file (even privatly if you don't what to publish it), I could investigate what is the root cause and fix it.

@kelson42
Copy link
Contributor

kelson42 commented Sep 2, 2022

No way to build anyt

My purposed solution: Provide a switch to bypass errors and just continue extracting. This would likely be simple and easy, in my limited knowledge.

We should first 100% understand the root cause.

@LTVA1
Copy link

LTVA1 commented Dec 2, 2022

Also having this issue, can provide a zim file for tests. Although it is around 3 GB in size, not sure where to upload it

@mgautierfr
Copy link
Collaborator

You can use any file share service, for exemple wetransfer.com or file.io
They are limited to 2GB files but you can cut the zim file.

@LTVA1
Copy link

LTVA1 commented Dec 7, 2022

I will try to limit crawl scope so the file would be less than 2 gb in size asap

@2600box
Copy link

2600box commented Oct 11, 2023

I made some simple modifications to ignore errors: #375

It now creates a more complete extraction. This could be incorporated into improved feedback to the user, options to ignore the errors, or hash the invalid names...

#373
In my case, invalid characters and long names cause the dump to error out and stop, with my modifications it keeps going and ignore the invalid and long names.

For example:

❯ ./zim-tools/build/src/zimdump dump --dir=./dump3 archive.zim
Error writing file to errors dir. ./dump3/_exceptions/H%2fplay.google.com%2flog?format=json&hasfast=true&authuser=0&__wb_method=POST&[[1,null,null,null,null,null,null,null,null,null,[null,null,null,null,"en",null,"17",null,null,[1,0,0,0,0]]],1654,[["1696854400954",null,[],null,null,null,null,"[[[\"%2fclient_streamz%2fpo%2fw%2fel\",null,[\"en\",\"rk\"],[[[[\"c\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,1]],[[[\"c\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,0]],[[[\"q\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,0.8000030517578125]],[[[\"S\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,3.5999984741210938]],[[[\"b\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,199.0999984741211]],[[[\"i\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,1.5]],[[[\"r\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,3440.6000061035156]],[[[\"C\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,2.4000015258789062]],[[[\"x\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,0]],[[[\"m\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,3.100006103515625]]],null,[]],[\"%2fclient_streamz%2fpo%2fw%2frl\",null,[\"mn\",\"ac\",\"sc\",\"rk\"],[[[[\"c\"],[null,1],[null,0],[\"O43z0dpjhgX20SCx4KAo\"]],[null,1887.900001525879]],[[[\"g\"],[null,1],[null,0],[\"O43z0dpjhgX20SCx4KAo\"]],[null,1331.400001525879]]],null,[]],[\"%2fclient_streamz%2fpo%2fw%2fcsc\",null,[\"cs\",\"rk\"],[[[[null,3],[\"O43z0dpjhgX20SCx4KAo\"]],[1]]],null,[]]]]",null,null,null,null,null,null,0,[null,[],null,"[[],[],[],[]]"],null,null,null,[],1,null,null,null,null,null,[]]],"1696854400955",[]]
Error writing file to errors dir. ./dump3/_exceptions/A%2fplay.google.com%2flog?format=json&hasfast=true&authuser=0&__wb_method=POST&[[1,null,null,null,null,null,null,null,null,null,[null,null,null,null,"en",null,"17",null,null,[1,0,0,0,0]]],1654,[["1696854400954",null,[],null,null,null,null,"[[[\"%2fclient_streamz%2fpo%2fw%2fel\",null,[\"en\",\"rk\"],[[[[\"c\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,1]],[[[\"c\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,0]],[[[\"q\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,0.8000030517578125]],[[[\"S\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,3.5999984741210938]],[[[\"b\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,199.0999984741211]],[[[\"i\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,1.5]],[[[\"r\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,3440.6000061035156]],[[[\"C\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,2.4000015258789062]],[[[\"x\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,0]],[[[\"m\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,3.100006103515625]]],null,[]],[\"%2fclient_streamz%2fpo%2fw%2frl\",null,[\"mn\",\"ac\",\"sc\",\"rk\"],[[[[\"c\"],[null,1],[null,0],[\"O43z0dpjhgX20SCx4KAo\"]],[null,1887.900001525879]],[[[\"g\"],[null,1],[null,0],[\"O43z0dpjhgX20SCx4KAo\"]],[null,1331.400001525879]]],null,[]],[\"%2fclient_streamz%2fpo%2fw%2fcsc\",null,[\"cs\",\"rk\"],[[[[null,3],[\"O43z0dpjhgX20SCx4KAo\"]],[1]]],null,[]]]]",null,null,null,null,null,null,0,[null,[],null,"[[],[],[],[]]"],null,null,null,[],1,null,null,null,null,null,[]]],"1696854400955",[]]

@benoit74
Copy link
Contributor

benoit74 commented Nov 1, 2023

@kelson42
Copy link
Contributor

This is basically a duplicate of #213

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants