-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Select OEM/ANSI code page according to system locale setting #36
base: main
Are you sure you want to change the base?
Conversation
Hello, thank you for this. I've seen your progress across multiple issue trackers regarding Linux and 7zip path encoding. void MultiByteToUnicodeString3_iconv(UString &res, const AString &s)
{
res.Empty();
if (s.IsEmpty())
return;
iconv_t cd;
if ((cd = iconv_open("UTF-8", "CP932")) != (iconv_t)-1) {
AString sUtf8;
unsigned slen = s.Len();
char* src = s.Ptr_non_const();
unsigned dlen = slen * 4 + 1; // (source length * 4) + null termination
char* dst = sUtf8.GetBuf_SetEnd(dlen);
const char* dstStart = dst;
memset(dst, 0, dlen);
size_t slen_size_t = static_cast<size_t>(slen);
size_t dlen_size_t = static_cast<size_t>(dlen);
size_t done = iconv(cd, &src, &slen_size_t, &dst, &dlen_size_t);
if (done == (size_t)-1) {
iconv_close(cd);
// iconv failed. Falling back to default behavior
MultiByteToUnicodeString2(res, s, 932);
return;
}
// Null-terminate the result
*dst = '\0';
iconv_close(cd);
AString sUtf8CorrectLength;
size_t dstCorrectLength = dst - dstStart;
sUtf8CorrectLength.SetFrom(sUtf8, static_cast<unsigned>(dstCorrectLength));
if (ConvertUTF8ToUnicode(sUtf8CorrectLength, res) /*|| ignore_Utf8_Errors*/)
return;
}
} and then in LzhHandler's GetProperty: UString dst;
MultiByteToUnicodeString3_iconv(dst, item.GetName());
UString s = NItemName::WinPathToOsPath(dst); If you'd like to try implementing it for LZH, here's a sample file and the current / expected output of
(I noticed the Debian package maintainer for 7zz is Japanese, so he may have some experience) |
Unfortunately, I also don't understand the 7-zip codebase well enough. I'm afraid we need Igor Pavlov to tell us how to implement this. |
@ip7z: What do you think about this PR? |
This and the patch applied to the debian package completely wrecks shift_jis encoded filenames. Difference is that the first one can be fixed with |
Will look at this, thanks! |
Can you please attach a sample file with such name? |
Absolutely, here you go. without patch: Also fwiw the current output with this patch is the same I get in Ark (KDE's archive GUI), no matter if I have an unpatched 7zip. But that's probably them just relying on how p7zip worked. Unzipping with pure 7z or specifying encoding with unzip is currently the only way to get these filenames without corrupting them. |
@kattjevfel please try to replace
to
after applying this patch, then check you archive once again. Make sure your system locale is set to ja_JP as 7zip should somehow "guess" what legacy code page to use, so its using system locale for that. |
Your patch does work after adding At that point I still prefer having them not be converted to UTF-8 by 7-zip at all and instead have an external tool identify and convert them, as that doesn't even require the locale to be installed. |
After change above it should also work without locale installed. |
You are right, when using |
Use |
Fixes
https://sourceforge.net/p/sevenzip/bugs/2473/
https://sourceforge.net/p/sevenzip/bugs/1060/