-
Notifications
You must be signed in to change notification settings - Fork 335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
iconv always silent, -c
broken
#471
Comments
My devuan balseraph man page says "-s This option is ignored; it is provided only for compatibility", and providing it doesn't change the behavior here? (Do you want it to say "Illegal input sequence at position 1" like the other one, whether -s is provided or not, or to produce its current output but exit with an error if some chars couldn't be converted?) I'm not getting a $ with -c, I'm getting "ab" from toybox iconv built against musl. (I think the binary in my $PATH should be the same one at https://landley.net/bin/toybox/latest/toybox-x86_64 ?) Generally when an option flag in an optstr A) isn't documented, B) doesn't do anything, it's just there for compatibility. I don't have a way to mark these, nor do I have an entirely standard policy on them (patch lists -u and says it's ignored, but that's largely because "patch -u" was kind of a big deal for ages. I have a local help text redo that yanks the -u line but adds (-u is assumed and ignored) to second paragraph, haven't checked it in yet because... well it's a judgement call, innit? I expect a non-zero number of people to learn to use a command from the toybox help text. (And presumably go on to read bigger docs eventually, but I still don't want to rule OUT people being able to do that.) And thus explicitly telling them "you don't need to know this, ignore it", while trying to be terse seems counterproductive. |
Try commit 53d8a67 maybe? |
oh, that's sad. i'd believed https://pubs.opengroup.org/onlinepubs/009604499/utilities/iconv.html which has a much more useful description. but, yes, my linux box has the breakage you describe:
how about macOS?
so also not what i'd expected from POSIX, but different to the gnu one.
that was my prompt :-) here's toybox on macOS with your latest patch:
that's certainly better (and i think the '?'s are reasonable and defensible). i'm less tempted by the error message now than i was when i believed that you could disable it with POSIX certainly allows not having a message ("When -s is not used, the results of encountering invalid characters in the input stream (either those that are not valid characters in the codeset of the input file or that have no corresponding character in the codeset of the output file) shall be specified in the system documentation."), though both |
My commit changed the error return in case anybody's testing it (so a script CAN see it wasn't a full conversion), but I didn't output something that would stomp the conversion. I can output the data to stdout and an error message to stderr, but for humans those would interleave and arguably be net worse? The question is whether partial results are useful. The error message kind of implies refusing to produce any useful output because one thing went wrong. |
which is basically the macOS (bsd?) behavior. i suspect their argument would be "if you wanted us to either ignore transliteration failures, or use a replacement character, you should have suffixed the encoding name with //IGNORE or whatever". although it surprised me (and if we go this way, the that said ... i've personally only ever used iconv(1) to test bionic's iconv(3). a quick code search of all the code i have access to turned up 80% of call sites being from one UTF to another (8 to 16, say, or 16 to 32). there were a few lossy conversions to ascii, but they were mostly |
On 12/16/23 14:11, enh-google wrote:
The error message kind of implies refusing to produce any useful output
because one thing went wrong.
which is basically the macOS (bsd?) behavior. i suspect their argument would be
"if you wanted us to either ignore transliteration failures, or use a
replacement character, you should have suffixed the encoding name with //IGNORE
or whatever".
My problem is I'm not a big user of this command, so I've tried to be flexible
and waited for people to complain.
although it surprised me (and if we go this way, the |--help| output -- if not
the error message! -- should definitely mention how you get out of this), i'm
coming round to it (modulo quality of docs/errors) possibly being the
least-worst of the options.
Unless you're buffering the data until it's complete, you're producing partial
output anyway.
The difference between partial and full output is essentially "exit early on
partial input instead of hanging if the input source hangs", because the caller
has to notice the error and discard what they got anyway.
that said ... i've personally only ever used iconv(1) to test bionic's iconv(3).
a quick code search of all the code i have access to turned up 80% of call sites
being from one UTF to another (8 to 16, say, or 16 to 32). there were a few
lossy conversions to ascii, but they were mostly |-t ascii//TRANSLIT| anyway. i
saw one |-t UTF-8//TRANSLIT| that was presumably someone who had actually seen
invalid byte sequences in the input they were dealing with.
I don't think we're checking for //TRANSLIT or //IGNORE. All that stuff is
passed on to libc and handled by libc, and is presumably thus libc's problem?
Posix doesn't mention these in the iconv command nor in iconv_open() (which just
says "Settings of fromcode and tocode and their permitted combinations are
implementation-defined.")
If we _should_ be handling this, then basically "our default is translit and
their default is ignore", except we do not currently implement the error message
for ignore at all...
And //IGNORE says that characters that cannot be converted are discarded, it
does NOT say to stop early...
We can add //IGNORE and //TRANSLIT if you like? (And mention them in the help
text...)
Rob
|
correct. and my guess is that's why the other iconv(1)s say "not our problem --- tell iconv_open(3) what you want via our -f and -t arguments".
no, that was my point --- i think the macOS (bsd?) iconv(1) author said "look, i'm just going to stop early because if you don't want that, you can just tell iconv_open(3) via (whereas POSIX doesn't have those, which is probably why it talks about i'm a bit wary of not having a diagnostic (because everyone else does, and they may know something we don't), but i think the current behavior is probably good enough. i'd be tempted to mention |
If //IGNORE says to print an error message, how is glibc supposed to implement that? |
no, |
Looking at this one again, whose court is this ball currently in? Right now the iconv.c command loop is ignoring the return value of iconv() and instead looking at the adjusted in/inlen out/outlen values, advancing past leading bad characters in the input to try to get through it (skipping them for -c and passing them through verbatim otherwise). But if the library function is printing error messages, then presumably we need to care about -1 return values, if nothing else to avoid spamming multiple error messages? Is there a value that says "an error message was produced"? Do we abort on any -1? (Which might render the skipping behavior moot...?) Should I check for //TRANSLIT or //IGNORE myself? |
the library function does not print error messages in any implementation i'm aware of. if there's an EILSEQ from a conversion inside iconv(3):
no, that would undermine the purpose of those modifiers. (and no other iconv(1) seems to do this.) i think if toybox iconv(1) mentions the iconv(3) //TRANSLIT or //IGNORE modifiers at all, it should be in the help text, but you probably don't want to do that since they're not POSIX so musl of course doesn't implement them. (which is unfortunate because, as we've seen, everyone else's iconv(3) helps you paper over iconv(1)'s options being shit.)
any -1. EINVAL vs EILSEQ gives you a bit of detail about what was wrong with the input sequence. (there's also E2BIG for running out of output buffer, but toybox should never see that. in any case, i think perror_exit() is the way to go rather than actually looking at errno.) and i think that's all there is for toybox to do here? if iconv(3) returns -1 (and there's no point implementing |
going through help text, i noticed that iconv's
-s
isn't documented because it's always on, which seems like an unfortunate default.more than that, the
-c
behavior isn't right either:(i'm trying not to get distracted from my more trivial goal, so i'll just file bugs for any weirdness i spot!)
The text was updated successfully, but these errors were encountered: