2325 4480 pwd newline #4804

egmontkob · 2025-10-20T11:04:37Z

Proposed changes

Checklist

👉 Our coding style can be found here: https://midnight-commander.org/coding-style/ 👈

I have referenced the issue(s) resolved by this PR (if any)
I have signed-off my contribution with git commit --amend -s
Lint and unit tests pass locally with my changes (make indent && make check)
I have added tests that prove my fix is effective or that my feature works
I have added the necessary documentation (if appropriate)

mc-worker · 2025-10-20T16:42:48Z

src/subshell/common.c

+     * see ticket #4480. Make sure to wrap in time. See how large we can grow so that an
+     * additional escaped byte and the closing string still fit. */
+    const int max_length =
+        COOKED_MODE_BUFFER_SIZE - MAX (strlen (before_wrap), strlen (quote_cmd_end));


I think it's better to save string lengths in the intermediate variables and then use them in g_string_append_len() instead of g_string_append() to avoid double calls of strlen().

Do we care more about shaving off a few CPU cycles at a non-critical path, rather than preferring cleaner, easier to read code?

Moreover, the code also would become inconsistent with itself. E.g. quote_cmd_end is used twice, so it speeds up things by a few nanoseconds to store its length. before_wrap is likely used exactly once, with a small chance more than once. But after_wrap is likely not used at all. So in order to shave off a few CPU cycles we shouldn't precompute its length?

I'd find it absolutely pointless to go into such mircooptimizations somewhere where it really doesn't matter.

I agree that optimizations are pointless, but I would still extract constants for readability. This would also make the line shorter and make it fit on one line.

I've pushed an additional commit (to be squashed!!!) to show what the code would look like with precomputed lengths.

I honestly don't find it any more readable than the previous version; duplicating (kind of) variable names like g_string_append_len (ret, foobar, foobar_len) are a source of error if you pass the wrong length, harder to verify that you didn't make this mistake, and is a visual clutter instead of just seeing foobar once.

IMHO.

If you guys both prefer this variant then I'll squash this sommit with the previous one.

i prefer the "optimized" version. having the lengths pre-calculated should make it easier to write code that doesn't buffer-overflow.

mc-worker · 2025-10-20T16:44:08Z

src/subshell/common.c

+                g_string_append (ret, before_wrap);
+                g_string_append_c (ret, '\n');
+                g_string_append (ret, after_wrap);
+                line_length = strlen (after_wrap);


It's better to swap these two lines:

line_length = strlen (after_wrap) g_string_append_len (ret, after_wrap, line_length);

Why is it better? Currently the string append operations are next to each other. By swapping them, you'd break that flow.

Or is it because then you could use that as a length parameter to g_string_append_len; again, resulting in a visibly more inconsisent code for us humans, and an absolutely unmeasurable speedup? Let's go for more readable, more consistent code please. Let me know please why you think your proposal would be more readable.

I actually have no preference one way or the other here...

mc-worker · 2025-10-20T16:45:07Z

src/subshell/common.c

+                    g_string_append (ret, before_wrap);
+                    g_string_append_c (ret, '\n');
+                    g_string_append (ret, after_wrap);
+                    line_length = strlen (after_wrap);


The same:

line_length = strlen (after_wrap) g_string_append_len (ret, after_wrap, line_length);

egmontkob · 2025-10-20T17:27:31Z

Oops, I've accidentally pushed the branch into the main mc repo; I didn't mean that. Sorry for that! You can remove that branch if you wish.

egmontkob · 2025-10-20T17:45:22Z

Nevermind, I've deleted it. Sorry for the noise.

egmontkob · 2025-10-20T17:53:35Z

There's a regression with tcsh:

Previously it could enter directories with a non-alphanumeric UTF-8 symbol (e.g. heart ❤) in their name, now it cannot. (It can still enter alphanumeric UTF-8 characters.)

Let me play around a little bit with tcsh to see if I can fix this.

Let's hold off this PR for now.

ossilator

3nd commit msg: typo in 'platforms'

ossilator · 2025-10-20T18:51:20Z

src/subshell/common.c

- *   cd "`printf '%b' 'ABC\0nnnDEF\0nnnXYZ'`"
+ * Enter any directory safely, no matter what special bytes its name contains (special shell
+ * characters, control characters, non-printable characters, invalid UTF-8 etc.).
+ * NOTE: Treat directory name an untrusted data, don't allow it to cause executing any commands in


"as untrusted"

I just moved this text away by a few lines :) anyway, fixing.

ossilator · 2025-10-20T18:56:48Z

src/subshell/common.c

+         *
+         * Wrapping to new line with a trailing backslash outside of the innermost single quotes.
+         */
+        quote_cmd_start = " _mc_newdir_=\"`printf '%b_' '";


the assigned value does not need to be quoted, and for legacy shells it even should not be.

Thanks, fixing.

ossilator · 2025-10-20T19:01:11Z

src/subshell/common.c

        g_string_append (ret, "./");

    // Copy the beginning of the command to the buffer
    g_string_append (ret, quote_cmd_start);


side note: this happening after the ./ append is plain bogus.
(fix should be in a separate commit.)

I have also noticed this, but I just don't want to begin fixing every minor detail that I don't fully like :) But now that you've pointed it out too, I'll fix it.

But, sorry, I won't create a new commit for every tiny change I make, I just find that nonsense and super counterproductive. It's a somewhat bigger rework of that method, including everything that goes into it.

Oh, no... wait... now that I've moved the cd string prefix into this method (which I had to do because of tcsh's variable construction), I have to swap the order, the one in my change was straight wrong.

ossilator · 2025-10-20T19:11:20Z

src/subshell/common.c

+     * see ticket #4480. Make sure to wrap in time. See how large we can grow so that an
+     * additional escaped byte and the closing string still fit. */
+    const int max_length =
+        COOKED_MODE_BUFFER_SIZE - MAX (strlen (before_wrap), strlen (quote_cmd_end));


i prefer the "optimized" version. having the lengths pre-calculated should make it easier to write code that doesn't buffer-overflow.

ossilator · 2025-10-20T19:18:40Z

src/subshell/common.c

+     * see ticket #4480. Make sure to wrap in time. See how large we can grow so that an
+     * additional line wrapping or closing string still fits. */
+    const int max_length =
+        COOKED_MODE_BUFFER_SIZE - MAX (strlen (before_wrap), strlen (quote_cmd_end));


it would be kinda more elegant to wrap the trailer separately if the situation occurs, but the extra code is probably not worth it ...

i prefer the "optimized" version. having the lengths pre-calculated should make it easier to write code that doesn't buffer-overflow.

Looks like it's 3:1 (all 3 of you reviewers against me), so I'll change the code (squash the followup commit), but I'm really curious: Could you please provide an example where a pre-computed length would save you from a buffer overrun?

I have shown you that it becomes easy to use the wrong length, e.g.

g_string_append_len (ret, foobar1, foobar1_len); g_string_append_len (ret, quux2, quux1_len); g_string_append_len (ret, loremipsum3, lorempisum3_len);

Not only is it harder to read, but would you spot the bug if you weren't looking for it? How is it any safer than writing

g_string_append (ret, foobar1); g_string_append (ret, quux2); g_string_append (ret, loremipsum3);

?

it would be kinda more elegant to wrap the trailer separately if the situation occurs, but the extra code is probably not worth it ...

Could you please elaborate? I don't get what you're thinking of here. Even though you're not asking me to change the code, I'm curious :)

if you assign the lengths and actually use them to pre-calculate and check the buffer, it's easier to make sure that you're actually working with the same numbers. the mismatch risk is real, but i personally worry about it less, because i have annoyingly good visual pattern recognition.

the length of the final cd command is currently subtracted from the buffer size, so it always fits. but it would be possible to ensure that only the line termination fits, and put the cd on its own line if necessary.

… the subshell If the subshell writes the working directory slowly, previously we could read its beginning and stop there. Signed-off-by: Egmont Koblinger <[email protected]>

This piece of code was never live in mc. It would work around a BusyBox bug that was fixed in 2012. Signed-off-by: Egmont Koblinger <[email protected]>

egmontkob · 2025-10-20T20:59:14Z

Back to tcsh:

I think I can do either of these two things:

Use the $'...' string constant notation. As strings aren't byte-safe, they're forced to the locale (practically UTF-8), it means that in an UTF-8 environment directories with invalid UTF-8 in their name won't work. But the method handles newline characters just fine.
Use `printf ...` command substitution. This method breaks newlines – in case of tcsh not just the trailing ones but any internal newlines. On the other hand it's binary-safe, we can enter any directory, even if invalid UTF-8. Note though that it doesn't work backwards: if you hide the panels and do a cd to a directory with an invalid UTF-8 name, the subshell reporting it back to mc using echo $cwd:q will also mangle it and mc won't be able to cd there (already buggy in current mc). To which a workaround could be to invoke the external pwd -L utility.

So it's either-or: invalid UTF-8 but no newlines, or newlines but not invalid UTF-8.

And no, I'm not going to implement both and pick runtime so that we only fail if a path contains both :)

Which one do you guys vote for?

egmontkob · 2025-10-21T07:01:59Z

tcsh: I was wrong, command substitution isn't binary safe either.

So, without terrible hacks, I can get newline working but not invalid UTF-8.

To get invalid UTF-8 working, I think this would do it:

save the value of LC_ALL
setenv LC_ALL C
cd target_directory
restore LC_ALL (or most likely: the lack thereof)
set cwd=target_directory # to fix the accents shown in the prompt

and when a command completes and tcsh sends it working directory to mc, invoke the external pwd -L.

I'm not gonna do this, it's just not worth it.

zyv · 2025-10-21T07:19:20Z

Which one do you guys vote for?

My thinking is that if I had to choose, I'd take newlines.

…ng a directory with special characters Handle trailing '\n' character in the directory name. Make sure to construct the cd command in physical lines no longer than 250 bytes so that we don't hit the small limit of the kernel's cooked mode tty buffer size on some platforms. tcsh still has problems entering directories with special characters (including invalid UTF-8) in their name. Other shells are now believed to handle any directory name properly. Signed-off-by: Egmont Koblinger <[email protected]>

…ibyte UTF-8 Don't escape safe shell characters commonly used in paths, such as '/', '.', '-' and '_'. Don't escape multibyte UTF-8 characters. Escaping each byte separately in string assignments doesn't work in tcsh. The previous commit introduces a regression here: tcsh cannot enter directories whose name is valid UTF-8 but contains non-alphanumeric UTF-8 characters. It used to work because printf would glue them together correctly, but we no longer use printf and command substitution because that breaks newlines. Signed-off-by: Egmont Koblinger <[email protected]>

egmontkob · 2025-10-21T07:42:55Z

Yup I'm going with newlines.

New commit pushed to fix regression with tcsh.

How confident are we that placing unquoted 128..255 bytes in the command line is safe in every shell, they don't have any special meaning in the shell?

egmontkob requested review from aborodin, ossilator and zyv October 20, 2025 11:04

github-actions bot added needs triage Needs triage by maintainers prio: medium Has the potential to affect progress labels Oct 20, 2025

github-actions bot added this to the Future Releases milestone Oct 20, 2025

zyv added area: core Issues not related to a specific subsystem and removed needs triage Needs triage by maintainers labels Oct 20, 2025

zyv modified the milestones: Future Releases, 4.8.34 Oct 20, 2025

mc-worker requested review from mc-worker and removed request for aborodin October 20, 2025 16:38

mc-worker reviewed Oct 20, 2025

View reviewed changes

egmontkob force-pushed the 2325_4480_pwd_newline branch from ceeb70e to d6b260d Compare October 20, 2025 17:26

ossilator reviewed Oct 20, 2025

View reviewed changes

egmontkob added 2 commits October 20, 2025 21:53

Ticket MidnightCommander#4480: Read the entire working directory from…

4960287

… the subshell If the subshell writes the working directory slowly, previously we could read its beginning and stop there. Signed-off-by: Egmont Koblinger <[email protected]>

Remove a commented and outdated busybox code

a148b9a

This piece of code was never live in mc. It would work around a BusyBox bug that was fixed in 2012. Signed-off-by: Egmont Koblinger <[email protected]>

egmontkob force-pushed the 2325_4480_pwd_newline branch 2 times, most recently from 57e9ac6 to d267252 Compare October 20, 2025 20:36

egmontkob added 2 commits October 21, 2025 09:37

egmontkob force-pushed the 2325_4480_pwd_newline branch from d267252 to eefcf8c Compare October 21, 2025 07:39

Uh oh!

2325 4480 pwd newline #4804

Are you sure you want to change the base?

2325 4480 pwd newline #4804

Uh oh!

Conversation

egmontkob commented Oct 20, 2025 • edited by zyv Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

egmontkob commented Oct 20, 2025

Uh oh!

egmontkob commented Oct 20, 2025

Uh oh!

egmontkob commented Oct 20, 2025

Uh oh!

ossilator left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

egmontkob Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

egmontkob commented Oct 20, 2025

Uh oh!

egmontkob commented Oct 21, 2025

Uh oh!

zyv commented Oct 21, 2025

Uh oh!

egmontkob commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

egmontkob commented Oct 20, 2025 •

edited by zyv

Loading

egmontkob Oct 20, 2025 •

edited

Loading