Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Symlink resolutions: limits and return modes #1751

Closed

Conversation

migue
Copy link

@migue migue commented Jun 15, 2024

It can be useful to limit the number of symlink resolutions performed while looking for a tree entry. The goal is to provide the ability to resolve up to a particular depth, instead of reaching the end of the link chain.

In addition, I would like to extend the symlink resolution process and provide the ability to return the object found at the designated depth instead of returning an error.

The current code already provides a limit to the maximum number of resolutions that can be performed, and something similar to this is returned to the caller:

loop SP <size> LF
<object> LF

With these patches, we are looking to return the actual information of the object where the resolution stopped. Something similar to:

<oid> blob <size>\nndata\n

Copy link

gitgitgadget bot commented Jun 15, 2024

Welcome to GitGitGadget

Hi @migue, and welcome to GitGitGadget, the GitHub App to send patch series to the Git mailing list from GitHub Pull Requests.

Please make sure that either:

  • Your Pull Request has a good description, if it consists of multiple commits, as it will be used as cover letter.
  • Your Pull Request description is empty, if it consists of a single commit, as the commit message should be descriptive enough by itself.

You can CC potential reviewers by adding a footer to the PR description with the following syntax:

CC: Revi Ewer <[email protected]>, Ill Takalook <[email protected]>

Also, it is a good idea to review the commit messages one last time, as the Git project expects them in a quite specific form:

  • the lines should not exceed 76 columns,
  • the first line should be like a header and typically start with a prefix like "tests:" or "revisions:" to state which subsystem the change is about, and
  • the commit messages' body should be describing the "why?" of the change.
  • Finally, the commit messages should end in a Signed-off-by: line matching the commits' author.

It is in general a good idea to await the automated test ("Checks") in this Pull Request before contributing the patches, e.g. to avoid trivial issues such as unportable code.

Contributing the patches

Before you can contribute the patches, your GitHub username needs to be added to the list of permitted users. Any already-permitted user can do that, by adding a comment to your PR of the form /allow. A good way to find other contributors is to locate recent pull requests where someone has been /allowed:

Both the person who commented /allow and the PR author are able to /allow you.

An alternative is the channel #git-devel on the Libera Chat IRC network:

<newcontributor> I've just created my first PR, could someone please /allow me? https://github.com/gitgitgadget/git/pull/12345
<veteran> newcontributor: it is done
<newcontributor> thanks!

Once on the list of permitted usernames, you can contribute the patches to the Git mailing list by adding a PR comment /submit.

If you want to see what email(s) would be sent for a /submit request, add a PR comment /preview to have the email(s) sent to you. You must have a public GitHub email address for this. Note that any reviewers CC'd via the list in the PR description will not actually be sent emails.

After you submit, GitGitGadget will respond with another comment that contains the link to the cover letter mail in the Git mailing list archive. Please make sure to monitor the discussion in that thread and to address comments and suggestions (while the comments and suggestions will be mirrored into the PR by GitGitGadget, you will still want to reply via mail).

If you do not want to subscribe to the Git mailing list just to be able to respond to a mail, you can download the mbox from the Git mailing list archive (click the (raw) link), then import it into your mail program. If you use GMail, you can do this via:

curl -g --user "<EMailAddress>:<Password>" \
    --url "imaps://imap.gmail.com/INBOX" -T /path/to/raw.txt

To iterate on your change, i.e. send a revised patch or patch series, you will first want to (force-)push to the same branch. You probably also want to modify your Pull Request description (or title). It is a good idea to summarize the revision by adding something like this to the cover letter (read: by editing the first comment on the PR, i.e. the PR description):

Changes since v1:
- Fixed a typo in the commit message (found by ...)
- Added a code comment to ... as suggested by ...
...

To send a new iteration, just add another PR comment with the contents: /submit.

Need help?

New contributors who want advice are encouraged to join [email protected], where volunteers who regularly contribute to Git are willing to answer newbie questions, give advice, or otherwise provide mentoring to interested contributors. You must join in order to post or view messages, but anyone can join.

You may also be able to find help in real time in the developer IRC channel, #git-devel on Libera Chat. Remember that IRC does not support offline messaging, so if you send someone a private message and log out, they cannot respond to you. The scrollback of #git-devel is archived, though.

Copy link

gitgitgadget bot commented Jun 15, 2024

There are issues in commit 5285b47:
Make the number of symlink resolutions configurable
Lines in the body of the commit messages should be wrapped between 60 and 76 characters.
Indented lines, and lines without whitespace, are exempt

Copy link

gitgitgadget bot commented Jun 15, 2024

There are issues in commit cc23c8f:
Make the results of symlink resolution configurable
Lines in the body of the commit messages should be wrapped between 60 and 76 characters.
Indented lines, and lines without whitespace, are exempt

@migue migue force-pushed the migue/follow-symlinks-max-depth branch from cc23c8f to 46431e2 Compare June 17, 2024 05:45
@dscho
Copy link
Member

dscho commented Jun 17, 2024

/allow

Copy link

gitgitgadget bot commented Jun 17, 2024

User migue is now allowed to use GitGitGadget.

Copy link
Member

@dscho dscho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, @migue!

I'd like to suggest using the prefix cat-file: in the commit message (and continuing in lower-case) because the reviewers on the Git mailing list tend to focus a lot on such things.

Also, maybe you want to describe in the second commit message that you want to introduce a kind of "best effort mode" here, seeing as the symlink resolution does have data to return, it just so happens to still be a symlink even after following <n> hops?

Sometimes, it can be useful to limit the number of symlink resolutions
performed while looking for a tree entry.

The goal is to provide the ability to resolve up to a particular depth,
instead of reaching the end of the link chain.

The current code already provides a limit to the maximum number of
resolutions that can be performed
(GET_TREE_ENTRY_FOLLOW_SYMLINKS_MAX_LINKS). This patch introduces a new
config setting to make the previous property configurable. No logical
changes are introduced in this patch

Signed-off-by: Miguel Ángel Pastor Olivar <[email protected]>
@migue migue force-pushed the migue/follow-symlinks-max-depth branch from 46431e2 to ab8c677 Compare June 17, 2024 07:31
@migue
Copy link
Author

migue commented Jun 17, 2024

@dscho Thanks a lot for taking the time to review this 🙇 !

I have included the cat-file in both commit messages and added some additional comments in the second one. I would be happy to rephrase the second one if you think it's not clear enough.

On a different topic, I saw one windows related build failing in my previous push. I don't think it's related to my changes but I don't really know 😅

Again, thanks a lot for taking a look at this!!

@dscho
Copy link
Member

dscho commented Jun 17, 2024

I have included the cat-file in both commit messages and added some additional comments in the second one. I would be happy to rephrase the second one if you think it's not clear enough.

Thank you @migue! Personally, I would lead with the "best-effort" paragraph in the commit message, and then explain the more technical mechanics of the patch. But it's your contribution (not mine 😁), and if you want to mention it at the end, that's fine, too.

On a different topic, I saw one windows related build failing in my previous push. I don't think it's related to my changes but I don't really know 😅

Yes, it failed with a cryptic "No plan found in TAP output" message. These happen in the Windows tests from time to time, I still haven't found out what is going on exactly, my working theory is that file/directory creation/deletion fails due to Defender doing its job. I saw that failure and re-ran the failed jobs so that your imminent force-push would result in a quick CI run due to the tree-same check finding an already-successful run for a tree-same commit.

Sorry for not telling you that I did that!

@migue migue force-pushed the migue/follow-symlinks-max-depth branch from ab8c677 to 3eae47c Compare June 17, 2024 07:46
This patch introduces a new "best effort mode" where the object found at
resolution step N is returned. If we've reached the end of the chain, the
returned object will be the file at the end of the chain, however, if, after
n resolutions we haven't reached the end of the chain, the returned object
will represent a symlink

The goal is to extend the symlink resolution process and provide the ability
to return the object found at the designated depth instead of returning an
error.

The current code already provides a limit to the maximum number of
resolutions that can be performed and something similar to this is returned
back to the caller:

loop SP <size> LF <object> LF

With the new config setting we are looking to return the actual information
of the object where the resolution stopped. Something similar to:

<oid> blob <size>\ndata\n

Signed-off-by: Miguel Ángel Pastor Olivar <[email protected]>
@migue migue force-pushed the migue/follow-symlinks-max-depth branch from 3eae47c to 5de72c4 Compare June 17, 2024 07:49
@migue
Copy link
Author

migue commented Jun 17, 2024

Yes, it failed with a cryptic "No plan found in TAP output" message. These happen in the Windows tests from time to time, I still haven't found out what is going on exactly, my working theory is that file/directory creation/deletion fails due to Defender doing its job. I saw that failure and re-ran the failed jobs so that your imminent force-push would result in a quick CI run due to the tree-same check finding an already-successful run for a tree-same commit.

Thanks a lot for the context!!

Thank you @migue! Personally, I would lead with the "best-effort" paragraph in the commit message, and then explain the more technical mechanics of the patch. But it's your contribution (not mine 😁), and if you want to mention it at the end, that's fine, too.

Reordered the message a bit, let me know if you think it's better now

@dscho
Copy link
Member

dscho commented Jun 17, 2024

Reordered the message a bit, let me know if you think it's better now

Thank you @migue. I think it is better now!

@migue
Copy link
Author

migue commented Jun 17, 2024

/preview

Copy link

gitgitgadget bot commented Jun 17, 2024

Preview email sent as [email protected]

@migue
Copy link
Author

migue commented Jun 17, 2024

/submit

Copy link

gitgitgadget bot commented Jun 17, 2024

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1751/migue/migue/follow-symlinks-max-depth-v1

To fetch this version to local tag pr-1751/migue/migue/follow-symlinks-max-depth-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1751/migue/migue/follow-symlinks-max-depth-v1

Copy link

gitgitgadget bot commented Jun 17, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Miguel Ángel Pastor Olivar via GitGitGadget"
<[email protected]> writes:

> The current code already provides a limit to the maximum number of
> resolutions that can be performed, and something similar to this is returned
> to the caller:
>
> loop SP <size> LF
> <object> LF
>
>
> With these patches, we are looking to return the actual information of the
> object where the resolution stopped. Something similar to:
>
> <oid> blob <size>\nndata\n

Just a random and idle thought, but is it all that interesting to
learn only about the object at the horizon?

If recursive resolutions are limited to say 3 levels, I wonder if it
is beneficial to give full record from each iteration without losing
information, e.g., saying "A points at B which in turn points at C,
and I stopped there but C is still not the final thing", instead of
saying "I followed links and C was the last one I saw after I
repeated for the maximum number of times the configuration allows me
to".

@@ -757,3 +757,8 @@ core.maxTreeDepth::
tree (e.g., "a/b/cde/f" has a depth of 4). This is a fail-safe
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Miguel Ángel Pastor Olivar via GitGitGadget"
<[email protected]> writes:

> diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
> index 93d65e1dfd2..ca2d1eede52 100644
> --- a/Documentation/config/core.txt
> +++ b/Documentation/config/core.txt
> @@ -757,3 +757,8 @@ core.maxTreeDepth::
>  	tree (e.g., "a/b/cde/f" has a depth of 4). This is a fail-safe
>  	to allow Git to abort cleanly, and should not generally need to
>  	be adjusted. The default is 4096.
> +
> +core.maxSymlinkDepth::
> +	The maximum number of symlinks Git is willing to resolve while
> +	looking for a tree entry.
> +	The default is GET_TREE_ENTRY_FOLLOW_SYMLINKS_MAX_LINKS.
> \ No newline at end of file

Style: please do not end our text files with an incomplete line.

Regarding the patch contents, this is an end-user facing document.
How would they learn what the actual value is?

Is there a "valid range" the users are allowed to set this value to?
If so, what is the range?  What do users get when they set it outside
the allowed range?  Do they get warned?  Do they get die()?  Is the
value silently ignored?

If there is no upper limit for the "valid range", how does a user
set it to "infinity", and what's the downside of doing so?  What
happens when the user sets it to 0, or a negative value, if there is
no lower limit for the "valid range"?  The questions in this
paragraph your updated documentation text do not have to answer if
your "valid range" does have both upper and lower limit, but the
documentation must answer questions in the previous paragraph.

> diff --git a/config.c b/config.c
> index abce05b7744..d69e9a3ae6b 100644
> --- a/config.c
> +++ b/config.c
> @@ -1682,6 +1682,11 @@ static int git_default_core_config(const char *var, const char *value,
>  		return 0;
>  	}
>  
> +	if (!strcmp(var, "core.maxsymlinkdepth")) {
> +		max_symlink_depth = git_config_int(var, value, ctx->kvi);
> +		return 0;
> +	}
> +
>  	/* Add other config variables here and to Documentation/config.txt. */
>  	return platform_core_config(var, value, ctx, cb);
>  }
> diff --git a/environment.c b/environment.c
> index 701d5151354..6d7a5001eb1 100644
> --- a/environment.c
> +++ b/environment.c
> @@ -95,6 +95,7 @@ int max_allowed_tree_depth =
>  #else
>  	2048;
>  #endif
> +int max_symlink_depth = -1;

Why set it to -1 here, instead of initializing it to the
GET_TREE_ENTRY_FOLLOW_SYMLINKS?  By introducing a configuration
variable (which by the way I am not convinced is necessarily a good
idea to begin with), you are surfacing that built-in default value
as a more prominent thing, not hidden away in a little corner of
tree-walk.c implementation detail.  If you do define a "valid range
of values", the code that parses core.maxsymlinkdepth in config.c
may want to learn what the value of GET_TREE_ENTRY_FOLLOW_SYMLINKS
is, which means the symbol may need to be visible in some common
header file anyway.

By the way, this is not a new problem this patch introduces, as the
default GET_TREE_ENTRY_FOLLOW_SYMLINKS came from 275721c2
(tree-walk: learn get_tree_entry_follow_symlinks, 2015-05-20), but I
wonder if the default number should somehow be aligned with the
other upper limit, SYMREF_MAXDEPTH for a symbolic ref pointing at
another symbolic ref pointing at yet another ...

> +test_expect_success 'git cat-file --batch --follow-symlink stop resolving symlinks' '
> +	printf "loop 22\nHEAD:link-to-symlink-3\n">expect &&
> +	printf 'HEAD:link-to-symlink-3' | git -c core.maxsymlinkdepth=1 cat-file --batch="%(objectname) %(objecttype) %(objectsize)" --follow-symlinks > actual &&

Style: a redirection operator needs a single SP before it and no SP
between it and its target, i.e.

	printf "loop 22..." >expect &&
	printf "HEAD:link ..." |
        git ... cat-file ... >actual &&

Also fold overly long line after "|" pipeline.

> diff --git a/tree-walk.c b/tree-walk.c
> index 6565d9ad993..3ec2302309e 100644
> --- a/tree-walk.c
> +++ b/tree-walk.c
> @@ -664,7 +664,12 @@ enum get_oid_result get_tree_entry_follow_symlinks(struct repository *r,
>  	struct object_id current_tree_oid;
>  	struct strbuf namebuf = STRBUF_INIT;
>  	struct tree_desc t;
> -	int follows_remaining = GET_TREE_ENTRY_FOLLOW_SYMLINKS_MAX_LINKS;
> +	int follows_remaining =
> +		max_symlink_depth > -1 &&
> +				max_symlink_depth <=
> +					GET_TREE_ENTRY_FOLLOW_SYMLINKS_MAX_LINKS ?
> +			max_symlink_depth :
> +			GET_TREE_ENTRY_FOLLOW_SYMLINKS_MAX_LINKS;

Strange indentation.

If you range-limit at the place the configuration was parsed, you do
not have to do any of this here, but if you insist hiding
GET_TREE_ENTRY_FOLLOW_SYMLINKS_MAX_LINKS from others (yet still use
it in the end-user facing documentation???), then

	int follows_remaining =
		(-1 < max_symlink_depth &&
		 max_symlink_depth <= GET_TREE_ENTRY_FOLLOW_SYMLINKS_MAX_LINKS)
		? max_symlink_depth
		: GET_TREE_ENTRY_FOLLOW_SYMLINKS_MAX_LINKS;

or perhaps a lot easier to read form, i.e.

	int follows_remaining = max_symlink_depth;

        if (follows_remaining < -1 ||
            GET_TREE_ENTRY_FOLLOW_SYMLINKS_MAX_LINKS < follows_remaining)
		follows_remaining = GET_TREE_ENTRY_FOLLOW_SYMLINKS_MAX_LINKS;

>  	init_tree_desc(&t, NULL, NULL, 0UL);
>  	strbuf_addstr(&namebuf, name);


Thanks.

@@ -757,3 +757,22 @@ core.maxTreeDepth::
tree (e.g., "a/b/cde/f" has a depth of 4). This is a fail-safe
to allow Git to abort cleanly, and should not generally need to
be adjusted. The default is 4096.

core.maxSymlinkDepth::
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Miguel Ángel Pastor Olivar via GitGitGadget"
<[email protected]> writes:

> From: =?UTF-8?q?Miguel=20=C3=81ngel=20Pastor=20Olivar?= <[email protected]>
>
> This patch introduces a new "best effort mode" where the object found at
> resolution step N is returned. If we've reached the end of the chain, the
> returned object will be the file at the end of the chain, however, if, after
> n resolutions we haven't reached the end of the chain, the returned object
> will represent a symlink
>
> The goal is to extend the symlink resolution process and provide the ability
> to return the object found at the designated depth instead of returning an
> error.
>
> The current code already provides a limit to the maximum number of
> resolutions that can be performed and something similar to this is returned
> back to the caller:
>
> loop SP <size> LF <object> LF
>
> With the new config setting we are looking to return the actual information
> of the object where the resolution stopped. Something similar to:
>
> <oid> blob <size>\ndata\n

I do not think this should be a configuration variable at all.
Either a command line option, or even better yet would be an
in-stream instruction ("flip into the 'tell me the last symlink
you saw before you gave up' mode"), is understandable though, given
that this is strictly for the "batch" mode.

For that matter, it is dubious that the previous one that added a
configuration variable to lower the symlink recursion limit is a
good idea.  It does not affect anything but "cat-file --batch" and
an in-stream instruction, e.g. "in this session, do not resolve more
than 3 levels", sounds like a much better fit to what this wants to
do.  That way, it will be a lot better isolated from unrelated code
paths.  It might even make sense not to introduce the new
max_symlink_depth global variable, but pass it through as a new
member in "struct object_context" given to get_oid_with_context(),
which in turn is passed as a new parameter to
get_tree_entry_follow_symlinks() function.

So, I am supportive to solving the problem this series attempts to
solve, but I am not on board with the design this series took.

Thanks.

@migue
Copy link
Author

migue commented Jul 18, 2024

I am not going to pursue this

@migue migue closed this Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants