Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace atoi() with strtoi_with_tail() #1646

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mohit-marathe
Copy link

@mohit-marathe mohit-marathe commented Jan 22, 2024

Hello,

This patch series replaces atoi() with an updated version of
strtol_i() called strtoi_with_tail (Credits: Junio C Hamano). The
reasoning behind this is to improve error handling by not allowing
non-numerical characters in the hunk header (which might happen
in case of a corrupt patch, although rarely).

There is still a change to be made, as Junio says:
"A corrupt patch may be getting a nonsense patch-ID with the current
code and hopefully is not matching other patches that are not
corrupt, but with such a change, a corrupt patch may not be getting
any patch-ID and a loop that computes patch-ID for many files and
try to match them up might need to be rewritten to take the new
failure case into account."
I'm not sure where this change needs to me made (maybe
get_one_patchid()?). It would be great if anyone could point me to
the correct place.

Thanks,
Mohit Marathe

Copy link

gitgitgadget bot commented Jan 22, 2024

Welcome to GitGitGadget

Hi @mohit-marathe, and welcome to GitGitGadget, the GitHub App to send patch series to the Git mailing list from GitHub Pull Requests.

Please make sure that your Pull Request has a good description, as it will be used as cover letter. You can CC potential reviewers by adding a footer to the PR description with the following syntax:

CC: Revi Ewer <[email protected]>, Ill Takalook <[email protected]>

Also, it is a good idea to review the commit messages one last time, as the Git project expects them in a quite specific form:

  • the lines should not exceed 76 columns,
  • the first line should be like a header and typically start with a prefix like "tests:" or "revisions:" to state which subsystem the change is about, and
  • the commit messages' body should be describing the "why?" of the change.
  • Finally, the commit messages should end in a Signed-off-by: line matching the commits' author.

It is in general a good idea to await the automated test ("Checks") in this Pull Request before contributing the patches, e.g. to avoid trivial issues such as unportable code.

Contributing the patches

Before you can contribute the patches, your GitHub username needs to be added to the list of permitted users. Any already-permitted user can do that, by adding a comment to your PR of the form /allow. A good way to find other contributors is to locate recent pull requests where someone has been /allowed:

Both the person who commented /allow and the PR author are able to /allow you.

An alternative is the channel #git-devel on the Libera Chat IRC network:

<newcontributor> I've just created my first PR, could someone please /allow me? https://github.com/gitgitgadget/git/pull/12345
<veteran> newcontributor: it is done
<newcontributor> thanks!

Once on the list of permitted usernames, you can contribute the patches to the Git mailing list by adding a PR comment /submit.

If you want to see what email(s) would be sent for a /submit request, add a PR comment /preview to have the email(s) sent to you. You must have a public GitHub email address for this. Note that any reviewers CC'd via the list in the PR description will not actually be sent emails.

After you submit, GitGitGadget will respond with another comment that contains the link to the cover letter mail in the Git mailing list archive. Please make sure to monitor the discussion in that thread and to address comments and suggestions (while the comments and suggestions will be mirrored into the PR by GitGitGadget, you will still want to reply via mail).

If you do not want to subscribe to the Git mailing list just to be able to respond to a mail, you can download the mbox from the Git mailing list archive (click the (raw) link), then import it into your mail program. If you use GMail, you can do this via:

curl -g --user "<EMailAddress>:<Password>" \
    --url "imaps://imap.gmail.com/INBOX" -T /path/to/raw.txt

To iterate on your change, i.e. send a revised patch or patch series, you will first want to (force-)push to the same branch. You probably also want to modify your Pull Request description (or title). It is a good idea to summarize the revision by adding something like this to the cover letter (read: by editing the first comment on the PR, i.e. the PR description):

Changes since v1:
- Fixed a typo in the commit message (found by ...)
- Added a code comment to ... as suggested by ...
...

To send a new iteration, just add another PR comment with the contents: /submit.

Need help?

New contributors who want advice are encouraged to join [email protected], where volunteers who regularly contribute to Git are willing to answer newbie questions, give advice, or otherwise provide mentoring to interested contributors. You must join in order to post or view messages, but anyone can join.

You may also be able to find help in real time in the developer IRC channel, #git-devel on Libera Chat. Remember that IRC does not support offline messaging, so if you send someone a private message and log out, they cannot respond to you. The scrollback of #git-devel is archived, though.

@dscho
Copy link
Member

dscho commented Jan 22, 2024

/allow

Copy link

gitgitgadget bot commented Jan 22, 2024

User mohit-marathe is now allowed to use GitGitGadget.

WARNING: mohit-marathe has no public email address set on GitHub;
GitGitGadget needs an email address to Cc: you on your contribution, so that you receive any feedback on the Git mailing list. Go to https://github.com/settings/profile to make your preferred email public to let GitGitGadget know which email address to use.

@mohit-marathe
Copy link
Author

/preview

Copy link

gitgitgadget bot commented Jan 22, 2024

Preview email sent as [email protected]

@mohit-marathe mohit-marathe changed the title [PATCH 0/2] Replace atoi() with strtol_i2() Replace atoi() with strtol_i2() Jan 22, 2024
@mohit-marathe
Copy link
Author

/submit

Copy link

gitgitgadget bot commented Jan 22, 2024

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1646/mohit-marathe/update-strtol_i-v1

To fetch this version to local tag pr-1646/mohit-marathe/update-strtol_i-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1646/mohit-marathe/update-strtol_i-v1

@@ -1,3 +1,4 @@
#include "git-compat-util.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Mohit Marathe via GitGitGadget" <[email protected]> writes:

>  	static const char digits[] = "0123456789";
>  	const char *q, *r;
> +	char *endp;
>  	int n;
>  
>  	q = p + 4;
>  	n = strspn(q, digits);
>  	if (q[n] == ',') {
>  		q += n + 1;
> -		*p_before = atoi(q);
> +		if (strtol_i2(q, 10, p_before, &endp) != 0)
> +			return 0;
>  		n = strspn(q, digits);
>  	} else {
>  		*p_before = 1;
>  	}

Looking at this code again, because we upfront run strspn() to make
sure q[] begins with a run of digits *and* followed by a comma
(which is not a digit), I think it is safe to use atoi() and assume
it would slurp all the digits.  So the lack of another check the use
of new helper allows us to do, namely

		if (endp != q + n)
			return 0;

is probably OK, but that is one of the two reasons why you would
favor the use of new helper over atoi(), so the upside of this
change is not all that great as I originally hoped for X-<.

Not your fault, of course.  We would still catch when the digit
string that starts q[] is too large to fit in an int, which is an
upside.

> -	if (n == 0 || q[n] != ' ' || q[n+1] != '+')
> +	if (q[n] != ' ' || q[n+1] != '+')
>  		return 0;

When we saw q[] that begins with ',' upon entry to this function, we
used to say *p_before = 1 and then saw n==0 and realized it is not a
good input and returned 0 from the function.

Now we instead peek q[0] and the check says q[0] is not SP so we
will return 0 the same way so there is no behaviour change from the
upper hunk?  The conversion may be correct, but it wasn't explained
in the proposed commit log message.

How are the change to stop caring about n==0 here ...

>  	r = q + n + 2;
>  	n = strspn(r, digits);
>  	if (r[n] == ',') {
>  		r += n + 1;
> -		*p_after = atoi(r);
> -		n = strspn(r, digits);
> +		if (strtol_i2(r, 10, p_after, &endp) != 0)
> +			return 0;
>  	} else {
>  		*p_after = 1;
>  	}
> -	if (n == 0)
> -		return 0;

... and this change here, linked to the switch from atoi() to
strtul_i2()[*]?

It looks like an unrelated behaviour change that is left
unexplained.

>  	return 1;
>  }

Thanks for working on this one.


[Footnote]

 * by the way, what a horrible name for a public function.  Yuck.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Mohit Marathe wrote (reply to this):

On Tuesday, January 23rd, 2024 at 1:02 AM, Junio C Hamano <[email protected]> wrote:

> "Mohit Marathe via GitGitGadget" [email protected] writes:
> 
> > static const char digits[] = "0123456789";
> > const char *q, *r;
> > + char *endp;
> > int n;
> > 
> > q = p + 4;
> > n = strspn(q, digits);
> > if (q[n] == ',') {
> > q += n + 1;
> > - *p_before = atoi(q);
> > + if (strtol_i2(q, 10, p_before, &endp) != 0)
> > + return 0;
> > n = strspn(q, digits);
> > } else {
> > *p_before = 1;
> > }
> 
> 
> Looking at this code again, because we upfront run strspn() to make
> sure q[] begins with a run of digits and followed by a comma
> (which is not a digit), I think it is safe to use atoi() and assume
> it would slurp all the digits. So the lack of another check the use
> of new helper allows us to do, namely
> 
> if (endp != q + n)
> return 0;
> 
> is probably OK, but that is one of the two reasons why you would
> favor the use of new helper over atoi(), so the upside of this
> change is not all that great as I originally hoped for X-<.
> 
> Not your fault, of course. We would still catch when the digit
> string that starts q[] is too large to fit in an int, which is an
> upside.
> 
> > - if (n == 0 || q[n] != ' ' || q[n+1] != '+')
> > + if (q[n] != ' ' || q[n+1] != '+')
> > return 0;
> 
> 
> When we saw q[] that begins with ',' upon entry to this function, we
> used to say *p_before = 1 and then saw n==0 and realized it is not a
> good input and returned 0 from the function.

Uh oh, I just looked at the `if` block and concluded that it was just 
to check if it has numbers after the ',', which`strtol_i2()` already 
does. But I totally missed this one. 

> Now we instead peek q[0] and the check says q[0] is not SP so we
> will return 0 the same way so there is no behaviour change from the
> upper hunk? The conversion may be correct, but it wasn't explained
> in the proposed commit log message.
> 
> How are the change to stop caring about n==0 here ...
> 
> > r = q + n + 2;
> > n = strspn(r, digits);
> > if (r[n] == ',') {
> > r += n + 1;
> > - *p_after = atoi(r);
> > - n = strspn(r, digits);
> > + if (strtol_i2(r, 10, p_after, &endp) != 0)
> > + return 0;
> > } else {
> > *p_after = 1;
> > }
> > - if (n == 0)
> > - return 0;
> 
> 
> ... and this change here, linked to the switch from atoi() to
> strtul_i2()[*]?
> 
> It looks like an unrelated behaviour change that is left
> unexplained.
> 
> > return 1;
> > }
> 
> 
> Thanks for working on this one.
> 
> 
> [Footnote]
> 
> * by the way, what a horrible name for a public function. Yuck.

Yeah, I thought so too /:D How does `strtol_i_updated` sounds?

Thanks for you feedback! I will send v2 with the corrections soon.

@mohit-marathe mohit-marathe force-pushed the update-strtol_i branch 2 times, most recently from 2dddb73 to c3b202a Compare January 24, 2024 05:51
@mohit-marathe
Copy link
Author

/preview

Copy link

gitgitgadget bot commented Jan 24, 2024

Preview email sent as [email protected]

@mohit-marathe mohit-marathe changed the title Replace atoi() with strtol_i2() Replace atoi() with strtol_i_updated() Jan 24, 2024
@mohit-marathe mohit-marathe force-pushed the update-strtol_i branch 2 times, most recently from 8d32119 to f3a03d6 Compare January 24, 2024 06:31
@mohit-marathe
Copy link
Author

/submit

Copy link

gitgitgadget bot commented Jan 24, 2024

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1646/mohit-marathe/update-strtol_i-v2

To fetch this version to local tag pr-1646/mohit-marathe/update-strtol_i-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1646/mohit-marathe/update-strtol_i-v2

@mohit-marathe
Copy link
Author

/submit

Copy link

gitgitgadget bot commented Jan 24, 2024

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1646/mohit-marathe/update-strtol_i-v3

To fetch this version to local tag pr-1646/mohit-marathe/update-strtol_i-v3:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1646/mohit-marathe/update-strtol_i-v3

@mohit-marathe
Copy link
Author

/submit

Copy link

gitgitgadget bot commented Jan 24, 2024

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1646/mohit-marathe/update-strtol_i-v4

To fetch this version to local tag pr-1646/mohit-marathe/update-strtol_i-v4:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1646/mohit-marathe/update-strtol_i-v4

@@ -1309,6 +1309,29 @@ static inline int strtol_i(char const *s, int base, int *result)
return 0;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Mohit Marathe via GitGitGadget" <[email protected]> writes:

> From: Mohit Marathe <[email protected]>
>
> This function is an updated version of strtol_i() function. It will
> give more control to handle parsing of the characters after the
> integer and better error handling while parsing numbers.

i2 was horrible but this is worse.  What would you call an even
newer variant when you need to add one?  strtol_i_updated_twice?

To readers who are reading the code in 6 months, it is totally
uninteresting that strtol_i() is an older function and the new thing
was invented later as its update.  What they want to learn is how
these two are different, what additional things this new one lets
them do compared to the old one, namely: we can optionally learn
where the run of the digits has ended.

Perhaps call it "strtoi_with_tail" or something, unless others
suggest even better names?

Thanks.

@@ -1,3 +1,4 @@
#include "git-compat-util.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Mohit Marathe via GitGitGadget" <[email protected]> writes:

>  	q = p + 4;
>  	n = strspn(q, digits);
>  	if (q[n] == ',') {
>  		q += n + 1;

So, we saw "@@ -" and skipped over these four bytes, skipped the
digits from there, and found a comma.  

For "@@ -29,14 +30,18 @@", for example, our q is now "14 +30,18 @@"
as we have skipped over that comma after 29.

> -		*p_before = atoi(q);
> +		if (strtol_i_updated(q, 10, p_before, &endp) != 0)
> +			return 0;

We parse out 14 and store it to *p_before.  endp points at " +30..."
now.

>  		n = strspn(q, digits);
> +		if (endp != q + n)
> +			return 0;

Is this necessary?  By asking strtol_i_updated() where the number ended,
we already know endp without skipping the digits in q with strspn().
Shouldn't these three lines become more like

		n = endp - q;

instead?  

After all, we are not trying to find a bug in strtol_i_updated(),
which would be the only reason how this "return 0" would trigger.

>  	} else {
>  		*p_before = 1;
>  	}
> @@ -48,8 +53,11 @@ static int scan_hunk_header(const char *p, int *p_before, int *p_after)
>  	n = strspn(r, digits);
>  	if (r[n] == ',') {
>  		r += n + 1;
> -		*p_after = atoi(r);
> +		if (strtol_i_updated(r, 10, p_after, &endp) != 0)
> +			return 0;
>  		n = strspn(r, digits);
> +		if (endp != r + n)
> +			return 0;

Likewise.

>  	} else {
>  		*p_after = 1;
>  	}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Mohit Marathe wrote (reply to this):

On Thursday, January 25th, 2024 at 2:32 AM, Junio C Hamano <[email protected]> wrote:

> "Mohit Marathe via GitGitGadget" [email protected] writes:
> 
> > q = p + 4;
> > n = strspn(q, digits);
> > if (q[n] == ',') {
> > q += n + 1;
> 
> 
> So, we saw "@@ -" and skipped over these four bytes, skipped the
> digits from there, and found a comma.
> 
> For "@@ -29,14 +30,18 @@", for example, our q is now "14 +30,18 @@"
> as we have skipped over that comma after 29.
> 
> > - *p_before = atoi(q);
> > + if (strtol_i_updated(q, 10, p_before, &endp) != 0)
> > + return 0;
> 
> 
> We parse out 14 and store it to *p_before. endp points at " +30..."
> now.
> 
> > n = strspn(q, digits);
> > + if (endp != q + n)
> > + return 0;
> 
> 
> Is this necessary? By asking strtol_i_updated() where the number ended,
> we already know endp without skipping the digits in q with strspn().
> Shouldn't these three lines become more like
> 
> n = endp - q;
> 
> instead?
> 
> After all, we are not trying to find a bug in strtol_i_updated(),
> which would be the only reason how this "return 0" would trigger.
> 

I was confused about how an invalid hunk header of a corrupted would
look like. This was just an attempt of making a sanity check. But after
taking another look, I agree that its unnecessary.

> > } else {
> > *p_before = 1;
> > }
> > @@ -48,8 +53,11 @@ static int scan_hunk_header(const char *p, int *p_before, int *p_after)
> > n = strspn(r, digits);
> > if (r[n] == ',') {
> > r += n + 1;
> > - *p_after = atoi(r);
> > + if (strtol_i_updated(r, 10, p_after, &endp) != 0)
> > + return 0;
> > n = strspn(r, digits);
> > + if (endp != r + n)
> > + return 0;
> 
> 
> Likewise.
> 
> > } else {
> > *p_after = 1;
> > }

@mohit-marathe mohit-marathe changed the title Replace atoi() with strtol_i_updated() Replace atoi() with strtoi_with_tail() Jan 28, 2024
@mohit-marathe
Copy link
Author

/submit

Copy link

gitgitgadget bot commented Jan 28, 2024

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1646/mohit-marathe/update-strtol_i-v5

To fetch this version to local tag pr-1646/mohit-marathe/update-strtol_i-v5:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1646/mohit-marathe/update-strtol_i-v5

@@ -1309,6 +1309,29 @@ static inline int strtol_i(char const *s, int base, int *result)
return 0;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Mohit Marathe via GitGitGadget" <[email protected]> writes:

> From: Mohit Marathe <[email protected]>
>
> This function is an updated version of strtol_i() function. It will
> give more control to handle parsing of the characters after the
> numbers and better error handling while parsing numbers.
>
> Signed-off-by: Mohit Marathe <[email protected]>
> ---
>  git-compat-util.h | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
>
> diff --git a/git-compat-util.h b/git-compat-util.h
> index 7c2a6538e5a..c576b1b104f 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -1309,6 +1309,29 @@ static inline int strtol_i(char const *s, int base, int *result)
>  	return 0;
>  }

Are we leaving the original one above?  Shouldn't this step instead
remove it, as strtol_i() is now a C preprocessor macro as seen below?

> +#define strtol_i(s,b,r) strtoi_with_tail((s), (b), (r), NULL)
> +static inline int strtoi_with_tail(char const *s, int base, int *result, char **endp)
> +{
> +	long ul;
> +	char *dummy = NULL;
> +
> +	if (!endp)
> +		endp = &dummy;
> +	errno = 0;
> +	ul = strtol(s, endp, base);
> +	if (errno ||
> +	    /*
> +	     * if we are told to parse to the end of the string by
> +	     * passing NULL to endp, it is an error to have any
> +	     * remaining character after the digits.
> +	     */
> +	   (dummy && *dummy) ||
> +	    *endp == s || (int) ul != ul)
> +		return -1;
> +	*result = ul;
> +	return 0;
> +}
> +
>  void git_stable_qsort(void *base, size_t nmemb, size_t size,
>  		      int(*compar)(const void *, const void *));
>  #ifdef INTERNAL_QSORT

This function is an updated version of strtol_i() function. It will
give more control to handle parsing of the characters after the
numbers and better error handling while parsing numbers.

Signed-off-by: Mohit Marathe <[email protected]>
The change is made to improve the error-handling capabilities
during the conversion of string to integers. The
`strtoi_with_tail` function offers a more robust mechanism for
converting strings to integers by providing enhanced error
detection. Unlike `atoi`, `strtoi_with_tail` allows the code to
differentiate between a valid conversion and an invalid one,
offering better resilience against potential issues such as
reading hunk header of a corrupted patch.

Signed-off-by: Mohit Marathe <[email protected]>
@mohit-marathe
Copy link
Author

/submit

Copy link

gitgitgadget bot commented Feb 4, 2024

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1646/mohit-marathe/update-strtol_i-v6

To fetch this version to local tag pr-1646/mohit-marathe/update-strtol_i-v6:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1646/mohit-marathe/update-strtol_i-v6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants