Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Belt.Int.fromString silently throws away part of input #3732

Closed
RasmusKlett opened this issue Aug 7, 2019 · 9 comments
Closed

Belt.Int.fromString silently throws away part of input #3732

RasmusKlett opened this issue Aug 7, 2019 · 9 comments
Labels
stale Old issues that went stale

Comments

@RasmusKlett
Copy link

I encountered some very surprising behaviour using Belt.Int.fromString. If given the string representation of a floating point number, it will silently ignore anything beyond the decimal point.

Looking at the source, it turns out fromString is simply wrapping the Javascript parseInt. This is a very quirky Javascript-esque function, and in my opinion does not deserve a place in Belt.

Examples of surprising behaviour:

"15.9" -> 15      // The ".9" is ignored
"15e99" -> 15     // The "e99" is ignored
"   15" -> 15     // The whitespace is ignored - maybe fine?

I suggest that at least the first two should result in None, since the full string could not be parsed. I find it convenient that leading whitespace is automatically stripped, but some might prefer strict None behaviour in this case as well.

Please - let us not repeat the mistakes of Javascript just because it is convenient to implement.

@bobzhang
Copy link
Member

bobzhang commented Aug 8, 2019

@RasmusKlett looks reasonable, do you have a clean alternative in mind?

@RasmusKlett
Copy link
Author

I've tried a few versions, but haven't been able to maintain the performance of the builtin parseInt. I'm not sure how much we care about performance, but maybe someone smarter than me can help?

I've tried simply wrapping the OCaml int_of_string, this has a 3x performance penalty on my machine (Excuse the reason syntax):

let intFromString = str =>
  switch (int_of_string(str)) {
  | number => Some(number)
  | exception (Failure(_msg)) => None
};

Note this does not allow leading whitespace - I'm not sure what the correct behavour here is?

MDN suggests simply doing a regex validation. This performs slightly better than int_of_string, and can easily be made to allow leading whitespace.

function filterInt(value) {
  if (/^[-+]?(\d+)$/.test(value)) {
    return Number(value);
  } else {
    return undefined;
  }
}

I've also tried implementing it in terms of an explicit for-loop over code points. This performs terribly for some reason. If this was native code, I would expect this to perform best, since it skips a lot of checks not necessary with static types, but that was definitely not the case.

function codePointInt(value) {
  var result = 0;
  var length = value.length;
  for (var i=0; i < length; i++) {
    var code = value.codePointAt(i);
    result += (code - 48) * 10**(length - 1 - i);
  }
  return result;
}

So there are two questions here: Do we allow leading whitespace? And can we live with a 2-3x performance hit? If not, I think we need someone smarter than me to do some tuning.

Here's the stuff I've benched, if anyone wants to try.

@bobzhang
Copy link
Member

bobzhang commented Aug 9, 2019

I don't think performance is critical here, but not introducing additional deps is quite important.
Would you mind send a PR?

@RasmusKlett
Copy link
Author

It will take me some time, I haven't tried committing to the project before. I will give it a shot next week.

Unless someone disagrees, I will go ahead and use the Mozilla-suggested version of validating the input using a regex. This has the lowest performance hit, and it can allow leading whitespace.

@bobzhang
Copy link
Member

@RasmusKlett There is one thing annoying that in latest js, 3_23 is legal, how shall we handle it properly

@TheSpyder
Copy link
Contributor

I agree the default should be more reliable than parseInt - and performance probably doesn't matter - but I'd like parseInt to still remain somewhere. In some circumstances I want the quirky behaviour.

Perhaps parseFloat + Math.round would be a better basis for Int.fromString?

@amiralies
Copy link
Contributor

another case happening to me:

let x = Int.fromString("22 foo bar") // 22

rcherrueau added a commit to rcherrueau/rescript-compiler that referenced this issue Jul 3, 2021
The actual implementation of Belt.Int.fromString relies on the
Javascript parseInt function [0].  That function parses Strings in a
permissive manner.  For instance, if the function encounters an
alphabetical character, it returns the value up to that character:

  parseInt("123a456") // output: 123

The same applies for Belt.Float.fromString that relies on
parseFloat [1] and have a similar interpretation.

This commit proposes a stricter implementation of Int and Float parsing
using the Javascript Number constructor [2].  The Belt.Int.fromString
uses the more specific Math.trunc function [3], that parses string
similarly to Number and then returns the Int part of that number.

Relying only on the Number constructor keeps the implementation simple.
However, there exists one last quirk while using it with an empty
string.  It returns the 0 value:

  Number("") // ouptput: 0

Consequently, Belt.Int.fromString("") will return 0.  Tests for
Belt.(Int/Float).fromString have been updated in order to reflect that.

Fix rescript-lang#3732

[0] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/parseInt
[1] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/parseFloat
[2] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number/Number
[3] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/trunc
rcherrueau added a commit to rcherrueau/rescript-compiler that referenced this issue Jul 3, 2021
The actual implementation of Belt.Int.fromString relies on the
Javascript `parseInt` function [0].  That function parses strings in a
permissive manner.  For instance, if the function encounters an
alphabetical character, it returns the value up to that character:

    parseInt("123a456") // output: 123

The same applies for Belt.Float.fromString that relies on
`parseFloat` [1] and have a similar interpretation.

This commit proposes a stricter implementation of Int and Float parsing
using the Javascript `Number` constructor [2].  The Belt.Int.fromString
uses the more specific `Math.trunc` function [3], that parses string
similarly to `Number` and then returns the Int part of that number.

Relying only on the `Number` constructor keeps the implementation
simple.  However, there exists one last quirk while using it with an
empty string.  It returns the `0` value:

    Number("") // ouptput: 0

Consequently, Belt.Int.fromString("") will return `0`.  Tests for
Belt.(Int/Float).fromString have been updated in order to reflect that.

Fix rescript-lang#3732

[0] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/parseInt
[1] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/parseFloat
[2] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number/Number
[3] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/trunc

Signed-Off-By: Ronan-A. Cherrueau <[email protected]>
@stale
Copy link

stale bot commented Mar 11, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Old issues that went stale label Mar 11, 2023
@stale stale bot closed this as completed Mar 24, 2023
@glennsl
Copy link
Contributor

glennsl commented Mar 25, 2023

See discussion about this issue in rescript-lang/rescript-core#86 and proposed solution in rescript-lang/rescript-core#93

Probably better to fix this in rescript-core now than in the standard library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Old issues that went stale
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants