Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Belt] Stricter parse for Int and Float #5209

Closed
wants to merge 1 commit into from

Conversation

rcherrueau
Copy link

The actual implementation of Belt.Int.fromString relies on the
Javascript parseInt function [0]. That function parses Strings in a
permissive manner. For instance, if the function encounters an
alphabetical character, it returns the value up to that character:

parseInt("123a456") // output: 123

The same applies for Belt.Float.fromString that relies on
parseFloat [1] and have a similar interpretation.

This commit proposes a stricter implementation of Int and Float parsing
using the Javascript Number constructor [2]. The Belt.Int.fromString
uses the more specific Math.trunc function [3], that parses string
similarly to Number and then returns the Int part of that number.

Relying only on the Number constructor keeps the implementation simple.
However, there exists one last quirk while using it with an empty
string. It returns the 0 value:

Number("") // ouptput: 0

Consequently, Belt.Int.fromString("") will return 0. Tests for
Belt.(Int/Float).fromString have been updated in order to reflect that.

Fix #3732

[0] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/parseInt
[1] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/parseFloat
[2] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number/Number
[3] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/trunc

The actual implementation of Belt.Int.fromString relies on the
Javascript `parseInt` function [0].  That function parses strings in a
permissive manner.  For instance, if the function encounters an
alphabetical character, it returns the value up to that character:

    parseInt("123a456") // output: 123

The same applies for Belt.Float.fromString that relies on
`parseFloat` [1] and have a similar interpretation.

This commit proposes a stricter implementation of Int and Float parsing
using the Javascript `Number` constructor [2].  The Belt.Int.fromString
uses the more specific `Math.trunc` function [3], that parses string
similarly to `Number` and then returns the Int part of that number.

Relying only on the `Number` constructor keeps the implementation
simple.  However, there exists one last quirk while using it with an
empty string.  It returns the `0` value:

    Number("") // ouptput: 0

Consequently, Belt.Int.fromString("") will return `0`.  Tests for
Belt.(Int/Float).fromString have been updated in order to reflect that.

Fix rescript-lang#3732

[0] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/parseInt
[1] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/parseFloat
[2] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number/Number
[3] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/trunc

Signed-Off-By: Ronan-A. Cherrueau <[email protected]>
@bobzhang
Copy link
Member

relevant #3732, I think it is okay to introduce a breaking change in v10.
Number seems not safe:

Number("90071992547409922")
90071992547409920

I would suggest here we just use a handwritten one which has clear specification, reference implementation (need some error/overflow checking):

function codePointInt(value) {
  var result = 0;
  var length = value.length;
  for (var i=0; i < length; i++) {
    var code = value.codePointAt(i);
    result = 10 * result + (code - 48)  | 0
  }
  return result;
}

@rcherrueau
Copy link
Author

rcherrueau commented Jul 12, 2021

I understand.

In that case, may I suggest to follow an implementation similar to Elm [0]. I tested its performance against the implementations in #3732 (comment), and the result is really good [1]. However, even with that implementation the parsing of 90071992547409922 fails, because that number is higher than Number.MAX_SAFE_INTEGER and thus should be represented as a BigInt [2]. I don't know how this should be handle in rescript.

References:

[0] https://github.com/elm/core/blob/65cea00afa0de03d7dda0487d964a305fc3d58e3/src/Elm/Kernel/String.js#L278-L297

[1] https://rebench.github.io/?rebench-data=NoIhBplA2CQKwM5wPQCoAEAHArgFw0TxwDMSMALAUwCcqM0UAdAO1ZJxYGM8BLAexYYqAGwC2AfQDKeGrxYBzCXn4BJFngAURGgEpWAb1ZM8ANwCGNDCrzmRGALwYADAG5jZyxi78AJlWdHQlkAOi4KSwBhPyoAQS1nXXc2TysiSwInH39AhydnAA8AJgAhDHQMAGoGFAwAHzrvGNz84oARcswAWhqMAH4MAEYMAC4XZI8SfitNCyteIPSaPFcMBYAeYJoQkSpFPApVysrefRSjFJM5pv9F0PComPjNU4m8E15yTWz6TcKAZkCDRcBX+AE4MJsfmcTCYLrD3rIqMQaEJOP4SPIqL43iYAL4eEw2OxBQaBTDE+zVH4YHoAtweAlsFJ0FFCBZ5YIZQl4AboqiYljYnljb7NRytIodAZdSmjaz8WwiJKsJmiSQyOSKZRqDSaJggMHOZwAdkGYLBRQArAAWE025wWopFA26EAAXUgICtcCQqEwAFlzCwcCSaVh+PICLw8LRzHxBDV2JweAIhD8AAqRjTqLQWEQ4Ki6DAXDAYa50RA4ESZcasMvXXb7ChBfOFnZ7BQHZJlqYza68BxuNaQjBNruHNbHYulsvlrw0pxtqhhGJZqPPV71ueV6sESpOMW3Ho2gAcxcwZLQaE044OtKGD63QgwTLLrJwqIwu5rySZmezPBc31Q1jTNJ1bXtR1LRdEA3U9UAbV9ZBwBACoAzaAA5DApBwBQFCoIhsXLWhEDTJMWA4bgEyETEa1oYDlxnbdPgwTQUAAPWALpKndPp9SYXxKl0AASFAQljIhZjsQtdGYl932RT8hEwnAxAAI1oaSCyLHtX2EEREHoWdFLZDB+UFbE9KZJk6NjGhgINI1TXNS1IIdJ1YKSD0vX+ZD-QwSJPzoDQMBKUQCFMUjyJ6HwxCwXhdl8DB+HwXACEYZNqPIkgaH4MRNXkBQXnkhsvF4EThicLBLCM4DeHAIZEj01iXkQTDzEwl5Krkktt1M5TzJYDEsRxbc8QMoy+oU78lK-CrBms1VjEovKCtkIqQOc8C3LtDyYNdVwfNAQY4DoAKZF8EReHUjAAHcaHMLAEsUCjdmjDQADE1sK16nB0RwAD5t0QO6Y3CNiowkfgSAkHRNp0XrZ0aEMNNoIGcPyqhNFRzS9G3RoqAKLgqCwGi2M+8xEs-bGJDERAFF6hxAYwTDBCoVUJhYKNvvy37iqcsDXOtPboOdQ6PXdd0gA

[2] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/BigInt

@cristianoc
Copy link
Collaborator

cristianoc commented Jun 25, 2022

I think it is okay to introduce a breaking change in v10.

Since v10 is coming soon, perhaps this can be revisited.

@cristianoc cristianoc added this to the v10.1 milestone Jun 25, 2022
@cristianoc
Copy link
Collaborator

Feel free to reopen if still relevant.

@cristianoc cristianoc closed this Jul 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Belt.Int.fromString silently throws away part of input
3 participants