Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to generate random booleans with stablehlo.rng? #1313

Open
wecing opened this issue Mar 10, 2023 · 9 comments
Open

How to generate random booleans with stablehlo.rng? #1313

wecing opened this issue Mar 10, 2023 · 9 comments
Labels

Comments

@wecing
Copy link

wecing commented Mar 10, 2023

What happened?

According to the spec:

If rng_distribution == UNIFORM, then the random numbers are generated following the uniform distribution over the interval [a, b)
If a >= b, the behavior is undefined.

(C1) a, b, and result have the same element type.

Here if we want the result to be i1, a and b must also be i1; because a < b, we have a == 0 and b == 1. According to the spec, we will only generate values in the range of [0, 1), i.e. constant 0.

Does stablehlo.rng ignore a and b when result type is i1?

Steps to reproduce your issue

No response

Version information

No response

@loganchien
Copy link

I think the problem is not limited to i1.

For example, if the output element type is i8, we won't be able to specify [-128, 127] (inclusive).

For example, if the output element type is ui8, we won't be able to specify [0, 255] (inclusive).

I think we should either switch to close-close interval or allow a/b to be larger integer type.

@sdasgup3
Copy link
Member

sdasgup3 commented Mar 10, 2023

Thanks @wecing and @loganchien for the findings and analysis.
We should fix the problem if StableHLO needs (#597) to support the op.

@wecing
Copy link
Author

wecing commented Mar 10, 2023

I think we should either switch to close-close interval

I would say this should be controlled with a flag. stablehlo.rng could be used to generate floats under uniform distribution; in that case it makes sense to use close-open intervals, e.g. [0, 1).

@sdasgup3 sdasgup3 added the Spec label Mar 10, 2023
@soraros
Copy link

soraros commented Mar 10, 2023

IMO, it's really that rng should treat exact and inexact types differently.
Or even better, having separate ops for them. As a hint, it already doesn't support non-float types when rng_distribution = NORMAL.

@sdasgup3
Copy link
Member

I think we should either switch to close-close interval

For int/bools, that makes sense given std::uniform_int_distribution follows that.

be used to generate floats under uniform distribution; in that case it makes sense to use close-open intervals,

That is right: we can keep [closed, open) here given std::uniform_real_distribution.

@sdasgup3
Copy link
Member

Thanks for all the inputs!

@wecing
Copy link
Author

wecing commented Mar 10, 2023

we can keep [closed, open) here given std::uniform_real_distribution.

That page also mentions:

It is difficult to create a distribution over the closed interval [a, b] from this distribution

I do wonder how useful would rng(UNIFORM) : f32 be, though. There are other corner cases like begin=0, end=+inf where it's really unclear what the desired behavior should be. In practice it would be a lot easier to just do:

a = rng(begin = 0, end = (2^n)-1) : i32
b = 2^n
return a / b

@loganchien
Copy link

loganchien commented Mar 10, 2023

I do wonder how useful would rng(UNIFORM) : f32 be, though.

I think the uniform distribution can only be achieved when both a and b are finite and both a and nextafter(b, -inf) have the same exponent and same signbit. And then, rng(a, b, UNIFORM) generates uniform integer between [significand(a), significand(nextafter(b, -inf))] (inclusive) and returns floating point with {signbit, exponent, random_significand}. Other combination is unlikely to be uniform (given that IEEE 754 binary float32 is not continuous in mathematical sense and values are always rounded if there are too many significand bits).

rng(a, b, UNIFORM) :f32 can deal with the value range that is too large to be kept in int64/uint64, even though its precision is still limit to the number of significand bits.

I think these two limitations are known issues of C++ (or Boost Random Library) uniform_real_distribution. And it seems that there is no simple fix. [1] [2]

To some degree, I agree that serious users should not use rng(a, b, UNIFORM) : f32 if uniform distribution is essential to the correctness of the algorithm (e.g. encryption algorithms, key generation algorithms, etc). It is easy to mess up even if rng generates uniform distributed float32. For example, assume that x is a uniform distributed random value in [1, 2) , 5 * x is not uniform distributed within the range [5, 10) due to the rounding.

But I think most use cases just expect some convenient access to floating point random values that look like uniform distributed in the close-open interval [a, b) and can accept a little uneven probability (e.g. Monte Carlo simulation).

@sdasgup3
Copy link
Member

sdasgup3 commented Mar 24, 2023

Based on my conversation with @wecing, there is no immediate rush to deliver on the feature and can wait for sometime in Q2. With that keeping this ticket unassigned for now to be addressed in Q2'23.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: No status
Development

No branches or pull requests

4 participants