Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sequence() fails as instance method #370

Closed
moocow-the-bovine opened this issue Jan 13, 2022 · 7 comments
Closed

sequence() fails as instance method #370

moocow-the-bovine opened this issue Jan 13, 2022 · 7 comments

Comments

@moocow-the-bovine
Copy link
Contributor

Since commit f49ac06 (v2.063_03), sequence() called as an instance method returns a copy of the calling object whenever its type is not double.

Examples:

$vals = [3,4,5];
$good = pdl(double, $vals)->sequence;  # ok: [0 1 2]
$bad = pdl(float, $vals)->sequence; # bad: [3 4 5]
$bad = pdl(indx, $vals)->sequence; # bad: [3 4 5]
# etc.

This leads to downstream failures in PDL::CCS (-> moocow-the-bovine/PDL-CCS#1), and appears to have resulted from changes made to PDL::sequence() and PDL::axisvals2() in the aforementioned commit. I'm guessing that the changed behavior is related to the new $type_given lexical in PDL::sequence() (which is false when called as an instance method), but I don't have a fix.

PDL::axisvals2() appears to be generating the correct values (in both $bar and $dummy), albeit always as type double (which seems rather inefficient). Perhaps the inplace() on the sequence() lexical $bar isn't propagating correctly through axisvals2() when there's a type-mismatch?

@mohawk2
Copy link
Member

mohawk2 commented Jan 13, 2022

Thank you for figuring out where PDL::CCS's expectation was being broken! I'll add this to the unit test and then see about fixing it.

@mohawk2
Copy link
Member

mohawk2 commented Jan 13, 2022

In fact, thank you for the unit tests as well! That saves time :-D

@mohawk2
Copy link
Member

mohawk2 commented Jan 13, 2022

FYI I've moved the tests from t/core.t to t/basic.t, because despite how fundamental sequence is in PDL programming, it's implemented in PDL::Basic rather than PDL::Core. Yes, those packages want a good tidy-up in due course.

@mohawk2
Copy link
Member

mohawk2 commented Jan 13, 2022

By the way, inplace for different types is semantically a bit confusing. I saw in CCS that you've got a convert that works inplace, which is something I'm looking at enabling in main PDL. The main problem it creates is when there are badvalues - if you change the data in-place (which is itself a problem - what happens when you convert byte to double?), what should the new ndarray's (or supposedly, due to inplace, the same one) badvalue be? If you have insights into how to make this work well, I'm very interested!

Another issue is that in the hopefully-near-future loop-fusion change, the whole concept of inplace will become a bit moot, and might actually get ignored. Ideas on this are extremely welcome!

@mohawk2
Copy link
Member

mohawk2 commented Jan 13, 2022

I am closing this, because #371 has been merged and released with 2.067. But I do still want to see your thoughts on the above point :-)

@mohawk2 mohawk2 closed this as completed Jan 13, 2022
@moocow-the-bovine
Copy link
Contributor Author

moocow-the-bovine commented Jan 14, 2022

By the way, inplace for different types is semantically a bit confusing.

Yep.

I saw in CCS that you've got a convert that works inplace, which is something I'm looking at enabling in main PDL. The main problem it creates is when there are badvalues

I think my policy on bad-balue-conversion to date has been mostly "hide my head in the sand and hope it doesn't bite me". CCS has some special handling for treating the "missing" value as BAD or NaN ... but it's basically just setting explicit administrative flags (analogous to $pdl->badflag).

if you change the data in-place (which is itself a problem - what happens when you convert byte to double?), what should the new ndarray's (or supposedly, due to inplace, the same one) badvalue be?

I guess the principle of least surprise dictates that BAD should always be BAD. My understanding has been that the actual BAD value is dependent only on the datatype, so that would suggest that the conversion target type's badvalue() replaces the source's one (maybe in a 2nd pass scan of the input if badflag is set?), so -2**31 would replace 255 for a byte->long conversion (as is the case for the current convert()). That said, I also think that anyone who needs inplace conversion and bad values should be savvy enough to figure out what those values are and treat them according to the needs of their application ... maybe they actually want 255-BAD to stay 255, and it would be fairly easy to do bad-mapping on the perl side with $old->inplace->convert($new_type)->inplace->setvaltobad($old_type->badvalue)

If you have insights into how to make this work well, I'm very interested!

CCS inplace-conversion actually uses PDL::convert() for the values ($ccs->[$VALS]), but it's a no-op for the non-missing indices ($ccs->[$WHICH]): the entire justification for allowing $ccs->inplace()->convert() is to save the memory & CPU cycles that would be needed to copy $ccs->[$WHICH].

Another issue is that in the hopefully-near-future loop-fusion change, the whole concept of inplace will become a bit moot, and might actually get ignored. Ideas on this are extremely welcome!

sounds interesting!

@mohawk2
Copy link
Member

mohawk2 commented Jan 15, 2022

I saw in CCS that you've got a convert that works inplace, which is something I'm looking at enabling in main PDL. The main problem it creates is when there are badvalues

I think my policy on bad-balue-conversion to date has been mostly "hide my head in the sand and hope it doesn't bite me". CCS has some special handling for treating the "missing" value as BAD or NaN ... but it's basically just setting explicit administrative flags (analogous to $pdl->badflag).

My feeling is you'd be better off using explicit values for BAD rather than out-of-band information, if only for performance reasons (in terms of data locality, it's right there with the non-BAD data). But I don't know CCS well at all!

And the not-dealing is tempting; until the last month or so, qsortvec[i] mishandled BAD values as simply their numerical values.

if you change the data in-place (which is itself a problem - what happens when you convert byte to double?), what should the new ndarray's (or supposedly, due to inplace, the same one) badvalue be?

I guess the principle of least surprise dictates that BAD should always be BAD. My understanding has been that the actual BAD value is dependent only on the datatype, so that would suggest that the conversion target type's badvalue() replaces the source's one (maybe in a 2nd pass scan of the input if badflag is set?), so -2**31 would replace 255 for a byte->long conversion (as is the case for the current convert()). That said, I also think that anyone who needs inplace conversion and bad values should be savvy enough to figure out what those values are and treat them according to the needs of their application ... maybe they actually want 255-BAD to stay 255, and it would be fairly easy to do bad-mapping on the perl side with $old->inplace->convert($new_type)->inplace->setvaltobad($old_type->badvalue)

Depending on how it was compiled, PDL has had per-ndarray bad values for a long time. Badvalues and per-ndarray values became enabled by default last year, together with being able to use NaN as that if you wanted. But if you don't set it, you get the type's badvalue, which you can change (though I suspect it's a bad idea to do so, partly because it's action at a distance). Hopefully you'd agree that saying "BAD should stay BAD" isn't helpful here - when the type changes, a choice must be made as to what numerical value to set the bad value to, since that's how bad values are implemented. If they want badvalues (when the ndarray and/or type-default badvalue is 255) to stay 255, all they'd need to do is turn off the badflag, then do as you say. But the inplace part before the convert still doesn't help, since the datatypes might be different sizes. Saying "inplace" doesn't make it so here ;-)

If you have insights into how to make this work well, I'm very interested!

CCS inplace-conversion actually uses PDL::convert() for the values ($ccs->[$VALS]), but it's a no-op for the non-missing indices ($ccs->[$WHICH]): the entire justification for allowing $ccs->inplace()->convert() is to save the memory & CPU cycles that would be needed to copy $ccs->[$WHICH].

I suppose I was imagining for CCS that you could "simply" convert the underlying dense storage, since no dimensional stuff would change? As I said, I don't know the library well :-)

Another issue is that in the hopefully-near-future loop-fusion change, the whole concept of inplace will become a bit moot, and might actually get ignored. Ideas on this are extremely welcome!

sounds interesting!

Please take a look at #349 and comment if you like :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants