You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I suggest we implement a some kind of data checker that is performed during the data transfer step (is it the right place?).
Every chunk and block of data needs to be tested for presence of zero values, nan's and inf's. Then the result needs to be accumulated in the block's loop to come up with the statistics for a chunk. The result for every block can go into the debug file.
The user, however, can get a warning about the percentage of nan's/inf's/zeros with respect to all values in the chunk. This warning should go to the user's log. In case if there are no nan's/inf's the message does not appear in the user's log. And zeros alert should only if zeros hit a certain percentage (e.g. 25% of the data are zeros).
Let me know if it sounds sensible? It shouldn't really take a lot of time to calculate this, but could be of great use for users and developers.
The text was updated successfully, but these errors were encountered:
Yep, I'm all for having some basic checks on the data and feeding it back to the users and logfile, sounds good.
I suggest we implement a some kind of data checker that is performed during the data transfer step (is it the right place?).
I think putting it in the data transfer method (ie, GenericMethodWrapper._transfer_data()) might not be quite the right place, just because it's not doing anything with data transferring, but putting this data check right next to the data transfer method call in execute() seems sensible.
The general idea of accumulating information across blocks sounds similar to what the stats calculation wrapper does, maybe that can be some sort of guide in how to do this if we need to modify a wrapper.
It'd most likely be GenericMethodWrapper that we'd modify I guess, since we'd want all method wrappers to perform the data check. But edge cases like the stats calculation wrapper might be worth considering before we jump into modifying the base generic method wrapper that everything inherits from. For example, the stats calculation wrapper inherits from the generic wrapper; should the stats calculation wrapper also be doing these data checks, if it's already doing stats calculations?
Discussion started in #267 and continued in #369.
I suggest we implement a some kind of data checker that is performed during the data transfer step (is it the right place?).
Every chunk and block of data needs to be tested for presence of zero values, nan's and inf's. Then the result needs to be accumulated in the block's loop to come up with the statistics for a chunk. The result for every block can go into the debug file.
The user, however, can get a warning about the percentage of nan's/inf's/zeros with respect to all values in the chunk. This warning should go to the user's log. In case if there are no nan's/inf's the message does not appear in the user's log. And zeros alert should only if zeros hit a certain percentage (e.g. 25% of the data are zeros).
Let me know if it sounds sensible? It shouldn't really take a lot of time to calculate this, but could be of great use for users and developers.
The text was updated successfully, but these errors were encountered: