Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data transfer checker #372

Open
dkazanc opened this issue Jun 24, 2024 · 1 comment
Open

Data transfer checker #372

dkazanc opened this issue Jun 24, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@dkazanc
Copy link
Collaborator

dkazanc commented Jun 24, 2024

Discussion started in #267 and continued in #369.

I suggest we implement a some kind of data checker that is performed during the data transfer step (is it the right place?).

Every chunk and block of data needs to be tested for presence of zero values, nan's and inf's. Then the result needs to be accumulated in the block's loop to come up with the statistics for a chunk. The result for every block can go into the debug file.

The user, however, can get a warning about the percentage of nan's/inf's/zeros with respect to all values in the chunk. This warning should go to the user's log. In case if there are no nan's/inf's the message does not appear in the user's log. And zeros alert should only if zeros hit a certain percentage (e.g. 25% of the data are zeros).

Let me know if it sounds sensible? It shouldn't really take a lot of time to calculate this, but could be of great use for users and developers.

@yousefmoazzam
Copy link
Collaborator

yousefmoazzam commented Jul 9, 2024

Yep, I'm all for having some basic checks on the data and feeding it back to the users and logfile, sounds good.

I suggest we implement a some kind of data checker that is performed during the data transfer step (is it the right place?).

I think putting it in the data transfer method (ie, GenericMethodWrapper._transfer_data()) might not be quite the right place, just because it's not doing anything with data transferring, but putting this data check right next to the data transfer method call in execute() seems sensible.

The general idea of accumulating information across blocks sounds similar to what the stats calculation wrapper does, maybe that can be some sort of guide in how to do this if we need to modify a wrapper.

It'd most likely be GenericMethodWrapper that we'd modify I guess, since we'd want all method wrappers to perform the data check. But edge cases like the stats calculation wrapper might be worth considering before we jump into modifying the base generic method wrapper that everything inherits from. For example, the stats calculation wrapper inherits from the generic wrapper; should the stats calculation wrapper also be doing these data checks, if it's already doing stats calculations?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants