-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Boundary conditions - e.g. per-shell R and CC values #29
Comments
Boundary and advisory limits are used during deposition and annotation to ensure that the values that we receive are valid. Take pH as an example: We are looking at adding a boundary limit to Rmerge to prevent users entering values of 10 instead of 0.10. It has taken the biocurators several months to clean up values which were incorrect. The OneDep system does parse values from uploaded files - so this problem is less pronounced for categories which are typically in uploaded coordinate files - such as refinement statistics from _refine. |
It makes complete sense to have a "Boundary condition" of 0 <= pH <= 14 because the definition of a pH value does not allow it to be negative or above 14. Going back to reflection data, both upper boundary conditions for _reflns_shell.Rmerge_I_obs make sense. My question concerning per-shell merging R values is why they have a lower "Boundary condition" of 0.0 when there is no reason that they should always be above 0.0. It's as if the pH boundary conditions would be 1.0 <= pH <= 14. The case of _reflns_shell.pdbx_CC_half is even more pronounced. Anyway, if I understand your pH example correctly then the "Boundary condition" is describing the theoretical limits of a particular value and the "Advisory boundary condition" provides for some sanity checking during deposition. Therefore the above examples should most likely be modified to become
(the lower "Boundary condition" for the R-values is maybe debatable, although I would prefer allowing for a certain negative range). If there is the (unfortunate and unenviable) need to distinguish between values given in % or not at deposition, this should probably be part of the "Advisory boundary condition" (which looks like a PDBx extension, _pdbx_item_range) and maybe not placed into the "Boundary condition" (_item_range) which seems an original mmCIF item. Life would probably be easier if no values were ever given in percentages (which sometimes gives the impression a value could only ever be between 0 and 100%)... but storing integers was just so much cheaper back in the days I guess ;-) |
Anyway, boundary conditions should be for impossible values. The advisory limits a from a model/data interpretation point of view much more important. |
While some of the limits may seem odd, they are based partially on the data in the archive. For instance Rmerge on I for reflns_shell - we have a value of 116 in 6QT6. This agrees with the publication in which "sensible" Rmerge is reported for other datasets - and this one is radiation induced damaging of the crystal. Knowing the PI - I do not believe this was an error of percentage and off by 100. Inclusion of zero boundary limits might be in error - but we need to ensure that we do not have zero's hidden away in the archive - instead of ? for missing data - so they need to be checked one by one. Clearly more cleanup can and will continue into the future. With regards to advisory limits, these were based on statistical analysis of data in the archive. It is used to suggest unusual values that might be in error in manual data entry - but we did not wish to block someone who does have that unusual case. |
Does the "observed criterion" of *This is making an assumption about valid definitions of the observed criterion. I don't know if such assumptions are safe. Likewise, I agree that |
As far as I understand, the following boundary conditions are given in the current dictionary
Apart from the fact that there seems an inconsistency regarding the lower limit (inclusive/exclusive), where do these boundary conditions come from? They are not intrinsic in the formulae for computing these metrics: a poorly populated shell or one with a lot of negative intensity values (high anisotropy of significant data while computing these metrics using all data before data cut-off) could easily lead to perfectly valid negative values.
Another example is
which seems rather odd. A correlation coefficient should have a boundary condition of -1 <= x <= 1 (intrinsic to the way it is computed).
Maybe there is some confusion about the use of "Boundary condition" and the "Advisory Boundary condition". I would have thought that the "Boundary condition" describes what range of values a particular item can take due to the formula (see CC above) or the nature of the item (a "number of reflections" can not be negative etc). As a complement to this, the "Advisory Boundary condition" can provide smaller boundaries in order to provide more guidance what is commonly seen as a sensible range (for whatever reason). So we have for example
Questions:
The text was updated successfully, but these errors were encountered: