-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
90% - Prevent null values being added to ExtendedGraph #82
90% - Prevent null values being added to ExtendedGraph #82
Conversation
* | ||
* @return bool | ||
*/ | ||
protected function isValueValid($value){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change name to isValidLiteralValue()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't we want to check resource values as well though? isValidTripleValue
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From Tripod's perspective, don't we just need to ensure that subject, predicate, and object are all strings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-empty strings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, but enforcing strings as opposed to null could be very painful for any graphs which are already storing non-string values (eg. integers). This current PR to ignore null values could well cause enough headaches without taking out integers at the same time. I do think we should ultimately enforce strings-only, but maybe one step at a time, once we are comfortable with no nulls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or arrays of strings and integers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess what I'm saying is that we should be doing type checking here so we're not just allowing some other bad data that is equally as incompatible as a null value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we deal with legacy data, though? We have no idea what crazy things have ended up in the graph. The php data types are:
String.
Integer.
Float (floating point numbers - also called double)
Boolean.
Array.
Object.
NULL.
Resource.
Integer, floats and booleans could all have ended up in the graph. Is it possible to add an array via the public ExtendedGraph methods?
Agree that really we should be ignoring anything that isn't a string, but I guess the question is how much we are prepared to break anything currently using tripod to enforce this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tripod only supports plain literals. Full-stop. If other data types are in the documents they would produce invalid RDF, anyway, because we do not support typed literals at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed this with @rsinger Agreed to continue with the current approach. I have just tested to see if add_to_literal can accept arrays, and it can, so we can't simply ignore any non-scalar.
* @return bool | ||
*/ | ||
protected function isValidTripleValue($value){ | ||
if(!is_string($value) && !is_int($value) && !is_array($value) && !is_float($value) && !is_bool($value)){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not is_scalar
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh yep, I meant to use that, no idea why I didn't.... one moment.
@rsinger @RobotRobot Something I just noticed is https://github.com/talis/tripod-php/blob/master/src/classes/ExtendedGraph.class.php#L141 - What should be return of |
@rsinger @RobotRobot also at the moment we are allowing scalars for both literals and resources. Should we restrict resources to only strings? I really can't imagine how having anything other than a string can be legitimate for a uri. |
resources should definitely always be strings. So I think we should validate them as such, rather than scalars. |
* @param array $mongoValueObject | ||
* @return array | ||
* @return array| false an array of values or false if the value is not valid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
array| false
isn't a valid phpdoc, it needs to be array|bool
This looks good. Should we validate |
@rsinger I feel like maybe that should be raised as a separate issue so we can get this branch down. |
Why can't you just run $s and $p through isValidResourceProperty? |
I mean, my real question is, can you even add invalid values there? I don't see why wouldn't fix it here, if so. |
@rsinger my only real reservation to fixing that in this branch is that I'll have to add a whole bunch of tests to cover that, and this needs to get shipped today, ideally. It's currently passing as it is so I think the right strategy is to merge this down and fix up the $s and $p in another branch... |
Can you just confirm whether or not it's even an issue? There is no point in making a release to prevent us from adding duff data if we've still left it open to add duff data. |
Added checks for subjects and predicates |
This will now also throw an exception if trying to add a triple with a blank subject |
* | ||
* @return bool | ||
*/ | ||
protected function isValidTripleValue($value){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Triple value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that would be a good generic name to cover subject, predicate and value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it made more sense when it was named resourcevalue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used for more than just validating a resource value though - the only thing it doesn't validate now is a literal value. isValidTripleComponentValue
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I be breaking this out into isValidSubject
, isValidPredicate
, and isValidVaue
? It might make sense since we are now ensuring that the subject is not an empty string, as we can just add the check into there...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, they're all just uris.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isTripleValue()
really doesn't make a lot of sense.
The graph is made up of subjects ($s) predicates ($p) and objects ($o).
Subjects and predicates are always resources (another name for URIs), objects can be either a resource or a literal.
So it makes more sense to have two methods that reflect that.
isValidResource()
- this covers subjects, predicates or objects that are of type URI. Those are all resources - a different term for URI.
isValidLiteral()
covers objects that are literals only.
You are currently doing validation in two places - once in _add_triples()
and again in the individual methods - pick one or the other - don't do both - that just wastes CPU cycles unnecessarily.
Finally you're null checking outside these methods. Do the null checking inside the isValidResource()
. If the validity check fails for resources or predicates (i.e. returns false) I think the calling method might validly throw an exception, but for objects we are saying it is allowed but ignored (to keep the peace).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point, in my head I was separating subjects and resource values unnecessarily here, will fix up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't spotted that duplicate validation - that was a leftover from a previous iteration. Removing.
if(!isset($value['r']) || !$this->isValidTripleValue($value['r'])){ | ||
return; | ||
} | ||
if($value['r'] === ""){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again - why is your validity checking not inside the method validating the value? Instead if isValidWhatever
returns false then throw your exception.
@RobotRobot @rsinger Fixed up from PR comments:
|
* @param string $value | ||
* @return bool | ||
*/ | ||
protected function isValidLiteralValue($value){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too verbose, rename isValidLiteral()
👍 just rename those methods, then merge |
…d-to-extendedgraph 90% - Prevent null values being added to ExtendedGraph
To address issue #81
This will ensure that subjects, predicates, and resource values are always strings. Literal values can be any scalar - we would like to enforce strings for literal values too, but doing so may break legacy databases which use non-strings as values.