-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Resolve Schema Location for xsi:SchemaLocation in Config files #1335
Fix Resolve Schema Location for xsi:SchemaLocation in Config files #1335
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like a Xercs bug?
// we want to attempt to resolve the URI whether the non-jar uri has a scheme or not, | ||
// this is relevant for when we are validating with Xerces, and it calls resolvesEntity | ||
// we get URIs that look like file:/path/to/not/absolute/path ex: file:/org/apache/daffodil/xsd/dafext.xsd | ||
// that fail to be found in the above case, so we have to look them up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify, if we have something like this:
xsi:schemaLocation="urn:ogf:dfdl:2013:imp:daffodil.apache.org:2018:ext /org/apache/daffodil/xsd/dafext.xsd"
Then Xerces changes that to file:/org/apache/daffodil/xsd/dafext.xsd
and sends that to the EntityResolver? Feels like a bug in Xerces. It doesn't do that for include/import schemaLocation, not sure why it would do that for xsi:schemaLocation.
Note this also means means that if we ever have something legit like this:
<xs:include schemaLocation="file:/foo/bar/baz.dfdl.xsd">
Then we will look on the class path for /foo/bar/baz.dfdl.xsd
, which feels wrong, we should only ever look at the filesystem for that kind of schemaLocation. Anything else feels like a bug. And I'm not sure I want to trade a Xerces bug for a Daffodil bug.
I'm wondering if we should that xsi:schemaLocation won't resolve anything to a classpath, unless there is someway we can get Xerces to not munge the xsi:schemaLocation. Or maybe we have a bug somewhere else where we are doing that munging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I looked into this a bit more, and the conversion to a URI is definitely coming from Xerces, though I'm not sure that's a bug. According to this link, the location is supposed to be a URI...so it might be a Daffodil Bug to not properly interprete it. Thoughts? https://www.w3.org/TR/xmlschema-1/#schema-loc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/foo/bar/baz.dfdl.xsd
is a valid URI. I'm not sure Xerces should assume the URI references a file on the filesystem. Feels like it should be up to the entity resolver to determine how to interpret the URI.
if (uri.isAbsolute) { | ||
// an absolute URI is one with a scheme. In this case, we expect to be able to resolve | ||
// the URI and do not try anything else (e.g. filesystem, classpath). Since this function | ||
if (uri.isAbsolute && uri.getScheme.contains("jar")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be uri.getScheme == "jar"
. Avoids ever accidentally matching some other scheme like "foojar" if that ever happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 but needs a comment why this workaround is okay
@@ -221,7 +221,7 @@ class DFDLCatalogResolver private () | |||
resolvedSystem | |||
} else if ( | |||
resolvedUri != null && ((systemId == null) || (systemId != null && resolvedUri.endsWith( | |||
systemId | |||
(new URI(systemId)).getPath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a detailed comment that we do this to get around Xerces and why this is okay. Maybe even an example of schema in the code so it's very clear. At face value it looks wrong since if someone includes a schemaLocation of file:/foo/bar/baz
we will ignore the fact that this is an absolute URI as long as the path matches the resolveUri.
We should probably also only do this if the URI is an absolute file:
URI, to make it clear it is only trying to work around a very specific Xercses bug and wont' accidentally break other cases. For example, if the URI is a jar: uri then we probably don't want to do this logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a 2nd +1 since this went in a different direction from when I gave my first approval.
- currently, due to a bug in Xerces xsi:schemaLocation URIs are absolutized, which causes resolveCommon to not set the resolvedUri to the path it resolves from namespace, since the absolute URI doesn't match the end of the resolvedURI. This fix converts systemId to a URI and uses its path when comparing with resolvedURI, therefore resulting in a successful resolution. - add comment explaining systemIdPath check change - add schema location to config to prove it doesn't cause failure DAFFODIL-2339
16dcd8c
to
0efcc26
Compare
DAFFODIL-2339