Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Resolve Schema Location for xsi:SchemaLocation in Config files #1335

Merged

Conversation

olabusayoT
Copy link
Contributor

@olabusayoT olabusayoT commented Oct 10, 2024

  • currently we check resolvedUri from namespaces endWith the passed in systemId. With the Xerces bug, the absolutized systemId will never match, so we convert the systemId to a URI and grab its path, and use that for the comparison
  • add comment about specific validation error from xerces that requires workaround
  • add schema location to config to prove it doesn't cause failure

DAFFODIL-2339

Copy link
Contributor

@jadams-tresys jadams-tresys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member

@stevedlawrence stevedlawrence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like a Xercs bug?

// we want to attempt to resolve the URI whether the non-jar uri has a scheme or not,
// this is relevant for when we are validating with Xerces, and it calls resolvesEntity
// we get URIs that look like file:/path/to/not/absolute/path ex: file:/org/apache/daffodil/xsd/dafext.xsd
// that fail to be found in the above case, so we have to look them up
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, if we have something like this:

xsi:schemaLocation="urn:ogf:dfdl:2013:imp:daffodil.apache.org:2018:ext /org/apache/daffodil/xsd/dafext.xsd"

Then Xerces changes that to file:/org/apache/daffodil/xsd/dafext.xsd and sends that to the EntityResolver? Feels like a bug in Xerces. It doesn't do that for include/import schemaLocation, not sure why it would do that for xsi:schemaLocation.

Note this also means means that if we ever have something legit like this:

<xs:include schemaLocation="file:/foo/bar/baz.dfdl.xsd">

Then we will look on the class path for /foo/bar/baz.dfdl.xsd, which feels wrong, we should only ever look at the filesystem for that kind of schemaLocation. Anything else feels like a bug. And I'm not sure I want to trade a Xerces bug for a Daffodil bug.

I'm wondering if we should that xsi:schemaLocation won't resolve anything to a classpath, unless there is someway we can get Xerces to not munge the xsi:schemaLocation. Or maybe we have a bug somewhere else where we are doing that munging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I looked into this a bit more, and the conversion to a URI is definitely coming from Xerces, though I'm not sure that's a bug. According to this link, the location is supposed to be a URI...so it might be a Daffodil Bug to not properly interprete it. Thoughts? https://www.w3.org/TR/xmlschema-1/#schema-loc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/foo/bar/baz.dfdl.xsd is a valid URI. I'm not sure Xerces should assume the URI references a file on the filesystem. Feels like it should be up to the entity resolver to determine how to interpret the URI.

if (uri.isAbsolute) {
// an absolute URI is one with a scheme. In this case, we expect to be able to resolve
// the URI and do not try anything else (e.g. filesystem, classpath). Since this function
if (uri.isAbsolute && uri.getScheme.contains("jar")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be uri.getScheme == "jar". Avoids ever accidentally matching some other scheme like "foojar" if that ever happens.

Copy link
Member

@stevedlawrence stevedlawrence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 but needs a comment why this workaround is okay

@@ -221,7 +221,7 @@ class DFDLCatalogResolver private ()
resolvedSystem
} else if (
resolvedUri != null && ((systemId == null) || (systemId != null && resolvedUri.endsWith(
systemId
(new URI(systemId)).getPath
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a detailed comment that we do this to get around Xerces and why this is okay. Maybe even an example of schema in the code so it's very clear. At face value it looks wrong since if someone includes a schemaLocation of file:/foo/bar/baz we will ignore the fact that this is an absolute URI as long as the path matches the resolveUri.

We should probably also only do this if the URI is an absolute file: URI, to make it clear it is only trying to work around a very specific Xercses bug and wont' accidentally break other cases. For example, if the URI is a jar: uri then we probably don't want to do this logic.

Copy link
Contributor

@jadams-tresys jadams-tresys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a 2nd +1 since this went in a different direction from when I gave my first approval.

- currently, due to a bug in Xerces xsi:schemaLocation URIs are absolutized, which causes resolveCommon to not set the resolvedUri to the path it resolves from namespace, since the absolute URI doesn't match the end of the resolvedURI. This fix converts systemId to a URI and uses its path when comparing with resolvedURI, therefore resulting in a successful resolution.
- add comment explaining systemIdPath check change
- add schema location to config to prove it doesn't cause failure

DAFFODIL-2339
@olabusayoT olabusayoT force-pushed the daf-2339-xsiSchemaLocationAddedToConfigFile branch from 16dcd8c to 0efcc26 Compare October 17, 2024 00:36
@olabusayoT olabusayoT merged commit e5851a7 into apache:main Oct 29, 2024
12 checks passed
@olabusayoT olabusayoT deleted the daf-2339-xsiSchemaLocationAddedToConfigFile branch October 29, 2024 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants