-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add use of DAFFODIL_TDML_API_INFOSETS enviroment variable #1365
base: main
Are you sure you want to change the base?
Add use of DAFFODIL_TDML_API_INFOSETS enviroment variable #1365
Conversation
e2f2646
to
0ee6b9a
Compare
- currently during testing we parse/unparse using both the sax and non-sax API, which leads to issues like trace running outputting twice for the same test which is confusing. We also run the parse for all out infoset outputters. With this tunable, we default to the more efficient single infoset outputter (scalaxml) and single API (non-sax) parse/unparse. - we convert TDMLInfosetOutputter to a trait so the Full and Limited subclasses can extend it as well as TeeInfosetOutputter - the tunable has 2 options: limited and full. with limited being the default and full being our current 2 API, all infoset outputters mode - get rid of unused and inaccessible parse function - add test showing use of full mode DAFFODIL-2904
0ee6b9a
to
ec3b567
Compare
@@ -579,6 +579,15 @@ | |||
</xs:documentation> | |||
</xs:annotation> | |||
</xs:element> | |||
<xs:element name="testingParseUnparseAPIMode" type="daf:TunableTestingParseUnparseAPIMode" default="limited" minOccurs="0"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if a tunable isn't the right thing for this?
This will apimode flag is really only going to be used for regression testing, and when we do regression testing we want it enabled for all tests. We don't really want to have to modify config files or something for all tests to enable this, which is how tunables are generally set.
I'm thinking we either want this to be a Java system property (e.g. daffodil.tdml.apimode
) or an environment variable (e.g. DAFFODIL_TDML_APIMODE
). And I think an environment variable has advantages over a Java system property. For one, using system properties is different depending on how you spawn the JVM. For example, the sbt and daffodil CLI specify system properties differently:
sbt -Ddaffodil.tdml.apimode=full test
DAFFODIL_JAVA_OPTS="-Ddaffodil.tdml.apimode=full" daffodil test foo.tdml
So with sbt you can use -D
directly, but with the CLI you have to specify it in DAFFODIL_JAVA_OPTS. I also am not sure if SBT propagates system properties if it forks. So if it forks a process (like some of our integration tests do), I imagine it won't get the system property and the forked processes won't run with full apimode.
But if we go with an environment variable instead, it's the same for both:
export DAFFODIL_TDML_APIMODE=full
sbt test
daffodil test foo.tdml
This is the same regardless if we are testing via sbt or via the daffodil CLI. And if something else ever runs the TDML runner (e.g. a test harness that directly calls the TDML runner), it's also the same. I also believe if sbt or an integration test forks, it copies the existing environment variables, so forked processes should also enable full api mode.
Also, for CI we can just add something like this to our GitHub actions config:
env:
DAFFODIL_TDML_APIMODE=full
And all CI will run with all APIs, which would be good for regression testing.
<xs:simpleType name="TunableTestingParseUnparseAPIMode"> | ||
<xs:restriction base="xs:token"> | ||
<xs:enumeration value="limited" /> | ||
<xs:enumeration value="full" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should consider something other than limited/full. I think we determined those probably weren't the best names for our different kinds of validation, so we should learn from that.
Maybe instead of "full", we do "all"? And maybe instead of "limited" we do "scala" (implying scala infoset?). If we ever wanted, this could then be extended to be a list of infosets/api's to use. For example, we could in theory support something like DAFFODIL_TDML_APIMODE="scala json sax"
, which would only enable those three kinds of infosets instead of all of them. I'm not sure the extra complexity is wroth the flexibility for now, but it leaves the option open. scala/all is probably sufficient for now.
Also, makes me wonder if the env variable wants to be something like DAFFODIL_TDML_INFOSETS
or DAFFODIL_TDML_API_INFOSETS
? Not sure I like those, but something that it expects an infoset type might be an improvement?
doParseWithBothApis(dpInputStream, saxInputStream, lengthLimitInBits) | ||
} else { | ||
doParseWithNonSaxAPI(dpInputStream, lengthLimitInBits) | ||
} | ||
} | ||
|
||
override def parse(is: java.io.InputStream, lengthLimitInBits: Long): TDMLParseResult = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below this we call IOUtils.toByteArray
, which reads the input stream into a byte array. But then it calls the above parse function which converts the byte array back to an input stream. We need to do that when parsing with both APIs since each API needs it's own input stream, but if we are only using the non-sax API we don't really need to do that.
I'm wondering if it would be possible and not too messy to only do the byte array stuff iwe are are doing both APIs. Otherwise we just pass in the original input stream to doParseWithNonSaxAPI? Avoids unnecessarily duplicating the input stream.
} | ||
} | ||
|
||
def doParseWithNonSaxAPI( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like there's some duplicate code in these doParseWith*API functions since they both implement the non-sax API. Can they be combined into one function, e.g.
private def doParse(...) {
// always do stuff with non-sax API
if (apiMode = full) {
// do sax API
// verify SAX results match non-sax results
}
}
Same with the unparse functions?
- currently during testing we parse/unparse using both the sax and non-sax API, which leads to issues like trace running outputting twice for the same test which is confusing. We also run the parse for all our infoset outputters. With this enviromental variable, we default to the more efficient single infoset outputter (scalaxml) and single API (non-sax) parse/unparse. - the DAFFODIL_TDML_API_INFOSETS env has 2 options: 'scala' and 'all'. with scala being the default and 'all' being the current implementation of running both APIs and all infoset outputters - refactor code to reduce duplication and remove tunable - add cli test showing use of scala and all mode - set CI mode to all for regression testing DAFFODIL-2904
- move cli tests to daffodil-test-integration since they require stage to run successfully DAFFODIL-2953
56be768
to
4876133
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 👍
|
||
val envs = Map("DAFFODIL_TDML_API_INFOSETS" -> "scala") | ||
|
||
runCLI(args"test -i -t $tdml byte_entities_6_08", envs = envs) { cli => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note that this doesn't really check that we are running only the scala API. Unfortunately which APIs are used are way behind the scenes so there isn't really a good way to verify it, and there's no good way for the expect library to expect that a string doesn't exist. This test is probably still worthwhile since I imagine it gives us coverage and at least confirms that "scala" is legal, but if we accidentally break things so "scala" runs both APIs, this test probably won't fail.
DAFFODIL-2904