Add use of DAFFODIL_TDML_API_INFOSETS enviroment variable #1365

olabusayoT · 2024-11-06T02:45:50Z

currently during testing we parse/unparse using both the sax and non-sax API, which leads to issues like trace running outputting twice for the same test which is confusing. We also run the parse for all our infoset outputters. With this enviromental variable, we default to the more efficient single infoset outputter (scalaxml) and single API (non-sax) parse/unparse.
the DAFFODIL_TDML_API_INFOSETS env has 2 options: 'scala' and 'all'. with scala being the default and 'all' being the current implementation of running both APIs and all infoset outputters
we convert TDMLInfosetOutputter to a trait so the All and Scala subclasses can extend it as well as TeeInfosetOutputter
get rid of unused and inaccessible parse function
set CI mode to all for regression testing
add cli test showing use of scala and all mode

- currently during testing we parse/unparse using both the sax and non-sax API, which leads to issues like trace running outputting twice for the same test which is confusing. We also run the parse for all out infoset outputters. With this tunable, we default to the more efficient single infoset outputter (scalaxml) and single API (non-sax) parse/unparse. - we convert TDMLInfosetOutputter to a trait so the Full and Limited subclasses can extend it as well as TeeInfosetOutputter - the tunable has 2 options: limited and full. with limited being the default and full being our current 2 API, all infoset outputters mode - get rid of unused and inaccessible parse function - add test showing use of full mode DAFFODIL-2904

stevedlawrence · 2024-11-06T10:40:09Z

daffodil-propgen/src/main/resources/org/apache/daffodil/xsd/dafext.xsd

@@ -579,6 +579,15 @@
            </xs:documentation>
          </xs:annotation>
        </xs:element>
+        <xs:element name="testingParseUnparseAPIMode" type="daf:TunableTestingParseUnparseAPIMode" default="limited"  minOccurs="0">


I'm wondering if a tunable isn't the right thing for this?

This will apimode flag is really only going to be used for regression testing, and when we do regression testing we want it enabled for all tests. We don't really want to have to modify config files or something for all tests to enable this, which is how tunables are generally set.

I'm thinking we either want this to be a Java system property (e.g. daffodil.tdml.apimode) or an environment variable (e.g. DAFFODIL_TDML_APIMODE). And I think an environment variable has advantages over a Java system property. For one, using system properties is different depending on how you spawn the JVM. For example, the sbt and daffodil CLI specify system properties differently:

sbt -Ddaffodil.tdml.apimode=full test DAFFODIL_JAVA_OPTS="-Ddaffodil.tdml.apimode=full" daffodil test foo.tdml

So with sbt you can use -D directly, but with the CLI you have to specify it in DAFFODIL_JAVA_OPTS. I also am not sure if SBT propagates system properties if it forks. So if it forks a process (like some of our integration tests do), I imagine it won't get the system property and the forked processes won't run with full apimode.

But if we go with an environment variable instead, it's the same for both:

export DAFFODIL_TDML_APIMODE=full sbt test daffodil test foo.tdml

This is the same regardless if we are testing via sbt or via the daffodil CLI. And if something else ever runs the TDML runner (e.g. a test harness that directly calls the TDML runner), it's also the same. I also believe if sbt or an integration test forks, it copies the existing environment variables, so forked processes should also enable full api mode.

Also, for CI we can just add something like this to our GitHub actions config:

env: DAFFODIL_TDML_APIMODE=full

And all CI will run with all APIs, which would be good for regression testing.

stevedlawrence · 2024-11-06T10:49:21Z

daffodil-propgen/src/main/resources/org/apache/daffodil/xsd/dafext.xsd

+  <xs:simpleType name="TunableTestingParseUnparseAPIMode">
+    <xs:restriction base="xs:token">
+      <xs:enumeration value="limited" />
+      <xs:enumeration value="full" />


I wonder if we should consider something other than limited/full. I think we determined those probably weren't the best names for our different kinds of validation, so we should learn from that.

Maybe instead of "full", we do "all"? And maybe instead of "limited" we do "scala" (implying scala infoset?). If we ever wanted, this could then be extended to be a list of infosets/api's to use. For example, we could in theory support something like DAFFODIL_TDML_APIMODE="scala json sax", which would only enable those three kinds of infosets instead of all of them. I'm not sure the extra complexity is wroth the flexibility for now, but it leaves the option open. scala/all is probably sufficient for now.

Also, makes me wonder if the env variable wants to be something like DAFFODIL_TDML_INFOSETS or DAFFODIL_TDML_API_INFOSETS? Not sure I like those, but something that it expects an infoset type might be an improvement?

stevedlawrence · 2024-11-06T10:57:08Z

...-processor/src/main/scala/org/apache/daffodil/processor/tdml/DaffodilTDMLDFDLProcessor.scala

+      doParseWithBothApis(dpInputStream, saxInputStream, lengthLimitInBits)
+    } else {
+      doParseWithNonSaxAPI(dpInputStream, lengthLimitInBits)
+    }
  }

  override def parse(is: java.io.InputStream, lengthLimitInBits: Long): TDMLParseResult = {


Below this we call IOUtils.toByteArray, which reads the input stream into a byte array. But then it calls the above parse function which converts the byte array back to an input stream. We need to do that when parsing with both APIs since each API needs it's own input stream, but if we are only using the non-sax API we don't really need to do that.

I'm wondering if it would be possible and not too messy to only do the byte array stuff iwe are are doing both APIs. Otherwise we just pass in the original input stream to doParseWithNonSaxAPI? Avoids unnecessarily duplicating the input stream.

stevedlawrence · 2024-11-06T11:03:45Z

...-processor/src/main/scala/org/apache/daffodil/processor/tdml/DaffodilTDMLDFDLProcessor.scala

+    }
+  }
+
+  def doParseWithNonSaxAPI(


Feels like there's some duplicate code in these doParseWith*API functions since they both implement the non-sax API. Can they be combined into one function, e.g.

private def doParse(...) { // always do stuff with non-sax API if (apiMode = full) { // do sax API // verify SAX results match non-sax results } }

Same with the unparse functions?

- currently during testing we parse/unparse using both the sax and non-sax API, which leads to issues like trace running outputting twice for the same test which is confusing. We also run the parse for all our infoset outputters. With this enviromental variable, we default to the more efficient single infoset outputter (scalaxml) and single API (non-sax) parse/unparse. - the DAFFODIL_TDML_API_INFOSETS env has 2 options: 'scala' and 'all'. with scala being the default and 'all' being the current implementation of running both APIs and all infoset outputters - refactor code to reduce duplication and remove tunable - add cli test showing use of scala and all mode - set CI mode to all for regression testing DAFFODIL-2904

- move cli tests to daffodil-test-integration since they require stage to run successfully DAFFODIL-2953

stevedlawrence

+1 👍

stevedlawrence · 2024-11-06T19:26:04Z

daffodil-test-integration/src/test/scala/org/apache/daffodil/cliTest/TestCLITdml.scala

+
+    val envs = Map("DAFFODIL_TDML_API_INFOSETS" -> "scala")
+
+    runCLI(args"test -i -t $tdml byte_entities_6_08", envs = envs) { cli =>


Just a note that this doesn't really check that we are running only the scala API. Unfortunately which APIs are used are way behind the scenes so there isn't really a good way to verify it, and there's no good way for the expect library to expect that a string doesn't exist. This test is probably still worthwhile since I imagine it gives us coverage and at least confirms that "scala" is legal, but if we accidentally break things so "scala" runs both APIs, this test probably won't fail.

olabusayoT requested review from mbeckerle, stevedlawrence and jadams-tresys November 6, 2024 02:45

olabusayoT force-pushed the daf-2904-testingParseUnparseModeTunable branch from e2f2646 to 0ee6b9a Compare November 6, 2024 04:20

olabusayoT force-pushed the daf-2904-testingParseUnparseModeTunable branch from 0ee6b9a to ec3b567 Compare November 6, 2024 04:34

olabusayoT changed the title ~~Add testingPareUnpareAPIMode Tunable~~ Add testingParseUnparseAPIMode Tunable Nov 6, 2024

stevedlawrence reviewed Nov 6, 2024

View reviewed changes

olabusayoT changed the title ~~Add testingParseUnparseAPIMode Tunable~~ Add use of DAFFODIL_TDML_API_INFOSETS enviroment variable Nov 6, 2024

fixup! fixup! Add testingParseUnparseAPIMode Tunable

4876133

- move cli tests to daffodil-test-integration since they require stage to run successfully DAFFODIL-2953

olabusayoT force-pushed the daf-2904-testingParseUnparseModeTunable branch from 56be768 to 4876133 Compare November 6, 2024 19:24

stevedlawrence approved these changes Nov 6, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add use of DAFFODIL_TDML_API_INFOSETS enviroment variable #1365

Add use of DAFFODIL_TDML_API_INFOSETS enviroment variable #1365

olabusayoT commented Nov 6, 2024 •

edited

Loading

stevedlawrence Nov 6, 2024

stevedlawrence Nov 6, 2024

stevedlawrence Nov 6, 2024

stevedlawrence Nov 6, 2024

stevedlawrence left a comment

stevedlawrence Nov 6, 2024


		val envs = Map("DAFFODIL_TDML_API_INFOSETS" -> "scala")

		runCLI(args"test -i -t $tdml byte_entities_6_08", envs = envs) { cli =>

Add use of DAFFODIL_TDML_API_INFOSETS enviroment variable #1365

Are you sure you want to change the base?

Add use of DAFFODIL_TDML_API_INFOSETS enviroment variable #1365

Conversation

olabusayoT commented Nov 6, 2024 • edited Loading

stevedlawrence Nov 6, 2024

Choose a reason for hiding this comment

stevedlawrence Nov 6, 2024

Choose a reason for hiding this comment

stevedlawrence Nov 6, 2024

Choose a reason for hiding this comment

stevedlawrence Nov 6, 2024

Choose a reason for hiding this comment

stevedlawrence left a comment

Choose a reason for hiding this comment

stevedlawrence Nov 6, 2024

Choose a reason for hiding this comment

olabusayoT commented Nov 6, 2024 •

edited

Loading