Warn about Multiple Child Elements with Same Name #1321

olabusayoT · 2024-09-26T21:01:33Z

currently when we have multiple child elements in a sequence with the same name, daffodil allows it, while choice branches return a UPA error and unordered sequences return an SDE. We want to generate a warning when this happens in an ordered sequence so we refactor code to warn if the local names are the same. This has the effect of warning whether namespaces are supported or unsupported
we group using elementChildren instead of groupMembers since we want to consider all children since regardless of the model groups, their immediate children end up as siblings in the infoset and can cause potential ambiguities for some infoset types
add test to check we get UPA error with choices
add tests to exercise new warning (same/different namespaces, but same local name)

stevedlawrence · 2024-09-26T21:22:54Z

daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/SequenceGroup.scala

-        )
+  }
+
+  private lazy val checkMembersHaveUniqueNamesInNamespaces: Map[Boolean, Seq[Term]] = {


Might suggest a different name for this since it isn't really checking anything, it's just grouping things.

stevedlawrence · 2024-09-26T21:41:04Z

daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/SequenceGroup.scala

+
+  private lazy val checkIfMultipleChildrenWithSameName: Unit = {
+    val nonUniqueNameChildren =
+      checkMembersHaveUniqueNamesInNamespaces.filter(_._1 == true).values


I wonder if this needs a different grouping than with the unorderd sequences?

The goal of this check is to warn in case the schema is used with an infoset that don't support duplicate fields, and those kinds of infosets probably also don't support namespaces. So we probably want to warn even if they have different namespaces, which means we need to group by name only and not by QName.

Also, we kindof already have a similar check with WarnID.NamespaceDifferencesOnly. I wonder if that check should be removed in place of this one, since this new check is really just a more general version of that check? The existing one is about same-name elements next to each other, this one is about same-name elements that are siblings. Anything that this new check warns about will also warn for the existing check.

stevedlawrence · 2024-09-26T21:41:44Z

daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/SequenceGroup.scala

+      checkMembersHaveUniqueNamesInNamespaces.filter(_._1 == true).values
+    nonUniqueNameChildren.foreach { children =>
+      children.head.SDW(
+        WarnID.MultipleChildElementsWithSameName,


Maybe this wants to be WarnID.NamespaceDifferencesOnly?

I think I prefer the idea of MultipleChildElementsWithSameName subsuming NamespaceDifferencesOnly instead of the other way around, because in sequence group, the namespaces could be different or the same with the same name and it should still trigger MultipleChildElementsWithSameName. Actually I'm wondering if we ougt to leave NamespaceDifferenceOnly alone since it's very clear that it's for the case where the name is the same and the namespace is different, and MultipleChildElementsWithSameName is for the case where the name is the same regardless of whether the namespace is different or the same?

I guess there are kindof few different cases that are are being considered for warnings here:

Infoset representations that do not support namespaces. We warn here if immediate siblings have the same name (but different namespaces). This is the existing NamespaceDifferencesOnly check.

Infoset representations that do not support siblings with the same name (but do support namespaces). We warn here if any siblings have the same QName.

Infoset representations that do not support namespaces OR siblings with the same name. We warn here if any siblings have the same name (but different namespaces).

I think we definitely need to warn for 3 (infosets that support neither namespaces nor same sibling names are common). If we implement 3, that will automatically match 1 and 2. If we have separate checks for 1, 2, and 3, we would end up with duplicate warnings. For example, something that warns for 3, will also warn for 2 and maybe 1 if the siblings are adjacent. Maybe this is fine, which would allow you to suppress 3 warnings but not 1, for example. Or maybe we only warn for 3, and the warning message can be descriptive enough to mention the different implications depending on what an infoset does and does no support?

Agreed NamespaceDifferencesOnly probably isn't the right id for this warning. There are a number of other kinds of things we'll eventually want to warn about that are all about related to infosets that don't support certain things, for example anonymous choices, nested choices, nullable elements. Do we want to eventually plan to warn for all of these kinds of tings under a single warn ID (e.g. LimitedInfosetRepresentation)? Or do we want specific IDs for each type of limitation?

Examples just to make sure im following

val type1 = <xs:element> <xs:complexType> <xs:sequence> <xs:element name="a" type="xs:string"/> <xs:element ref="u:a"/> <xs:element name="b" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> val type2 = { <xs:element> <xs:complexType> <xs:sequence> <xs:element name="a" type="xs:string"/> <xs:element name="b" type="xs:string"/> <xs:choice> <xs:element ref="tns:a"/> <xs:element name="a" type="xs:string"/> </xs:choice> </xs:sequence> </xs:complexType> </xs:element> } val type3 = { <xs:element> <xs:complexType> <xs:sequence> <xs:element name="a" type="xs:string"/> <xs:element name="b" type="xs:string"/> <xs:element ref="u:a"/> </xs:sequence> </xs:complexType> </xs:element> }

I think the existing checkIfMultipleChildrenWithSameName takes care of 2 and 3. 2 will warn if they have the same namespace and 3 will warn if they have different namespaces. I don't think we're supposed to be checking for what the current infoset supports so I think we can use the same warn id for them. I am inclined to leave 1 with the warnId NamespaceDifferencesOnly rather than deprecating it since it's a well established warning. It might be confusing for the users to suddenly change the warning?

I'm inclined to have separate warnIds for different limitations. @mbeckerle thoughts?

stevedlawrence · 2024-10-01T12:51:17Z

daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/SequenceGroup.scala

-      Assert.invariant(gm.isInstanceOf[ElementBase])
-      gm.asInstanceOf[ElementBase].namedQName
+  private lazy val checkUnorderedSequenceMembersHaveUniqueNamesInNamespaces: Unit = {
+    groupedMembersWithSameName().values.foreach { children =>


Just from the name groupMembersWithSameName, it's not clear that this includes the namespace. If I just read this I think I would assume it only looks at the name. Maybe adding includesNamespae = true just to make it explicit. Or maybe alternatively name it groupMembersWithSameQName, and change the parameter to excludeNamespace which defaults to false, and the other use of this would be groupmembersWithSameQName(excludeNamespace = true).

Another option, you could do something like

def groupMembersBy[T](func: (ElementBase) => T): Map[T, Seq[Term]] = { ... .groupBy { func(gm.asIsntanceOf[ElementBase]) } }

And then the usage could be groupMembersBy(_.namedQName) and groupMembersBy(_.name). This makes it much more clear what your grouping by, and could be used for other things in the future.

stevedlawrence · 2024-10-01T13:11:11Z

daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/SequenceGroup.scala

+    includeNamespace: Boolean = true
+  ): Map[Boolean, Seq[Term]] = {
+    val childrenGroupedByName = groupMembers
+      .filter { m => m.isInstanceOf[LocalElementDecl] || m.isInstanceOf[ElementRef] }


This filter is new from the old implementation. I think it's fine for the unordered sequences check since unordered sequences only allow local element decls or refs. But I don't think it works for the checkIfMultipleChildrenWithSameName check. Normal ordered sequences are allowed to have things other than elements (e.g. choice, group, sequence), and their immediate children need to be considered for the multiple children with same name check, since they all end up as siblings in the infoset of local element decl's and references and cause potential ambiguities for some infoset types.

I think we really need to consider all children regardless of model groups. We probably already have a member for this, we have lots of members about siblings and children. Maybe elementChildren is the right thing?

I think elementChildren will work for the generic groupMembersBy since we do checkMembersAreAllElementOrElementRef before checkUnorderedSequenceMembersHaveUniqueNamesInNamespaces, so we won't get to the latter if the former fails. So I removed the filter and changed groupMembers to elementChildren

stevedlawrence · 2024-10-01T13:55:08Z

daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/SequenceGroup.scala

+      checkMembersHaveUniqueNamesInNamespaces.filter(_._1 == true).values
+    nonUniqueNameChildren.foreach { children =>
+      children.head.SDW(
+        WarnID.MultipleChildElementsWithSameName,


I guess there are kindof few different cases that are are being considered for warnings here:

Infoset representations that do not support namespaces. We warn here if immediate siblings have the same name (but different namespaces). This is the existing NamespaceDifferencesOnly check.

Infoset representations that do not support siblings with the same name (but do support namespaces). We warn here if any siblings have the same QName.

Infoset representations that do not support namespaces OR siblings with the same name. We warn here if any siblings have the same name (but different namespaces).

I think we definitely need to warn for 3 (infosets that support neither namespaces nor same sibling names are common). If we implement 3, that will automatically match 1 and 2. If we have separate checks for 1, 2, and 3, we would end up with duplicate warnings. For example, something that warns for 3, will also warn for 2 and maybe 1 if the siblings are adjacent. Maybe this is fine, which would allow you to suppress 3 warnings but not 1, for example. Or maybe we only warn for 3, and the warning message can be descriptive enough to mention the different implications depending on what an infoset does and does no support?

Agreed NamespaceDifferencesOnly probably isn't the right id for this warning. There are a number of other kinds of things we'll eventually want to warn about that are all about related to infosets that don't support certain things, for example anonymous choices, nested choices, nullable elements. Do we want to eventually plan to warn for all of these kinds of tings under a single warn ID (e.g. LimitedInfosetRepresentation)? Or do we want specific IDs for each type of limitation?

- currently when we have multiple child elements in a sequence with the same name, daffodil allows it, while choice branches return a UPA error and unordered sequences return an SDE. We want to generate a warning when this happens in an ordered sequence so we refactor code to warn if the local names are the same. This has the effect of warning whether namespaces are supported or unsupported - we group using elementChildren instead of groupMembers since we want to consider all children since regardless of the model groups, their immediate children end up as siblings in the infoset and can cause potential ambiguities for some infoset types - add test to check we get UPA error with choices - add tests to exercise new warning (same/different namespaces, but same local name) DAFFODIL-2736

olabusayoT requested review from mbeckerle and stevedlawrence September 26, 2024 21:01

stevedlawrence reviewed Sep 26, 2024

View reviewed changes

stevedlawrence reviewed Oct 1, 2024

View reviewed changes

olabusayoT requested review from jadams-tresys and stevedlawrence October 1, 2024 19:43

olabusayoT force-pushed the daf-2736-warnMultipleChildrenWithSameName branch from 2b01108 to 7c4a219 Compare October 11, 2024 19:41

olabusayoT force-pushed the daf-2736-warnMultipleChildrenWithSameName branch from 7c4a219 to 97e98bc Compare October 11, 2024 19:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Warn about Multiple Child Elements with Same Name #1321

Warn about Multiple Child Elements with Same Name #1321

olabusayoT commented Sep 26, 2024 •

edited

Loading

stevedlawrence Sep 26, 2024

stevedlawrence Sep 26, 2024

stevedlawrence Sep 26, 2024

olabusayoT Sep 27, 2024

stevedlawrence Oct 1, 2024

olabusayoT Oct 1, 2024

stevedlawrence Oct 1, 2024

stevedlawrence Oct 1, 2024

olabusayoT Oct 1, 2024

stevedlawrence Oct 1, 2024

Warn about Multiple Child Elements with Same Name #1321

Are you sure you want to change the base?

Warn about Multiple Child Elements with Same Name #1321

Conversation

olabusayoT commented Sep 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

olabusayoT commented Sep 26, 2024 •

edited

Loading