Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn about Multiple Child Elements with Same Name #1321

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

olabusayoT
Copy link
Contributor

@olabusayoT olabusayoT commented Sep 26, 2024

  • currently when we have multiple child elements in a sequence with the same name, daffodil allows it, while choice branches return a UPA error and unordered sequences return an SDE. We want to generate a warning when this happens in an ordered sequence so we refactor code to warn if the local names are the same. This has the effect of warning whether namespaces are supported or unsupported
  • we group using elementChildren instead of groupMembers since we want to consider all children since regardless of the model groups, their immediate children end up as siblings in the infoset and can cause potential ambiguities for some infoset types
  • add test to check we get UPA error with choices
  • add tests to exercise new warning (same/different namespaces, but same local name)

DAFFODIL-2736

)
}

private lazy val checkMembersHaveUniqueNamesInNamespaces: Map[Boolean, Seq[Term]] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might suggest a different name for this since it isn't really checking anything, it's just grouping things.


private lazy val checkIfMultipleChildrenWithSameName: Unit = {
val nonUniqueNameChildren =
checkMembersHaveUniqueNamesInNamespaces.filter(_._1 == true).values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this needs a different grouping than with the unorderd sequences?

The goal of this check is to warn in case the schema is used with an infoset that don't support duplicate fields, and those kinds of infosets probably also don't support namespaces. So we probably want to warn even if they have different namespaces, which means we need to group by name only and not by QName.

Also, we kindof already have a similar check with WarnID.NamespaceDifferencesOnly. I wonder if that check should be removed in place of this one, since this new check is really just a more general version of that check? The existing one is about same-name elements next to each other, this one is about same-name elements that are siblings. Anything that this new check warns about will also warn for the existing check.

checkMembersHaveUniqueNamesInNamespaces.filter(_._1 == true).values
nonUniqueNameChildren.foreach { children =>
children.head.SDW(
WarnID.MultipleChildElementsWithSameName,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this wants to be WarnID.NamespaceDifferencesOnly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I prefer the idea of MultipleChildElementsWithSameName subsuming NamespaceDifferencesOnly instead of the other way around, because in sequence group, the namespaces could be different or the same with the same name and it should still trigger MultipleChildElementsWithSameName. Actually I'm wondering if we ougt to leave NamespaceDifferenceOnly alone since it's very clear that it's for the case where the name is the same and the namespace is different, and MultipleChildElementsWithSameName is for the case where the name is the same regardless of whether the namespace is different or the same?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there are kindof few different cases that are are being considered for warnings here:

  1. Infoset representations that do not support namespaces. We warn here if immediate siblings have the same name (but different namespaces). This is the existing NamespaceDifferencesOnly check.
  2. Infoset representations that do not support siblings with the same name (but do support namespaces). We warn here if any siblings have the same QName.
  3. Infoset representations that do not support namespaces OR siblings with the same name. We warn here if any siblings have the same name (but different namespaces).

I think we definitely need to warn for 3 (infosets that support neither namespaces nor same sibling names are common). If we implement 3, that will automatically match 1 and 2. If we have separate checks for 1, 2, and 3, we would end up with duplicate warnings. For example, something that warns for 3, will also warn for 2 and maybe 1 if the siblings are adjacent. Maybe this is fine, which would allow you to suppress 3 warnings but not 1, for example. Or maybe we only warn for 3, and the warning message can be descriptive enough to mention the different implications depending on what an infoset does and does no support?

Agreed NamespaceDifferencesOnly probably isn't the right id for this warning. There are a number of other kinds of things we'll eventually want to warn about that are all about related to infosets that don't support certain things, for example anonymous choices, nested choices, nullable elements. Do we want to eventually plan to warn for all of these kinds of tings under a single warn ID (e.g. LimitedInfosetRepresentation)? Or do we want specific IDs for each type of limitation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examples just to make sure im following

    val type1 =
      <xs:element>
        <xs:complexType>
          <xs:sequence>
            <xs:element name="a" type="xs:string"/>
            <xs:element ref="u:a"/>
            <xs:element name="b" type="xs:string"/>
          </xs:sequence>
        </xs:complexType>
      </xs:element> 
    val type2 = {
      <xs:element>
        <xs:complexType>
          <xs:sequence>
            <xs:element name="a" type="xs:string"/>
            <xs:element name="b" type="xs:string"/>
            <xs:choice>
              <xs:element ref="tns:a"/>
              <xs:element name="a" type="xs:string"/>
            </xs:choice>
          </xs:sequence>
        </xs:complexType>
      </xs:element>
    }
    val type3 = {
      <xs:element>
        <xs:complexType>
          <xs:sequence>
            <xs:element name="a" type="xs:string"/>
            <xs:element name="b" type="xs:string"/>
            <xs:element ref="u:a"/>
          </xs:sequence>
        </xs:complexType>
      </xs:element>
    }

I think the existing checkIfMultipleChildrenWithSameName takes care of 2 and 3. 2 will warn if they have the same namespace and 3 will warn if they have different namespaces. I don't think we're supposed to be checking for what the current infoset supports so I think we can use the same warn id for them. I am inclined to leave 1 with the warnId NamespaceDifferencesOnly rather than deprecating it since it's a well established warning. It might be confusing for the users to suddenly change the warning?

I'm inclined to have separate warnIds for different limitations. @mbeckerle thoughts?

Assert.invariant(gm.isInstanceOf[ElementBase])
gm.asInstanceOf[ElementBase].namedQName
private lazy val checkUnorderedSequenceMembersHaveUniqueNamesInNamespaces: Unit = {
groupedMembersWithSameName().values.foreach { children =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just from the name groupMembersWithSameName, it's not clear that this includes the namespace. If I just read this I think I would assume it only looks at the name. Maybe adding includesNamespae = true just to make it explicit. Or maybe alternatively name it groupMembersWithSameQName, and change the parameter to excludeNamespace which defaults to false, and the other use of this would be groupmembersWithSameQName(excludeNamespace = true).

Another option, you could do something like

def groupMembersBy[T](func: (ElementBase) => T): Map[T, Seq[Term]] = {
  ...
    .groupBy { func(gm.asIsntanceOf[ElementBase]) }
}

And then the usage could be groupMembersBy(_.namedQName) and groupMembersBy(_.name). This makes it much more clear what your grouping by, and could be used for other things in the future.

includeNamespace: Boolean = true
): Map[Boolean, Seq[Term]] = {
val childrenGroupedByName = groupMembers
.filter { m => m.isInstanceOf[LocalElementDecl] || m.isInstanceOf[ElementRef] }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This filter is new from the old implementation. I think it's fine for the unordered sequences check since unordered sequences only allow local element decls or refs. But I don't think it works for the checkIfMultipleChildrenWithSameName check. Normal ordered sequences are allowed to have things other than elements (e.g. choice, group, sequence), and their immediate children need to be considered for the multiple children with same name check, since they all end up as siblings in the infoset of local element decl's and references and cause potential ambiguities for some infoset types.

I think we really need to consider all children regardless of model groups. We probably already have a member for this, we have lots of members about siblings and children. Maybe elementChildren is the right thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think elementChildren will work for the generic groupMembersBy since we do checkMembersAreAllElementOrElementRef before checkUnorderedSequenceMembersHaveUniqueNamesInNamespaces, so we won't get to the latter if the former fails. So I removed the filter and changed groupMembers to elementChildren

checkMembersHaveUniqueNamesInNamespaces.filter(_._1 == true).values
nonUniqueNameChildren.foreach { children =>
children.head.SDW(
WarnID.MultipleChildElementsWithSameName,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there are kindof few different cases that are are being considered for warnings here:

  1. Infoset representations that do not support namespaces. We warn here if immediate siblings have the same name (but different namespaces). This is the existing NamespaceDifferencesOnly check.
  2. Infoset representations that do not support siblings with the same name (but do support namespaces). We warn here if any siblings have the same QName.
  3. Infoset representations that do not support namespaces OR siblings with the same name. We warn here if any siblings have the same name (but different namespaces).

I think we definitely need to warn for 3 (infosets that support neither namespaces nor same sibling names are common). If we implement 3, that will automatically match 1 and 2. If we have separate checks for 1, 2, and 3, we would end up with duplicate warnings. For example, something that warns for 3, will also warn for 2 and maybe 1 if the siblings are adjacent. Maybe this is fine, which would allow you to suppress 3 warnings but not 1, for example. Or maybe we only warn for 3, and the warning message can be descriptive enough to mention the different implications depending on what an infoset does and does no support?

Agreed NamespaceDifferencesOnly probably isn't the right id for this warning. There are a number of other kinds of things we'll eventually want to warn about that are all about related to infosets that don't support certain things, for example anonymous choices, nested choices, nullable elements. Do we want to eventually plan to warn for all of these kinds of tings under a single warn ID (e.g. LimitedInfosetRepresentation)? Or do we want specific IDs for each type of limitation?

- currently when we have multiple child elements in a sequence with the same name, daffodil allows it, while choice branches return a UPA error and unordered sequences return an SDE. We want to generate a warning when this happens in an ordered sequence so we refactor code to warn if the local names are the same. This has the effect of warning whether namespaces are supported or unsupported
- we group using elementChildren instead of groupMembers since we want to consider all children since regardless of the model groups, their immediate children end up as siblings in the infoset and can cause potential ambiguities for some infoset types
- add test to check we get UPA error with choices
- add tests to exercise new warning (same/different namespaces, but same local name)

DAFFODIL-2736
@olabusayoT olabusayoT force-pushed the daf-2736-warnMultipleChildrenWithSameName branch from 7c4a219 to 97e98bc Compare October 11, 2024 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants