Skip to content

svd2ir: rework handling of names with placeholders#136

Open
datdenkikniet wants to merge 5 commits intoembassy-rs:mainfrom
datdenkikniet:clusters
Open

svd2ir: rework handling of names with placeholders#136
datdenkikniet wants to merge 5 commits intoembassy-rs:mainfrom
datdenkikniet:clusters

Conversation

@datdenkikniet
Copy link
Copy Markdown
Contributor

@datdenkikniet datdenkikniet commented Apr 19, 2026

To summarize what this PR does:

  1. Fieldsets with non-array placeholders are supported
  2. As a side-effect, non-array placeholder fieldsets that were supported previously by pretending that they were arrays (such as field{A, B}set turning into fieldset(n: usize)) are now no longer treated as arrays, unless they actually are arrays (field{A,B}set becomes fieldAset and fieldBset)

For a table overview, see: #136 (comment)

To explain the real life use case that this solves:

Some parts of Renesas' SVDs have the following pattern:

<register>
    <dim>2</dim>
    <dimIncrement>0x4</dimIncrement>
    <dimIndex>P0B,P2B,P3B,P4B</dimIndex>
    <name>BUSSCNT%s</name>
</register>
<register>
    <dim>4</dim>
    <dimIncrement>0x4</dimIncrement>
    <dimIndex>MBIU,RAM0</dimIndex>
    <name>BUSSCNT%s</name>
</register>

Currently, chiptool crashes while generating fieldsets for such patterns, as it normalizes both BUSSCNT%s to BUSSCNT and fails to find unique names for the individual fieldsets.

In order to improve that behaviour, we add an extra level of mapping:

  1. We create a fieldset with the name stripped of its placeholder (adding a number at the end if there are duplicates)
  2. We map the full register name to this fieldset name
  3. We use the full register name to generate the register fieldset access, unless the fieldset is an actual array. Then we use the numeric access pattern (fieldset_name(n: usize)) to get fieldsets at offsets.

This has the pleasant side effect of resolving the fact that convert_periperhal converts input like this:

<register>
    <dim>2</dim>
    <dimIncrement>0x4</dimIncrement>
    <dimIndex>A,B</dimIndex>
    <name>DLLCR%s</name>
</register>
<register>
    <dim>2</dim>
    <dimIncrement>0x4</dimIncrement>
    <dimIndex>MAIN,BACKUP</dimIndex>
    <name>INST%sREG0</name>
</register>

into

impl Peripheral {
    fn dlccr(&self, n: usize) -> Reg<Dlccr>;
    fn instreg0(&self, n: usize) -> Reg<_>;
}

instead, we now get this (identical fieldsets are returned, but at different offsets):

impl Peripheral {
    fn dlccra(&self) -> Reg<Dlccr>;
    fn dlccrb(&self) -> Reg<Dlccr>;
    fn instmainreg0(&self) -> Reg<_>;
    fn instbackupreg0(&self) -> REg<_>;
}

We can probably do the same for fields, but this PR doesn't do that yet.

This fixes #90 and closes #92

@datdenkikniet datdenkikniet changed the title Support converting SVDs with (duplicate) clustered names to IR svd2ir: Support converting SVDs with (duplicate) clustered names to IR Apr 19, 2026
@datdenkikniet datdenkikniet changed the title svd2ir: Support converting SVDs with (duplicate) clustered names to IR svd2ir: Support converting SVDs with (duplicate) clustered names Apr 19, 2026
@datdenkikniet datdenkikniet force-pushed the clusters branch 2 times, most recently from 4f94476 to 9a9d61d Compare April 19, 2026 18:27
@inferiorhumanorgans
Copy link
Copy Markdown
Contributor

inferiorhumanorgans commented Apr 19, 2026

This works with the SVDs for:

  • RA2A1
  • RA4M1

This does not work for these SVDs:

  • RA4L1
Error: Failed to find unique name for ["CANFD", "CFDRMDF_11"]. n: ["CANFD", "CFDRMDF_11"]
  • RA6M5
Err: on rename Fieldset "CANFD::regs::CFDTMDF1__01", new name "canfd::regs::Cfdtmdf101" already exist
Err: on rename Fieldset "CANFD::regs::CFDTMDF1__11", new name "canfd::regs::Cfdtmdf111" already exist

Also, it would be nice to keep the indices noted somewhere (appended to the description, as a replacement for %s in the description, or as a separate ir field).

@datdenkikniet
Copy link
Copy Markdown
Contributor Author

Added an ugly-hack workaround for now. It seems to handle the RA4L1 SVD now, but not sure it'll also hold up to the transforms. There is a better way to do it but I cannot come up with it right now :P

@datdenkikniet datdenkikniet marked this pull request as draft April 19, 2026 20:56
@inferiorhumanorgans
Copy link
Copy Markdown
Contributor

Yeah, there are lots of ugly hacks to deal with these SVDs unfortunately (and the less said about the other metadata I'm ingesting the better 😂).

FWIW the MCUs I've got on hand are the 2A1, 4L1, 4M1, 6M5, and 8M1 so that's what I'm checking. The 8M1's SVD causes the svd-parser validation to choke so I'm skipping that for now.

I'm testing these with the ra-pac workflow (just extract-peripherals ra4m1). I can walk you through setting that up if you'd like but that also depends on some transforms that aren't in the main branch so it may not be worth the time.

@datdenkikniet datdenkikniet force-pushed the clusters branch 2 times, most recently from e145d1e to 81c3d1a Compare April 20, 2026 06:11
@datdenkikniet datdenkikniet marked this pull request as ready for review April 20, 2026 06:12
@datdenkikniet
Copy link
Copy Markdown
Contributor Author

datdenkikniet commented Apr 20, 2026

Did some mucking about and have come to the conclusion that having an underscore suffix, and just checking if the new name has been used at all, is easiest and is most likely to resolve most issues. There is probably a better way to find a unique name beyond "iterate until we've found one that doesn't exist yet" but I don't want to figure that one out just yet

@inferiorhumanorgans
Copy link
Copy Markdown
Contributor

I think the more elegant solution would be to include the indicies in the name, but that has the potential for unreasonably long fieldset names. What I landed on was to keep a hashmap of seen fieldset names to keep track of the counts.

@datdenkikniet
Copy link
Copy Markdown
Contributor Author

to include the indicies in the name

Like you say, this could cause really long names. Additionally, it doesn't necessarily improve the naming: inserting the dimension indices at the end of the name is equally "bad" IMO, and inserting them at the placeholder name doesn't really make it any better either.

Take, for example, CFDRMDF%s_1 with indicies 1-7, CFDRMDF%s_1 with indicies 8-15, and CFDRMDF%s_11 with indicies 8-15 (picked from the RA4L1 SVD). They're not arrays (by the standards implemented in this PR at least) because the placeholder is not at the end. We'd end up with:

Approach CFDRMDF%s_1,1-7 CFDRMDF%s_1,8-15 CFDRMDF%s_11, 8-15
main CFDRMDF_1 CFDRMDF_1 (causes collision) CFDRMDF_11
This PR CFDRMDF_1 CFDRMDF_1_1 CFDRMDF_11
With indices at placeholder CFDRMDF1234567_1 CFDRMDF_89101112131415_1 CFDRMDF89101112131415_11
With indices at end CFDRMDF_1_1234567 CFDRMDF_1_89101112131415 CFDRMDF_11_89101112131415

I don't really see including the indices helping much of anything. We could/should include them in the description, but then they become stale as soon as anyone merges or deletes anything, and having some sort of two-way binding seems really heavy.

@datdenkikniet datdenkikniet marked this pull request as draft April 21, 2026 20:58
@inferiorhumanorgans
Copy link
Copy Markdown
Contributor

In the case of CFDRMDF%s_1, the field itself is CFDRMDF1 or "RX Message Buffer Data Field 1 Registers". For the indicies I would collapse contiguous ranges to m_n with an end result along the lines of CFDRMDF_1_m_n. I've not actually read any of the docs on the CANFD peripheral but I'd expect the indicies are individual mailboxes.

That breaks down with PFS where you get constructs like P00%sPFS where the intended index is actually 00 or 000. PmnnPFS breaks down to m = index of the port peripheral, nn = index of the pin on the port.

Ultimately I'm not so concerned with the generated form as with PFS I rewrote the whole thing by hand because the fieldsets were so useless, and I may do the same with CANFD.

@datdenkikniet datdenkikniet changed the title svd2ir: Support converting SVDs with (duplicate) clustered names svd2ir: rework handling of names with placeholders Apr 22, 2026
@datdenkikniet
Copy link
Copy Markdown
Contributor Author

I would collapse contiguous ranges to m_n with an end result along the lines of CFDRMDF_1_m_n

I may have misunderstood the intent of your comment, but to make sure we're on the same page: I agree that this as an end result would be great, but it also sounds like something that is way out of scope for svd2ir. Merging fieldsets based on compatibility (or assuming that they are compatible based on names) sounds like a transform-level concern, not an svd2ir-level concern. svd2ir should mirror the SVD as closely as possible, which it would do with this change. This approach doesn't merge anything, but just uses what the SVD tells us (names containing placeholders refer to the same thing by definition) to generate a slightly more concise IR.

An alternative would be that we always expand all placeholders (except for arrays, [%s]), e.g. generate 8 fieldsets for CFDRMDF{0,1,2,3,4,5,6,7}_1 and leave it up to the user to re-merge them, but that seems unnecessary since the IR is made to deduplicate identical fieldsets, and we know that their definitions are identical.

Since we'll have to do manual transforms to get it into a good shape anyways, I don't think the naming scheme matters beyond that it is mostly recognizable as belonging to certain inner blocks, which it is with the current naming scheme.

Ultimately I'm not so concerned with the generated form

OK, let's conclude the naming discussion then: the name of the fieldset doesn't really matter so long as it is unique and can be associated with the inner block functions that use it, because whatever it becomes, it is likely to get renamed anyways. The current approach does that, while preserving the names we could already generate, and doesn't generate comically long names.

Opinions? I don't think bikeshedding the naming thing any more will help.

@inferiorhumanorgans
Copy link
Copy Markdown
Contributor

Opinions? I don't think bikeshedding the naming thing any more will help.

Agreed.

However as of 163fc34 chiptool still chokes on the CANFD peripheral in the 4L1 and 6M5.

4L1:

thread 'main' (20068057) panicked at src/transform/mod.rs:84:60:
called `Result::unwrap()` on an `Err` value:
Err: on rename Fieldset "CANFD::regs::CFDRMDF_1_1", new name "canfd::regs::Cfdrmdf11" already exist
Err: on rename Fieldset "CANFD::regs::CFDRMDF_1_3", new name "canfd::regs::Cfdrmdf13" already exist
Err: on rename Fieldset "CANFD::regs::CFDRMDF_1_2", new name "canfd::regs::Cfdrmdf12" already exist

6M5:

thread 'main' (20068572) panicked at src/transform/mod.rs:84:60:
called `Result::unwrap()` on an `Err` value:
Err: on rename Fieldset "CANFD::regs::CFDTMDF1__1_1", new name "canfd::regs::Cfdtmdf111" already exist
Err: on rename Fieldset "CANFD::regs::CFDTMDF1__0_1", new name "canfd::regs::Cfdtmdf101" already exist

The minimal set of transforms I'm applying is:

  • Sanitize
  • DeleteUselessEnums
  • Sort

@datdenkikniet
Copy link
Copy Markdown
Contributor Author

Ahh, okay: Sanitize is removing underscores from fieldset names, but underscores are the thing we use for separating... I'll look into what we can do about that

@datdenkikniet
Copy link
Copy Markdown
Contributor Author

datdenkikniet commented Apr 23, 2026

OK, after some thinking, I think the way to go here is:

  1. Introduce a RenameFieldsets transform
  2. Merge this PR as-is, and force the user to choose better names before calling Sanitize.

I've listed some other alternatives in #141, but I don't think any of them are better than this.

ETA: Have opened #141

With RenameFieldsets, I end up with the following:

  - !RenameFieldsets
    from: CANFD::regs::CFDRMDF_(\d+)_(\d+)
    to: CANFD::regs::CFDRMDF${1}copy${2}
  - !Sanitize
  - !MergeFieldsets
    from: canfd::regs::Cfdrmdf(\d+)(copy(\d+))?
    to: canfd::regs::Cfdrmdf

but of course, in this case, one could:

  - !MergeFieldsets
    from: CANFD::regs::CFDRMDF_(\d+)(_(\d+))?
    to: CANFD::regs::CFDRMDf
  - !Sanitize

instead, even without RenameFieldsets, because the register can actually be merged.

@inferiorhumanorgans
Copy link
Copy Markdown
Contributor

Another alternative might be to have the counts zero padded. so that 11,1 → 011_001 and 1,1 → 001_001 don't collide. It would be nice (but is not a deal breaker for me) to not have to pre-sanitize input.

@datdenkikniet
Copy link
Copy Markdown
Contributor Author

datdenkikniet commented Apr 29, 2026

The only way to truly avoid it is to change the name-generation strategy based on how Sanitize does its renames, but that would mean that changes to Sanitize would directly mean that there have to be changes to svd2ir, which I don't like.

After all, we could decide that Sanitize should strip leading zeros following an underscore, and we'll be right back in the same situation. I know that it's easy to come up with hypotheticals like this, but I don't think doing special cases "just" to be able to Sanitize immediately is the right approach.

Let's see if dirbaio finds time to review this and/or has general comments.

@datdenkikniet datdenkikniet marked this pull request as ready for review April 29, 2026 20:43
@inferiorhumanorgans
Copy link
Copy Markdown
Contributor

inferiorhumanorgans commented May 3, 2026

BTW this construct is no longer turning into an array (from the 8M1 SVD):

<register>
    <dim>2</dim>
    <dimIncrement>0x2</dimIncrement>
    <dimIndex>1,2</dimIndex>
    <name>PVD%sCR1</name>
    <description>Voltage Monitor %s Circuit Control Register 1</description>
8m1-system

This SVD has the added distinction of crashing Helix.

@datdenkikniet
Copy link
Copy Markdown
Contributor Author

datdenkikniet commented May 3, 2026

BTW this construct is no longer turning into an array (from the 8M1 SVD):

Yes, that's intentional (but changing it is trivial): only maybe-arrays that have a [%s] placeholder or end in a %s placeholder that is purely numeric are turned into array access.

The reasoning is that only [%s]-placeholders are "actually" arrays (going by the SVD spec, "A list of cluster names can be build using the placeholder %s. Use the placeholder [%s] at the end of the identifier to generate arrays in the header file."), and things ending in %s with only numerical dimensions still "read correctly" if array access is used (but, going purely by the SVD spec, it's not correct to do that).

Registers with placeholders in the middle, such as your example (also note that placeholders can also be non-numeric, see for example BUSMCNT%s in R7FA4M1AB.svd) aren't arrays at all, and treating them as such creates a pretty terrible disconnect between the actual register name (pvd2cr1) and the way it is accessed through the PAC .pvdcr1(1). Also note the "off-by-one" array accessor (pvd2cr1 is at index 1, not 2). In the BUSMCNT case: busmcntm4i is accessed through busmcnt(0), which simply makes no sense. It also messes up assumptions/guarantees about the uniqueness of a name.

Name Placeholders Old access Notes New access
BUS%sCNTR 1-2 buscntr({0,1}) Not unique: separate BUS%sCNTR with different dims or BUSCNTR%s may exist. Access offset does not correspond to actual name bus1cntr(),bus2cntr()
BUS%sCNTR 3-4 N/A svd2ir fails bus3cntr(), bus4cntr()
BUS%sCNTR SPI,USB buscntr({0,1}) Not unique: separate BUS%sCNTR with different dims or BUSCNTR%s may exist. busspicntr(), bususbcntr()
BUS%sCNTR I2C,MDIO N/A svd2ir fails busi2ccntr(), busmdiocntr()
BUSCNTR[%s] 0-1 buscntr({0,1}) - buscntr({0,1})
BUSCNTR%s 1-2 buscntr({0,1}) Not unique: separate BUSCNTR%s with different dims or BUS%sCNTR may exist. Access offset does not correspond to actual name buscntr1(), buscntr2() (as of febf739)
BUSCNTR%s 3-4 N/A svd2ir fails buscntr3(), buscntr4()
BUSCNTR%s SPI, USB buscntr({0,1}) Not unique: separate BUSCNTR%s with different dims or BUS%sCNTR may exist. buscntrspi(), buscntrusb()
BUSCNTR%s I2C,MDIO N/A svd2ir fails buscntri2c(), buscntrmdio()

This SVD has the added distinction of crashing Helix.

Lol, that's not great

@inferiorhumanorgans
Copy link
Copy Markdown
Contributor

Registers with placeholders in the middle, such as your example (also note that placeholders can also be non-numeric, see for example BUSMCNT%s in R7FA4M1AB.svd) aren't arrays at all,

It's certainly easy enough to apply a transform after, but those registers are intended to be arrays and the trailing number is not intended to be an array index. In this case PVD%sCR1 means that there's an array of control register 1 where there might be multiple control registers each with different sets fields for each voltage monitor. So you'd want arrays of voltage monitor registers each with CR0 and CR1.

@datdenkikniet
Copy link
Copy Markdown
Contributor Author

datdenkikniet commented May 4, 2026

In this case PVD%sCR1 means that there's an array of control register 1 where there might be multiple control registers each with different sets fields for each voltage monitor.

I don't understand what you are trying to say here. The SVD only tells us: this register exists more than once, with this amount of bytes between them. svd2ir cannot/should not try to figure out that there are additional registers that happen to have the same placeholders and that overlap so that they can form a block. There is nothing in the SVD that can tell us this, only guesswork and happy coincidences that we should really leave up to those writing transforms.

The SVD doesn't even put these registers in a cluster, so there's nothing telling us that they should somehow be grouped together beyond the fact that their dimensions happen to be equal, and that the expanded version of these overlaps. The only reason we want them to be an array is because you (or I) can deduce it from the SVD, but the SVD itself doesn't tell us directly (which is what svd2ir's concern is)

but those registers are intended to be arrays

I disagree. The SVD doesn't contain any information that directly tells us that they are arrays (if they had been, the [%s] array placeholder should have been used). It only tells us that there are multiple instances of the same register at some offset from each other.

The register you gave as an example has a non-array placeholder, which is the only thing that is causing the change you see.

The effect of the change is that the access pattern in the PAC changes like this:

Register Old Access New access
SYS.PVD1CR0 SYSC.PVDCR0(0) SYSC.PVD1CR0()
SYS.PVD2CR0 SYSC.PVDCR0(1) SYSC.PVD2CR0()
SYS.PVD1CR1 SYSC.PVDCR1(0) SYSC.PVD1CR1()
SYS.PVD2CR1 SYSC.PVDCR1(1) SYSC.PVD2CR1()

Alternatively, going back to BUSMCNT%s, which has dimensions M4I,M4D,SYS,DMA:

Register Old Access New access
BUSMCNTM4I BUSMCNT(0) BUSMCNTM4I()
BUSMCNTM4D BUSMCNT(1) BUSMCNTM4D()
BUSMCNTSYS BUSMCNT(2) BUSMCNTSYS()
BUSMCNTDMA BUSMCNT(3) BUSMCNTDMA()

The old access is just wrong: it disconnects the actual placeholder from the thing you're accessing entirely, with the only benefit being that we can access it in an array-like manner (which doesn't help: how on earth does this random number map to an actual register offset/name that we care about?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Chiptool chokes on registers with duplicate names and dimIindex

2 participants