Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on reading files where <c> element misses the "r" attribute #465

Open
fabiospiga opened this issue Aug 22, 2024 · 4 comments · May be fixed by #514
Open

Error on reading files where <c> element misses the "r" attribute #465

fabiospiga opened this issue Aug 22, 2024 · 4 comments · May be fixed by #514

Comments

@fabiospiga
Copy link

I need to read a file where the element <row> and <c> miss the "r" attribute, that is apparently optional in the OpenXML structure.

Here's an example:
https://www.atih.sante.fr/sites/default/files/public/content/3968/fichier_complementaire_ccam_descriptive_a_usage_pmsi_2021_v2.xlsx

The raised exception is:
Cannot invoke "java.lang.Integer.intValue()" because the return value of "org.dhatim.fastexcel.reader.SimpleXmlReader.getIntAttribute(String)" is null

because int rowIndex cannot be unboxed at org.dhatim.fastexcel.reader.RowSpliterator#next

Could you please provide support for this use case?

Thanks and best regards,
Fabio

@fabiospiga fabiospiga changed the title Error on reading files where <c> element is misses the "r" attribute Error on reading files where <c> element misses the "r" attribute Aug 22, 2024
@RamsesGomez
Copy link

yes I have the exact same problem

@ochedru
Copy link
Collaborator

ochedru commented Sep 17, 2024

You are right, the r attribute is optional. We should check how this is handled in Apache POI. For example, what happens if some elements miss the r attribute, but not all?

@ursjoss
Copy link

ursjoss commented Feb 19, 2025

We have such a case as well. Interestingly, opening the seemingly broken file with Excel or libreoffice calc and saving it again results in fastexcel successfully processing the file. Looks like those programs will add the optional r attribute.

ursjoss added a commit to ursjoss/fastexcel that referenced this issue Feb 19, 2025
Inspired by the apache poi approach
@ursjoss ursjoss linked a pull request Feb 19, 2025 that will close this issue
ursjoss added a commit to ursjoss/fastexcel that referenced this issue Feb 20, 2025
Tracks both row and column indices to allow falling back to those
to determine row and/or cell address for Excel files that do not
provide the (optional) reference attribute 'r'.

Inspired by the apache poi approach.
@ursjoss
Copy link

ursjoss commented Feb 21, 2025

Opening the sample file provided by @fabiospiga and saving it also seems to fix the file in a way that let's fastexcel process it successfully (same as with our file). The FastExcelReaderTest that compares certain attributes of the parsed file when read with fastexcel with the same file being read with apache poi.

PR #514 fully fixes the issue we have experienced with our file. Adding it to the FastExcelReaderTest passes (after increasing byte array max override enough). However, the file provided by Fabio does not pass that test, even with the changes in PR #514. But also there, the initial exception is gone:

Cannot invoke "java.lang.Integer.intValue()" because the return value of
"org.dhatim.fastexcel.reader.SimpleXmlReader.getIntAttribute(String)" is null

I investigated to some extent to apply further fixes to let fastexcel process that file with the same outcome as with apache poi but I stopped.

I wanted to add a file to the FastExcelReaderTest but therefore failed to do so, as our file contains confidential customer data, and Fabios file still does not pass the test.

Maybe somebody else could contribute a file that can be added to the PR as a test case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants