Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export leads to corrupted file #136

Open
chrhansk opened this issue Mar 12, 2024 · 15 comments
Open

Export leads to corrupted file #136

chrhansk opened this issue Mar 12, 2024 · 15 comments

Comments

@chrhansk
Copy link

I am trying to export data using WriteVTK v1.18.2 to be examined with paraview. Specifically, the data is based on a rectilinear grid of size 41 x 41.
The output script is given by

output.jl

The resulting output file is the following (zipped to allow upload):

output_file.zip

The file is corrupted and unreadable as reported by paraview:

(   6.005s) [paraview        ]       vtkXMLParser.cxx:364    ERR| vtkXMLDataParser (0x5a6b5f2f9110): Error parsing XML in stream at line 17, column 3, byte index 811: not well-formed (invalid token)
(   6.055s) [paraview        ]       vtkXMLReader.cxx:576    ERR| vtkXMLRectilinearGridReader (0x5a6b5e915670): Error parsing input file.  ReadXMLInformation aborting.

This is confirmed using other XML libraries. Is there something wrong in the calls to the exporting or is this some error in the processing inside WriteVTK?

@jipolanco
Copy link
Member

I was able to run your script and generate the output file. I had no problem opening and visualising the file in ParaView 5.12. I also checked that the file I generated is identical to the one contained in your zip file.

I'm not sure what could be the problem. Maybe your operating system?

A few things you could try (see here for details):

  1. Pass compress = false to vtk_grid to disable compression;
  2. Pass append = false to vtk_grid to avoid writing raw binary data to XML files.

@chrhansk
Copy link
Author

This certainly is interesting. In terms of operating systems I tried this on a 64bit Linux also using ParaView 5.12. However, the exact same file seems to work on MacOS 14.4 (Apple M1) with ParaView 5.11. Is there something strange going on with the binary XML (endianness, word size)?

@jipolanco
Copy link
Member

Strange, I'm also on 64 bit Linux.

If it's an endianness problem, I would expect to see the issue in big endian machines, since the binary data was written in little endian order (WriteVTK decides this based on the value of ENDIAN_BOM).

@chrhansk
Copy link
Author

Well, I switched to base64 as you suggested and everything seems to work fine.

As an aside, I find the naming of the options to vtk_grid a little unfortunate: "Append" signals to me that data is appended to an existing file, which is not the case. As far as I see it, there are three formats: text, base64, and raw binary. The compression then applies to the second formats. Maybe this could be made a little clearer in the API (for example using a suitable union type as argument)?

I was wondering about the format, since I didn't find anything official regarding raw binary data in XML files. I wasn't aware that this is an unofficial extension from KitWare. Naturally, any normal XML reader will flag the file as invalid...

I would suspect that the problem lies in the data in the header, since zlib is quite well tested in general. Maybe the number of blocks / lengths is read differently based on some specific of the underlying system. Since this should likely not be the case, this looks like an error in ParaView / the file type specification, and not an error specific to your package...

@jipolanco
Copy link
Member

As an aside, I find the naming of the options to vtk_grid a little unfortunate: "Append" signals to me that data is appended to an existing file, which is not the case. As far as I see it, there are three formats: text, base64, and raw binary. The compression then applies to the second formats. Maybe this could be made a little clearer in the API (for example using a suitable union type as argument)?

I agree that the naming could be better. In the VTK specification, "append" refers to writing data to an "AppendedData" section at the end of the XML file. In this section, data can be written either as base64 or raw binary, and in both cases it can be optionally compressed. In WriteVTK, "append" implies that data will be written as raw binary, since appending base64 data is simply not implemented. I understand how this can be unclear. And yes, as mentioned e.g. in these VTK docs, writing raw binary data to an XML file goes against the XML specification. However, in practice ParaView has generally no issues when reading such files. And when working with large datasets, it's the fastest and most compact possible format (if one wants to stay within the possiblities of VTK XML files).

@MarDiehl
Copy link

MarDiehl commented Jun 2, 2024

I'm using ParaView in version 5.12.0 on Linux and also have issues with appended data. It seems that

However, in practice ParaView has generally no issues when reading such files.

does not always hold.

with append = false everything works as expected.

MarDiehl added a commit to damask-multiphysics/Damask.jl that referenced this issue Jun 2, 2024
@jipolanco
Copy link
Member

@MarDiehl thanks for reporting this. Would you be able to provide a minimal working example to test the issue?

@MarDiehl
Copy link

MarDiehl commented Jun 3, 2024

I am using Damask.jl (https://github.com/eisenforschung/Damask.jl)
An example file that is created if I change https://github.com/eisenforschung/Damask.jl/blob/370fbffc9531437bc92750d8071d889bfc82e27a/src/Damask.jl#L500 to append = true is attached.

20grains16x16x16_tensionX_material_inc00.vti (extension is modified to allow upload to GitHub).

The error message of Paraview is

(   4.673s) [paraview        ]       vtkXMLParser.cxx:364    ERR| vtkXMLDataParser (0x5f601c0e1d10): Error parsing XML in stream at line 24, column 0, byte index 1550: not well-formed (invalid token)
(   4.681s) [paraview        ]       vtkXMLReader.cxx:576    ERR| vtkXMLImageDataReader (0x5f601c0329d0): Error parsing input file.  ReadXMLInformation aborting.
(   4.957s) [paraview        ]       vtkXMLParser.cxx:364    ERR| vtkXMLDataParser (0x5f601c0e1d10): Error parsing XML in stream at line 24, column 0, byte index 1550: not well-formed (invalid token)
(   4.957s) [paraview        ]       vtkXMLReader.cxx:576    ERR| vtkXMLImageDataReader (0x5f601c0329d0): Error parsing input file.  ReadXMLInformation aborting.

ParaView reports

Client Information:
Version: 5.12.0
VTK Version: 9.3.20231030
Qt Version: 5.15.13
vtkIdType size: 64bits
Embedded Python: On
Python Library Path: /usr/lib/python3.12
Python Library Version: 3.12.3 (main, Apr 23 2024, 09:16:07) [GCC 13.2.1 20240417]
Python Numpy Support: On
Python Numpy Path: /usr/lib/python3.12/site-packages/numpy
Python Numpy Version: 1.26.4
Python Matplotlib Support: On
Python Matplotlib Path: /usr/lib/python3.12/site-packages/matplotlib
Python Matplotlib Version: 3.8.3
Python Testing: Off
MPI Enabled: On
Disable Registry: Off
Test Directory: 
Data Directory: 
SMP Backend: TBB
SMP Max Number of Threads: 8
OpenGL Vendor: Intel
OpenGL Version: 4.6 (Core Profile) Mesa 24.0.7-arch1.3
OpenGL Renderer: Mesa Intel(R) UHD Graphics 620 (WHL GT2)
Accelerated filters overrides available: No

Connection Information:
Remote Connection: No

@jipolanco
Copy link
Member

jipolanco commented Jun 3, 2024

Once more, I wish I could reproduce the issue, but I have no trouble reading that file on ParaView 5.12.1. Note that I'm using the official binaries from kitware.

My version information, in case it helps:

Client Information:
Version: 5.12.1
VTK Version: 9.3.20231030
Qt Version: 5.15.10
vtkIdType size: 64bits
Embedded Python: On
Python Library Path: /home/jipolanco/opt/ParaView-5.12.1-MPI-Linux-Python3.10-x86_64/lib/python3.10
Python Library Version: 3.10.13 (main, May 23 2024, 07:05:53) [GCC 10.2.1 20210130 (Red Hat 10.2.1-11)]
Python Numpy Support: On
Python Numpy Path: /home/jipolanco/opt/ParaView-5.12.1-MPI-Linux-Python3.10-x86_64/lib/python3.10/site-packages/numpy
Python Numpy Version: 1.25.2
Python Matplotlib Support: On
Python Matplotlib Path: /home/jipolanco/opt/ParaView-5.12.1-MPI-Linux-Python3.10-x86_64/lib/python3.10/site-packages/matplotlib
Python Matplotlib Version: 3.7.2
Python Testing: Off
MPI Enabled: On
ParaView Build ID: superbuild 32fea3bb560dae5130b02c087816520f7c71671a (!1198)
Disable Registry: Off
Test Directory: 
Data Directory: 
SMP Backend: TBB
SMP Max Number of Threads: 12
OpenGL Vendor: AMD
OpenGL Version: 4.6 (Core Profile) Mesa 24.0.8
OpenGL Renderer: AMD Radeon Graphics (radeonsi, gfx1103_r1, LLVM 18.1.1, DRM 3.57, 6.8.11-300.fc40.x86_64)
Accelerated filters overrides available: No

Connection Information:
Remote Connection: No

I'm guessing from your python paths that you're using the ParaView binaries from your linux distribution. Could you check if the issue disappears when using the official binaries?

@MarDiehl
Copy link

MarDiehl commented Jun 3, 2024

The official binary works. ParaView and vtk (which are separate builds) from the Arch Linux repositories fail. Both depend on libxml2 (current version 2.12.7) and pugixml (current version 1.14).

As far as I see, ParaView vendors libxml 2.9.12 (via vtk, https://gitlab.kitware.com/vtk/vtk/-/tree/14d1d855e377876b5dc73a64da566d0fc3e852ff/ThirdParty/libxml2) and pugixml 1.11.4 (via vtk, https://gitlab.kitware.com/vtk/vtk/-/tree/14d1d855e377876b5dc73a64da566d0fc3e852ff/ThirdParty/pugixml).

To me it seems plausible that these libraries are used for reading XML files and the newer versions are more strict when it comes to valid XML files.

@chrhansk
Copy link
Author

chrhansk commented Jun 3, 2024

Well, to my mind the problem is the following: Despite how it looks, the VTK format is not XML. I cannot stress this enough. XML does not allow binary data, it is purely text-based. The binary data embedded in the files makes this format "homemade" loosely based on XML (unless of course you encode the binary data in base64, in which case the format may comply with XML specifications).

@chrhansk
Copy link
Author

chrhansk commented Jun 3, 2024

I also think that the problem is not caused by WriteVTK.jl and only manifests itself here. Specifically, I think that the format is ad-hoc and a bit under-specified. Additionally, using an XML parser for a file that is not an XML file is just asking for trouble in my opinion.

In my case I generated a file which parsed correctly on one system and incorrectly on another. This really should not happen. I think in order to get this properly fixed, the issue needs to be raised with the devs at KitWare who are after all in control of the file format itself...

@MarDiehl
Copy link

MarDiehl commented Jun 3, 2024

I agree. I'll file a bug at kitware

@MarDiehl
Copy link

MarDiehl commented Jun 4, 2024

I tried to reproduce this but can't anymore. Presumably because ParaView was updated from 5.12.0 to 5.12.1 and rebuild with ffmpeg 7 (Note: 5.12.0 from the official build works).
The separate VTK library (I use it via Python) in version 9.3 and without the ffmpeg rebuild still fails.

When looking for bugs, I found https://gitlab.kitware.com/paraview/paraview/-/issues/20982. The solution for me is to avoid the binary format. Compressed base64 is good for my purposes. IMHO it's a bad idea to define an XML-based file format that is not compatible with XML.

@jipolanco: I think the HDF5-based format (https://docs.vtk.org/en/latest/design_documents/VTKFileFormats.html#vtkhdf-file-format) is the better option and it seems that KitWare directs the development efforts towards it. Would including that into WriteVTK.jl be something you're interested in?

@jipolanco
Copy link
Member

I agree with both of you that putting binary data on XML files is asking for trouble, especially when VTK uses standard XML libraries to read and write files. Unfortunately, for very large datasets, binary ("appended") data provides by far the fastest and most compact way of dealing with data, which is why it's the default format in WriteVTK.jl.

@MarDiehl Definitely, I'm very interested in the new VTKHDF format, which would solve this issue as well as being more convenient to work with. In fact I've been using it for a while in a separate project (not public yet). Adding support for VTKHDF in WriteVTK.jl, possibly as an alternative backend, would make a lot of sense, and it has been discussed in a few issues like #125. But I would need to find the time or someone motivated to implement this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants