Skip to content

JAXB marshaller and unmarshaller using streaming to handle large files

License

Notifications You must be signed in to change notification settings

Chavaillaz/jaxb-stream

Repository files navigation

JAXB Streaming

Quality Gate Dependency Check Maven Central License

This library allows you to read and write a list of elements (even from different types, but with the same parent) item by item from and to an XML file. The goal is to avoid loading a huge amount of data into memory when processing large files.

Installation

The dependency is available in maven central (see badge for version):

<dependency>
    <groupId>com.chavaillaz</groupId>
    <artifactId>jaxb-stream</artifactId>
</dependency>

Usage

You can find the following example in the StreamingTest class. Note that this library also works with JAXB classes generated by an XSD file. In that case, have a look at the StreamingXsdTest class.

Context

You are storing different types of metrics in an XML file. Because of memory constraints and the number of entries, you cannot load them at once as it would be done with JAXB by unmarshalling the file into the container class.

In order to process them anyway, you can use this library to read or write them, item by item.

In this example, an interface Metric is implemented by multiple metric types:

  • Disk metrics (class DiskMetric, XML element disk)
  • Memory metrics (class MemoryMetric, XML element memory)
  • Processor metrics (class ProcessorMetric, XML element processor)

Each metric defines an XML element by using the annotation @XmlRootElement. Those metrics would usually be stored in the container MetricsList, representing a list of metrics (container). This list also defines an XML element, in that case metrics, the XML tag for that container.

Below an XML file from that example:

<?xml version="1.0" ?>
<metrics>
    <disk>
        <disk>/</disk>
        <freePartitionSpace>688865050624</freePartitionSpace>
        <usablePartitionSpace>544384016384</usablePartitionSpace>
        <totalCapacity>700001001472</totalCapacity>
    </disk>
    <memory>
        <freeMemory>521889952</freeMemory>
        <maxMemory>8589934592</maxMemory>
        <totalMemory>536870912</totalMemory>
    </memory>
    <processor>
        <systemLoad>0.25</systemLoad>
        <processLoad>0.18</processLoad>
        <availableProcessors>16</availableProcessors>
    </processor>
    ...
</metrics>

Writing elements

For example, to write two metrics (memory and processor metrics), the following code can be used:

try (StreamingMarshaller marshaller = new StreamingMarshaller(MetricsList.class)) {
    marshaller.open(new FileOutputStream(fileName));
    marshaller.write(MemoryMetric.class, new MemoryMetric());
    marshaller.write(ProcessorMetric.class, new ProcessorMetric());
    ...
}

Note that you can also give the root element tag name instead of giving MetricsList.class.

Reading elements

For example, to read the written metrics (memory and processor metrics), the following code can be used:

try (StreamingUnmarshaller unmarshaller = new StreamingUnmarshaller(MemoryMetric.class, ProcessorMetric.class)) {
    unmarshaller.open(new FileInputStream(fileName));
    unmarshaller.iterate((type, element) -> doWhatYouWant(element));
}

or by iterating over each element by yourself:

try (StreamingUnmarshaller unmarshaller = new StreamingUnmarshaller(MemoryMetric.class, ProcessorMetric.class)) {
    unmarshaller.open(new FileInputStream(fileName));
    while (unmarshaller.hasNext()) {
        doWhatYouWant(unmarshaller.next(YourObject.class));
    }
}

Note that if the classes given to the StreamingUnmarshaller do not have the XmlRootElement annotation (for example if they are generated by XJC from an XSD), you can give the tag names with the classes using a Map.

Complex XML file structure

If the XML file you would like to create or read has a complex structure (meaning the stream of elements to read is not present right after the root tag), you have the possibility to extends both marshaller and unmarshaller and override the following methods:

  • createDocumentStart in StreamingMarshaller to write the start of the XML file before the stream of elements
  • close in StreamingMarshaller to write the end of the XML file (note that tags are closed automatically)
  • skipDocumentStart in StreamingUnmarshaller to reach the stream of elements in the document

Contributing

If you have a feature request or found a bug, you can:

  • Write an issue
  • Create a pull request

If you want to contribute then

  • Please write tests covering all your changes
  • Ensure you didn't break the build by running mvn test
  • Fork the repo and create a pull request

License

This project is under Apache 2.0 License.