flush behavior during packing #33

strssndktn · 2020-07-08T14:49:37Z

Two issues popped up in transit-clj which are better recorded here:

Writing large object to file is slow transit-clj#43
When using with HTTP Chunked Transfer Encoding, 6.7MB becomes 10.5MB due to chunking overheads transit-clj#46

In essence:

transit-java/src/main/java/com/cognitect/transit/impl/AbstractEmitter.java

Line 189 in cff7111

flushWriter();

The flushwriter flushes down the data from the output channel after serializing every single element of a data structure, causing a performance drop because the process will need to wait until the operating system forced every single newly serialized bit of data out onto the disk (or in the example of other streams, fragment it somehow on the network level).

I don't think the flush there is necessary but should move only to the end of the whole serialization process in write.

I would be happy to help or provide patches, just let me know. Thanks! :)

The text was updated successfully, but these errors were encountered:

puredanger · 2020-07-08T16:31:01Z

Flushing is always a balancing act between efficiency and immediacy. As such I'm not sure there is one right answer to this question. This really needs a top-down design process to consider what level of control is possible and useful in different use cases.

strssndktn · 2020-07-08T18:14:47Z

Flushing is always a balancing act between efficiency and immediacy. As such I'm not sure there is one right answer to this question. This really needs a top-down design process to consider what level of control is possible and useful in different use cases.

Yes, makes sense, there is a trade-off.

Yesterday I sketched up a small test to see what the current behaviour looks like:

    public void testFlushBehavior() {
    	int flushes[] = {0};
    	Map testData = Map.of(
    			Arrays.asList("Key1", "Key2"), Arrays.asList("Value1", "Value2"),
    			Arrays.asList("Key3", "Key4"), Arrays.asList("Value3", "Value4"));
    	ByteArrayOutputStream out = new ByteArrayOutputStream() {
    		public void flush() throws IOException {
    			super.flush();
    			flushes[0]++;
    		};
    	};
    	
    	for (TransitFactory.Format format : TransitFactory.Format.values()) {
    		Writer writer = TransitFactory.writer(format, out);
    		writer.write(testData);
    		System.err.format("Number of flushes for format %s: %d\n", format, flushes[0]);
    		out.reset();
    		flushes[0] = 0;
    	}
    }

The result with an otherwise pristine transit-java source tree is:

Number of flushes for format JSON: 15
Number of flushes for format MSGPACK: 1
Number of flushes for format JSON_VERBOSE: 15

It looks to me that the current behaviour is different for MSGPACK and JSON and indeed looks good for MSGPACK. The reason for the different behaviour is that org.msgpack.io.StreamOutput swallows the flush. It has an empty function body.

The reason I am a bit concerned about the amount of flushes for the JSON writer is that I worry about wear leveling on SSDs, as the operating system forces all data to disk and blocks for FileOutputStreams. Same for unnecessarily splitting up data units in network protocols.

Thinking about policy, do you think it makes sense to introduce some kind of policy argument during the construction of the TransitWriters? I see use cases for both behaviours but find the current one a bit surprising and inconsistent. ;)

tonsky · 2023-07-21T18:23:47Z

@puredanger how about flushing only the boundary with user code? When I call transit/write I don’t expect any intermediate flushes, at most one at the end.

The best would probably be not to do any flushes at all and leave them all to the user, but this change can be backwards-incompatible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flush behavior during packing #33

flush behavior during packing #33

strssndktn commented Jul 8, 2020 •

edited

Loading

puredanger commented Jul 8, 2020

strssndktn commented Jul 8, 2020

tonsky commented Jul 21, 2023

flush behavior during packing #33

flush behavior during packing #33

Comments

strssndktn commented Jul 8, 2020 • edited Loading

puredanger commented Jul 8, 2020

strssndktn commented Jul 8, 2020

tonsky commented Jul 21, 2023

strssndktn commented Jul 8, 2020 •

edited

Loading