-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flush behavior during packing #33
Comments
Flushing is always a balancing act between efficiency and immediacy. As such I'm not sure there is one right answer to this question. This really needs a top-down design process to consider what level of control is possible and useful in different use cases. |
Yes, makes sense, there is a trade-off. Yesterday I sketched up a small test to see what the current behaviour looks like: public void testFlushBehavior() {
int flushes[] = {0};
Map testData = Map.of(
Arrays.asList("Key1", "Key2"), Arrays.asList("Value1", "Value2"),
Arrays.asList("Key3", "Key4"), Arrays.asList("Value3", "Value4"));
ByteArrayOutputStream out = new ByteArrayOutputStream() {
public void flush() throws IOException {
super.flush();
flushes[0]++;
};
};
for (TransitFactory.Format format : TransitFactory.Format.values()) {
Writer writer = TransitFactory.writer(format, out);
writer.write(testData);
System.err.format("Number of flushes for format %s: %d\n", format, flushes[0]);
out.reset();
flushes[0] = 0;
}
} The result with an otherwise pristine transit-java source tree is:
It looks to me that the current behaviour is different for MSGPACK and JSON and indeed looks good for MSGPACK. The reason for the different behaviour is that The reason I am a bit concerned about the amount of flushes for the JSON writer is that I worry about wear leveling on SSDs, as the operating system forces all data to disk and blocks for FileOutputStreams. Same for unnecessarily splitting up data units in network protocols. Thinking about policy, do you think it makes sense to introduce some kind of policy argument during the construction of the TransitWriters? I see use cases for both behaviours but find the current one a bit surprising and inconsistent. ;) |
@puredanger how about flushing only the boundary with user code? When I call transit/write I don’t expect any intermediate flushes, at most one at the end. The best would probably be not to do any flushes at all and leave them all to the user, but this change can be backwards-incompatible. |
Two issues popped up in transit-clj which are better recorded here:
In essence:
transit-java/src/main/java/com/cognitect/transit/impl/AbstractEmitter.java
Line 189 in cff7111
The flushwriter flushes down the data from the output channel after serializing every single element of a data structure, causing a performance drop because the process will need to wait until the operating system forced every single newly serialized bit of data out onto the disk (or in the example of other streams, fragment it somehow on the network level).
I don't think the flush there is necessary but should move only to the end of the whole serialization process in
write
.I would be happy to help or provide patches, just let me know. Thanks! :)
The text was updated successfully, but these errors were encountered: