Using select_streams argument traverses file twice #60

chkothe · 2020-04-08T19:40:01Z

Currently, when one uses the select_streams argument, it will first traverse the file's chunks to find/parse the headers, and sets some flags used subsequently for a check during the load_xdf main loop, which traverses and parses the file again*. It does that using an alternative code path that mirrors the chunk traversal and header parsing of the main loop, but currently skips, among others, some error checking in case of file corruptions etc.

Side note: I do like the idea of breaking load_xdf up into smaller subroutines, and I like the generator approach of _read_chunks, i.e., there may be a possibility to refactor load_xdf in this overall style, especially if the logic were to get more complex over the years. However, as it stands, the current main loop is still pretty simple and easy to follow (esp when one reads it alongside the spec), and could continue to serve as an approachable reference implementation for future language ports for a couple more years. So I'm not ready to pick a side at this point, also considering the effort that a full & clean refactor would come down to.

For now, maybe a way to reconcile the code duplication (which I hope is temporary) and double-traversal of the file could be to add a self.skip (as in, skip processing chunks of this stream) bool in the StreamData constructor, and we could move the matching logic there or into a helper method/function. This way, it would run the first time a header is encountered, and then whenever one sees a chunk of that stream, one can, roughly, do an if streams[StreamId].skip: continue near the place where it currently does that check. We could then earmark the remainder of that alternative code path (parse_chunks, _read_chunks, ...) for future consideration when or if we take on a refactor of load_xdf in this general style (maybe with a git tag).

I'd be willing to implement the suggested change (using StreamData.skip) this week if there's no objection. I think this may also get us a closer to a future simple and fast load-only-headers option for load_xdf (some time soon I'm hoping to have a separate discussion on that).

(*): The double-traversal may not sound like much, it'd be relatively more costly on a network file system.

The text was updated successfully, but these errors were encountered:

cbrnr · 2020-04-09T07:51:31Z

I agree that duplicating chunk reading is not ideal, but at that time it was the quickest solution, because I could use functions from an alternative XDF reader I'd written. Also, these functions were mainly intended for querying the XDF file (resolve_streams, match_streaminfos), which I think would be nice to keep.

If you want to put in the effort to de-duplicate chunk reading code paths this would be great, provided that there are no regressions in functionality (i.e. keeping the two functions I've mentioned before).

cboulay · 2020-12-29T21:00:51Z

@cbrnr - I'm doing a little cleanup of pyxdf, deduplicating code, eliminating the double-traversal when using select_streams, adding some other shortcuts, etc.

Am I right in assuming that resolve_streams only needs stream headers? Indeed the functions it calls (parse_xdf, parse_chunks, _read_chunks) only operate on stream headers so the answer seems like an obvious "yes", but those functions' doc strings don't state this explicitly so I was just wondering if you were planning on building these out in the future?

In my deduplication effort, I'm reusing your _read_chunks method, but this means adding a lot of other functionality to it, only to disable that functionality with stream_headers_only=True when being called via parse_xdf (via resolve_streams). Is this compatible with your use-case?

cbrnr · 2020-12-30T11:01:03Z

I'll answer in #74.

cboulay mentioned this issue Dec 29, 2020

Deduplication, support select_streams on single pass, support headers-only #74

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using select_streams argument traverses file twice #60

Using select_streams argument traverses file twice #60

chkothe commented Apr 8, 2020

cbrnr commented Apr 9, 2020

cboulay commented Dec 29, 2020

cbrnr commented Dec 30, 2020

Using select_streams argument traverses file twice #60

Using select_streams argument traverses file twice #60

Comments

chkothe commented Apr 8, 2020

cbrnr commented Apr 9, 2020

cboulay commented Dec 29, 2020

cbrnr commented Dec 30, 2020