Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

editscript composition and optimization #2

Open
EmergentBehavior opened this issue May 10, 2018 · 5 comments
Open

editscript composition and optimization #2

EmergentBehavior opened this issue May 10, 2018 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@EmergentBehavior
Copy link

First, I think this library is pretty interesting. I was wondering about one use case though: let's say you have entity A_t0 (where t is analogous to a time step) and you have an editscript e_0->1 to describe the transformation needed to get A_t0 to A_t1. If you capture an editscript for transformations at each time step (if there is a change), you'd have a collection of e, right? Then if you want to get the present state of A you could just concatenate all those editscripts together (to describe changes between t0 and tN). Have you tried this use case?

I wonder if at some point though, if the editscript gets large enough the patching process would slow down and it would be helpful to have some sort of editscript optimizer to reduce to the minimal editscript needed to get from At0 to AtN.

@huahaiy
Copy link
Contributor

huahaiy commented May 10, 2018

For your first paragraph, yes, the editscript is designed to do just that. (get-edits e) return a vector. These vectors can be concatenated to represent a larger change. BTW, I added a 'combine` function.

For the second, it is a very interesting question. I have not encountered the cases where the patching process take too long. When these cases do appear, I will think about an optimizer.

On the other hand, editscript is designed with stream processing in mind. An editscript should be conceptualized as a chunk in a potentially endless stream of changes. So it is more meaningful to worry about data integrity, compression, windowing, etc, rather than the sizes of individual ediscripts. Optimizers in these contexts are indeed what I am very interested in.

Basically, I consider editscript as a part of the data-oriented effort of Clojure, that tries to elevate the level of abstraction of data from characters or bytes level to that of maps, sets, vectors, and lists level. So instead of talking about byte streams, we can talk about change streams in term of these data structures.

Do I make sense?

@pepe
Copy link

pepe commented May 11, 2018

I haven't had a chance to try edit script yet, but I think it will play nice with Specter. It seems to me they have a similar view of the data.

@EmergentBehavior
Copy link
Author

@huahaiy Thanks for the answer. My latter paragraph was considering a scenario in event streaming where I rebuild the "present" version of an entity by composing all historical mutations over its entire history of existence (if checkpointing or other strategies weren't used).

@huahaiy
Copy link
Contributor

huahaiy commented May 12, 2018

@EmergentBehavior You scenario sounds similar to mine.

Given an editscript, there are indeed some opportunities to optimize, e.g. if one sub-tree will later be deleted, all edits happened inside that sub-tree could be safely removed without impacting the end results.

Such optimization may require the editscript to record some kind of identifiers for internal nodes. I will think about these.

Meanwhile, my current focus is to further improve the diffing speed. I am working on fingerprinting the data to avoid drill down sub-trees that have the same content.

@huahaiy huahaiy self-assigned this Jun 26, 2020
@huahaiy huahaiy added the enhancement New feature or request label Jun 26, 2020
@huahaiy
Copy link
Contributor

huahaiy commented Jun 26, 2020

Implementing some obvious optimizations should be a good starting point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants