Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore alternative record architecture without "key indexes" #60

Closed
pospi opened this issue Sep 2, 2019 · 4 comments
Closed

Explore alternative record architecture without "key indexes" #60

pospi opened this issue Sep 2, 2019 · 4 comments
Labels
enhancement New feature or request

Comments

@pospi
Copy link
Member

pospi commented Sep 2, 2019

When I ran through the first pass of inter-DNA linking, we were storing "base" entries (*) as the address of the first version of an entry to keep consistent IDs. This was mostly to allow for consistent record IDs between entry updates, across networks. After implementing the second pass, which does not use "base" entries as targets but instead writes metadata around the target link as a JSON-based entry, storage of such consistent record hashes seems less necessary.

(*)(which have since been renamed to "key indexes"; please substitute as appropriate when you see the older terminology)

It may be possible to link directly between entries whilst always referring to them by their consistent initial hash without incurring any additional storage overhead. For example, we would no longer need the consistently-identified EVENT_BASE_ENTRY_TYPE linking to the underlying EVENT_ENTRY_TYPE that has a roaming hash.

  • When creating new records via create_record, instead of storing the base entry address, just return the initial hash coming from commit_entry.
  • When calling update_record there would no longer be any need to dereference the entry; however, reading the entry metadata in order to determine the most recent version hash may be necessary. The initial hash (rather than most recent entry hash) would be returned from this method as an identifier for the record that remains consistent between updates.
    • Another option is that update_record should accept a revision ID (read: actual entry hash) rather than a record ID (read: hash of first entry); which would necessitate returning this record metadata in responses (see Retrieve full revision history in all record responses #40). This method would also be better for avoiding undesirable update conflicts.
  • delete_record may have the same revision ID / record ID concern as for update, with the addition that there is no longer any base entry to delete.
  • read_record_entry takes the record ID initially returned from commit_entry and follows the update metadata through to the latest version of the entry automatically- there is no longer any reason to dereference the base entry. We may optionally wish to validate that no previous versions of the provided entry address exist, to ensure that revision IDs cannot be incorrectly used as record locators.

Aside from restructuring the zome link! definitions to remove the indirection, I don't think anything needs to change in the linking API. Provided all links continue to use the initial version of an entry, they should all still be readable in a single query for field traversals. It'll just be different link type names.

@pospi pospi added the enhancement New feature or request label Sep 2, 2019
@pospi pospi added this to the Production-ready core components milestone Sep 2, 2019
@pospi
Copy link
Member Author

pospi commented Sep 27, 2019

Other considerations and potential patterns to explore:

  • Is shadowing link data in the entry fields advisable, in order to speed up reads? This would also mean updates to link field data were reflected in the entry, so one could just follow the entry changelog to see when updates occurred (rather than also having to inspect related link entries). Of course, this would not apply to "indirect indexes" which use an intermediary entry to hold a compound key value.
  • What is most optimal?-
    • read methods that crawl links for each version of an initial entry; or
    • linking everything to the initial entry (as currently implemented); or
    • duplicating all links alongside the new version of the entry?
    • ...probably the latter? As it is consistent with the idea of versioning "carrying over" unmodified data into the new, and results in "always complete", easily re-constructable version data (rather than complex logic involved in link traversals as would be needed in some of the other configurations).
  • Should we actually be doing the "stable ID" thing, or does this cause issues with update logic & network partitions? (CAS updates are easy to do conflict detection on since the exact version is specified in each update.) If we don't, we need to build index synchronisation logic to cleanup old versions linked in other DNAs or manage cascading updates in order that records in remote DNAs can make accurate inferences about the number of external records linking in.

@pospi
Copy link
Member Author

pospi commented Nov 14, 2019

Another nod towards "shadowing the link data in the entry fields"- valueflows/vf-apps#5 (comment)

For constraints like "if the event action is not defined as input and/or output, it should not be related to a process", not including the link field data in the entry actually makes validation logic far more difficult.

We also need to ensure support for calling in to bridged DNAs during validation calls in order to fulfill constraints like the above. (CC @pdaoust)

@pospi pospi changed the title Explore alternative record architecture without "base entries" Explore alternative record architecture without "key indexes" Jan 9, 2020
@pospi
Copy link
Member Author

pospi commented Mar 12, 2020

Further reflections to be had RE https://infocentral.org/drafts/PrinciplesDraft.html

@pospi
Copy link
Member Author

pospi commented Apr 27, 2022

Closing this one as superceded- given the newer approach of keeping indexes separately to CRUD entries, not to mention the new Holochain architecture of using headers for updates & deletes.

It does seem we are on the final pass of indexing cleanup and with #84 (comment) and #264 being addressed we should be in a good place for an MVP.

@pospi pospi closed this as completed Apr 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant