Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search for usks - INCOMPATIBLE CHANGE REQUIRES MATCHING UPDATE IN plugin-Library #8

Open
wants to merge 42 commits into
base: master
Choose a base branch
from

Conversation

debbiedub
Copy link

@debbiedub debbiedub commented Oct 11, 2022

Changes to search for and find USKs instead of SSKs and to handle USKs in the database.

The most important changes are:

  • Change the database to store editions separate from the URI (USK have edition 0 in the database). This makes different editions of the same page appear as one in the database.
  • Add a last fetched field in the database.
  • Change the indexes to separate between NEW and NEW_EDITION. Also separate PROCESSED_KSKs, and PROCESSED_USKs from DONE. The NEW_EDITION index is ordered on last fetched instead of last handled.
  • Control fetching from NEW, NEW_EDITION and FAILED queues separately.
  • Change the packing format of TermEntry sent to Library to allow sending USKs. This requires a corresponding fix in the plugin-Library that receives these TermEntrys.
  • Change all SSK links that could be USKs into USKs before entered into the database.
  • Catch redirect of a fetch if it is a USK with new edition to just add it to the queue again. For NEW_EDITIONs this will mean in at the top, for NEW it will mean at the end.
  • Subscribe to new USKs for the USKs found.
  • Show and info file on the Spider GUI.

The purpose is to have newly found pages fetched before attempting to
fetch all failed pages again.
This is an attempt that was not working.
The change of the page affects the list of pages and might confuse the
iterator.
USKs treated as USKs, ...
Each USK is now stored once in the database and not once per edition.
@debbiedub debbiedub changed the title Search for usks Search for usks - INCOMPATIBLE CHANGE REQUIRES MATCHING UPDATE IN plugin-Library Oct 11, 2022
@ArneBab
Copy link
Contributor

ArneBab commented Nov 27, 2022

Thank you! I’ll try to get this into 1496 and to create test-jars till then so interested people can test this before it is pushed to all users.

@ArneBab
Copy link
Contributor

ArneBab commented Nov 27, 2022

@Juiceman
Copy link
Contributor

Can I help get this merged somehow? This needs the plugin-Library merged first... Is the Library backwards compatible with existing indexes or do both need to go in at the same time?

@debbiedub
Copy link
Author

The changes I have made in plugin-Library does not affect using the library to find things in any of the indices and the indices are stored in the same way in freenet/hyphanet.

The problem is for nodes that run the combination of plugin-Spider and plugin-Library to create the index. The communication between the two plugins is changed so that the current version of plugin-Library (without my fixes) cannot receive the information found by plugin-Spider with this PR.

@ArneBab
Copy link
Contributor

ArneBab commented Jul 23, 2024

That is much less risky to release then — because the unchanged Spider + Library are broken already after 3-4 updates, so there’s no new breakage — thank you!

Because the changes to library are huge.

Where are the freenet.copied packages in Library copied from? Are they from the fred sourcetree?

@debbiedub
Copy link
Author

The freenet.copied are from the fred source tree.

One alternative is to not merge my changes to plugin-Library into plugin-Library, and as a consequence let plugin-Library be the plugin that is used in almost every node to read the index and nothing else. Eventually, the old features to create the index can be removed. The functions that actually create the index could then either be a stand-alone repo, or be merged into plugin-Spider since they will always be used together anyway.

One of the problems with this is that the implementation of the code to maintain the B-trees is shared between the reading of the index and the writing of the index. When I made the restructuring of plugin-Library, I split it into src, shared, and updater where shared was shared both between reading and writing but also between running as part of the plugin (plugin-Library) and outside the node. Unless this part is factored out of the plugin, as I have done by moving it to shared, there will be multiple implementations of this.

@debbiedub
Copy link
Author

I am now thinking like this:

| Function            | Current main   | Debbies current solution  | Suggestion                     | Comment                              |
|---------------------+----------------+---------------------------+--------------------------------+--------------------------------------|
| Reading (a plugin)  | plugin-Library | plugin-Library (src)      | plugin-Library (src)           |                                      |
| Shared              | plugin-Library | plugin-Library (shared)   | plugin-Library (shared)        | Shared between reading and creating  |
| Creating            | plugin-Library | plugin-Library (uploader) | plugin-Spider (TBD)            | Compile and run dependency to Shared |
| Tools               |                | plugin-Library (uploader) | plugin-Spider (TBD)            | Tools to maintain the index          |
| Crawling (a plugin) | plugin-Spider  | plugin-Spider             | plugin-Spider (plugins.Spider) |                                      |

The suggested structure would make a lot more sense than my current approach that was started with the ambition not to modify plugin-Spider.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants