Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite sync module in optimal way #974

Open
al-indigo opened this issue Apr 9, 2023 · 1 comment
Open

Rewrite sync module in optimal way #974

al-indigo opened this issue Apr 9, 2023 · 1 comment
Assignees
Labels
backend bug is related to backend enhancement this label means that resolving the issue would improve some part of the system optimization something that improves performance or storage space

Comments

@al-indigo
Copy link
Member

We have a module that is responsible for cross-site sync. Due to historical reasons that module (https://github.com/ispras/lingvodoc/blame/heavy_refactor/lingvodoc/views/v2/sync.py) works in a batch mode which means that it grabs whole changes that we need to sync and sends it in a single request/transaction.

That will not work anymore (at all) since it consumes memory as a satan. We need to make it another way. The main points are:

  1. The code should form reasonable (and configurable) parts for sync process. Instead of grabbing all the dictionaries with all the contents we should make reasonable parts: the dictionaries themselves, the batches of lex entries, etc in configurable batch sizes (100 per sync request for example).
  2. The tasks should be made as our async tasks (that can be executed by Celery or by our forking mechanism)
  3. The sync tasks should cache the sync result in local cache. If the main server has consumed (or given to us) an update successfully, we should mark synced data in Redis instead of re-acquiring the state of the main server.
  4. The tasks for sync should form a linear queue. None of the lex entries should be sent to main site until the parent objects are really created (e.g. dictionaries, perspectives, fields). Their existence should be proved by 2.
  5. All the login process should use the external IAM service: Keycloak. We should make authentication and prove authorization for objects through our central Keycloak instance. @princessfruittt should make up a way to give a sequence of client_ids at this side eventually
  6. (possibly hard) If it's possible to prioritize these sync transactions, they must have the highest priority. I didn't find a way to do it yet through sqlalchemy
  7. (possibly hard) If there is a way to use near-binary algorithms for data transmission/serialize/deserialize that makes sense to use it there. I know about protobuf/grpc/http 2.0, not sure that there is nothing more appropriate here.
  8. The web-interface for that feature exists. Maybe it needs to be revised (but I'm ok with the last version except it is synchronous). After the changes 1-5 it obviously should be async.
@al-indigo al-indigo added enhancement this label means that resolving the issue would improve some part of the system optimization something that improves performance or storage space backend bug is related to backend labels Apr 9, 2023
@al-indigo al-indigo pinned this issue Apr 9, 2023
@al-indigo
Copy link
Member Author

This issue has low priority for now (let's look for tomorrow meeting though), but is pretty important itself and seems to be time-consuming, so I've pinned it for a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend bug is related to backend enhancement this label means that resolving the issue would improve some part of the system optimization something that improves performance or storage space
Projects
None yet
Development

No branches or pull requests

4 participants