feat: delete lots of data from indexers when leaving project #469

EvanHahn · 2024-02-06T21:56:44Z

When you leave a project, we now delete all indexed data outside of the auth namespace. Fixes #427.

Co-Authored-By: Andrew Chou [email protected]

src/capabilities.js

src/index-writer/index.js

src/utils.js

gmaclennan

Reading through this and related code I have some thoughts:

We need to close cores before we unlink/purge them - this should cancel replication requests and closes file descriptors
We should probably unreplicate cores that we are deleting before deleting them
We should write the updated role for leaving a project before deleting anything
We should probably wait for (pre)sync to complete before deleting a project - ensuring connected peers get all the data from the departing device, including the role record that they have left.
I'm not entirely clear about the state of a project instance after leaving - not everything is closed, but I'm not sure if the instance will work any more, or just have undefined behaviour?
I'm not sure about deleting schemas from the indexer separately from deleting the multi-core-indexer state - what happens if we don't delete a schema in a namespace but we do delete the multi-core-indexer?
I'm worried we're introducing complicated states where the indexes could be out-of-sync with the underlying cores. Would it be safer to just delete all indexes and index reader states and then re-index the cores we have left?
I think we need to maintain some kind of left project state, so that when a new project instance is created for a project that has been left, it doesn't try to re-sync everything and re-create the writer cores.

Sorry for inconclusive review - this PR raises a lot of questions / highlights some potential issues.

src/datastore/index.js

src/mapeo-project.js

EvanHahn · 2024-04-05T02:25:42Z

We need to close cores before we unlink/purge them - this should cancel replication requests and closes file descriptors

I think we're doing that here:

https://github.com/digidem/mapeo-core-next/blob/cdebfafa772958729f5ade5e64c04010b160b710/src/mapeo-project.js#L624-L629

We should probably unreplicate cores that we are deleting before deleting them

I'm not super familiar with its internals but it looks like Hypercore might do this for us?

We should write the updated role for leaving a project before deleting anything

Done in ab4d9c1.

I'm not sure about deleting schemas from the indexer separately from deleting the multi-core-indexer state - what happens if we don't delete a schema in a namespace but we do delete the multi-core-indexer?

At worst, we might've left a project but not cleaned up one or both of the indexers due to a crash or other failure.

We could add something to check this at startup. Pseudocode:

onStartup(() => {
  for (const leftProject in getLeftProjects()) {
    leftProject.makeSureCoresAndIndexesAreDestroyed()
  }
})

I'm worried we're introducing complicated states where the indexes could be out-of-sync with the underlying cores. Would it be safer to just delete all indexes and index reader states and then re-index the cores we have left?

I defer to your expertise here but I think we're just clearing the indexes and closing/deleting the cores. I suppose cores could have data in them that hasn't made it to the indexes and never will, but I think that's fine?

This feels like one of a few questions we haven't answered about leaving projects. Others include:

We should probably wait for (pre)sync to complete before deleting a project - ensuring connected peers get all the data from the departing device, including the role record that they have left.

I'm not entirely clear about the state of a project instance after leaving - not everything is closed, but I'm not sure if the instance will work any more, or just have undefined behaviour?

I think we need to maintain some kind of left project state, so that when a new project instance is created for a project that has been left, it doesn't try to re-sync everything and re-create the writer cores.

I think these are separate tasks, but I agree.

EvanHahn · 2024-04-11T18:17:37Z

I'm going to file the following issues if that seems reasonable to you:

At startup, clean up left projects that failed to delete

If we start to leave a project and the app crashes, we may fail to remove data from indexers. At startup, we should find left projects and remove the "orphaned" data.
Test leaving a project while offline, then connecting and syncing

We don't test what happens when someone leaves a project while offline, then syncs that to others. This test may uncover bugs.
Test project instance after leaving

We don't test what happens when someone leaves a project, then tries to perform certain operations on it. This test may uncover bugs or missing features.

Do those three tasks seem reasonable?

EvanHahn · 2024-04-25T14:25:16Z

Just filed #583,#584, and #585 based on the above.

gmaclennan

This seems ok to me, although I admit I can't quite get my head around possible edge-cases, but what is here looks like it should work. It could be worth spending some time writing out some more detailed tests of how we expect devices that have left a project to behave when connecting to others. I think the other aims of project leaving are:

Remove others data to free up disk space - can we test for this?
Keep own data so that it can be recovered later - can we test for this?

EvanHahn · 2024-05-09T16:08:40Z

Remove others data to free up disk space - can we test for this?

Keep own data so that it can be recovered later - can we test for this?

Good idea. I'll plan to do this in #585.

When you leave a project, we now delete all indexed data outside of the `auth` namespace. Fixes [#427]. [#427]: #427 Co-Authored-By: Andrew Chou <[email protected]>

feat: delete lots of data from indexers when leaving project

842cb7c

EvanHahn requested a review from gmaclennan February 6, 2024 21:56

EvanHahn commented Feb 6, 2024

View reviewed changes

src/capabilities.js Outdated Show resolved Hide resolved

src/index-writer/index.js Show resolved Hide resolved

src/utils.js Outdated Show resolved Hide resolved

EvanHahn added 3 commits February 20, 2024 07:08

Merge branch 'main' into remove-indexes-when-leaving

17f7d32

Fix merge mistake with duplicate function

9a296f6

Merge branch 'main' into remove-indexes-when-leaving

cdebfaf

EvanHahn marked this pull request as ready for review March 27, 2024 22:09

gmaclennan reviewed Apr 4, 2024

View reviewed changes

src/datastore/index.js Show resolved Hide resolved

src/mapeo-project.js Outdated Show resolved Hide resolved

src/mapeo-project.js Outdated Show resolved Hide resolved

src/mapeo-project.js Outdated Show resolved Hide resolved

EvanHahn added 2 commits April 4, 2024 20:46

Merge branch 'main' into remove-indexes-when-leaving

9f22adb

Assign LEFT role earlier

ab4d9c1

EvanHahn added 2 commits April 10, 2024 19:03

Merge branch 'main' into remove-indexes-when-leaving

1ce862f

Clean up deletion of indexed data

0b6aec3

EvanHahn requested a review from gmaclennan April 10, 2024 20:24

Merge branch 'main' into remove-indexes-when-leaving

c95f371

EvanHahn mentioned this pull request Apr 11, 2024

chore: fix flaky "project leave" tests #553

Merged

Merge branch 'main' into remove-indexes-when-leaving

cc9dd7c

gmaclennan approved these changes May 9, 2024

View reviewed changes

Merge branch 'main' into remove-indexes-when-leaving

59868d5

EvanHahn self-assigned this May 13, 2024

EvanHahn added 3 commits May 20, 2024 18:24

Use real multi-core-indexer

9b276cb

Use real @mapeo/sqlite-indexer

30d8b56

Use exact version for sqlite indexer

90b0818

EvanHahn merged commit 22aef9f into main May 20, 2024
7 checks passed

EvanHahn deleted the remove-indexes-when-leaving branch May 20, 2024 18:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: delete lots of data from indexers when leaving project #469

feat: delete lots of data from indexers when leaving project #469

EvanHahn commented Feb 6, 2024 •

edited

Loading

gmaclennan left a comment

EvanHahn commented Apr 5, 2024

EvanHahn commented Apr 11, 2024

EvanHahn commented Apr 25, 2024

gmaclennan left a comment

EvanHahn commented May 9, 2024

feat: delete lots of data from indexers when leaving project #469

feat: delete lots of data from indexers when leaving project #469

Conversation

EvanHahn commented Feb 6, 2024 • edited Loading

gmaclennan left a comment

Choose a reason for hiding this comment

EvanHahn commented Apr 5, 2024

EvanHahn commented Apr 11, 2024

EvanHahn commented Apr 25, 2024

gmaclennan left a comment

Choose a reason for hiding this comment

EvanHahn commented May 9, 2024

EvanHahn commented Feb 6, 2024 •

edited

Loading