indexserver: add debug endpoint for deleting repository shards #485

ggilmore · 2022-11-17T23:03:47Z

This PR adds a new debug endpoint /debug/delete?id=$REPO_ID. When a user calls this URL, all the shards associated with this ID will be deleted (or tombstoned if they're in compound shards).

Given the frequent memory-map incidents that we had this this week, this PR should make it easier for users to recover when they're in this state.

They should be able:

Grab a list of all IDs / repositories that are indexed on the instance:

wget -q -O - http://localhost:6072/debug/indexed                                                                                                                                              (base) 14:51:36
ID              Name
736075603       wut
1309754292      whatever

Extract some subset of repo ids via some method like awk | tail/head

wget -q -O - http://localhost:6072/debug/indexed | awk '{print $1}' | tail -n +2 | head -n 2                                                                                                  (base) 15:01:56
736075603
1309754292

Pipe their chosen ids to this new debug route:

wget -q -O - http://localhost:6072/debug/indexed | awk '{print $1}' | tail -n +2 | head -n 2 | xargs -I {} -- wget -q -O -  "http://localhost:6072/debug/delete?id={}"

This should also be useful for customers that want to try incremental indexing (and need a faster way to revert other than waiting for a new build).

(notes)

This PR still isn't final, I still need to add basic tests for deleteShards.
I decided to put this logic entirely in main.go. This logic is quite similar to the logic used in build/builder.go, but
- I didn't want to export this as a public function, I can't see anyone else needing to re-use this shard deletion logic (this debug endpoint should be the only user)
- Builder has its own custom logic around shard deletion that's battle-tested (deleting shards only when it successfully finishes a new build).
I think the use of build.Options here isn't great. That object is so huge, and we ultimately only need the indexDir, repoID, and repoName out of it - everything else is irrelevant for this use case. I tried experimenting with refactoring some of those interfaces, but I ended up just creating a bunch of pass-through-methods instead (from https://www.amazon.com/Philosophy-Software-Design-John-Ousterhout/dp/1732102201). What are your thoughts on this? Is it it worth the refactor? Can we create a smaller "options" object that doesn't have so many extraneous fields?

stefanhengl · 2022-11-18T10:08:49Z

I think the use of build.Options here isn't great. That object is so huge, and we ultimately only need the indexDir, repoID, and repoName out of it - everything else is irrelevant for this use case. I tried experimenting with refactoring some of those interfaces, but I ended up just creating a bunch of pass-through-methods instead (from https://www.amazon.com/Philosophy-Software-Design-John-Ousterhout/dp/1732102201). What are your thoughts on this? Is it it worth the refactor? Can we create a smaller "options" object that doesn't have so many extraneous fields?

I would leave it as it. It is very "Zoekt" to pass around options structs. I believe you could do with IndexOptions but you need FindAllShards() which doesn't really have to implemented on top of build.Options and should maybe be a function instead.

However, I think at some point, Zoekt would profit from a unified concept of a shard IE operations (find(id), delete(id), ...) to abstract from compound shards, simple shards, and delta shards.

cmd/zoekt-sourcegraph-indexserver/main.go

stefanhengl · 2022-11-18T11:33:56Z

You might as well add your nice recipe "How to delete a repo" from the PR description to the documentation of the debug endpoint.

ggilmore · 2022-11-19T01:52:58Z

cmd/zoekt-sourcegraph-indexserver/main_test.go

@@ -124,6 +250,109 @@ func TestMain(m *testing.M) {
 	os.Exit(m.Run())
 }

+func createTestNormalShard(t *testing.T, indexDir string, r zoekt.Repository, numShards int, optFns ...func(options *build.Options)) []string {


mostly copied this from the builder package, should we create a testutil package?

yup, maybe it's time. Not blocking for this PR though.

ggilmore · 2022-11-19T01:54:42Z

@sourcegraph/search-core PTAL

cmd/zoekt-sourcegraph-indexserver/main.go

stefanhengl

LGTM!

cmd/zoekt-sourcegraph-indexserver/main.go

stefanhengl · 2022-11-21T09:05:27Z

cmd/zoekt-sourcegraph-indexserver/main_test.go

@@ -124,6 +250,109 @@ func TestMain(m *testing.M) {
 	os.Exit(m.Run())
 }

+func createTestNormalShard(t *testing.T, indexDir string, r zoekt.Repository, numShards int, optFns ...func(options *build.Options)) []string {


yup, maybe it's time. Not blocking for this PR though.

keegancsmith

We already have very similiar logic to this in cleanup.go, which is run when we are told to stop indexing a repo. It feels like you should be able to reuse that? I took a look at cleanup and I believe it would be relatively straightforward. Additionally the function for this living in cleanup.go sounds good to me rather than growing main.go.

cmd/zoekt-sourcegraph-indexserver/main.go

…exDir (instead of using buildOptions)

ggilmore · 2022-11-29T20:12:19Z

@sourcegraph/search-core PTAL

I have:

moved the logic from main.go into cleanup.go
changed the deletion logic to use zoekt.FilePaths instead of stat-ing directly
added a new sub-routine to cleanup() that ensures that we delete any metadata files that don't have corresponding shards (this is necessary after the change introduced in "2" since there is no longer an enforced order for how we delete files)
changed the logic for deleteShards to scan the entire indexDir instead of using build.Options.FindAllShards() (following the style of other functions in cleanup)
changed the debugHandler to take out a global indexDir lock
wired up the shardLogger to the deleteShards implementation

ggilmore · 2022-11-29T20:19:20Z

cmd/zoekt-sourcegraph-indexserver/cleanup.go

+		// Is this repository inside a compound shard? If so, set a tombstone
+		// instead of deleting the shard outright.
+		if zoekt.ShardMergingEnabled() && maybeSetTombstone([]shard{s}, repoID) {
+			shardsLog(indexDir, "tomb", []shard{s})


Note: Looking at the implementation of shardslog, I'm unsure if it's threadsafe (what happens if there are two writers to the same file)?

deleteShard's requirement that we hold the global index lock protects us against this, but I wonder if we should make this more explicit (e.g. have a dedicated mutex just for shardslog).

quoting: https://github.com/natefinch/lumberjack

Lumberjack assumes that only one process is writing to the output files. Using the same lumberjack configuration from multiple processes on the same machine will result in improper behavior.

stefanhengl · 2022-11-30T13:03:33Z

cmd/zoekt-sourcegraph-indexserver/cleanup.go

+// isn't present in indexDir.
+//
+// Users must hold the global indexDir lock before calling deleteShards.
+func deleteShards(indexDir string, repoID uint32) error {


Quoting from cleanup:

simple := shards[:0] for _, s := range shards { if shardMerging && maybeSetTombstone([]shard{s}, repo) { shardsLog(indexDir, "tombname", []shard{s}) } else { simple = append(simple, s) } } if len(simple) == 0 { continue } removeAll(simple...)

Can't we just call that instead? I guess that is what @keegancsmith was refering to(?). We could factor it out into its own function and call it from your handler and from cleanup. WDYT?

ggilmore · 2022-11-30T23:03:20Z

@stefanhengl PTAL at my latest commit which tries to factor out the logic.

To be honest, I don't like the logic of this shared interface much at all (indexDir and shards are passed (and there is no contract that indexDir needs to contain those shards, indexDir is only passed for logging purposes, etc). LMK what you think.

keegancsmith · 2022-12-07T19:52:54Z

cmd/zoekt-sourcegraph-indexserver/cleanup.go

+	// remove any Zoekt metadata files in the given dir that don't have an
+	// associated shard file


is this unrelated cleanup? IE maybe should be another PR?

No, I added this functionality since we switched the implementation. Before, the implementation made sure to delete the metadata file before its associated shard file. Since we switched to using zoekt.IndexFilePaths that order isn't guaranteed anymore (leaving open the possibility of a metadata file not having an associated shard if we delete the shard and then crash). I added this additional logic to cleanup so that we would have a background process that would reap those "stranded" files.

I am happy to pull bit into a separate PR though.

keegancsmith · 2022-12-07T19:59:44Z

cmd/zoekt-sourcegraph-indexserver/cleanup.go

+	// Deleting shards in reverse sorted order (2 -> 1 -> 0) always ensures that we don't leave an inconsistent
+	// state behind even if we crash.


but if we do crash we will have partial state and be none the wiser anyways. So I think this is just extra complexity without making us any safer than before.

In fact I'd argue deleting 0 first is the best bet, since that is what we actually check in other parts. So if that is missing then we end up deleting the other ones. Then adding a check in cleanup to remove stranded indexes would make sense.

At the end of the day we are relying on the fact that os.Remove is super fast. Not ideal, and really we should introduce some other way of being atomic.

Maybe we can add an extra file that acts as a tombstone?

for a given repo if repo.tombstone.json exists, for all intents and purposes zoekt should treat the repo as deleted. webserver should unload all of its associated shards, and cleanup should have a reaper that would eventually delete all the shards + metafiles before deleting the tombstone file.

keegancsmith · 2022-12-07T20:01:49Z

cmd/zoekt-sourcegraph-indexserver/cleanup.go

+//
+// If one of the provided shards is a compound shard and the repository is contained within it,
+// the repository is tombstoned instead.
+func deleteOrTombstone(indexDir string, repoID uint32, shardMerging bool, shards ...shard) {


I see you mentioned you didn't like that we have to pass in indexDir and shardMerging to this func. Maybe instead it should be part of some helper struct that state.

minor nit: I'd make repoID come after shardMerging param. IE the more "static" something is the closer to the front of the arg list. Just a habit from functional languages and currying.

Got it...I'll play around with passing around some sort of helper to see if that's less awkward. Something like:

type ServerConfig struct { shardMerging bool indexDir path }

keegancsmith · 2022-12-07T20:02:16Z

cmd/zoekt-sourcegraph-indexserver/cleanup.go

+// If one of the provided shards is a compound shard and the repository is contained within it,
+// the repository is tombstoned instead.
+func deleteOrTombstone(indexDir string, repoID uint32, shardMerging bool, shards ...shard) {
+	var simple []shard


I see you got rid of the filtering hack, I guess just because now that it is a func it is safer that way?

By filtering hack I assume you mean the shardMap := getShards(indexDir) line? Yes, I decided to pass the list of shards associated with the ID directly since cleanup() already has this information. If we're sharing the function implementation, gathering the entire shardMap seemed redundant.

cmd/zoekt-sourcegraph-indexserver/main.go

cmd/zoekt-sourcegraph-indexserver/cleanup.go

Co-authored-by: Keegan Carruthers-Smith <[email protected]>

keegancsmith · 2023-01-09T07:46:29Z

What is the status of this PR? What can I do to help unblock/land?

indexserver: add debug endpoint for deleting repository shards

2cb0213

ggilmore requested a review from a team November 17, 2022 23:08

stefanhengl reviewed Nov 18, 2022

View reviewed changes

cmd/zoekt-sourcegraph-indexserver/main.go Outdated Show resolved Hide resolved

add tests

76a18b6

ggilmore commented Nov 19, 2022

View reviewed changes

provide example shell pipeline in documentation

828806c

ggilmore marked this pull request as ready for review November 19, 2022 01:54

ggilmore requested review from a team and stefanhengl November 19, 2022 01:54

gl-srgr reviewed Nov 21, 2022

View reviewed changes

cmd/zoekt-sourcegraph-indexserver/main.go Outdated Show resolved Hide resolved

stefanhengl approved these changes Nov 21, 2022

View reviewed changes

keegancsmith reviewed Nov 22, 2022

View reviewed changes

cmd/zoekt-sourcegraph-indexserver/main.go Outdated Show resolved Hide resolved

cmd/zoekt-sourcegraph-indexserver/main.go Outdated Show resolved Hide resolved

ggilmore added 2 commits November 28, 2022 10:52

move logic to cleanup.go

e53fada

wip - add logic to cleanup() to handle stranded metadata files

1ab8eb7

ggilmore force-pushed the zoekt-debug-drop branch from cc46ce9 to 1ab8eb7 Compare November 28, 2022 23:11

ggilmore added 2 commits November 29, 2022 11:39

switch deleteShards implementation to search across all shards in ind…

76ae04c

…exDir (instead of using buildOptions)

wire up shardslog

4d58e64

ggilmore requested review from keegancsmith, gl-srgr and stefanhengl November 29, 2022 20:12

ggilmore commented Nov 29, 2022

View reviewed changes

stefanhengl reviewed Nov 30, 2022

View reviewed changes

factor out cleanup and delete logic to shared function

4d8cf41

keegancsmith approved these changes Dec 7, 2022

View reviewed changes

Update cmd/zoekt-sourcegraph-indexserver/cleanup.go

3270e03

Co-authored-by: Keegan Carruthers-Smith <[email protected]>

ggilmore and others added 2 commits December 8, 2022 12:47

Update cmd/zoekt-sourcegraph-indexserver/cleanup.go

83f68e7

Co-authored-by: Keegan Carruthers-Smith <[email protected]>

handler: write successful message upon deleting a repository

4c2077d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

indexserver: add debug endpoint for deleting repository shards #485

indexserver: add debug endpoint for deleting repository shards #485

ggilmore commented Nov 17, 2022 •

edited

Loading

stefanhengl commented Nov 18, 2022

stefanhengl commented Nov 18, 2022

ggilmore Nov 19, 2022

stefanhengl Nov 21, 2022

ggilmore commented Nov 19, 2022

stefanhengl left a comment

stefanhengl Nov 21, 2022

keegancsmith left a comment

ggilmore commented Nov 29, 2022

ggilmore Nov 29, 2022 •

edited

Loading

stefanhengl Nov 30, 2022

stefanhengl Nov 30, 2022 •

edited

Loading

ggilmore commented Nov 30, 2022

keegancsmith Dec 7, 2022

ggilmore Dec 8, 2022

keegancsmith Dec 7, 2022

ggilmore Dec 8, 2022

keegancsmith Dec 7, 2022

ggilmore Dec 8, 2022

keegancsmith Dec 7, 2022

ggilmore Dec 8, 2022

keegancsmith commented Jan 9, 2023

		// remove any Zoekt metadata files in the given dir that don't have an
		// associated shard file

		// Deleting shards in reverse sorted order (2 -> 1 -> 0) always ensures that we don't leave an inconsistent
		// state behind even if we crash.

indexserver: add debug endpoint for deleting repository shards #485

Are you sure you want to change the base?

indexserver: add debug endpoint for deleting repository shards #485

Conversation

ggilmore commented Nov 17, 2022 • edited Loading

stefanhengl commented Nov 18, 2022

stefanhengl commented Nov 18, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ggilmore commented Nov 19, 2022

stefanhengl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keegancsmith left a comment

Choose a reason for hiding this comment

ggilmore commented Nov 29, 2022

ggilmore Nov 29, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stefanhengl Nov 30, 2022 • edited Loading

Choose a reason for hiding this comment

ggilmore commented Nov 30, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keegancsmith commented Jan 9, 2023

ggilmore commented Nov 17, 2022 •

edited

Loading

ggilmore Nov 29, 2022 •

edited

Loading

stefanhengl Nov 30, 2022 •

edited

Loading