Skip to content

Conversation

@Mvrxel
Copy link

@Mvrxel Mvrxel commented Nov 7, 2025

Hi there!

I opened the issue about the “Too many bytes read” error in the deleteAsync function and decided to give it a try.

Does this seem like a reasonable direction to fix it? :)

Summary:

Implement truly paginated chunk deletion to avoid “Too many bytes read” errors when deleting large entries. Remove unnecessary doc reads, delete chunks by ID, and return an exclusive nextStartOrder = order + 1. Re‑export internal.chunks.deleteChunksPage for entries.deleteSync.

Problem:

Unbounded queries and extra reads of embedding/content docs caused > 16 MB reads, breaking deletion tests and large‑entry cleanup.

Changes:

  • Add paginated deletion loop with byte‑budget thresholds.

  • Delete chunks directly by ID (no content/embedding fetch).

  • Use nextStartOrder = lastOrder + 1 for exclusive pagination.

  • Restore export const deleteChunksPage = internalMutation(...).

  • Keep entries.deleteSync/Async logic; now loops safely until empty.

Testing:

Unit and large‑entry tests pass (deleteSync, deleteByKeySync). Multiple paginated passes complete within read limits.


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@pkg-pr-new
Copy link

pkg-pr-new bot commented Nov 13, 2025

Open in StackBlitz

npm i https://pkg.pr.new/get-convex/rag/@convex-dev/rag@52

commit: 4f02d52

dataUsedSoFar += estimateEmbeddingSize(embedding);
await ctx.db.delete(chunk.state.embeddingId);
}
await ctx.db.delete(chunk.state.embeddingId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern here is whether there can be multiple references to the same embeddingId, and so trying to delete it if it doesn't exist will fail. And I believe deletes today require reading the document anyways, so may as well get it and figure out if it's putting us over the limit?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that can happen, but as I understand it, it shouldn’t occur for “ready” chunks since each one has its own embedding. Only “replaced” ones can share the same embeddingId, but we don’t delete those again. So wouldn’t it make more sense to skip the pre‑get and just wrap the delete in a try‑catch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good call - but regardless we need to know how much data is being loaded, right? if we just delete we don't know how much "read bandwidth" was used. My assumption here being that delete adds to the read bandwidth, but it won't be added twice if we get & delete.

await ctx.db.delete(chunk.contentId);
await ctx.db.delete(chunk._id);
lastOrder = chunk.order;
if (dataUsedSoFar > BANDWIDTH_PER_TRANSACTION_HARD_LIMIT) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an idea for why this isn't already preventing an issue?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the previous version was doing the get for embedding and content before checking the limit, so it was already hitting the limit during those reads, before even reaching the check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even still, the "hard limit" is set to 8MB which is half of the actual transaction limit. So either it was being called from another transaction that already had 8MB read, or it's reading twice as much as it thinks here.. which maybe could be the case if get & delete are accounted for separately. Were you seeing this when it was being called directly from something, e.g. when your code was calling entries deleteAsync ? or was it failing on some subsequent page? Maybe an issue here is that it does the first page eagerly here: https://github.com/get-convex/rag/blob/main/src/component/entries.ts#L552
Instead of always doing it async

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried rolling chunks.ts back to the previous version and moving stuff to workers, but the error’s still there. Seems like the real issue’s in deleteChunksPageHandler — it still does those big gets before deleting, and since delete also counts as a read, the I/O kinda doubles. Even with workers, one page can still hit the 16 MB limit on its own.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah so the big issue here is the accounting doubling. I don't want it to rely on stopping at 15.9MB read instead of 16.1MB. It should stop at 8MB. A simple test of this would be to make a simple mutation that reads a big document then deletes it and see if the dashboard lists the data read as doubled. If we just remove the get then I don't know how we count the size of what we deleted. Another approach here is to just use a heuristic and/or set limits so documents can't get bigger than 100k, say. Then just fetch 80 of them per page

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants