Skip to content

Fix DB backup crash on large databases by streaming writes#1909

Open
moktamd wants to merge 1 commit intopaperclipai:masterfrom
moktamd:fix/streaming-db-backup
Open

Fix DB backup crash on large databases by streaming writes#1909
moktamd wants to merge 1 commit intopaperclipai:masterfrom
moktamd:fix/streaming-db-backup

Conversation

@moktamd
Copy link
Copy Markdown

@moktamd moktamd commented Mar 27, 2026

Thinking path: Server crashes during scheduled backups → RangeError: Invalid string length in backup-lib.ts → the entire SQL dump is buffered in a lines[] array and joined into one string → for databases >~250MB the joined string exceeds V8 limits → fix by streaming writes directly to disk instead of buffering.

Fixes #1843

Backup accumulated all SQL in memory before writing to disk. For databases exceeding ~250MB, lines.join("\n") hit V8's string length limit, crashing the server and killing all agent sessions.

  • runDatabaseBackup: writes SQL incrementally via createWriteStream with backpressure handling instead of buffering in a string array
  • runDatabaseRestore: reads the dump file line-by-line via createReadStream/readline instead of readFile into one string

Memory usage is now bounded regardless of database size.

Replace in-memory string accumulation in runDatabaseBackup with
createWriteStream (with backpressure handling), and replace readFile
in runDatabaseRestore with createReadStream/readline line-by-line
processing. Memory usage is now bounded regardless of database size.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 27, 2026

Greptile Summary

This PR replaces the in-memory SQL buffering in runDatabaseBackup with an incremental createWriteStream approach (with backpressure handling), and rewrites runDatabaseRestore to use createReadStream/readline instead of a single readFile. The core approach is sound and directly addresses the V8 string-length RangeError on large databases.\n\nTwo reliability issues exist in the new backup path that should be addressed before merging:\n\n- Partial backup file on error: The write stream is opened at the start of the try block but the finally clause only closes the SQL connection. Any mid-backup failure (SQL error, write error) leaves a partial .sql file on disk with a normal timestamp-based name. The previous implementation was atomic — no file was written unless all SQL was fully buffered first. A partial file could be mistaken for a valid backup.\n- Unhandled stream error event: fs.WriteStream emits an 'error' event for OS-level write failures (disk full, permissions, etc.). Without a registered 'error' listener, this triggers an unhandled EventEmitter error in Node.js, crashing the process — exactly the failure mode the PR intends to fix.\n\nThe PR description includes a thinking path and describes what was done and why, but is missing the "how we can verify it works" and "any risks" sections per the contribution guidelines. Please add verification steps (e.g. how to simulate a large-database backup and confirm memory is bounded) and a risks section to the PR description.

Confidence Score: 4/5

Not safe to merge as-is — two P1 issues in the backup path could leave orphaned partial files on disk and leave the process vulnerable to crashes on disk-write failures.

The core streaming approach is correct and resolves the V8 OOM crash. However, two concrete defects exist in the new write path: missing stream cleanup on error (a regression in atomicity vs. the old code) and a missing 'error' event listener that can still crash the process on disk I/O failures. Both are straightforward to fix but represent real production risks.

packages/db/src/backup-lib.ts — specifically the stream lifecycle management around error paths in runDatabaseBackup.

Important Files Changed

Filename Overview
packages/db/src/backup-lib.ts Streaming rewrite correctly avoids the V8 string-length crash; however, the new implementation has two P1 issues: the write stream is not cleaned up on error (leaving a partial file on disk) and missing an 'error' event listener (unhandled stream errors can still crash the process).
Prompt To Fix All With AI
This is a comment left during a code review.
Path: packages/db/src/backup-lib.ts
Line: 153-158

Comment:
**Partial backup file left on disk if error occurs**

The write stream is opened at the very start of the `try` block (line 158), but the `finally` block only calls `await sql.end()`. If any SQL query or write call throws an error mid-backup, the stream is never closed/destroyed and the partial `.sql` file is silently left on disk.

This is a regression from the previous implementation, where `writeFile` was called only after all SQL was fully buffered — meaning a failure during SQL queries left no file at all. Now a partial file (with a normal timestamp-based filename) persists, which could be picked up by restore tooling or confuse operators.

The stream should be cleaned up in a `finally`-style guard. For example:

```typescript
mkdirSync(opts.backupDir, { recursive: true });
const backupFile = resolve(opts.backupDir, `${filenamePrefix}-${timestamp()}.sql`);
const stream = createWriteStream(backupFile, { encoding: "utf8" });

let backupComplete = false;
try {
  // ... all writes ...
  await new Promise<void>((resolve, reject) =>
    stream.end((err: Error | null | undefined) => (err ? reject(err) : resolve())));
  backupComplete = true;
} finally {
  if (!backupComplete) {
    stream.destroy();
    try { unlinkSync(backupFile); } catch { /* ignore */ }
  }
}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: packages/db/src/backup-lib.ts
Line: 158-163

Comment:
**No `error` event listener on the write stream**

`fs.WriteStream` emits an `'error'` event when a write fails (e.g., disk full, permission denied). Without a registered listener, this fires as an unhandled EventEmitter error in Node.js, which crashes the entire process — ironically the same outcome the PR is trying to prevent.

The `stream.end()` callback at line 513 only catches errors from the final flush/close, not from the intermediate `stream.write()` calls inside `writeLine`. A mid-stream disk error would still kill the process.

Attach an error listener before writing:

```typescript
const stream = createWriteStream(backupFile, { encoding: "utf8" });

// Capture the first write error so it can be re-thrown via writeLine
let streamError: Error | null = null;
stream.once("error", (err) => { streamError = err; });

const writeLine = async (line: string): Promise<void> => {
  if (streamError) throw streamError;
  const ok = stream.write(line + "\n");
  if (!ok) await new Promise<void>((resolve) => stream.once("drain", resolve));
  if (streamError) throw streamError;
};
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "Fix DB backup crash on large databases b..." | Re-trigger Greptile

Comment on lines 153 to +158
try {
await sql`SELECT 1`;

const lines: string[] = [];
const emit = (line: string) => lines.push(line);
const emitStatement = (statement: string) => {
emit(statement);
emit(STATEMENT_BREAKPOINT);
mkdirSync(opts.backupDir, { recursive: true });
const backupFile = resolve(opts.backupDir, `${filenamePrefix}-${timestamp()}.sql`);
const stream = createWriteStream(backupFile, { encoding: "utf8" });
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Partial backup file left on disk if error occurs

The write stream is opened at the very start of the try block (line 158), but the finally block only calls await sql.end(). If any SQL query or write call throws an error mid-backup, the stream is never closed/destroyed and the partial .sql file is silently left on disk.

This is a regression from the previous implementation, where writeFile was called only after all SQL was fully buffered — meaning a failure during SQL queries left no file at all. Now a partial file (with a normal timestamp-based filename) persists, which could be picked up by restore tooling or confuse operators.

The stream should be cleaned up in a finally-style guard. For example:

mkdirSync(opts.backupDir, { recursive: true });
const backupFile = resolve(opts.backupDir, `${filenamePrefix}-${timestamp()}.sql`);
const stream = createWriteStream(backupFile, { encoding: "utf8" });

let backupComplete = false;
try {
  // ... all writes ...
  await new Promise<void>((resolve, reject) =>
    stream.end((err: Error | null | undefined) => (err ? reject(err) : resolve())));
  backupComplete = true;
} finally {
  if (!backupComplete) {
    stream.destroy();
    try { unlinkSync(backupFile); } catch { /* ignore */ }
  }
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/db/src/backup-lib.ts
Line: 153-158

Comment:
**Partial backup file left on disk if error occurs**

The write stream is opened at the very start of the `try` block (line 158), but the `finally` block only calls `await sql.end()`. If any SQL query or write call throws an error mid-backup, the stream is never closed/destroyed and the partial `.sql` file is silently left on disk.

This is a regression from the previous implementation, where `writeFile` was called only after all SQL was fully buffered — meaning a failure during SQL queries left no file at all. Now a partial file (with a normal timestamp-based filename) persists, which could be picked up by restore tooling or confuse operators.

The stream should be cleaned up in a `finally`-style guard. For example:

```typescript
mkdirSync(opts.backupDir, { recursive: true });
const backupFile = resolve(opts.backupDir, `${filenamePrefix}-${timestamp()}.sql`);
const stream = createWriteStream(backupFile, { encoding: "utf8" });

let backupComplete = false;
try {
  // ... all writes ...
  await new Promise<void>((resolve, reject) =>
    stream.end((err: Error | null | undefined) => (err ? reject(err) : resolve())));
  backupComplete = true;
} finally {
  if (!backupComplete) {
    stream.destroy();
    try { unlinkSync(backupFile); } catch { /* ignore */ }
  }
}
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +158 to 163
const stream = createWriteStream(backupFile, { encoding: "utf8" });

const writeLine = async (line: string): Promise<void> => {
const ok = stream.write(line + "\n");
if (!ok) await new Promise<void>((resolve) => stream.once("drain", resolve));
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 No error event listener on the write stream

fs.WriteStream emits an 'error' event when a write fails (e.g., disk full, permission denied). Without a registered listener, this fires as an unhandled EventEmitter error in Node.js, which crashes the entire process — ironically the same outcome the PR is trying to prevent.

The stream.end() callback at line 513 only catches errors from the final flush/close, not from the intermediate stream.write() calls inside writeLine. A mid-stream disk error would still kill the process.

Attach an error listener before writing:

const stream = createWriteStream(backupFile, { encoding: "utf8" });

// Capture the first write error so it can be re-thrown via writeLine
let streamError: Error | null = null;
stream.once("error", (err) => { streamError = err; });

const writeLine = async (line: string): Promise<void> => {
  if (streamError) throw streamError;
  const ok = stream.write(line + "\n");
  if (!ok) await new Promise<void>((resolve) => stream.once("drain", resolve));
  if (streamError) throw streamError;
};
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/db/src/backup-lib.ts
Line: 158-163

Comment:
**No `error` event listener on the write stream**

`fs.WriteStream` emits an `'error'` event when a write fails (e.g., disk full, permission denied). Without a registered listener, this fires as an unhandled EventEmitter error in Node.js, which crashes the entire process — ironically the same outcome the PR is trying to prevent.

The `stream.end()` callback at line 513 only catches errors from the final flush/close, not from the intermediate `stream.write()` calls inside `writeLine`. A mid-stream disk error would still kill the process.

Attach an error listener before writing:

```typescript
const stream = createWriteStream(backupFile, { encoding: "utf8" });

// Capture the first write error so it can be re-thrown via writeLine
let streamError: Error | null = null;
stream.once("error", (err) => { streamError = err; });

const writeLine = async (line: string): Promise<void> => {
  if (streamError) throw streamError;
  const ok = stream.write(line + "\n");
  if (!ok) await new Promise<void>((resolve) => stream.once("drain", resolve));
  if (streamError) throw streamError;
};
```

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DB backup crashes server with RangeError when database exceeds ~250MB

1 participant