Fix DB backup crash on large databases by streaming writes#1909
Fix DB backup crash on large databases by streaming writes#1909moktamd wants to merge 1 commit intopaperclipai:masterfrom
Conversation
Replace in-memory string accumulation in runDatabaseBackup with createWriteStream (with backpressure handling), and replace readFile in runDatabaseRestore with createReadStream/readline line-by-line processing. Memory usage is now bounded regardless of database size.
Greptile SummaryThis PR replaces the in-memory SQL buffering in Confidence Score: 4/5Not safe to merge as-is — two P1 issues in the backup path could leave orphaned partial files on disk and leave the process vulnerable to crashes on disk-write failures. The core streaming approach is correct and resolves the V8 OOM crash. However, two concrete defects exist in the new write path: missing stream cleanup on error (a regression in atomicity vs. the old code) and a missing 'error' event listener that can still crash the process on disk I/O failures. Both are straightforward to fix but represent real production risks. packages/db/src/backup-lib.ts — specifically the stream lifecycle management around error paths in runDatabaseBackup. Important Files Changed
Prompt To Fix All With AIThis is a comment left during a code review.
Path: packages/db/src/backup-lib.ts
Line: 153-158
Comment:
**Partial backup file left on disk if error occurs**
The write stream is opened at the very start of the `try` block (line 158), but the `finally` block only calls `await sql.end()`. If any SQL query or write call throws an error mid-backup, the stream is never closed/destroyed and the partial `.sql` file is silently left on disk.
This is a regression from the previous implementation, where `writeFile` was called only after all SQL was fully buffered — meaning a failure during SQL queries left no file at all. Now a partial file (with a normal timestamp-based filename) persists, which could be picked up by restore tooling or confuse operators.
The stream should be cleaned up in a `finally`-style guard. For example:
```typescript
mkdirSync(opts.backupDir, { recursive: true });
const backupFile = resolve(opts.backupDir, `${filenamePrefix}-${timestamp()}.sql`);
const stream = createWriteStream(backupFile, { encoding: "utf8" });
let backupComplete = false;
try {
// ... all writes ...
await new Promise<void>((resolve, reject) =>
stream.end((err: Error | null | undefined) => (err ? reject(err) : resolve())));
backupComplete = true;
} finally {
if (!backupComplete) {
stream.destroy();
try { unlinkSync(backupFile); } catch { /* ignore */ }
}
}
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: packages/db/src/backup-lib.ts
Line: 158-163
Comment:
**No `error` event listener on the write stream**
`fs.WriteStream` emits an `'error'` event when a write fails (e.g., disk full, permission denied). Without a registered listener, this fires as an unhandled EventEmitter error in Node.js, which crashes the entire process — ironically the same outcome the PR is trying to prevent.
The `stream.end()` callback at line 513 only catches errors from the final flush/close, not from the intermediate `stream.write()` calls inside `writeLine`. A mid-stream disk error would still kill the process.
Attach an error listener before writing:
```typescript
const stream = createWriteStream(backupFile, { encoding: "utf8" });
// Capture the first write error so it can be re-thrown via writeLine
let streamError: Error | null = null;
stream.once("error", (err) => { streamError = err; });
const writeLine = async (line: string): Promise<void> => {
if (streamError) throw streamError;
const ok = stream.write(line + "\n");
if (!ok) await new Promise<void>((resolve) => stream.once("drain", resolve));
if (streamError) throw streamError;
};
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "Fix DB backup crash on large databases b..." | Re-trigger Greptile |
| try { | ||
| await sql`SELECT 1`; | ||
|
|
||
| const lines: string[] = []; | ||
| const emit = (line: string) => lines.push(line); | ||
| const emitStatement = (statement: string) => { | ||
| emit(statement); | ||
| emit(STATEMENT_BREAKPOINT); | ||
| mkdirSync(opts.backupDir, { recursive: true }); | ||
| const backupFile = resolve(opts.backupDir, `${filenamePrefix}-${timestamp()}.sql`); | ||
| const stream = createWriteStream(backupFile, { encoding: "utf8" }); |
There was a problem hiding this comment.
Partial backup file left on disk if error occurs
The write stream is opened at the very start of the try block (line 158), but the finally block only calls await sql.end(). If any SQL query or write call throws an error mid-backup, the stream is never closed/destroyed and the partial .sql file is silently left on disk.
This is a regression from the previous implementation, where writeFile was called only after all SQL was fully buffered — meaning a failure during SQL queries left no file at all. Now a partial file (with a normal timestamp-based filename) persists, which could be picked up by restore tooling or confuse operators.
The stream should be cleaned up in a finally-style guard. For example:
mkdirSync(opts.backupDir, { recursive: true });
const backupFile = resolve(opts.backupDir, `${filenamePrefix}-${timestamp()}.sql`);
const stream = createWriteStream(backupFile, { encoding: "utf8" });
let backupComplete = false;
try {
// ... all writes ...
await new Promise<void>((resolve, reject) =>
stream.end((err: Error | null | undefined) => (err ? reject(err) : resolve())));
backupComplete = true;
} finally {
if (!backupComplete) {
stream.destroy();
try { unlinkSync(backupFile); } catch { /* ignore */ }
}
}Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/db/src/backup-lib.ts
Line: 153-158
Comment:
**Partial backup file left on disk if error occurs**
The write stream is opened at the very start of the `try` block (line 158), but the `finally` block only calls `await sql.end()`. If any SQL query or write call throws an error mid-backup, the stream is never closed/destroyed and the partial `.sql` file is silently left on disk.
This is a regression from the previous implementation, where `writeFile` was called only after all SQL was fully buffered — meaning a failure during SQL queries left no file at all. Now a partial file (with a normal timestamp-based filename) persists, which could be picked up by restore tooling or confuse operators.
The stream should be cleaned up in a `finally`-style guard. For example:
```typescript
mkdirSync(opts.backupDir, { recursive: true });
const backupFile = resolve(opts.backupDir, `${filenamePrefix}-${timestamp()}.sql`);
const stream = createWriteStream(backupFile, { encoding: "utf8" });
let backupComplete = false;
try {
// ... all writes ...
await new Promise<void>((resolve, reject) =>
stream.end((err: Error | null | undefined) => (err ? reject(err) : resolve())));
backupComplete = true;
} finally {
if (!backupComplete) {
stream.destroy();
try { unlinkSync(backupFile); } catch { /* ignore */ }
}
}
```
How can I resolve this? If you propose a fix, please make it concise.| const stream = createWriteStream(backupFile, { encoding: "utf8" }); | ||
|
|
||
| const writeLine = async (line: string): Promise<void> => { | ||
| const ok = stream.write(line + "\n"); | ||
| if (!ok) await new Promise<void>((resolve) => stream.once("drain", resolve)); | ||
| }; |
There was a problem hiding this comment.
No
error event listener on the write stream
fs.WriteStream emits an 'error' event when a write fails (e.g., disk full, permission denied). Without a registered listener, this fires as an unhandled EventEmitter error in Node.js, which crashes the entire process — ironically the same outcome the PR is trying to prevent.
The stream.end() callback at line 513 only catches errors from the final flush/close, not from the intermediate stream.write() calls inside writeLine. A mid-stream disk error would still kill the process.
Attach an error listener before writing:
const stream = createWriteStream(backupFile, { encoding: "utf8" });
// Capture the first write error so it can be re-thrown via writeLine
let streamError: Error | null = null;
stream.once("error", (err) => { streamError = err; });
const writeLine = async (line: string): Promise<void> => {
if (streamError) throw streamError;
const ok = stream.write(line + "\n");
if (!ok) await new Promise<void>((resolve) => stream.once("drain", resolve));
if (streamError) throw streamError;
};Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/db/src/backup-lib.ts
Line: 158-163
Comment:
**No `error` event listener on the write stream**
`fs.WriteStream` emits an `'error'` event when a write fails (e.g., disk full, permission denied). Without a registered listener, this fires as an unhandled EventEmitter error in Node.js, which crashes the entire process — ironically the same outcome the PR is trying to prevent.
The `stream.end()` callback at line 513 only catches errors from the final flush/close, not from the intermediate `stream.write()` calls inside `writeLine`. A mid-stream disk error would still kill the process.
Attach an error listener before writing:
```typescript
const stream = createWriteStream(backupFile, { encoding: "utf8" });
// Capture the first write error so it can be re-thrown via writeLine
let streamError: Error | null = null;
stream.once("error", (err) => { streamError = err; });
const writeLine = async (line: string): Promise<void> => {
if (streamError) throw streamError;
const ok = stream.write(line + "\n");
if (!ok) await new Promise<void>((resolve) => stream.once("drain", resolve));
if (streamError) throw streamError;
};
```
How can I resolve this? If you propose a fix, please make it concise.
Thinking path: Server crashes during scheduled backups → RangeError: Invalid string length in backup-lib.ts → the entire SQL dump is buffered in a lines[] array and joined into one string → for databases >~250MB the joined string exceeds V8 limits → fix by streaming writes directly to disk instead of buffering.
Fixes #1843
Backup accumulated all SQL in memory before writing to disk. For databases exceeding ~250MB,
lines.join("\n")hit V8's string length limit, crashing the server and killing all agent sessions.runDatabaseBackup: writes SQL incrementally viacreateWriteStreamwith backpressure handling instead of buffering in a string arrayrunDatabaseRestore: reads the dump file line-by-line viacreateReadStream/readlineinstead ofreadFileinto one stringMemory usage is now bounded regardless of database size.