Windows: invalid UTF-8 CSV read crashes process instead of returning duckdb::Error
When reading a CSV file containing invalid UTF-8 bytes with ignore_errors=false, the Rust duckdb crate crashes the process on Windows instead of returning a normal duckdb::Error.
The same SQL returns a detailed DuckDB error in:
- DuckDB CLI on Windows
- Python DuckDB binding
- Rust
duckdb crate on Linux
So this looks specific to the Windows Rust binding / bundled library error path.
Environment
Rust crate:
duckdb = { version = "1.10503.1", features = ["bundled", "parquet"] }
Observed on:
Windows
duckdb crate: 1.10503.1
features: bundled, parquet
DuckDB CLI comparison:
Reproduction
Create a CSV file with one invalid UTF-8 byte:
mkdir -p data
printf 'event_time_ms,device_id,bytes\n1,ok-1,100\n2,bad-\xff,200\n3,ok-2,300\n' > data/duckdb_invalid_utf8.csv
Minimal Rust example:
use duckdb::Connection;
fn main() -> anyhow::Result<()> {
let conn = Connection::open_in_memory()?;
let sql = r#"
select event_time_ms, device_id, bytes
from read_csv(
['data/duckdb_invalid_utf8.csv'],
auto_detect=false,
delim=',',
quote='"',
escape='"',
header=true,
null_padding=true,
strict_mode=true,
ignore_errors=false,
hive_partitioning=false,
columns={'event_time_ms': 'bigint', 'device_id': 'string', 'bytes': 'bigint'}
)
"#;
let mut stmt = conn.prepare(sql)?;
let rows = stmt.query_map([], |row| {
Ok((
row.get::<_, i64>(0)?,
row.get::<_, String>(1)?,
row.get::<_, i64>(2)?,
))
})?;
let rows: Vec<(i64, String, i64)> = rows.collect::<duckdb::Result<_>>()?;
println!("{rows:?}");
Ok(())
}
Run:
cargo run --example duckdb_test
Actual Behavior on Windows
The process crashes with:
exit code: 0xc0000409
STATUS_STACK_BUFFER_OVERRUN
No duckdb::Error is returned to Rust.
This is especially problematic because callers cannot catch the error and retry with ignore_errors=true.
Expected Behavior
The crate should return a normal duckdb::Error, similar to DuckDB CLI / Python binding / Linux Rust behavior.
For example, DuckDB CLI on Windows returns:
Invalid Input Error:
CSV Error on Line: 3
Original Line: 2,bad-?,200
Invalid unicode (byte sequence mismatch) detected. This file is not utf-8 encoded.
Possible Solution: Set the correct encoding, if available, to read this CSV File (e.g., encoding='UTF-16')
Possible Solution: Enable ignore errors (ignore_errors=true) to skip this row
file = data/duckdb_invalid_utf8.csv
delimiter = , (Set By User)
quote = " (Set By User)
escape = " (Set By User)
new_line = \n (Auto-Detected)
header = true (Set By User)
skip_rows = 0 (Auto-Detected)
comment = (empty) (Auto-Detected)
strict_mode = true (Set By User)
date_format = (Auto-Detected)
timestamp_format = (Auto-Detected)
null_padding = 1
sample_size = 20480
ignore_errors = false
all_varchar = 0
Linux Behavior
On Linux, the same Rust example returns a normal error:
Error: Invalid Input Error: CSV Error on Line: 3
Original Line: 2,bad-?,200
Invalid unicode (byte sequence mismatch) detected. This file is not utf-8 encoded.
Possible Solution: Set the correct encoding, if available, to read this CSV File (e.g., encoding='UTF-16')
Possible Solution: Enable ignore errors (ignore_errors=true) to skip this row
file = data/duckdb_invalid_utf8.csv
delimiter = , (Set By User)
quote = " (Set By User)
escape = " (Set By User)
new_line = \n (Auto-Detected)
header = true (Set By User)
skip_rows = 0 (Auto-Detected)
comment = (empty) (Auto-Detected)
strict_mode = true (Set By User)
date_format = (Auto-Detected)
timestamp_format = (Auto-Detected)
null_padding = 1
sample_size = 20480
ignore_errors = false
all_varchar = 0
Caused by:
Error code 1: Unknown error code
ignore_errors=true Works
Changing only this option:
correctly skips the invalid row and returns:
(1, "ok-1", 100)
(3, "ok-2", 300)
Summary
The issue seems to be that, on Windows, an invalid UTF-8 CSV error path crashes before the Rust caller receives a duckdb::Error.
Expected: return a normal error, as DuckDB CLI, Python binding, and Linux Rust behavior do.
Windows: invalid UTF-8 CSV read crashes process instead of returning
duckdb::ErrorWhen reading a CSV file containing invalid UTF-8 bytes with
ignore_errors=false, the Rustduckdbcrate crashes the process on Windows instead of returning a normalduckdb::Error.The same SQL returns a detailed DuckDB error in:
duckdbcrate on LinuxSo this looks specific to the Windows Rust binding / bundled library error path.
Environment
Rust crate:
Observed on:
DuckDB CLI comparison:
Reproduction
Create a CSV file with one invalid UTF-8 byte:
Minimal Rust example:
Run:
Actual Behavior on Windows
The process crashes with:
No
duckdb::Erroris returned to Rust.This is especially problematic because callers cannot catch the error and retry with
ignore_errors=true.Expected Behavior
The crate should return a normal
duckdb::Error, similar to DuckDB CLI / Python binding / Linux Rust behavior.For example, DuckDB CLI on Windows returns:
Linux Behavior
On Linux, the same Rust example returns a normal error:
ignore_errors=trueWorksChanging only this option:
ignore_errors=truecorrectly skips the invalid row and returns:
Summary
The issue seems to be that, on Windows, an invalid UTF-8 CSV error path crashes before the Rust caller receives a
duckdb::Error.Expected: return a normal error, as DuckDB CLI, Python binding, and Linux Rust behavior do.