Skip to content

Windows: invalid UTF-8 CSV read crashes process instead of returning duckdb::Error #774

@lukoou3

Description

@lukoou3

Windows: invalid UTF-8 CSV read crashes process instead of returning duckdb::Error

When reading a CSV file containing invalid UTF-8 bytes with ignore_errors=false, the Rust duckdb crate crashes the process on Windows instead of returning a normal duckdb::Error.

The same SQL returns a detailed DuckDB error in:

  • DuckDB CLI on Windows
  • Python DuckDB binding
  • Rust duckdb crate on Linux

So this looks specific to the Windows Rust binding / bundled library error path.

Environment

Rust crate:

duckdb = { version = "1.10503.1", features = ["bundled", "parquet"] }

Observed on:

Windows
duckdb crate: 1.10503.1
features: bundled, parquet

DuckDB CLI comparison:

DuckDB v1.5.2

Reproduction

Create a CSV file with one invalid UTF-8 byte:

mkdir -p data
printf 'event_time_ms,device_id,bytes\n1,ok-1,100\n2,bad-\xff,200\n3,ok-2,300\n' > data/duckdb_invalid_utf8.csv

Minimal Rust example:

use duckdb::Connection;

fn main() -> anyhow::Result<()> {
    let conn = Connection::open_in_memory()?;

    let sql = r#"
        select event_time_ms, device_id, bytes
        from read_csv(
            ['data/duckdb_invalid_utf8.csv'],
            auto_detect=false,
            delim=',',
            quote='"',
            escape='"',
            header=true,
            null_padding=true,
            strict_mode=true,
            ignore_errors=false,
            hive_partitioning=false,
            columns={'event_time_ms': 'bigint', 'device_id': 'string', 'bytes': 'bigint'}
        )
    "#;

    let mut stmt = conn.prepare(sql)?;
    let rows = stmt.query_map([], |row| {
        Ok((
            row.get::<_, i64>(0)?,
            row.get::<_, String>(1)?,
            row.get::<_, i64>(2)?,
        ))
    })?;

    let rows: Vec<(i64, String, i64)> = rows.collect::<duckdb::Result<_>>()?;
    println!("{rows:?}");

    Ok(())
}

Run:

cargo run --example duckdb_test

Actual Behavior on Windows

The process crashes with:

exit code: 0xc0000409
STATUS_STACK_BUFFER_OVERRUN

No duckdb::Error is returned to Rust.

This is especially problematic because callers cannot catch the error and retry with ignore_errors=true.

Expected Behavior

The crate should return a normal duckdb::Error, similar to DuckDB CLI / Python binding / Linux Rust behavior.

For example, DuckDB CLI on Windows returns:

Invalid Input Error:
CSV Error on Line: 3
Original Line: 2,bad-?,200
Invalid unicode (byte sequence mismatch) detected. This file is not utf-8 encoded.

Possible Solution: Set the correct encoding, if available, to read this CSV File (e.g., encoding='UTF-16')
Possible Solution: Enable ignore errors (ignore_errors=true) to skip this row

  file = data/duckdb_invalid_utf8.csv
  delimiter = , (Set By User)
  quote = " (Set By User)
  escape = " (Set By User)
  new_line = \n (Auto-Detected)
  header = true (Set By User)
  skip_rows = 0 (Auto-Detected)
  comment = (empty) (Auto-Detected)
  strict_mode = true (Set By User)
  date_format =  (Auto-Detected)
  timestamp_format =  (Auto-Detected)
  null_padding = 1
  sample_size = 20480
  ignore_errors = false
  all_varchar = 0

Linux Behavior

On Linux, the same Rust example returns a normal error:

Error: Invalid Input Error: CSV Error on Line: 3
Original Line: 2,bad-?,200
Invalid unicode (byte sequence mismatch) detected. This file is not utf-8 encoded.

Possible Solution: Set the correct encoding, if available, to read this CSV File (e.g., encoding='UTF-16')
Possible Solution: Enable ignore errors (ignore_errors=true) to skip this row

  file = data/duckdb_invalid_utf8.csv
  delimiter = , (Set By User)
  quote = " (Set By User)
  escape = " (Set By User)
  new_line = \n (Auto-Detected)
  header = true (Set By User)
  skip_rows = 0 (Auto-Detected)
  comment = (empty) (Auto-Detected)
  strict_mode = true (Set By User)
  date_format =  (Auto-Detected)
  timestamp_format =  (Auto-Detected)
  null_padding = 1
  sample_size = 20480
  ignore_errors = false
  all_varchar = 0

Caused by:
    Error code 1: Unknown error code

ignore_errors=true Works

Changing only this option:

ignore_errors=true

correctly skips the invalid row and returns:

(1, "ok-1", 100)
(3, "ok-2", 300)

Summary

The issue seems to be that, on Windows, an invalid UTF-8 CSV error path crashes before the Rust caller receives a duckdb::Error.

Expected: return a normal error, as DuckDB CLI, Python binding, and Linux Rust behavior do.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions