Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading remote (az storage) delta tables causes segfault #154

Open
LucaDe opened this issue Feb 15, 2025 · 17 comments
Open

Reading remote (az storage) delta tables causes segfault #154

LucaDe opened this issue Feb 15, 2025 · 17 comments
Milestone

Comments

@LucaDe
Copy link

LucaDe commented Feb 15, 2025

Hi all, I've been observing a weird crash (uncaught target signal 11 (Segmentation fault) - core dumped) of the entire process while reading data from a delta table from an azure storage container. The specific error does not occur when reading a local delta table OR running on an ARM64 setup (e.g. local macbook).
Since there is no specific error message it is hard to understand the actual issue.
Any tips are highly appreciated 🙌

Other things where I could confirm that things work as expected on the amd64 setup:

  • using the deprecated node duckdb package
  • reading a normal csv file (no delta_scan) from azure

Setup for reproducing the issue

Simple package.json including the duckdb package

{
  "name": "node-duck",
  "type": "module",
  "version": "1.0.0",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "keywords": [],
  "author": "",
  "license": "ISC",
  "description": "",
  "dependencies": {
    "@duckdb/node-api": "^1.2.0-alpha.14"
  }
}

Simple JS file demonstrating the behavior

import { DuckDBInstance } from '@duckdb/node-api'

const main = async () => {
    const instance = await DuckDBInstance.create(':memory:');
    const connection = await instance.connect();

    // This step works in all scenarios
    const result = await connection.runAndReadAll(`SELECT * FROM delta_scan('./data')`);
    console.log(`Local data results`, {
        rows: result.getRows(),
    });

    await connection.run(`CREATE SECRET secret1 (TYPE AZURE,CONNECTION_STRING '${process.env.AZURE_CONNECTION_STRING}');`);
    await connection.run(`SET azure_transport_option_type='curl'`);
    
    // This step fails when running on an AMD64 arch
    const resultRemote = await connection.runAndReadAll(`SELECT * FROM delta_scan('az://data')`);
    console.log(`Remote data results`, {
        rows: resultRemote.getRows(),
    });
}

await main();

Sample Dockerfile to reproduce

FROM node:22-bookworm-slim

RUN apt-get update && apt-get -y install ca-certificates

COPY index.js .
COPY package.json .
COPY data data/

RUN npm install

ENTRYPOINT ["node", "index.js"]

To run the example I copied this delta-table to my local env as well as to an azure storage container.

Building & running the image (AZURE_CONNECTION_STRING env var needed)

# Breaks
docker build -t duck:test --platform linux/amd64 .
docker run --platform linux/amd64 -e AZURE_CONNECTION_STRING=$AZURE_CONNECTION_STRING -it duck:test
# Works
docker build -t duck:test --platform linux/arm64 .
docker run --platform linux/arm64 -e AZURE_CONNECTION_STRING=$AZURE_CONNECTION_STRING -it duck:test
@jraymakers
Copy link
Contributor

Thanks for the detailed report.

Unfortunately, since I don't have access to an Azure environment, this will be difficult for me to reproduce.

You mention that you also tested this using the "classic" Node client. What version of that client did you use? It was only very recently upgraded to DuckDB 1.2.0.

It would be useful information if you could report whether this happens in your environment with other ways of using DuckDB, such as the CLI or another up-to-date client, such as Python.

@LucaDe
Copy link
Author

LucaDe commented Feb 15, 2025

Thanks for the swift response @jraymakers

Yes, can confirm that the most recent 1.2.0 version of the classic "duckdb" package does not have this issue!
Executing the 1.20 CLI within the image mentioned above also does not have the issue (I have to set the azure_transport_option_type to curl to fix SSL CA cert connection issues, but besides that it runs smooth).

@jraymakers
Copy link
Contributor

jraymakers commented Feb 15, 2025

I see, thanks for the additional info.

I'm puzzled by what could be causing this, given the very specific platform & environment, and the difficulty in reproducing it on my own will make it hard for me to learn more by myself. I may need to ask for assistance.

In the meantime, if you are able to further reduce the repro case, or reproduce in any other environment, that would likely help. For example, it could be helpful to know which part of runAndReadAll and getRows is triggering the problem, which could be accomplished by just running run, followed by (say) getChunk.

@carlopi
Copy link
Collaborator

carlopi commented Feb 15, 2025

I also have a weird(-ish) question that might help track this down, what are the platforms reported by DuckDB for the CLI and the Node package?
To get them just reporting the result of PRAGMA platform SQL query should be enough.

And if they were to be different, could you possibly check also the Python package? (both checking the platform and whether that works on DELTA or not?)

Thanks!

@LucaDe
Copy link
Author

LucaDe commented Feb 15, 2025

Before going deeper on both of your comments, I installed the segfault-handler package & registered it as a listener. Maybe the output below helps. Another interesting observation is that the result varies from run to run, it's either causing the segfault as shown below or just showing "killed" indicating a OOM situation

PID 5455 received SIGSEGV for address: 0x800c8af780
/node_modules/segfault-handler/build/Release/segfault-handler.node(+0x3248)[0x4031a4f248]
/lib/x86_64-linux-gnu/libc.so.6(+0x3c050)[0x4000b9c050]
/lib/x86_64-linux-gnu/libc.so.6(+0x16e78c)[0x4000cce78c]
/lib/x86_64-linux-gnu/libstdc++.so.6(_ZNSs9_M_mutateEmmm+0x115)[0x400092dfd5]
/lib/x86_64-linux-gnu/libstdc++.so.6(_ZNSs12_M_leak_hardEv+0x5c)[0x400092e14c]
/root/.duckdb/extensions/v1.2.0/linux_amd64_gcc4/delta.duckdb_extension(_ZNSt8__detail9_CompilerISt12regex_traitsIcEE22_M_insert_char_matcherILb0ELb0EEEvv+0x31)[0x40ec8b4d21]
/root/.duckdb/extensions/v1.2.0/linux_amd64_gcc4/delta.duckdb_extension(_ZNSt8__detail9_CompilerISt12regex_traitsIcEE7_M_atomEv+0x90)[0x40ec8bc700]
/root/.duckdb/extensions/v1.2.0/linux_amd64_gcc4/delta.duckdb_extension(_ZNSt8__detail9_CompilerISt12regex_traitsIcEE14_M_alternativeEv+0xd8)[0x40ec8bcee8]
/root/.duckdb/extensions/v1.2.0/linux_amd64_gcc4/delta.duckdb_extension(_ZNSt8__detail9_CompilerISt12regex_traitsIcEE14_M_disjunctionEv+0x19)[0x40ec8bd119]
/root/.duckdb/extensions/v1.2.0/linux_amd64_gcc4/delta.duckdb_extension(_ZNSt8__detail9_CompilerISt12regex_traitsIcEEC1EPKcS5_RKSt6localeNSt15regex_constants18syntax_option_typeE+0x320)[0x40ec8bd820]
/root/.duckdb/extensions/v1.2.0/linux_amd64_gcc4/delta.duckdb_extension(+0x615d53)[0x40ec8bdd53]
/root/.duckdb/extensions/v1.2.0/linux_amd64_gcc4/delta.duckdb_extension(+0x618dc7)[0x40ec8c0dc7]
/root/.duckdb/extensions/v1.2.0/linux_amd64_gcc4/delta.duckdb_extension(+0x6195ad)[0x40ec8c15ad]
/root/.duckdb/extensions/v1.2.0/linux_amd64_gcc4/delta.duckdb_extension(+0x619ef8)[0x40ec8c1ef8]
/root/.duckdb/extensions/v1.2.0/linux_amd64_gcc4/delta.duckdb_extension(+0x619fd0)[0x40ec8c1fd0]
/node_modules/@duckdb/node-bindings-linux-x64/libduckdb.so(_ZN6duckdb19ParquetScanFunction23ParquetScanBindInternalERNS_13ClientContextENS_10unique_ptrINS_15MultiFileReaderESt14default_deleteIS4_ELb1EEENS_10shared_ptrINS_13MultiFileListELb1EEERNS_6vectorINS_11LogicalTypeELb1EEERNSB_ISsLb1EEENS_14ParquetOptionsE+0x29e)[0x402f48119e]
/node_modules/@duckdb/node-bindings-linux-x64/libduckdb.so(_ZN6duckdb19ParquetScanFunction15ParquetScanBindERNS_13ClientContextERNS_22TableFunctionBindInputERNS_6vectorINS_11LogicalTypeELb1EEERNS5_ISsLb1EEE+0x244)[0x402f484cc4]
/node_modules/@duckdb/node-bindings-linux-x64/libduckdb.so(_ZN6duckdb6Binder25BindTableFunctionInternalERNS_13TableFunctionERKNS_16TableFunctionRefENS_6vectorINS_5ValueELb1EEESt13unordered_mapISsS7_NS_33CaseInsensitiveStringHashFunctionENS_29CaseInsensitiveStringEqualityESaISt4pairIKSsS7_EEENS6_INS_11LogicalTypeELb1EEENS6_ISsLb1EEE+0x2e5)[0x402e26f7f5]
/node_modules/@duckdb/node-bindings-linux-x64/libduckdb.so(_ZN6duckdb6Binder4BindERNS_16TableFunctionRefE+0x89c)[0x402e270a5c]
/node_modules/@duckdb/node-bindings-linux-x64/libduckdb.so(_ZN6duckdb6Binder4BindERNS_8TableRefE+0x1c5)[0x402e2bf5b5]
/node_modules/@duckdb/node-bindings-linux-x64/libduckdb.so(_ZN6duckdb6Binder8BindNodeERNS_10SelectNodeE+0x3c)[0x402e22267c]
/node_modules/@duckdb/node-bindings-linux-x64/libduckdb.so(_ZN6duckdb6Binder8BindNodeERNS_9QueryNodeE+0xab)[0x402e2bfebb]
/node_modules/@duckdb/node-bindings-linux-x64/libduckdb.so(_ZN6duckdb6Binder4BindERNS_9QueryNodeE+0x5b)[0x402e2c0afb]
/node_modules/@duckdb/node-bindings-linux-x64/libduckdb.so(_ZN6duckdb6Binder4BindERNS_15SelectStatementE+0x39)[0x402e22c389]
/node_modules/@duckdb/node-bindings-linux-x64/libduckdb.so(_ZN6duckdb6Binder4BindERNS_12SQLStatementE+0x225)[0x402e2c0a95]
/node_modules/@duckdb/node-bindings-linux-x64/libduckdb.so(_ZN6duckdb7Planner10CreatePlanERNS_12SQLStatementE+0x92)[0x402e2cb362]
/node_modules/@duckdb/node-bindings-linux-x64/libduckdb.so(_ZN6duckdb13ClientContext31CreatePreparedStatementInternalERNS_17ClientContextLockERKSsNS_10unique_ptrINS_12SQLStatementESt14default_deleteIS6_ELb1EEENS_12optional_ptrISt13unordered_mapISsNS_18BoundParameterDataENS_33CaseInsensitiveStringHashFunctionENS_29CaseInsensitiveStringEqualityESaISt4pairIS3_SC_EEELb1EEE+0x2e9)[0x402ee37409]
/node_modules/@duckdb/node-bindings-linux-x64/libduckdb.so(_ZN6duckdb13ClientContext23CreatePreparedStatementERNS_17ClientContextLockERKSsNS_10unique_ptrINS_12SQLStatementESt14default_deleteIS6_ELb1EEENS_12optional_ptrISt13unordered_mapISsNS_18BoundParameterDataENS_33CaseInsensitiveStringHashFunctionENS_29CaseInsensitiveStringEqualityESaISt4pairIS3_SC_EEELb1EEENS_21PreparedStatementModeE+0x2dd)[0x402ee37edd]
/node_modules/@duckdb/node-bindings-linux-x64/libduckdb.so(+0x17951ae)[0x402ee381ae]
/node_modules/@duckdb/node-bindings-linux-x64/libduckdb.so(_ZN6duckdb13ClientContext32RunFunctionInTransactionInternalERNS_17ClientContextLockERKSt8functionIFvvEEb+0x69)[0x402ee2b9f9]
/node_modules/@duckdb/node-bindings-linux-x64/libduckdb.so(_ZN6duckdb13ClientContext15PrepareInternalERNS_17ClientContextLockENS_10unique_ptrINS_12SQLStatementESt14default_deleteIS4_ELb1EEE+0x144)[0x402ee2c644]
/node_modules/@duckdb/node-bindings-linux-x64/libduckdb.so(_ZN6duckdb13ClientContext7PrepareERKSs+0xc3)[0x402ee4a813]
qemu: uncaught target signal 11 (Segmentation fault) - core dumped

@LucaDe
Copy link
Author

LucaDe commented Feb 15, 2025

In the meantime, if you are able to further reduce the repro case, or reproduce in any other environment, that would likely help. For example, it could be helpful to know which part of runAndReadAll and getRows is triggering the problem, which could be accomplished by just running run, followed by (say) getChunk.

Just awaiting a ".run" call throws the issue already

@jraymakers
Copy link
Contributor

Given the stack trace above, I'd be curious if just awaiting prepare triggers the problem as well.

@LucaDe
Copy link
Author

LucaDe commented Feb 15, 2025

I also have a weird(-ish) question that might help track this down, what are the platforms reported by DuckDB for the CLI and the Node package? To get them just reporting the result of PRAGMA platform SQL query should be enough.

Node Package: linux_amd64_gcc4
CLI: linux_amd64_gcc4

So both, are equal! Can check the python lib as well though :)

Image

@LucaDe
Copy link
Author

LucaDe commented Feb 15, 2025

Given the stack trace above, I'd be curious if just awaiting prepare triggers the problem as well.

Yes, can confirm that await connection.prepare(`SELECT * FROM delta_scan('az://data')`); triggers the same segfault as above

@carlopi
Copy link
Collaborator

carlopi commented Feb 15, 2025

@LucaDe, thanks for checking!

@jraymakers
Copy link
Contributor

Unfortunately even that simple example seems to require an Azure account (which I don't have).

@LucaDe
Copy link
Author

LucaDe commented Feb 15, 2025

@LucaDe, thanks for checking!

Sure, checked python as well. Same platform (linux_amd64_gcc4) & reading the delta table from Azure works as expected.

@LucaDe
Copy link
Author

LucaDe commented Feb 15, 2025

Unfortunately even that simple example seems to require an Azure account (which I don't have).

Yeah, super annoying. Can check if same behavior happens with azurite (never used it before)

@jraymakers
Copy link
Contributor

Given the stack trace above, the most likely causes are:

  • A bug in either the Azure or Delta extension.
  • A bug in some other native component that's corrupting memory. Relevant (DuckDB-related) native components include Node Neo and DuckDB itself. I suppose there's also Node, but that seems an unlikely culprit.
  • Some interplay between two or more of the above.

I'm not sure what to make of the apparent platform-specificity. There might be some memory corruption that's just more likely to result in an error on some platforms (e.g. Linux AMD64), or it might be somehow specific to the platform, perhaps because of some conditional compilation or compiler-specific behavior.

The fact that this repros by just calling prepare (after a few successful calls to run) rules out a lot of the code in Node Neo. There's not a lot of code involved in the path to prepare, especially without binding any parameters. If it is a bug in Node Neo, we should be able to construct a repro that doesn't depend on extensions. But it's not clear what that would be; there are a lot of test cases that call prepare with more complex statements than the example that are passing just fine.

So, although we've only seen a repro on Node Neo so far, I tentatively suspect the cause is elsewhere (such as the Azure or Delta extension) and something about Node Neo on Linux x64 makes the problem more likely to surface.

It would be interesting to try to make a pure C program that makes the same calls as Node Neo in this case.

@jraymakers jraymakers added this to the Limbo milestone Feb 16, 2025
@LucaDe
Copy link
Author

LucaDe commented Feb 16, 2025

Yeah, it's somehow at the intersection of Neo, Azure & Delta.
I was able to confirm that the same issue occurs when connection against a local azure storage container using azurite (again amd64 breaks, arm64 works).
While creating a reproducable setup I could slim the setup down even further (without azurite or an acutal azure container). It's reproducable in the following setup:

Dockerfile

FROM node:22-bookworm-slim

WORKDIR /app

COPY index.js .
RUN npm init -y && npm i @duckdb/node-api

ENTRYPOINT ["node", "index.js"]

index.js

import { DuckDBInstance } from "@duckdb/node-api";

const main = async () => {
  const instance = await DuckDBInstance.create(":memory:");
  const connection = await instance.connect();

  await connection.run(`CREATE SECRET secret1 (TYPE AZURE,CONNECTION_STRING 'ABC');`);
  
  console.log("Running prepare");
  // This step fails when running on an AMD64 arch & succeeds on ARM64
  await connection.prepare(`SELECT * FROM delta_scan('az://testing/data')`);
  console.log("Prepare done");
};

await main();

ARM64

docker build -t duck:test --platform linux/arm64 .
docker run --platform linux/arm64 -it duck:test

# Throws the expected error "[Error: IO Error: Hit DeltaKernel FFI error (from: While trying to read from delta table: 'az://testing/data/'): Hit error: 8 (ObjectStoreError) with message (Error interacting with object store: Generic MicrosoftAzure error: Account must be specified)]"

AMD64

docker build -t duck:test --platform linux/amd64 .
docker run --platform linux/amd64 -it duck:test

# Fails with crash or segfault

@jraymakers
Copy link
Contributor

Thanks for the slimmed-down repro above. With those steps, I was able to reproduce locally.

I tried reproducing in a pure C program, but I was unable to. However, I was able to reproduce by adding a minimal test hardness function to Node Neo:

Napi::Value test_issue154(const Napi::CallbackInfo& info) {
    auto env = info.Env();
    std::cout << "start" << std::endl;
    duckdb_database db;
    std::cout << "open" << std::endl;
    duckdb_open(":memory:", &db);
    duckdb_connection conn;
    std::cout << "connect" << std::endl;
    duckdb_connect(db, &conn);
    duckdb_result res;
    std::cout << "query" << std::endl;
    duckdb_query(conn, "CREATE SECRET secret1 (TYPE AZURE,CONNECTION_STRING 'ABC');", &res);
    duckdb_prepared_statement prepared;
    std::cout << "prepare" << std::endl;
    duckdb_prepare(conn, "SELECT * FROM delta_scan('az://testing/data')", &prepared);
    std::cout << "done" << std::endl;
    return env.Undefined();
  }

When run in a linux/amd64 docker container, the above outputs:

start
open
connect
query
prepare
free(): invalid pointer

So, it seems some combination of the way the native binaries are built in Node (using node-gyp and the node-addon-api) and the use of the Azure & Delta extensions triggers this problem. I thought for a while that perhaps I was mismanaging memory in Node Neo (i.e. double-freeing), but the fact that the above example reproduces the problem eliminates that possibility.

Unfortunately, this means I don't know how to solve the problem. It seems most likely that there's a bug in the Azure or Delta extensions that is exposed in this particular build environment.

@LucaDe
Copy link
Author

LucaDe commented Feb 17, 2025

Thanks for further checking, I cross-posted this within the delta extension issues for now 🤞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants