Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue found on page 'Conversion between DuckDB and Python' #4268

Open
nickzoic opened this issue Dec 5, 2024 · 1 comment
Open

Issue found on page 'Conversion between DuckDB and Python' #4268

nickzoic opened this issue Dec 5, 2024 · 1 comment

Comments

@nickzoic
Copy link

nickzoic commented Dec 5, 2024

Page URL: https://duckdb.org/docs/api/python/conversion.html

I am unsure of whether this is a documentation issue or an issue with DuckDB itself or if I'm just doing it wrong, but I tried to follow a couple of the examples on this page under https://duckdb.org/docs/api/python/conversion#dict using duckdb python 1.1.3 ...

The first example with key and value lists which is meant to turn into a MAP I can't get to work no matter what I try.
The second which says "Otherwise we'll try to convert it to a STRUCT" comes out as a MAP.

example:

import duckdb
sesh = duckdb.connect()

def func3() -> dict[str,list[str]|list[int]]:
    return { "key": [ 1, 2, 3 ], "value": [ "one", "two", "three" ] } 
sesh.create_function("func3", func3)

try:
    print(sesh.sql("select func3()"))
except Exception as exc:
    print(f"func3(): {exc}")


def func4() -> dict[str|int|bool,str|int|list[int]|bool]:
    return { 
        1: "one",
        "2": 2,
        "three": [1, 2, 3],
        False: True
    } 
sesh.create_function("func4", func4)

try:
    print(sesh.sql("select func4()"))
except Exception as exc:
    print(f"func4(): {exc}")

output:

func3(): Conversion Error: Type VARCHAR can't be cast as UNION(u1 VARCHAR[], u2 BIGINT[]). VARCHAR can't be implicitly cast to any of the union member types: VARCHAR[], BIGINT[]
┌──────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                               func4()                                                │
│ map(union(u1 varchar, u2 bigint, u3 boolean), union(u1 varchar, u2 bigint, u3 bigint[], u4 boolean)) │
├──────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ {1=one, 2=2, three=[1, 2, 3], false=true}                                                            │
└──────────────────────────────────────────────────────────────────────────────────────────────────────┘
@soerenwolfers
Copy link
Collaborator

The type annotation in your first example has to be

 -> dict[int, int]

if you want the behavior mentioned in the docs. I agree that it's odd for a Python user to put a wrong type annotation on their function to make a consuming library happy, but that's why duckdb does offer specifying the input and output types explicitly in the create_function call, e.g.,

sesh.create_function("func3", func3, [], dict[int, int])

To get the type you're aiming for in your second example, similarly specify

-> dict[str|int|bool,str|int|list[int]|bool]:

In general, the docs that you link to are for replacement scan situations, where there is no chance for explicit annotations, not for UDFs. For example

a = np.array([{ 
        1: "one",
        "2": 2,
        "three": [1, 2, 3],
        False: True
    } ])
duckdb.query("SELECT * FROM a")

does infer the type that's expected from the docs (i.e., it creates a struct with the four stringified keys and the corresponding correct value types). @Tishj Might know more details / have more insights.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants