Update quickbooks with realistic data type error by joellabes · Pull Request #95 · dbt-labs/ade-bench

joellabes · 2025-12-11T03:57:35Z

First swing at the problem described in #55 (comment), where an agent shouldn't be asked to modify a source table directly.

This PR depends on #85 and its new approach to packages + alignment of dbt authoring layer versions between core and fusion. It does the following things:

During setup, change the column to an integer by making it a unix epoch, instead of null::integer. This should also make it easier for the agents to solve, if they preview the data and realise what they look like.
Change the solution to expect the agent to create its own versions of the staging models, and disable the other ones. This is a difficult task which I had to do a lot of Googling to find the docs for! But it is the dbtonic way of solving the underlying problem
Adds Snowflake and Fusion support to the quickbooks family of tasks (not really propagated across 002, 003, 004 yet)
adds yq to the duckdb/core image, to modify dbt_project.yml

…ent to build

joellabes · 2025-12-15T05:57:34Z

@bstancil would be good to get a sense check from you here too, before I go further

bstancil

Left a couple smaller comments. Two other larger ones:

One thing to note here is that the solution doesn't actually matter. If the sage agent solves this by overriding the staging tables, great, but the agent doesn't have to do it in the same way. It could, in theory, blow up the original data type, and then just create staging models that read what's in the source table. So long as the dbt tests pass (staging models exist and contain what we want them to contain), the eval will pass. It doesn't require them to be created in the same way. (We could solve that by checking the source table, or just checking data types of the source table.)

One other note about that too - if we want it to solve it in this particular way - especially if it's dependent on a creating a new model - we probably have to tell it that we want that. It's unlikely to create these models with those names unless the assignment gives it that instruction. (Another alternative would be say "solve this problem WITHOUT changing the source data, adding a test to make sure that source table doesn't change, and getting rid of the tests for the staging tables. So, so long as it gets the right result without breaking the source, we'd consider it passing.)

On the migrations, the reason this isn't in Snowflake is because the duckdb to snowflake script doesn't work for quickbooks. I don't remember the exact error, but if you run the migration ade migrate duckdb-to-snowflake --include quickbooks --use-database-export, I believe it has issues with some sort of data type conversion. (I'm trying it now, and so far so good, but will report back when I see what it does.)

Also, several quickbook tasks don't work with fusion because you can't (to my knowledge) inspect the raw sql. There might be some workarounds for that though.

#12

bstancil · 2025-12-15T16:03:59Z

docker/base/Dockerfile.duckdb-dbt

 RUN apt-get update && apt-get install -y \
    tmux asciinema \
    curl \
+    && curl -sSL https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64 \


Technically this isn't necessary because the duckdb dockerfile isn't used for any of the migrations you're working on here, but probably fine to leave to keep it consistent with the otehrs

The solution includes modifying dbt_project.yml to disable the overridden models, which is why this is now needed

bstancil · 2025-12-15T16:07:03Z

shared/migrations/quickbooks__duckdb_to_snowflake_dbtf/migration.sh

As a broader note on these migrations, if the change that's needed is fairly small (eg, replace a function with another one), I try to do it inline in the script, because it's easier to see what's changing. Full replacing a file can be hard to follow, since you don't know what's different between things. That's a judgement call though. How much are we needing to update this?

These files aren't in the original project, because they're initially installed from the package hub. I guess they could be copied from inside of the Docker container's project's packages directory and then have the one-line change applied, but that feels kinda janky too.

bstancil · 2025-12-15T16:35:09Z

Ah, actually, the load into snowflake happens fine? So never mind on at least that part of bullet 2

joellabes · 2025-12-15T20:19:09Z

So long as the dbt tests pass (staging models exist and contain what we want them to contain), the eval will pass. It doesn't require them to be created in the same way.

Yep got it - although over time I'd imagine we add more checks to ensure that the actual problem has been solved, not just that it's stumbled into a correct-looking answer.

It could, in theory, blow up the original data type, and then just create staging models that read what's in the source table.

This, for example, would pass the test but destroy the data. So adding "source tables are the same shape as they were at the outset" would be another eval condition to add, as you note.

One other note about that too - if we want it to solve it in this particular way - especially if it's dependent on a creating a new model - we probably have to tell it that we want that. It's unlikely to create these models with those names unless the assignment gives it that instruction. (Another alternative would be say "solve this problem WITHOUT changing the source data, adding a test to make sure that source table doesn't change, and getting rid of the tests for the staging tables. So, so long as it gets the right result without breaking the source, we'd consider it passing.)

Yeah Ganz and I were discussing that this morning - when you look at the SWE-bench tasks for example, their prompts are full development-ready GitHub issues. So "this is broken, fix it" or "make a table with these characteristics" are probably too brief. I do think that having a second LLM play as the requestor and meting out additional information if/when the agent asks followup questions would be a very good path to go down, but I'll make a separate issue to discuss that.

bstancil · 2025-12-15T20:56:23Z

On the "Source tables are the same shape" point, yeah, that would require adding a couple new seed tables to the test, so that's easy.

And on the question about prompt detail, I'm a bit mixed on that. Though I guess it depends on how exactly we want to use it. Without thinking about it too much, I think I might say that every task should have two "prompts" (though they aren't prompts, but are like, instructions):

Detailed, which explains exactly what needs to be done, a la swe bench.
"realistic," which is more like how someone would actually instruct a bot to do it. "Figure out what's broken and fix it" is definitely a thing I've told it before, and that strikes me as how people would generally actually do it.

…rror

joellabes · 2025-12-18T01:57:37Z

2. "Figure out what's broken and fix it" is definitely a thing I've told it before, and that strikes me as how people would generally actually do it.

Me too, but that's my opening gambit in a back and forth with the LLM, which is the bit that's missing at the moment

…cript

joellabes added 5 commits December 11, 2025 16:09

Update setup.sh to create realistic integer column for dates

9b60fe5

migration files

b4ccdfe

Update sample seeds to contain the staging models we're asking the ag…

9ef2f2e

…ent to build

dont actually need yq after all

bb3d6e7

do actually need yq after all

efc2f6e

bstancil reviewed Dec 15, 2025

View reviewed changes

Merge branch 'main' into update-quickbooks-with-realistic-data-type-e…

07e24ed

…rror

joellabes mentioned this pull request Dec 18, 2025

Add new auto check that the agent did not modify source tables #99

Open

joellabes added 2 commits December 18, 2025 16:48

Add snowflake support to other quickbooks tasks, update qb003 setup s…

3450807

…cript

Fix qb003 solution

12c9c39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Update quickbooks with realistic data type error#95

Update quickbooks with realistic data type error#95
joellabes wants to merge 8 commits intomainfrom
update-quickbooks-with-realistic-data-type-error

joellabes commented Dec 11, 2025 •

edited

Loading

Uh oh!

joellabes commented Dec 15, 2025

Uh oh!

bstancil left a comment

Uh oh!

bstancil Dec 15, 2025

Uh oh!

joellabes Dec 15, 2025

Uh oh!

bstancil Dec 15, 2025

Uh oh!

joellabes Dec 18, 2025

Uh oh!

bstancil commented Dec 15, 2025

Uh oh!

joellabes commented Dec 15, 2025

Uh oh!

bstancil commented Dec 15, 2025

Uh oh!

joellabes commented Dec 18, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

joellabes commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joellabes commented Dec 15, 2025

Uh oh!

bstancil left a comment

Choose a reason for hiding this comment

Uh oh!

bstancil Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

joellabes Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

bstancil Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

joellabes Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

bstancil commented Dec 15, 2025

Uh oh!

joellabes commented Dec 15, 2025

Uh oh!

bstancil commented Dec 15, 2025

Uh oh!

joellabes commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

joellabes commented Dec 11, 2025 •

edited

Loading

joellabes commented Dec 18, 2025 •

edited

Loading