Skip to content

Comments

Update quickbooks with realistic data type error#95

Draft
joellabes wants to merge 8 commits intomainfrom
update-quickbooks-with-realistic-data-type-error
Draft

Update quickbooks with realistic data type error#95
joellabes wants to merge 8 commits intomainfrom
update-quickbooks-with-realistic-data-type-error

Conversation

@joellabes
Copy link
Collaborator

@joellabes joellabes commented Dec 11, 2025

First swing at the problem described in #55 (comment), where an agent shouldn't be asked to modify a source table directly.

This PR depends on #85 and its new approach to packages + alignment of dbt authoring layer versions between core and fusion. It does the following things:

  • During setup, change the column to an integer by making it a unix epoch, instead of null::integer. This should also make it easier for the agents to solve, if they preview the data and realise what they look like.
  • Change the solution to expect the agent to create its own versions of the staging models, and disable the other ones. This is a difficult task which I had to do a lot of Googling to find the docs for! But it is the dbtonic way of solving the underlying problem
  • Adds Snowflake and Fusion support to the quickbooks family of tasks (not really propagated across 002, 003, 004 yet)
  • adds yq to the duckdb/core image, to modify dbt_project.yml

@joellabes
Copy link
Collaborator Author

@bstancil would be good to get a sense check from you here too, before I go further

Copy link
Collaborator

@bstancil bstancil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple smaller comments. Two other larger ones:

  1. One thing to note here is that the solution doesn't actually matter. If the sage agent solves this by overriding the staging tables, great, but the agent doesn't have to do it in the same way. It could, in theory, blow up the original data type, and then just create staging models that read what's in the source table. So long as the dbt tests pass (staging models exist and contain what we want them to contain), the eval will pass. It doesn't require them to be created in the same way. (We could solve that by checking the source table, or just checking data types of the source table.)

One other note about that too - if we want it to solve it in this particular way - especially if it's dependent on a creating a new model - we probably have to tell it that we want that. It's unlikely to create these models with those names unless the assignment gives it that instruction. (Another alternative would be say "solve this problem WITHOUT changing the source data, adding a test to make sure that source table doesn't change, and getting rid of the tests for the staging tables. So, so long as it gets the right result without breaking the source, we'd consider it passing.)

  1. On the migrations, the reason this isn't in Snowflake is because the duckdb to snowflake script doesn't work for quickbooks. I don't remember the exact error, but if you run the migration ade migrate duckdb-to-snowflake --include quickbooks --use-database-export, I believe it has issues with some sort of data type conversion. (I'm trying it now, and so far so good, but will report back when I see what it does.)

Also, several quickbook tasks don't work with fusion because you can't (to my knowledge) inspect the raw sql. There might be some workarounds for that though.

#12

RUN apt-get update && apt-get install -y \
tmux asciinema \
curl \
&& curl -sSL https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64 \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this isn't necessary because the duckdb dockerfile isn't used for any of the migrations you're working on here, but probably fine to leave to keep it consistent with the otehrs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solution includes modifying dbt_project.yml to disable the overridden models, which is why this is now needed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a broader note on these migrations, if the change that's needed is fairly small (eg, replace a function with another one), I try to do it inline in the script, because it's easier to see what's changing. Full replacing a file can be hard to follow, since you don't know what's different between things. That's a judgement call though. How much are we needing to update this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These files aren't in the original project, because they're initially installed from the package hub. I guess they could be copied from inside of the Docker container's project's packages directory and then have the one-line change applied, but that feels kinda janky too.

@bstancil
Copy link
Collaborator

Ah, actually, the load into snowflake happens fine? So never mind on at least that part of bullet 2

@joellabes
Copy link
Collaborator Author

So long as the dbt tests pass (staging models exist and contain what we want them to contain), the eval will pass. It doesn't require them to be created in the same way.

Yep got it - although over time I'd imagine we add more checks to ensure that the actual problem has been solved, not just that it's stumbled into a correct-looking answer.

It could, in theory, blow up the original data type, and then just create staging models that read what's in the source table.

This, for example, would pass the test but destroy the data. So adding "source tables are the same shape as they were at the outset" would be another eval condition to add, as you note.

One other note about that too - if we want it to solve it in this particular way - especially if it's dependent on a creating a new model - we probably have to tell it that we want that. It's unlikely to create these models with those names unless the assignment gives it that instruction. (Another alternative would be say "solve this problem WITHOUT changing the source data, adding a test to make sure that source table doesn't change, and getting rid of the tests for the staging tables. So, so long as it gets the right result without breaking the source, we'd consider it passing.)

Yeah Ganz and I were discussing that this morning - when you look at the SWE-bench tasks for example, their prompts are full development-ready GitHub issues. So "this is broken, fix it" or "make a table with these characteristics" are probably too brief. I do think that having a second LLM play as the requestor and meting out additional information if/when the agent asks followup questions would be a very good path to go down, but I'll make a separate issue to discuss that.

@bstancil
Copy link
Collaborator

On the "Source tables are the same shape" point, yeah, that would require adding a couple new seed tables to the test, so that's easy.

And on the question about prompt detail, I'm a bit mixed on that. Though I guess it depends on how exactly we want to use it. Without thinking about it too much, I think I might say that every task should have two "prompts" (though they aren't prompts, but are like, instructions):

  1. Detailed, which explains exactly what needs to be done, a la swe bench.
  2. "realistic," which is more like how someone would actually instruct a bot to do it. "Figure out what's broken and fix it" is definitely a thing I've told it before, and that strikes me as how people would generally actually do it.

@joellabes
Copy link
Collaborator Author

joellabes commented Dec 18, 2025

2. "Figure out what's broken and fix it" is definitely a thing I've told it before, and that strikes me as how people would generally actually do it.

Me too, but that's my opening gambit in a back and forth with the LLM, which is the bit that's missing at the moment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants