Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support capturing parameterized queries #159

Open
seasick opened this issue Jan 25, 2022 · 4 comments
Open

Support capturing parameterized queries #159

seasick opened this issue Jan 25, 2022 · 4 comments

Comments

@seasick
Copy link

seasick commented Jan 25, 2022

Brief description of the problem

Currently parameters of e.g. dbSendQuery are not considered when storing the statement and its return value.

When capturing two identical queries with different parameters only the result of the last query persists because the second capturing will overwrite the captured result of the first.

# Simplified code ahead ---v
capture_db_requests({
  dbSendQuery(con, "SELECT * FROM airlines WHERE carrier = $1 LIMIT 1", list("9E"))
  # capturing this overwrites the result of the previous capture, because only the statement is hashed
  dbSendQuery(con, "SELECT * FROM airlines WHERE carrier = $1 LIMIT 1", list("AA")) 
})

with_mock_db({
  # Both queries will get "AA" airlines
  dbSendQuery(con, "SELECT * FROM airlines WHERE carrier = $1 LIMIT 1", list("9E"))
  dbSendQuery(con, "SELECT * FROM airlines WHERE carrier = $1 LIMIT 1", list("AA"))
})
I've build a small test case - click me
# tests/testthat/test-parameterized-query.R
with_mock_path(path = file.path(temp_dir, "parameteterized-query"), {
    capture_db_requests({
        suppressMessages(con <- nycflights13_create_sqlite(verbose = FALSE))

        result <- dbSendQuery(con, "SELECT * FROM airlines WHERE carrier = $1 LIMIT 1", list("9E"))
        dbClearResult(result)

        result <- dbSendQuery(con, "SELECT * FROM airlines WHERE carrier = $1 LIMIT 1", list("AA"))
        dbClearResult(result)

        dbDisconnect(con)
    })

    with_mock_db({
        test_that("fetching queries with parameters", {
            suppressMessages(con <- nycflights13_create_sqlite(verbose = FALSE))

            # make a query
            result <- dbSendQuery(con, "SELECT * FROM airlines WHERE carrier = $1 LIMIT 1", list("9E"))
            rows <- DBI::dbFetch(result)
            dbClearResult(result)

            expect_equal(rows$carrier, "9E")

            # make a different one
            result <- dbSendQuery(con, "SELECT * FROM airlines WHERE carrier = $1 LIMIT 1", list("AA"))
            rows <- DBI::dbFetch(result)
            dbClearResult(result)

            expect_equal(rows$carrier, "AA")

            dbDisconnect(con)
        })
    })
})

Is there currently a way to have identical queries with different parameters and different captures?

@seasick seasick changed the title Capture different parameterized queries Support capturing parameterized queries Jan 25, 2022
@jonkeane
Copy link
Collaborator

Thanks for this issue, fantastic repro and discussion! Yup, I have not (yet) added other arguments to the hashing algorithm that tells queries apart. It shouldn't be too complicated to add it to the hashing system. Are you interested in sending a pull request doing that (You've already got the hard part done in the test case there!)? I'm more than happy to review something or send some pointers about where to look/what to alter...

@seasick
Copy link
Author

seasick commented Jan 31, 2022

@jonkeane, I can give it a shot. I also would be more than happy about some pointers before going into it 😄

@jonkeane
Copy link
Collaborator

Oops, sorry this notification got lost in my inbox 🙈

As a first pass I would try expanding the hash function to include at least one more argument where you can put the parameters. Since we don't really care about the hash doing anything other than being unique, you could even do something like dput(params) to get a string representation of those values at this point.

dittodb/R/paths.R

Lines 29 to 33 in 75e3ce0

hash <- function(string, n = 6) {
string <- clean_statement(string)
return(substr(digest(as.character(string)), 1, n))
}

and then for recording alter somewhere around:

hash(statement)

The recording bit is a little funny, the expression in dbSendQueryTrace there is executed at the end of a normal dbSendQuery invocation (the trace_dbi() function basically inserts that call at the end). So you'll have access to all of the values that have been written there (like params). You might need to do something to ensure that params exists before adding it to the hash, since not all backends support it (it's optional in the DBI spec).

and for reading alter somewhere around (which should be more straight forward):

hash = hash(statement),

In case you don't already know about this: you can set set_dittodb_debug_level() to an integer >2 to see lots of extra messages which I've found helpful when debugging this kind of thing!

@seasick
Copy link
Author

seasick commented Feb 23, 2022

Oops, sorry this notification got lost in my inbox 🙈

Same here 🙈 😁

I will take a stab at it in the coming days or weeks, not sure when I got the time for it. But it doesn't sound to complex from your description - I feared it would be harder.

Thanks for taking the time putting that together!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants