This is a substantially re-written fork of Stefan Milton Bache's squr
package.
The squr
(pronounced "skewer") package provides a set of tools
for managing structured query language (SQL) files in R projects.
It promotes the separation of logic and query scripts and eases the process
of reading, parameterizing, and composing SQL queries.
The primary changes in this fork are: the removal of several features that are
unnecessary for my own purposes (transactions, INSERT
statements, ignore blocks,
composing SQL snippets with +
) although I may add them back at a later date.
Additionally, I've made significant internal changes to handle the (specific
to me) situation of needing to run queries with very long IN
clauses with
more than 1000 elements. This includes new parameter specification methods
that use the helper functions IN()
and LIKE()
. Finally there is a minimal
system for documenting the associated SQL files.
squr uses DBI::sqlInterpolate
and DBI::dbQuoteString
to parse values, but
also allows for arbitrary string manipulation of SQL queries via sq_replace
.
Interpolating values is not as secure as using each database's
parameterized query functionality.
Because of this, squr is aimed at users for whom SQL injection attacks are a fairly low concern. This may include situations like,
- you interact with a local database on your machine with limited or no internet connectivity
- SQL parameter input values are always generated by you, not unknown remote user input
- you write code against a database for which you only have
SELECT
privileges - the code you write with squr is not being run by anyone other than you (e.g. not in web apps like Shiny)
The farther away from these sorts of situations you are, the less likely it is that squr is a secure choice. However, whenever the values originate from user input (e.g. in a Shiny/web application, or web services, etc), approprite precautions should still be taken (type checking, whitelisting, etc.)
OWASP has some good guidelines on how to prevent SQL injection in your applications.
In the following example the sql/query.sql
file is read,
parameterized, and sent. There is no database connectivivty in squr
,
this is left fully flexible and out of scope.
# Simple example of a query function wrapper. This part varies depending
# on database, drivers, etc, but needs only setup once.
rodbc <- function(query){
ch <- RODBC::odbcDriverConnect("<connectionstring>")
on.exit(RODBC::odbcClose(ch))
RODBC::sqlQuery(ch, query)
}
# Example of reading file, setting a parameter value, and sending the query,
# using the `sq_*` ("skew") functions.
result <-
sq_file("sql/query") %>%
sq_set(Param = value) %>%
sq_send(.with = rodbc)
The corresponding query.sql
file could look something like:
SELECT *
FROM TheTable
WHERE Param = @Param
Note that many arguments in squr
are prefixed with a dot; this is to
avoid name clash with values in ...
arguments.
Arguments are not actually bound to parameters in sq_set
, they are merely
recorded in the object. Parameters are bound just before being sent to the
database in sq_send
.
Note:
- the
.sql
extension can be omitted, - when called from within a package, the path is relative to the
inst
allation folder.
There is also sq_text
, which is a way to use inline SQL. Both sq_file
and
sq_text
produces S3 objects which are lists with an additional class
sq
to enable a few methods (e.g. print
).
Suppose we have the following SQL:
SELECT *
FROM table
WHERE column IN @column
and we'd like to be able to run this query with various collections of values
inserted for @column
. In squr
we can do the following:
result <- sq_text(.query = "select * from table where column in @column") %>%
sq_set(column = IN(1:10)) %>%
sq_send(.with = rodbc)
Wrapping the vector of values to be bound with IN()
ensures that they are
properly formatted and wrapped in parens. This will work for only a single value,
converting IN(1)
into column IN (1)
. Also, if the vector of values passed
is longer than 1000 squr
will split it into chunks of size at most 1000 and
then send multiple queries for each chunk, rbind
ing the results.
Splitting IN
clauses like this only works for one IN
clause per SQL query.
If our query looked like this:
SELECT *
FROM table
WHERE column LIKE @column
we can pass in values for @column
like this:
result <- sq_text(.query = "select * from table where column like @column") %>%
sq_set(column = LIKE("foo",side = "l",wildcard = "%")) %>%
sq_send(.with = rodbc)
The resulting bound query would look like this:
SELECT *
FROM table
WHERE column LIKE '%foo'
There is also a function sq_replace
that is a wrapper for gsub
that allows
you to edit SQL text, but only prior to setting parameter values with sq_set
.
Since values are appropriately quoted when they are bound, the default
replacements will not work for dynamically specifying column and table names.
However, you can use sq_value
explicitly (this is the function used
internally to prepare a value for SQL):
-- The SQL file
SELECT [Date]
, [CustomerId]
, [CustomerName]
, @Feature
FROM Customers
WHERE Date BETWEEN @DateFrom AND @DateTo
# R
result <-
sq_file("customers") %>%
sq_set(DateFrom = Sys.Date() - 10,
DateTo = Sys.Date(),
Feature = sq_value("Turnover", quote = "[")) %>%
sq_send(.with = rodbc)
squr allows for limited documentation within your .sql files, using a very simple syntax along the lines of roxygen2. All commented out lines at the beginning of a .sql file are assumed to be documentation. An example illustrating the format is:
-- Test Query
--
-- Pulls `column1` and `column1` from `my_table` filtering on a particular value for
-- `condition`.
--
-- @param foo a value for condition, e.g. "bar"
-- @param bar a value for another condition, this time with a
-- longer description across multiple lines.
-- @functions my_fun, your_fun, lots_of_fun,
-- more_fun, final_fun
-- @scripts script1.R, script2.R
select
column1,
column2
from
my_table
where
condition = @foo
another_condition = @bar
The first line, Test Query
is the query title. The next section forms the
description, with a blank commented line above and below.
Only three tags are recognized:
@param
a single word denoting the name of the paramter, a space and then a description of the parameter@functions
a comma separated list of functions that the SQL query is used in, e.g. for when squr is used within another package@scripts
a comma separated list of .R scripts that the SQL query is used in, e.g. for when squr is used in an RStudio project.
Other tags are ignored.
A similar (but different) project for Clojure (with ports for some other languages) by @krisajenkins is Yesql.