- Namespace: thi.ng.trio.query
Triplestores not just provide high flexibility in terms of data organization and storage, but also for querying purposes. The query engine implemented in this namespace is conceptually based on W3C SPARQL and also boroughs some of its terminology, however does not entail any RDF specifics or semantics. Datalog (as used by Datomic) too is similar in approach and both are essentially based on describing & matching patterns in a directed graph.
Feature | Status |
---|---|
Query planning | INPROGRESS |
Query compilation | DONE |
Sub-queries | DONE |
Multi-graph queries | DONE |
Direct query of clj triple seqs | DONE |
Filtering (per sub-query) | DONE |
Joins (fail fast) | DONE |
Optional joins | DONE |
Union (match alternatives) | DONE |
Subtraction (negation) | DONE |
Async / parallel sub-query execution | TODO |
Arbitrary-length paths queries | INPROGRESS |
Result graph construction | DONE |
Result graph extraction | INPROGRESS |
Result var binding injection | DONE |
Result var aliasing/transformation | DONE |
Result ordering on arbitrary vars | DONE |
Result grouping | DONE |
Result aggregate expressions | DONE |
Aggregate filters | DONE |
Compiler error reporting | INPROGRESS |
More generally, the overall engine design was guided by:
- no macro magic, completely functional
- use of plain hash maps to formulate and dynamically compose query specs
- easy to read syntax, no cryptic symbols
- use of vanilla Clojure fns for predicates, binding & aggregate fns
Currently, the query engine’s public API consist of only these two functions:
compile-query
compiles a query specquery
executes a query and accepts both pre-compiled queries or raw query specs.
For both functions an exception is thrown if there’re any compile errors. The query compiler itself is described in detail below.
(defn compile-query
"Pre-compiles a query spec hashmap into a query function to be
executed with `query`. Throws an exception if compilation fails
due to illegal args in spec."
[q]
(if-let [type (some #{:select :ask :construct :describe} (keys q))]
(compile-query-step identity q type)
(err/illegal-arg! "Unsupported query type")))
(defn query
"Takes a single query spec (raw or compiled) and executes it.
Returned result format depends on query type. A failed query
either results in nil or an empty collection."
[q]
(let [res ((if (fn? q) q (compile-query q)) ::empty)]
(if (seq res) res)))
To introduce some of the query engine features, let’s define a small graph of relationships:
Subject | Predicate | Object |
---|---|---|
alice | friend | bob |
alice | age | 34 |
bob | friend | carl |
bob | nick | bobby |
carl | spouse | alice |
carl | father | donald |
carl | age | 42 |
bob | father | emily |
emily | age | 23 |
freya | mother | emily |
emily | nick | em |
donald | friend | emily |
We can visualize this table with the following Graphviz snippet (utilizing some Elisp helper functions defined in libraryofbabel.org:
In Clojure this graph could be defined like this (using a hashmap is just one option (see Clojure collection conversion in trio.core namespace).
(require '[thi.ng.trio.core :as trio])
(require '[thi.ng.trio.query :as q])
(def g
(trio/as-model
'{alice {friend bob, age 34}
bob {friend carl, father emily, nick bobby}
carl {father donald, spouse alice, age 42}
donald {friend emily}
emily {age 23, nick em}
freya {mother emily}}))
Let’s query this graph to find some information about all parents & children:
- perform alternative queries to bind
?p
to fathers or mothers of any child?c
- optionally retrieve child’s age
?a
- inject result var binding/mapper to format age as descriptive string
- sort results by child name
(q/query
{:select :*
:from g
:query [{:where '[[?p father ?c]]}
{:union '[[?p mother ?c]]}
{:optional '[[?c age ?a]]}]
:bind {'?a (fn [res a] (if a (str (res '?c) " is " a " years old")))}
:order-asc '?c})
;; => ({?p carl, ?c donald} {?a "emily is 23 years old", ?p freya, ?c emily} {?a "emily is 23 years old", ?p bob, ?c emily})
We know Carl’s spouse is Alice, but we don’t have explicity stated her
as Donald’s mother and therefore she didn’t appear in the results.
However, we could create a :construct
query that produces a number
of inferred facts based on the following rules (of course these
wouldn’t stand up in the real world, but let’s ignore that point on
this occasion):
- for every
?f
father of?c
and where?m
spouse of?f
, we declare?m
mother of?c
- for every
?m
mother of?c
and where?f
spouse of?m
, we declare?f
father of?c
- for every
?f
father of?c
and?m
mother of?c
, we declare?f
spouse?m
(and vice versa) - declare a new
parent
relationship for each father/mother - assign gender information for each parent
- declare a new
child-of
relationship for each child
In order to create these new facts we can use a :construct
type
query and we will also compile it first:
(def infer-parents
(q/compile-query
{:from g
:construct '[[?f father ?c] [?m mother ?c] ;; rule 1 + 2
[?f spouse ?m] [?m spouse ?f] ;; rule 3
[?f parent ?c] [?m parent ?c] ;; rule 4
[?f gender male] [?m gender female] ;; rule 5
[?c child-of ?f] [?c child-of ?m]] ;; rule 6
:query [{:where '[[?f father ?c] [?f spouse ?m]]}
{:union '[[?m mother ?c] [?m spouse ?f]]}
{:union '[[?f father ?c] [?m mother ?c]]}]}))
(q/query infer-parents)
;; #{[freya spouse bob] [emily child-of bob] [carl parent donald] [bob gender male]
;; [alice parent donald] [carl father donald] [freya mother emily] [donald child-of alice]
;; [alice gender female] [alice mother donald] [bob spouse freya] [emily child-of freya]
;; [freya gender female] [bob parent emily] [carl spouse alice] [donald child-of carl]
;; [bob father emily] [alice spouse carl] [carl gender male] [freya parent emily]}
Now we can add these inferred triples to our graph and test it.
Note: Graphs have set-like semantics and only store unique triples.
So even though our :construct
query produces some triples which are
already in the graph, no duplicates will be added.
;; build new graph with inferred triples
(def g2 (trio/add-triples g (q/query infer-parents)))
;; select all parents from new graph and group by child
(q/query {:select :* :from g2 :query [{:where '[[?p parent ?c]]}] :group '?c})
;; => {emily [{?p bob, ?c emily} {?p freya, ?c emily}], donald [{?p carl, ?c donald} {?p alice, ?c donald}]}
The new graph with inferred triples marked in red:
As a final introductory example, let’s demonstrate how to extract
anything we know about subjects matched via an initial query/search,
in this case people with nicknames and who are younger than 40. A
:describe
query returns all triples in which a bound result var is
either a subject or object.
(q/query {:describe '?p :from g2 :query [{:where '[[?p nick ?n] [?p age ?a]] :filter {'?a #(< % 40)}}]})
;; => #{[emily child-of bob] [emily child-of freya] [emily nick em] [emily age 23]
;; [donald friend emily] [freya mother emily] [freya parent emily] [bob parent emily] [bob father emily]}
The extracted sub-graph defined by the result triples (edges in red are the matched query terms):
The role of the query compiler is to validate, analyze and translate query specs (plain hashmaps) into actual working code, as well as to reduce pre-processing and optimize code paths for repetitive queries.
As shown in the above examples, the query engine supports several query types, all of which are described in further detail below. Common to all query types is the concept of using one or more subqueries (each of which describes one or more graph patterns with variables to be matched). Each sub-query also defines a set operation how its individual result set is to be merged with already existing results to produce the final solution(s).
Query compilation is realized via the compile-query-step
multimethod
in order to cleanly separate logic for each query element and to keep
the code legible & steps composable.
(defmulti compile-query-step
(fn [qfn q type] type))
Essentially, each compile step is using functional composition to wrap
its predecessor and combine that result with its own. To allow
fail-fast behaviors for some phases, the namespaced ::empty
keyword
is used internally as sentinel to distinguish the initially empty
result set from an earlier failed search (where results are nil
or
an empty seq).
Before discussing the compilation steps in detail, we first define and take a look at the various top-level query types and their options:
Purpose: Matches patterns in graph, returns sequence of all found solutions. Each solution is a map of unique qvar bindings.
Use cases:
- General graph queries / pattern matching
- Result transformations
Key | Value(s) | Description | Required |
---|---|---|---|
:select | :*, qvar(s), map | Defines which qvars appear in solution | Y |
(See Result var projection) | |||
:from | trio.core/PModelSelect instance | Default triplestore to use for all sub-queries | Y/N |
(only mandatory if any sub-query doesn’t provide its own :from ) | |||
:query | [{sub-query} …] | Seq of any number of sub-queries (see Sub-queries) | Y |
:async | boolean | If true, execute sub-queries asynchronously in parallel | N |
Note: Not yet implemented | |||
:bind | fn or bindings map | Inject/transform qvars in final solutions | N |
(see Query options) | |||
:values | ’{?v1 #{a b} ?v2 c ..} | Pre-bind qvars to given values for all subqueries | N |
in order to constrain search. | |||
If a value set is specified for a qvar, attempts all options given | |||
Note: The same effect can be achieved with filters, | |||
though pre-binding will produce optimized queries | |||
:distinct | boolean | If true, results only include distinct bindings | |
:aggregate | {?v fn …} | Compute aggregate values and inject into solution | N |
(See Result aggregation & grouping) | |||
:filter | (fn [res] ..) or {?v fn..} | Filter final solutions | N |
:order-asc | ‘?var | Sort solutions by qvar (ascending) | N |
’[?v1 ?v2 ..] | Nested sort by first ?v1, ?v2 … | ||
:order-desc | same as :order-asc | Sort solutions by qvar (descending) | N |
:group | ‘?var | Group solutions by qvar | N |
(fn [res] group-key) | Group solutions by return value of fn | ||
Note: returns solutions as map (not as seq) |
The order of processing steps of a :select
query is as follows:
- Compile query spec
- Produce solution seq via iterative sub-query processing
- Inject qvars given via
:bind
option in all results - Sort solution seq via
:order-asc
or:order-desc
options - Project variables
- Perform grouping, if enabled
- Compute & inject aggregations (if grouped, apply aggregation fns for each key’s solution set)
- Final result filtering
The query compiler uses thi.ng/validate to verify all query specs and
provide useful error messages. The follow spec serves as formal (but
non-exhaustive) definition of a valid top-level :select
query.
Further validation occurs for certain query elements (e.g. during
sub-query dispatch, see below).
(def validator-select
{:select (v/alts
[(v/symbol) (v/keyword) (v/sequential)]
"must be a qvar, keyword or seq")
#?@(:clj
[:from (v/optional
(v/alts
[(v/instance clojure.lang.IDeref)
(v/satisfies api/PModelSelect) (v/satisfies api/PModelConvert)]
"must be an instance of IDeref or satisfy the PModelSelect or PModelConvert protocols"))])
:query {:* (v/map)}
:bind (v/optional
(v/alts
[(v/map) (v/function)]
"must be a map or fn"))
:group (v/optional
(v/alts
[(v/symbol) (v/sequential) (v/function)]
"must be a qvar, qvar seq or fn"))
:aggregate (v/optional (v/map))
:filter (v/optional
(v/alts
[(v/map) (v/function)]
"must be a map or fn"))})
(defmethod compile-query-step :select
[qfn {:keys [select bind order-asc order-desc distinct] :as q} _]
(debug :select select)
(validate {:from (collect-source-graphs q)} {:from (v/required)})
(validate q validator-select)
(cond->
(compile-query-step qfn q :query-dispatch)
bind (compile-query-step bind :bind)
order-asc (compile-query-step order-asc :order-asc)
order-desc (compile-query-step order-desc :order-desc)
true (compile-query-step q :project-vars)
distinct (compile-query-step q :distinct-bindings)))
Purpose: Matches patterns in graph(s) and binds values to qvars, then constructs new triples using bound qvars solutions. Returns set of constructed triples.
Use cases:
- Rule-based inference
- Triple mapping/updating
- Reification
- Annotation
- Adding of provenance data
- Graph transformation (in combination w/
:describe
)
Key | Value(s) | Description | Required |
---|---|---|---|
:construct | [pattern …] | solutions contain all used qvars | yes |
:from | trio.core/PModelSelect instance | default triplestore to use for all sub-queries | yes |
:query | [{sub-query} …] | seq of any number of sub-queries (see Sub-queries) | yes |
:bind | fn or bindings map | inject/transform qvars in final solutions | no |
(see Query options) |
Any other options supported by :select
will be removed from the
query spec.
(defn construct-triples
[patterns res]
(->> res
(mapcat
(fn [r]
(map
(fn [[s p o]]
(let [s (if (qvar? s) (r s) s)
p (if (qvar? p) (r p) p)
o (if (qvar? o) (r o) o)]
(if (and s p o) (api/triple s p o))))
patterns)))
(filter identity)
(set)))
(defmethod compile-query-step :construct-triples
[qfn {:keys [construct] :as q} _]
(fn [res] (->> res qfn (construct-triples construct))))
(defmethod compile-query-step :construct
[qfn q _]
(let [q (-> q
(dissoc :order-asc :order-desc :group)
(assoc :select :*))]
(-> qfn
(compile-query-step q :select)
(compile-query-step q :construct-triples))))
(defmethod compile-query-step :ask
[qfn q _]
(let [q (-> q
(dissoc :order-asc :order-desc :bind :group :aggregate)
(assoc :select :*))
qfn (compile-query-step qfn q :select)]
(fn [res] (if (seq (qfn res)) true false))))
Purpose: Extracts sub-graph for nodes matched via query. Returns
sub-graph as set of triples. If used with multiple graphs (defined via
:from
in sub-queries), :describe
queries will return a combined
result set from all graphs.
Use cases:
- REST APIs
- Sub-graph extraction
- Selective graph merging
- Graph partitioning/segmentation
- Complete removal of resources from graph(s)
Options:
Key | Value(s) | Description | Required |
---|---|---|---|
:describe | ‘?var | select sub-graphs of nodes in the value set of single qvar | yes |
’[?v1 ?v2 …] | select sub-graphs of given qvar value sets | ||
:from | trio.core/PModelSelect instance | default triplestore to use for all sub-queries | yes |
:query | [{sub-query} …] | seq of any number of sub-queries (see Sub queries) | yes |
:bind | fn or bindings map | inject/transform qvars in final solutions | no |
(see Query options) |
Any other options supported by :select
will be removed from the
query spec.
(defmethod compile-query-step :describe
[qfn {:keys [describe] :as q} _]
(let [q (-> q
(dissoc :order-asc :order-desc :group :aggregate)
(assoc :select :*))
qfn (compile-query-step qfn q :select)
describe (if (sequential? describe) describe [describe])
graphs (collect-source-graphs q)]
(fn [res]
(let [vars (select-keys (accumulate-result-vars (qfn res)) describe)]
(if (seq vars)
(reduce-kv
(fn [acc _ vals]
(reduce
(fn [acc from]
(-> acc
(into (mapcat #(api/select from % nil nil) vals))
(into (mapcat #(api/select from nil nil %) vals))))
acc graphs))
#{} vars))))))
Every main query spec must contain a :query
key which is a vector or
seq of subqueries and which themselves are maps. The engine currently
supports the following sub-query types:
Type | Key | Options | Description |
---|---|---|---|
join | :where | :from :filter :bind :values | join/intersection of given patterns (fail fast) |
join optional | :optional | :from :filter :bind :values | natural join of given patterns |
union | :union | :from :filter :bind :values | merge results with existing |
negation | :minus | :from :filter :bind :values | remove results from existing |
IMPORTANT: Each sub-query must specify one or more search patterns
which are always joined in order and are fail fast (i.e. if one
pattern fails the entire sub-query fails). The specified set operation
(e.g. optional join) is only applied to merge a sub-query’s result set
with the already computed solution. That also means, any query type
other than a :where
is potentially meaningless as first (or only)
sub-query: For example, a :minus
sub-query stated as first sub-query
would attempt to subtract its results from a still empty result set
and therefore constitute a no-op…
Each sub-query can define its own filters & variable injections and other options. These are passed via the options keys listed in the above table.
The :filter
& :bind
options both assume a function which is
applied to each unique solution map. However, for most use cases it is
more convenient to only specify these functions for individual query
vars and the following helpers allow us to do so.
For example this sub-query asks for artists under 25 and injects a new
qvar ?fullname
into each result:
{:where '[[?s :type :artist] [?s :age ?a] [?s :fname ?fn] [?s :surname ?sn]]
:filter {'?a #(< % 25)}
:bind {'?fullname (fn [res _] (str (res '?fn) " " (res '?sn)))}}
If specified as map with qvars as keys, :filter
functions must accept a single arg, i.e. the value of the qvar binding
to check. Any truthy return value will keep the result in the set of
solutions.
(defn compile-filter-fn
"Takes a map of qvar bindings as keys and filter fns as values.
Returns fn which accepts a single result map and returns true if all
given filter fns succeed. Filter fns must accept a single arg, i.e.
the qvar's value in the result map. Filter fns are only called if
the qvar is bound. If a filter is defined for a qvar which does not
ever appear in a solution, the related query will produce no
results."
[filters]
(if filters
(if (fn? filters)
filters
(fn [res]
(every?
(fn [[k f]] (let [rv (res k)] (if-not (nil? rv) (f rv) false)))
filters)))))
On the other hand, :bind
fns must accept two args:
- the full result map and
- the related qvar’s value.
The return value will be bound to the specified qvar in each
result/solution, unless the function returns nil
.
Important: For subqueries, qvar injection via :bind
is only
applied post-filtering.
(defn compile-bind-fn
"Takes a map of qvar bindings as keys and bind fns as values.
Returns a fn which accepts a single result map and returns updated
map with injected/updated qvars. Bind fns must accept two args: the
full result map and the value of the fn's associated qvars. The fn's
return value is used as new binding for the qvar unless nil. Bind
fns are called regardless if qvar is currently bound (non-nil). This
is in order to allow for the injection of new vars into the
solution."
[bindings]
(if bindings
(if (fn? bindings)
bindings
(fn [res]
(reduce-kv
(fn [acc k f]
(let [rv (f res (res k))]
(if (nil? rv) acc (assoc acc k rv))))
res bindings)))))
In order to support multi-graph queries, each sub-query can specify its
own graph to act on using the :from
option. If not specified, the
sub-query inherits the value stated in the main spec.
;; Multigraph example, define two graphs:
;; world-graph contains stats of countries & cities
;; uk-graph contains UK specifics
(def world-graph
(trio/as-model
'{usa {type country}
brazil {type country}
uk {type country}
nyc {type city, population 8.4e6, located-in usa}
sao-paulo {type city, population 20.3e6, located-in brazil}
london {type city, population 9.8e6, located-in uk}
manchester {type city, population 2.55e6, located-in uk}
cambridge {type city, population 1.24e5, located-in uk}}))
(def uk-graph
(trio/as-model
'[[LDN name london]
[MAN name manchester]
[CAM name cambridge]
[[CAM LDN MAN] type city]
[[brixton camden westminster] part-of LDN]
[[ardwick bradford hulme] part-of MAN]
[[abbey castle market] part-of CAM]
[[brixton camden westminster] type borough]
[[abbey ardwick bradford castle hulme market] type ward]]))
Visualization of the two graphs combined, with the UK graph shown in red:
;; find all cities w/ population > 1 mil
;; optionally find city regions for any city in UK graph
;; return sorted by ?country -> ?city -> ?region
(q/query
{:select '[?country ?city ?region]
:query [{:where '[[?country type country]
[?city located-in ?country]
[?city population ?pop]]
:filter {'?pop #(> % 1e6)}
:from world-graph}
{:optional '[[?c name ?city] [?region part-of ?c]]
:from uk-graph}]
:order-asc '[?country ?city ?region]})
;; => ({?country brazil, ?city sao-paulo} {?country uk, ?city london, ?region brixton}
;; {?country uk, ?city london, ?region camden} {?country uk, ?city london, ?region westminster}
;; {?country uk, ?city manchester, ?region ardwick} {?country uk, ?city manchester, ?region bradford}
;; {?country uk, ?city manchester, ?region hulme} {?country usa, ?city nyc})
As discussed above, specifying a :filter
allows us to constrain the
solution space of a query, however a filter is always applied after
each sub-query and so might have to deal with a large amount of
extraneous results. Whereas a filter can specify arbitrary criteria,
for some use cases we’re only interested in certain qvars taking a
number of possible constants/values and it would be more efficient to
pre-constrain the search space. This is achieved via the :values
option:
;; Example graph of randomly generated log messages
(def g
(->> #(vector
(rand-nth [:server :queue :scheduler :render :db :api])
(rand-nth [:debug :info :warning])
(rand-nth ["m1" "m2" "m3" "m4" "m5" "m6" "m7" "m8" "m9"]))
(repeatedly 100)
(trio/as-model)))
;; using :filter - select all messages for :server or :db process
;; note: the pattern [?p ?log ?msg] matches *all* triples in the graph
(q/query
{:select :*
:from g
:query [{:where '[[?p ?log ?msg]] :filter {'?p #{:server :db}}}]})
;; => ({?p :server, ?log :debug, ?msg "m7"} {?p :db, ?log :warn, ?msg "m9"} ...)
;; using :values - constrain ?p to either :server or :db *prior* to querying
;; produces same result, but with reduced effort
(q/query
{:select :*
:from g
:query [{:where '[[?p ?log ?msg]] :values {'?p #{:server :db}}}]})
Important: If :values
are also defined for the top-level query,
then they’re merged with the :values
bindings of the sub-query
(using merge-with
and combining both value sets).
(defn query-opts
"Takes a query options map and injects compiled versions for
:filter & :bind keys."
[opts]
(-> opts
(update-in [:filter] compile-filter-fn)
(update-in [:bind] compile-bind-fn)))
(defn- maybe-deref
[x] (try @x (catch #?(:clj Exception :cljs js/Error) e x)))
(defn- unpack-model
[model]
(let [model (maybe-deref model)]
(if (satisfies? api/PModelSelect model)
model
(if (satisfies? api/PModelConvert model)
(api/as-model model)
(err/illegal-arg! "Referenced data model doesn't implement PModelSelect")))))
(defmethod compile-query-step :join
[qfn {:keys [from values where] :as q} _]
(debug :join where values)
(let [opts (query-opts q)]
(fn [res]
(let [a (qfn res)]
(if (or (= ::empty a) (seq a))
(let [b (select-join (unpack-model from) values opts where)
res' (if (= ::empty a) b (join a b))]
res')
a)))))
(defmethod compile-query-step :minus
[qfn {:keys [from values minus] :as q} _]
(debug :minus minus values)
(let [opts (query-opts q)]
(fn [res]
(let [a (qfn res)]
(if (seq a)
(thi.ng.trio.query/minus a (select-join (unpack-model from) values opts minus))
a)))))
(defmethod compile-query-step :optional
[qfn {:keys [from values optional] :as q} _]
(debug :join-opt optional values)
(let [opts (query-opts q)]
(fn [res]
(let [a (qfn res)]
(if (or (= ::empty a) (seq a))
(let [b (select-join (unpack-model from) values opts optional)
res' (if (= ::empty a) b (join-optional a b))]
res')
a)))))
(defmethod compile-query-step :union
[qfn {:keys [from values union] :as q} _]
(debug :union union values)
(let [opts (query-opts q)]
(fn [res]
(let [a (qfn res)
b (select-join (unpack-model from) values opts union)
res' (if (= ::empty a) b (thi.ng.trio.query/union a b))]
res'))))
(defmethod compile-query-step :query-dispatch
[qfn {:keys [from values query]} _]
(debug :query-dispatch query)
(loop [qfn qfn, query query, first? true]
(if query
(let [q (first query)
from (or (:from q) from)
from' (maybe-deref from)
q (if (or (satisfies? api/PModelSelect from')
(satisfies? api/PModelConvert from'))
(assoc q :from from))
q (if values
(assoc q :values (merge-with (fnil into #{}) values (:values q)))
(update-in q [:values] (fnil identity {})))
_ (when (and first? (seq (select-keys q [:optional :minus :union])))
(err/illegal-arg! "First sub-query type must be of type :where"))
qfn (cond
(:where q) (compile-query-step qfn q :join)
(:optional q) (compile-query-step qfn q :optional)
(:minus q) (compile-query-step qfn q :minus)
(:union q) (compile-query-step qfn q :union))]
(recur qfn (next query) false))
qfn)))
(defmethod compile-query-step :bind
[qfn bind _]
(debug :bindings bind)
(let [bind (compile-bind-fn bind)]
(fn [res] (map bind (qfn res)))))
(defmethod compile-query-step :order-asc
[qfn order _]
(debug :order-asc order)
(fn [res] (order-asc order (qfn res))))
(defmethod compile-query-step :order-desc
[qfn order _]
(debug :order-desc order)
(fn [res] (order-desc order (qfn res))))
(defmethod compile-query-step :distinct-bindings
[qfn spec _]
(if (:group spec)
(fn [res]
(reduce-kv (fn [acc k v] (assoc acc k (into #{} v))) {} (qfn res)))
(fn [res]
(into #{} (qfn res)))))
All top-level query types are based on an initial :select
query and
result variable projection & filtering is the final transformation
step of a :select
. During this phase the engine does the following
(in this order):
- result seq is transformed into a map when
:group
is enabled (optional) - aggregation functions are run (optional)
- results are passed through top-level filter (optional)
- any qvars not requested by the user (but which are present in the solutions) are removed
- any qvars with aliasing or transformation fns are renamed/processed
Important: This processing order implies that any top-level filters can only be applied to “normal” or aggregated qvars (i.e. not their renamed/transformed versions).
For :select
queries, the engine supports the implicit renaming of
qvars in the set of solutions, as well as the option to transform a
qvar’s value during the final projection.
:select value | Description |
---|---|
:* | Keep all bound qvars in result maps |
‘?v | Result maps only contain given qvar |
’[?v1 ?v2 …] | Result maps only contain given qvars |
{‘?v (fn [res] (* 2 (res ‘?v)))} | Result maps only contain qvars used as keys in the map |
and their values as transformed by fn | |
Note: The fn receives the full result map | |
{‘?alias ?orig} | Result maps only contain ?alias but values from ?orig |
{‘?alias {:use ‘?v :fn (fn [v] …)}} | Result maps only contain qvars used as keys in the map |
but take (optionally transformed) value of qvar given for :use | |
Note: the :fn key is optional | |
[?v1 {?v2 #(…) ?v3 {:use … :fn …}}] | Combination of all above options is possible too |
As shown in the table, in order to rename and/or transform a qvar binding during projection we need to use a map in one of these formats:
;; solution contains ?a and ?b-new (as alias for ?b)
:select ['?a {'?b-new '?b}]
;; solution contains ?a and ?b-new (as alias for ?b and transformed ?b values)
:select ['?a {'?b-new {:use '?b :fn #(* 100.0 %)}}]
;; solution contains ?a and ?b (transformation fn receives full result map)
:select ['?a {'?b (fn [res] (* (res '?b) (inc (res '?tax))))}]
;; same as above, but only ?b (don't need to wrap map in vector)
:select {'?b (fn [res] (* (res '?b) (inc (res '?tax))))}
The following example graph maps authors to their books & prices:
;; data store of authors, books & prices
(def ds
(trio/as-model
[[:a1 :author [:b1 :b2]]
[:a2 :author [:b1 :b3]]
[:a3 :author :b4]
[:b1 :price 10]
[[:b2 :b4] :price 20]
[:b3 :price 5]]))
Let’s query this graph to return all books with price < 20 and calculate their retail price incl. taxes:
(q/query
{:select ['?b {'?net '?p, '?retail {:use '?p :fn #(* 1.2 %)}}]
:from ds
:query '[{:where [[?a :author ?b] [?b :price ?p]]}]
:filter {'?p #(< % 20)}})
;; => ({?b :b3, ?net 5, ?retail 6.0}
;; {?b :b1, ?net 10, ?retail 12.0}
;; {?b :b1, ?net 10, ?retail 12.0})
(defn renamed-vars
[binding-map]
(reduce-kv
(fn [vars k v]
(if (and (map? v) (:as v))
(assoc vars (:as v) v)
(assoc vars k v)))
{} binding-map))
(defn selected-var-keys
[qvars]
(reduce
(fn [vars v]
(if (map? v)
(merge vars v)
(assoc vars v v)))
{} qvars))
(defn parse-var-bind-spec
[v spec] (if (map? spec) [(or (:use spec) v) (:fn spec)] [nil spec]))
(defn compute-select-bindings
[res svars]
(reduce-kv
(fn [acc v spec]
(let [[k f] (parse-var-bind-spec v spec)
v' (if k
(if (fn? f) (f (res k)) (res k))
(f res))]
(if v'
(assoc acc v v')
acc)))
{} svars))
(defn validate-selected-vars
[svars]
(if (svars :*)
(if (> (count svars) 1)
(err/illegal-arg! (str "can't use :* selector with other vars: " (vals svars)))
(dissoc svars :*))
svars))
(defmethod compile-query-step :project-vars
[qfn {:keys [select group aggregate filter] :as q} _]
(debug :project-vars select)
(let [select (if (sequential? select) select [select])
svars (selected-var-keys select)
catch-all? (svars :*)
svars (validate-selected-vars svars)
agg-only? (every? (set (keys aggregate)) (keys svars))
filter (compile-filter-fn filter)]
(if group
(compile-query-step qfn [group aggregate svars filter agg-only? catch-all?] :project-groups)
(if (seq aggregate)
(if (and agg-only? (not catch-all?))
(compile-query-step qfn [aggregate svars filter] :project-vars-aggregates)
(compile-query-step qfn [aggregate svars catch-all? filter] :project-vars-mixed))
(if catch-all?
(compile-query-step qfn filter :project-vars-all)
(compile-query-step qfn [svars filter] :project-vars-simple))))))
The query engine supports the computation of aggregate values
(reductions) for both grouped and ungrouped result sets. Aggregated
qvars are defined via the top-level :aggregate
key which is a map
with qvars as keys and their reduction functions as values.
There are differences in how such aggregated qvars appear in the solution, based on if grouping is enabled or not. If no group criteria are given, aggregate qvars and their values are injected into every single result map of the solution seq. This table gives a better overview of the expected outcomes when aggregation is used:
Grouping | Aggregated vars selected | Non-aggregated vars selected | Solution format |
---|---|---|---|
Y | Y | Y | map keyed by group, each group a seq of result maps w/ agg. qvars injected in each |
Y | Y | N | map keyed by group, each key only single map of agg. qvars |
Y | N | Y | map keyed by group, each group a seq of result maps, no agg. qvars (pointless) |
N | Y | Y | seq of result maps, each w/ selected qvars and agg. qvars injected in each |
N | Y | N | map w/ only agg. qvars |
N | N | Y | seq of result maps, each w/ selected qvars |
Note: Aggregated qvars will not automatically appear in the solution
unless stated in the list of :select
qvars. They can be further
aliased & transformed during projection just as “normal” qvars…
Whereas aggregation is supported for all query types, result grouping
is only supported for :select
type queries and causes the solutions
to be returned as map instead of a flat sequence. The returned map is
keyed by the result value of the stated qvars or function given for
the :group
option in the query spec.
Multi-value clustering can be achieved by specifying several qvars as sequence. In this case, the keys of the solution map are unique tuples of the given qvar values. If a function is used to compute the grouping criteria, that function will be called for each individual result map and the return value is used to define grouping.
Let’s use once more the above example graph of book authors. We will now query this graph using grouping and compute some aggregate stats for each author. (Here we use some predefined aggregation functions, see section further below)
;; Aggregate query example
;; retrieve book prices, group by author
;; compute avg. & max price, group size per author
;; filter by avg.
(q/query
{:select '[?avg ?max ?num]
:from ds
:query '[{:where [[?a :author ?b] [?b :price ?p]]}]
:group '?a
:aggregate {'?avg {:use '?p :fn q/mean}
'?max {:use '?p :fn q/max}
'?num count}
:filter {'?avg #(> % 10)}})
;; => {:a3 {?avg 20.0, ?max 20, ?num 1}, :a1 {?avg 15.0, ?max 20, ?num 2}}
;; produce map of authors and their collected books
(q/query
{:select :books
:from ds
:query '[{:where [[?a :author ?b]]}]
:group '?a
:aggregate {:books {:use '?b :fn #(into #{} %)}}})
;; => {:a1 {:books #{:b2 :b1}}, :a2 {:books #{:b1 :b3}}, :a3 {:books #{:b4}}}
The following functions and compiler steps are used to produce optimized code paths based on the analysis of qvars selected for projection, both for grouped & ungrouped results.
(defn compile-group-fn
[group]
(if (fn? group)
group
(if (sequential? group)
(fn [res] (reduce #(conj % (res %2)) [] group))
(fn [res] (res group)))))
(defn compute-aggregates
[vars vals]
(reduce-kv
(fn [inner v spec]
(let [[k f] (parse-var-bind-spec v spec)]
(if k
(assoc inner v (f (map #(get % k) vals)))
(assoc inner v (f vals)))))
{} vars))
(defn project-aggregates-only
[avars svars filter]
(fn [vals]
(let [agg (compute-aggregates avars vals)]
(if (or (nil? filter) (filter agg))
(compute-select-bindings agg svars)))))
(defn project-aggregates-mixed
[avars svars all? flt]
(fn [vals]
(let [agg (compute-aggregates avars vals)
vals (map #(merge % agg) vals)
vals (if flt (filter flt vals) vals)]
(if (seq vals)
(if all?
vals
(map #(compute-select-bindings % svars) vals))))))
(defn project-aggregates-simple
[vars flt]
(fn [vals]
(let [vals (if flt (filter flt vals) vals)]
(if (seq vals)
(map #(compute-select-bindings % vars) vals)))))
(defn project-aggregates-all
[flt]
(fn [vals]
(let [vals (if flt (filter flt vals) vals)]
(if (seq vals) vals))))
(defmethod compile-query-step :project-vars-aggregates
[qfn [avars svars filter] _]
(debug :aggregates-only avars filter)
(let [project-fn (project-aggregates-only avars svars filter)]
(fn [res] (->> (qfn res) (project-fn)))))
(defmethod compile-query-step :project-vars-mixed
[qfn [avars svars all? flt] _]
(debug :mixed-aggregates avars svars all? flt)
(let [project-fn (project-aggregates-mixed avars svars all? flt)]
(fn [res] (->> (qfn res) (project-fn)))))
(defmethod compile-query-step :project-vars-simple
[qfn [vars flt] _]
(debug :simple-aggregates vars flt)
(let [project-fn (project-aggregates-simple vars flt)]
(fn [res] (->> (qfn res) (project-fn)))))
(defmethod compile-query-step :project-vars-all
[qfn flt _]
(debug :all-aggregates flt)
(let [project-fn (project-aggregates-all flt)]
(fn [res] (->> (qfn res) (project-fn)))))
(defn project-groups-with
[project-fn res]
(reduce-kv
(fn [acc k vals]
(if-let [v (project-fn vals)]
(assoc acc k v)
acc))
{} res))
(defmethod compile-query-step :group-aggregates-only
[qfn [avars svars filter] _]
(debug :aggregates-only avars filter)
(let [project-fn (project-aggregates-only avars svars filter)]
(fn [res] (->> (qfn res) (project-groups-with project-fn)))))
(defmethod compile-query-step :group-aggregates-mixed
[qfn [avars svars all? flt] _]
(debug :mixed-aggregates avars svars all? flt)
(let [project-fn (project-aggregates-mixed avars svars all? flt)]
(fn [res] (->> (qfn res) (project-groups-with project-fn)))))
(defmethod compile-query-step :group-aggregates-simple
[qfn [vars flt] _]
(debug :simple-aggregates vars flt)
(let [project-fn (project-aggregates-simple vars flt)]
(fn [res] (->> (qfn res) (project-groups-with project-fn)))))
(defmethod compile-query-step :group-aggregates-all
[qfn flt _]
(debug :all-aggregates flt)
(let [project-fn (project-aggregates-all flt)]
(fn [res] (->> (qfn res) (project-groups-with project-fn)))))
(defmethod compile-query-step :project-groups
[qfn [group aggregate svars filter agg-only? catch-all?] _]
(debug :project-groups group aggregate)
(let [group (compile-group-fn group)
qfn (fn [res] (group-by group (qfn res)))]
(if (seq aggregate)
(if (and agg-only? (not catch-all?))
(compile-query-step qfn [aggregate svars filter] :group-aggregates-only)
(compile-query-step qfn [aggregate svars catch-all? filter] :group-aggregates-mixed))
(if catch-all?
(compile-query-step qfn filter :group-aggregates-all)
(compile-query-step qfn [svars filter] :group-aggregates-simple)))))
(defn mean
"Aggregation function, computes mean of given values (as double)."
[vals] (double (/ (reduce + vals) (count vals))))
(defn median
"Aggregation function, computes median of given values."
[vals]
(nth (sort vals) (/ (count vals) 2)))
(defn min
"Aggregation function, computes minimum of given values."
[vals] (reduce clojure.core/min vals))
(defn max
"Aggregation function, computes maximum of given values."
[vals] (reduce clojure.core/max vals))
(defn percentile
"Aggregation HOF, takes percentile value (0-100), returns aggration
function accepting a seq of query var results and computes requested
percentile."
[n]
{:pre [(integer? n) (m/in-range? 0 100 n)]}
(fn [vals]
(m/percentile n (sort vals))))
(defn quartile
"Aggregation HOF, takes quartile value (1-4), returns aggration
function accepting a seq of query var results and computes requested
quartile subset."
[n]
{:pre [(integer? n) (m/in-range? 1 4 n)]}
(fn [vals]
(m/quartile n (sort vals))))
(defn bins-of
"Grouping HOF, accepts a bin size and query var, returns grouping
function binning values of qvar to steps of bin size."
[n qvar]
{:pre [(number? n) (pos? n) (qvar? qvar)]}
(fn [res] (* (m/floor (/ (res qvar) n)) n)))
We have seen how various phases of the query compiler define their own query spec validators. The following helper functions are used to apply them. If validation fails, query compilation halts and an error/exception is thrown with the error message containing a stringified map of explanations/reasons for each failed key.
;; failed validation example
(q/query {:select :* :query [{:where '[[?a author ?b]]}]})
;; => IllegalArgumentException error compiling query: {:from ("is required")}
(defn validate
[q spec]
(try
(let [[ret err] (v/validate q spec)]
(if-not err
ret
(err/throw! (str "Error compiling query: " err))))
(catch #?(:clj Exception :cljs js/Error) e
(err/throw! #?(:clj (.getMessage e) :cljs e)))))
The functions in this section do the actual heavy-lifting and together
with the select
method of a triplestore’s PModelSelect
protocol
implementation provide the pattern matching functionality the query
engine relies on.
(def ^:dynamic *auto-qvar-prefix* "?__q")
(def qvar?
"Returns true, if x is a qvar (a symbol prefixed with '?')"
(fn [x] (and (symbol? x) (= \? (.charAt ^String (name x) 0)))))
(defn auto-qvar?
"Returns true, if x is an auto-generated qvar
(a symbol prefixed with *auto-qvar-prefix*)"
[x] (and (symbol? x) (zero? (.indexOf ^String (name x) ^String *auto-qvar-prefix*))))
(defn auto-qvar
"Creates a new auto-named qvar (symbol)."
[] (gensym *auto-qvar-prefix*))
(defn qvar-name
[x] (-> x name (subs 1)))
(defn collect-source-graphs
[q] (->> q :query (into #{(:from q)} (map :from)) (filter identity)))
The table below provides an overview of possible property paths which can be defined for each search pattern. In principle these are based on the options defined for SPARQL, albeit using different syntax.
Path pattern | Description | Meaning |
---|---|---|
[?s [p1 p2] ?o] | fixed length path between ?s , ?o | equivalent to: {:where '[[?s p1 ?x] [?x p2 ?o]} |
[?s [:? p1 p2] ?o] | fixed length path, optional | if path exists fully, bind ?s , ?o |
[?s [:+ p1 p2] ?o] | full path, cyclic | if path exists fully one or more times, bind ?s , ?o |
[?s [:* p1 p2] ?o] | partial path, cyclic | if path exists partially one or more times, bind ?s , ?o |
[?s [:not p] ?o] | single edge not via p | equivalent to: {:where [?s ?pp ?o]} {:minus [?s p ?o]} |
[?s [:or p1 p2] ?o] | single edge p1 or p2 | equivalent to: {:where [?s p1 ?o]} {:union [?s p2 ?o]} |
Important: Please note, that property paths are not yet supported by
the highlevel query engine, but are actively being worked on.
So far, only the first four property path styles can be directly
queried (in isolation) using the low-level select-transitive
function described further below.
(defn resolve-path-pattern
"Takes a path triple pattern where the predicate is a seq of preds.
Returns seq of query patterns with injected temp qvars for inbetween
patterns. E.g.
[?s [p1 p2 p3] ?o]
=> ([?s p1 ?__q0] [?__q0 p2 ?__q1] [?__q1 p3 ?o])"
[[s p o]]
(let [vars (->> auto-qvar
(repeatedly (dec (count p)))
(cons s))]
(->> (concat (interleave vars p) [o])
(d/successive-nth 3 2))))
(defn resolve-patterns
[patterns]
(mapcat
(fn [[_ p :as t]]
(if (vector? p)
(resolve-path-pattern t)
[t]))
patterns))
(defn produce-patterns-with-bound-vars
"Takes a triple pattern (possibly with variables) and a map of
possible value sets (each *must* be a set or single value) for each var.
Produces lazy seq of resulting triple query patterns using cartesian
product of all values.
(produce-patterns-with-bound-vars
[?a :type ?b]
{?a #{\"me\" \"you\"} ?b #{\"foo\" \"bar\"})
=> ((\"me\" :type \"foo\") (\"me\" :type \"bar\")
(\"you\" :type \"foo\") (\"you\" :type \"bar\"))"
[[s p o] bindings]
(let [s (or (bindings s) s)
p (or (bindings p) p)
o (or (bindings o) o)]
(if (some set? [s p o])
(d/cartesian-product
(if (set? s) s #{s}) (if (set? p) p #{p}) (if (set? o) o #{o}))
[[s p o]])))
Proper query planning is still an active TODO item. The only step towards this so far is the following function, which sorts the patterns of an single sub-query based on an heuristic prioritizing patterns with the least qvars and therefore producing the most constrained, initial result set.
(defn sort-patterns
"Sorts a seq of triple patterns in dependency order using any
re-occuring vars. Triples with least qvars will be in head
position."
[patterns]
(let [q (map #(let [v (d/filter-tree qvar? %)] [(count v) v %]) patterns)
singles (->> q (filter #(= 1 (first %))) (mapcat second) set)]
(->> q
(sort-by (fn [[c v]] (- (* c 4) (count (filter singles v)))))
(map peek))))
(defn triple-verifier
"Takes a triple pattern (potentially with vars) and 3 booleans to
indicate which SPO is a var. Returns fn which accepts a result triple
and returns false if any of the vars clash (e.g. a qvar is used multiple
times but result has different values in each position or likewise, if
different vars relate to same values)."
[[ts tp to] vars varp varo]
(cond
(and vars varp varo) (cond
(= ts tp to) (fn [r] (= (r 0) (r 1) (r 2)))
(= ts tp) (fn [r] (and (= (r 0) (r 1)) (not= (r 0) (r 2))))
(= ts to) (fn [r] (and (= (r 0) (r 2)) (not= (r 0) (r 1))))
(= tp to) (fn [r] (and (= (r 1) (r 2)) (not= (r 0) (r 1))))
:else (constantly true))
(and vars varp) (if (= ts tp)
(fn [r] (= (r 0) (r 1)))
(fn [r] (not= (r 0) (r 1))))
(and vars varo) (if (= ts to)
(fn [r] (= (r 0) (r 2)))
(fn [r] (not= (r 0) (r 2))))
(and varp varo) (if (= tp to)
(fn [r] (= (r 1) (r 2)))
(fn [r] (not= (r 1) (r 2))))
:else (constantly true)))
(defn unique-bindings?
"Returns true if all values in the given map are unique, i.e.
no two keys are mapped to the same value."
[map] (== (count (into #{} (vals map))) (count map)))
(defn accumulate-result-vars
"Takes a seq of query result maps and returns a map of all found
qvars as keys and their value sets."
([res] (accumulate-result-vars {} nil res))
([acc vars res]
(let [acc (reduce
(fn [acc b]
(merge-with
(fn [a b] (if (set? a) (conj a b) (hash-set a b)))
acc (if vars (select-keys b vars) b)))
acc res)]
(reduce-kv
(fn [acc k v] (if (set? v) acc (assoc acc k #{v})))
acc acc))))
(defn select-renamed-keys
"Similar to clojure.core/select-keys, but instead of key seq takes a
map of keys to be renamed (keys in that map are the original keys to
be selected, their values the renamed keys in returned map)."
[ret map aliases]
(loop [ret ret, keys aliases]
(if keys
(let [kv (first keys)
entry (find map (key kv))]
(recur
(if entry
(assoc ret (val kv) (val entry))
ret)
(next keys)))
ret)))
(defn order-asc
[vars res]
(if (coll? vars)
(sort-by (fn [r] (reduce #(conj % (r %2)) [] vars)) res)
(sort-by (fn [r] (get r vars)) res)))
(defn order-desc
[vars res]
(if (coll? vars)
(sort-by
(fn [r] (reduce #(conj % (r %2)) [] vars))
#(- (compare % %2))
res)
(sort-by #(get % vars) #(- (compare % %2)) res)))
(defn distinct-result-set
[res]
(->> res
(reduce
(fn [acc r]
(let [vs (set (vals r))]
(if (acc vs) acc (assoc acc vs r))))
{})
(vals)))
(defn keywordize-result-vars
[res]
(map
(fn [r] (into {} (map (fn [[k v]] [(-> k qvar-name keyword) v]) r)))
res))
(defn triples->dot
"Takes a seq of triples and returns them as digraph spec in
Graphviz .dot format."
[triples]
(apply
str
(concat
"digraph G {\n"
"node[color=\"black\",style=\"filled\",fontname=\"Inconsolata\",fontcolor=\"white\"];\n"
"edge[fontname=\"Inconsolata\",fontsize=\"9\"];\n"
(map
(fn [t]
(let [[s p o] (map #(if (string? %) % (pr-str %)) t)]
(str "\"" s "\" -> \"" o "\" [label=\"" p "\"];\n")))
triples)
"}")))
(defn join
[a b]
(->> (set/join a b)
(seq)
#_(mapcat
(fn [k]
(if (unique-bindings? k)
[k])))))
(defn join-optional
[a b]
(loop [old (transient #{}), new (transient #{}), kb b]
(if kb
(let [kb' [(first kb)]
[old new] (loop [old old, new new, ka a]
(if ka
(let [ka' (first ka)
j (first (set/join [ka'] kb'))]
(if j
(recur (conj! old ka') (conj! new j) (next ka))
(recur old new (next ka))))
[old new]))]
(recur old new (next kb)))
(let [new (persistent! new)]
(if (seq new)
(into (apply disj (set a) (persistent! old)) new)
a)))))
(defn minus
[a b]
(let [vars (accumulate-result-vars b)]
(reduce-kv
(fn [res k v]
(let [v (if (coll? v) v [v])]
(reduce
(fn [res v] (vec (remove (fn [r] (= (r k) v)) res))) res v)))
a vars)))
(defn union
[a b]
(debug :union-ab a b)
(concat a b))
(defn unbound-var?
[bindings v? x] (and v? (not (bindings x))))
(defn bind-translator
[vs? vp? vo? [s p o]]
(if vs?
(if vp?
(if vo?
(fn [r] {s (r 0) p (r 1) o (r 2)})
(fn [r] {s (r 0) p (r 1)}))
(if vo?
(fn [r] {s (r 0) o (r 2)})
(fn [r] {s (r 0)})))
(if vp?
(if vo?
(fn [r] {p (r 1) o (r 2)})
(fn [r] {p (r 1)}))
(if vo?
(fn [r] {o (r 2)})
(fn [_] {})))))
(defn unbound-vars-translator
[bindings vs? vp? vo? [s p o]]
(if (unbound-var? bindings vs? s)
(if (unbound-var? bindings vp? p)
(if (unbound-var? bindings vo? o)
(fn [p] [nil nil nil])
(fn [p] [nil nil (p 2)]))
(if (unbound-var? bindings vo? o)
(fn [p] [nil (p 1) nil])
(fn [p] (assoc p 0 nil))))
(if (unbound-var? bindings vp? p)
(if (unbound-var? bindings vo? o)
(fn [p] [(p 0) nil nil])
(fn [p] (assoc p 1 nil)))
(if (unbound-var? bindings vo? o)
(fn [p] (assoc p 2 nil))
identity))))
(defn select-with-bindings-alts
[ds bindings opts [s p o :as t]]
(let [vs? (qvar? s), vp? (qvar? p), vo? (qvar? o)
vmap (bind-translator vs? vp? vo? t)
verify (triple-verifier t vs? vp? vo?)
flt (compile-filter-fn (opts :filter))
res-fn (if (:triples opts)
(if flt
#(if (verify %)
(let [vbinds (vmap %)]
(if (flt vbinds) [vbinds %])))
#(if (verify %) [(vmap %) %]))
(if flt
#(if (verify %)
(let [vbinds (vmap %)]
(if (flt vbinds) vbinds)))
#(if (verify %) (vmap %))))
qs (if vs? (bindings s) s)
qp (if vp? (bindings p) p)
qo (if vo? (bindings o) o)
_ (debug :select-alt qs qp qo)
res (api/select-with-alts ds qs qp qo)]
(if (seq res)
(loop [acc (transient []), res res]
(if res
(let [r (res-fn (first res))]
(if r
(recur (conj! acc r) (next res))
(recur acc (next res))))
(persistent! acc))))))
(defn select-with-bindings
[ds bindings opts [s p o :as t]]
(let [patterns (produce-patterns-with-bound-vars t bindings)
vs? (qvar? s), vp? (qvar? p), vo? (qvar? o)
vmap (bind-translator vs? vp? vo? t)
pmap (unbound-vars-translator bindings vs? vp? vo? t)
verify (triple-verifier t vs? vp? vo?)
flt (compile-filter-fn (opts :filter))
res-fn (if (:triples opts)
(if flt
#(if (verify %)
(let [vbinds (vmap %)]
(if (flt vbinds) [vbinds %])))
#(if (verify %) [(vmap %) %]))
(if flt
#(if (verify %)
(let [vbinds (vmap %)]
(if (flt vbinds) vbinds)))
#(if (verify %) (vmap %))))]
(debug :patterns patterns)
(loop [acc (transient []), ps patterns]
(if ps
(let [[qs qp qo] (pmap (vec (first ps)))
_ (debug :select qs qp qo)
res (api/select ds qs qp qo)]
(if (seq res)
(recur
(loop [acc acc, res res]
(if res
(let [r (res-fn (first res))]
(if r
(recur (conj! acc r) (next res))
(recur acc (next res))))
acc))
(next ps))
(recur acc (next ps))))
(persistent! acc)))))
(defn select-join
([ds patterns]
(select-join ds {} {} patterns))
([ds bindings {bind :bind flt :filter opt? :optional?} patterns]
(let [[p & ps] (sort-patterns patterns)
res (select-with-bindings ds bindings {} p)
join-fn (if opt? join-optional join)
flt (compile-filter-fn flt)
bind (compile-bind-fn bind)]
(if (seq res)
(loop [res res, ps ps]
(if ps
(let [binds (merge-with into (accumulate-result-vars res) bindings)
r' (select-with-bindings-alts ds binds {} (first ps))
res (if (seq r') (join-fn res r'))]
(if (seq res)
(recur res (next ps))))
(cond->> res
flt (filter flt)
bind (map bind))))))))
If using equal min/max search depth, resort to select-join
with
expanded pattern. In that case the predicate can be a seq [pred1
pred2 ..]
which is expanded cyclically for depth
hops, e.g:
(select-transitive ds '[?s [mother father] ?o] 3 3)
Else the fn takes a standard pattern with the subject or object either a given constant or qvar:
Select all ‘mother’ rels from 1 - 3 hops (i.e. from mother -> great-grandmother)
(select-transitive ds '[?s mother ?c] 1 3)
If the object is not a qvar, the search is executed in reverse direction:
(select-transitive ds '[?m mother 'toxi])
Select only grandchildren for given person:
(select-transitive ds '[john father ?c] 2 2) ;; same as... (select-join ds '[[john father ?p] [?p father ?c]])
(defn subvec-slices
"Takes a min & max count and returns function accepting a vector as
single arg. When called, returns vector of subvec slices each starting
at index 0 and with an increasing length from min to max."
[n1 n2]
(fn [path]
(mapv #(subvec path 0 %) (range n1 (inc (clojure.core/min (count path) n2))))))
(defn dfs-forward*
[ds s p sv acc min max]
(if (<= (count acc) max)
(let [acc (conj acc sv)
o (auto-qvar)
r (select-with-bindings ds {s sv} {} [s p o])]
(if (seq r)
(let [visited (set acc)
ovals (filter (comp not visited) (set (map o r)))]
(if (seq ovals)
(#?(:clj r/mapcat :cljs mapcat)
(fn [ov] (dfs-forward* ds o p ov acc min max))
ovals)
[acc]))
(if (> (count acc) min) [acc])))
[acc]))
(defn dfs-backward*
[ds o p ov acc min max]
(if (<= (count acc) max)
(let [acc (conj acc ov)
s (auto-qvar)
r (select-with-bindings ds {o ov} {} [s p o])]
(if (seq r)
(let [visited (set acc)
svals (filter (comp not visited) (set (map s r)))]
(if (seq svals)
(#?(:clj r/mapcat :cljs mapcat)
(fn [sv] (dfs-backward* ds s p sv acc min max))
svals)
[acc]))
(if (> (count acc) min) [acc])))
[acc]))
(defn select-transitive
([ds [s p o :as t]]
(if (vector? p)
(select-transitive ds t (count p) (count p))
(select-transitive ds t 1 1000000)))
([ds [s p o] mind maxd]
(let [mind (clojure.core/max mind 1)
maxd (clojure.core/max maxd 1)]
(if (= mind maxd)
(let [p (if (vector? p) p [p])
p (take mind (cycle p))]
(select-join ds {} {} (resolve-path-pattern [s p o])))
(let [vs? (qvar? s)
vo? (qvar? o)
v (auto-qvar)
conf (cond
(and vs? vo?) {:s s ;; [?s p ?o]
:o v
:bmap {}
:lookup (fn [b] [(b v) (b s)])
:bind (fn [p] {s (first p) o (peek p)})
:search dfs-forward*}
vo? {:s v ;; [x p ?o]
:o o
:bmap {v s}
:lookup (fn [b] [(b o) (b v)])
:bind (fn [p] {o (peek p)})
:search dfs-forward*}
vs? {:s s ;; [?s p x]
:o v
:bmap {v o}
:lookup (fn [b] [(b s) (b v)])
:bind (fn [p] {s (peek p)})
:search dfs-backward*})
{:keys [bind search]} conf
slices (subvec-slices (inc mind) (inc maxd))]
(->> (select-with-bindings ds (:bmap conf) {} [(:s conf) p (:o conf)])
(r/map (:lookup conf))
(r/mapcat #(search ds v p (% 0) [(% 1)] mind maxd))
(r/mapcat slices)
(r/map bind)
(into #{})))))))
(ns thi.ng.trio.query
(:refer-clojure :exclude [min max])
#?(:cljs
(:require-macros
[cljs-log.core :refer [debug info warn severe]]))
(:require
[thi.ng.trio.core :as api]
[thi.ng.dstruct.core :as d]
[thi.ng.dstruct.unionfind :as uf]
[thi.ng.math.core :as m]
[thi.ng.xerror.core :as err]
[thi.ng.validate.core :as v]
[clojure.set :as set]
[clojure.core.reducers :as r]
#?(:clj
[taoensso.timbre :refer [debug info warn error]])))
<<helpers>>
<<agg-helpers>>
<<joins>>
<<selectors>>
<<dfs>>
<<validators>>
<<compiler>>
<<compiler-api>>
(ns thi.ng.trio.test.query-ex1
(:require
[thi.ng.trio.core :as trio]
[thi.ng.trio.query :as q]
#?(:clj
[clojure.test :refer :all]
:cljs
[cemerick.cljs.test :refer-macros [is deftest]])))
<<test-ex1-a>>
<<test-ex1-c>>
<<test-ex1-d>>
(deftest test-example
(is (= '#{{?p carl ?c donald}
{?a "emily is 23 years old" ?p freya, ?c emily}
{?a "emily is 23 years old" ?p bob, ?c emily}}
(set
<<test-ex1-b>>
)))
(is (= '#{[freya spouse bob] [emily child-of bob]
[carl parent donald] [bob gender male]
[alice parent donald] [carl father donald]
[freya mother emily] [donald child-of alice]
[alice gender female] [alice mother donald]
[bob spouse freya] [emily child-of freya]
[freya gender female] [bob parent emily]
[carl spouse alice] [donald child-of carl]
[bob father emily] [alice spouse carl]
[carl gender male] [freya parent emily]}
(set (q/query infer-parents))))
(is (= '{emily [{?p bob ?c emily} {?p freya ?c emily}]
donald [{?p carl ?c donald} {?p alice ?c donald}]}
<<test-ex1-e>>
))
(is (= '#{[emily child-of bob] [emily child-of freya] [emily nick em] [emily age 23]
[donald friend emily] [freya mother emily] [freya parent emily] [bob parent emily] [bob father emily]}
(set
<<test-ex1-f>>
))))
(ns thi.ng.trio.test.query-ex2
(:require
[thi.ng.trio.core :as trio]
[thi.ng.trio.query :as q]
#?(:clj
[clojure.test :refer :all]
:cljs
[cemerick.cljs.test :refer-macros [is deftest]])))
<<test-ex2-a>>
(deftest text-example
(is (= '#{{?country brazil, ?city sao-paulo} {?country uk, ?city london, ?region brixton}
{?country uk, ?city london, ?region camden} {?country uk, ?city london, ?region westminster}
{?country uk, ?city manchester, ?region ardwick} {?country uk, ?city manchester, ?region bradford}
{?country uk, ?city manchester, ?region hulme} {?country usa, ?city nyc}}
(set
<<test-ex2-b>>
))))
(ns thi.ng.trio.test.query-ex3
(:require
[thi.ng.trio.core :as trio]
[thi.ng.trio.query :as q]
#?(:clj
[clojure.test :refer :all]
:cljs
[cemerick.cljs.test :refer-macros [is deftest]])))
<<test-ex3-a>>
(deftest test-example
(is (= '#{{?b :b3, ?net 5, ?retail 6.0}
{?b :b1, ?net 10, ?retail 12.0}}
(set
<<test-ex3-b>>
)))
(is (= '{:a3 {?avg 20.0, ?max 20, ?num 1}, :a1 {?avg 15.0, ?max 20, ?num 2}}
<<test-ex3-c>>
))
(is (= '{:a1 {:books #{:b2 :b1}}, :a2 {:books #{:b1 :b3}}, :a3 {:books #{:b4}}}
<<test-ex3-d>>
)))