-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stratified aggregation #37
Stratified aggregation #37
Conversation
src/main/scala/datalog/execution/SemiNaiveExecutionEngine.scala
Outdated
Show resolved
Hide resolved
enum StorageAggOp: | ||
case SUM, COUNT, MIN, MAX | ||
|
||
inline def getType(x: StorageConstant): Char = x match |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see why this is needed but one day would like to be able to remove it, as well as all the asInstanceOf
. We should be able to extend the Constant type to manage this eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I completely agree that this is a temporary solution. But I couldn't find another way that didn't require refactoring the entire Atom interface into a typed interface that forces declaring specific types for each 'column'. With dynamic types + domain interpretation, even if we assume that each constant has the appropriate type so we don't need to actively check it, we need to determine at runtime the specific types of the constants in order to apply the corresponding operator.
Unless you can come up with a better way to handle it, we either leave it as is or refactor the atom interface (not entirely trivial).
case SUM(override val t: Term) extends AggOp(t) | ||
case COUNT(override val t: Term) extends AggOp(t) | ||
case MIN(override val t: Term) extends AggOp(t) | ||
case MAX(override val t: Term) extends AggOp(t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one expression that is missing to have parity with the souffle docs is B(n, m) :- A(n, m), B(m, s), 2 * s = 2 * sum s : { A(n, s) } + 2.
.. I suppose that is just regular arithmetic, or maybe I misunderstood what that example is doing. Do you know what that would look like using this syntax?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have implemented a relatively simple version of aggregates, more similar to what can be found in the literature than Soufflé's. For example, in this version the grouping atoms must be over a single grouping relation (many grouping relations can be emulated by creating a new relation that joins them); and, as I implemented the aggregates before playing with arithmetic expressions, no arithmetic expressions can appear in the grouping atoms (I believe they are also syntactic sugar and can be emulated with other constructs, provided we include arithmetic expressions in carac at some point).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The closest you can get to coding the above rule is B(n, m) :- A(n, m), B(m, s), groupBy(A(n, x), Seq(n), SUM(x) -> s)
* Stratified aggregation * Add some tests and examples, fix broken tests * Handle anonymous variables in grouping atoms, add a few examples * Simplify Grouping volcano operator, add constants generated by aggregates to domain * Remove unused AST parameter * Move creation of static operations, finish StagedSnippet * Add bugfix comment
* Stratified aggregation * Add some tests and examples, fix broken tests * Handle anonymous variables in grouping atoms, add a few examples * Simplify Grouping volcano operator, add constants generated by aggregates to domain * Remove unused AST parameter * Move creation of static operations, finish StagedSnippet * Add bugfix comment
* Stratified aggregation * Add some tests and examples, fix broken tests * Handle anonymous variables in grouping atoms, add a few examples * Simplify Grouping volcano operator, add constants generated by aggregates to domain * Remove unused AST parameter * Move creation of static operations, finish StagedSnippet * Add bugfix comment
This PR adds support for stratified aggregation in carac. For example, the counting of the transitive closure can now be expressed as follows:
Changes
More specifically, it includes the following changes:
groupBy(ga(t1, ..., tn), [GV1, ..., GVm], AggOp.OP1(q1) -> AV1, ..., AggOp.OPk(qk) -> AVk)
. They are constructed using thegroupBy
object by specifying the grouping atom, the grouping variables and the different aggregations (operation, target and result). The grouping atoms are checked for validityPrecedenceGraph
now also checks that the aggregate rules are properly stratified, reusing logic from stratified negationCollectionsEDB
is extended with two new operations: distinct and groupMapReduce overCollectionsRow
VolcanoStorageManager
andDefaultStorageManager
rule evaluation logic now support grouping operationsGroupingLogicAtom
that hasAggOpNode
subtrees. TheStagedExecutionEngine
now builds the extended AST.CopyEliminationPass
is updated to properly handle grouping atomsGroupingOp
is introduced, and implemented by the interpreter and for three compilation targets:BytecodeCompiler
,LambdaCompiler
andQuoteCompiler
IROpTreeGenerator
now generates the grouping operationsStagedSnippetExecutionEngine
andStagedSnippetCompiler
Tests and examples
tc_count
test, with all execution engines)Moreover, some additional tests illustrate stratified aggregation: