Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stratified aggregation #37

Merged
merged 7 commits into from
Nov 9, 2023

Conversation

guillembartrina
Copy link
Collaborator

@guillembartrina guillembartrina commented Oct 19, 2023

This PR adds support for stratified aggregation in carac. For example, the counting of the transitive closure can now be expressed as follows:

val e = program.relation[Constant]("e")
val tc = program.relation[Constant]("tc")
val c = program.relation[Constant]("c")
val x, y, z = program.variable()

// Stratum 0
tc(x, y) :- e(x, y)
tc(x, y) :- (e(x, z), tc(z, y))

// Stratum 1
c(x, y) :- groupBy(tc(x, z), Seq(x), AggOp.COUNT(z) -> y)

// EDB
e("a", "b") :- ()
e("b", "c") :- ()
e("c", "d") :- ()
e("d", "e") :- ()
e("e", "f") :- ()
e("f", "g") :- ()

Changes

More specifically, it includes the following changes:

  • The DSL is extended to support grouping atoms, of the form groupBy(ga(t1, ..., tn), [GV1, ..., GVm], AggOp.OP1(q1) -> AV1, ..., AggOp.OPk(qk) -> AVk). They are constructed using the groupBy object by specifying the grouping atom, the grouping variables and the different aggregations (operation, target and result). The grouping atoms are checked for validity
  • PrecedenceGraph now also checks that the aggregate rules are properly stratified, reusing logic from stratified negation
  • A new specialized index aggregator is introduced for grouping predicates
  • CollectionsEDB is extended with two new operations: distinct and groupMapReduce over CollectionsRow
  • The VolcanoStorageManager and DefaultStorageManager rule evaluation logic now support grouping operations
  • The AST model is extended to support grouping atoms, with a new GroupingLogicAtom that has AggOpNodesubtrees. The StagedExecutionEngine now builds the extended AST.
  • CopyEliminationPass is updated to properly handle grouping atoms
  • A new IROp GroupingOp is introduced, and implemented by the interpreter and for three compilation targets: BytecodeCompiler, LambdaCompiler and QuoteCompiler
  • IROpTreeGenerator now generates the grouping operations
  • Added support to StagedSnippetExecutionEngine and StagedSnippetCompiler

Tests and examples

  • Check that malformed grouping atoms are property detected
  • Check that non-stratifiable aggregation is properly detected and aliases are correctly resolved
  • Counting of transitive closure (tc_count test, with all execution engines)

Moreover, some additional tests illustrate stratified aggregation:

  • Vertex degrees of a graph
  • Max vertex degree of a graph
  • Shortest edge in a graph

@guillembartrina guillembartrina changed the title WIP: Stratified aggregation Stratified aggregation Nov 4, 2023
@guillembartrina guillembartrina marked this pull request as ready for review November 4, 2023 08:33
enum StorageAggOp:
case SUM, COUNT, MIN, MAX

inline def getType(x: StorageConstant): Char = x match
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why this is needed but one day would like to be able to remove it, as well as all the asInstanceOf. We should be able to extend the Constant type to manage this eventually.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I completely agree that this is a temporary solution. But I couldn't find another way that didn't require refactoring the entire Atom interface into a typed interface that forces declaring specific types for each 'column'. With dynamic types + domain interpretation, even if we assume that each constant has the appropriate type so we don't need to actively check it, we need to determine at runtime the specific types of the constants in order to apply the corresponding operator.

Unless you can come up with a better way to handle it, we either leave it as is or refactor the atom interface (not entirely trivial).

case SUM(override val t: Term) extends AggOp(t)
case COUNT(override val t: Term) extends AggOp(t)
case MIN(override val t: Term) extends AggOp(t)
case MAX(override val t: Term) extends AggOp(t)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one expression that is missing to have parity with the souffle docs is B(n, m) :- A(n, m), B(m, s), 2 * s = 2 * sum s : { A(n, s) } + 2... I suppose that is just regular arithmetic, or maybe I misunderstood what that example is doing. Do you know what that would look like using this syntax?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have implemented a relatively simple version of aggregates, more similar to what can be found in the literature than Soufflé's. For example, in this version the grouping atoms must be over a single grouping relation (many grouping relations can be emulated by creating a new relation that joins them); and, as I implemented the aggregates before playing with arithmetic expressions, no arithmetic expressions can appear in the grouping atoms (I believe they are also syntactic sugar and can be emulated with other constructs, provided we include arithmetic expressions in carac at some point).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The closest you can get to coding the above rule is B(n, m) :- A(n, m), B(m, s), groupBy(A(n, x), Seq(n), SUM(x) -> s)

@aherlihy aherlihy merged commit d655b18 into aherlihy:main Nov 9, 2023
aherlihy pushed a commit that referenced this pull request Nov 28, 2023
* Stratified aggregation

* Add some tests and examples, fix broken tests

* Handle anonymous variables in grouping atoms, add a few examples

* Simplify Grouping volcano operator, add constants generated by aggregates to domain

* Remove unused AST parameter

* Move creation of static operations, finish StagedSnippet

* Add bugfix comment
aherlihy pushed a commit that referenced this pull request Nov 28, 2023
* Stratified aggregation

* Add some tests and examples, fix broken tests

* Handle anonymous variables in grouping atoms, add a few examples

* Simplify Grouping volcano operator, add constants generated by aggregates to domain

* Remove unused AST parameter

* Move creation of static operations, finish StagedSnippet

* Add bugfix comment
aherlihy added a commit that referenced this pull request Nov 29, 2023
aherlihy pushed a commit that referenced this pull request Nov 29, 2023
* Stratified aggregation

* Add some tests and examples, fix broken tests

* Handle anonymous variables in grouping atoms, add a few examples

* Simplify Grouping volcano operator, add constants generated by aggregates to domain

* Remove unused AST parameter

* Move creation of static operations, finish StagedSnippet

* Add bugfix comment
aherlihy added a commit that referenced this pull request Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants