Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add aggregate functions #2925

Merged
merged 3 commits into from
Nov 23, 2023
Merged

Conversation

timabdulla
Copy link
Contributor

@timabdulla timabdulla commented Aug 30, 2023

This is an initial attempt to add support for aggregate functions, using the call pattern suggested here: #915 (comment) by @wolfgangwalther.

Note that I am aware that this PR is missing a number of things (at the very least, count(), support for HAVING, the proposed limit on query cost to prevent denial-of-service failures, and, of course, automated tests), and I haven't fully tested this yet beyond some initial sanity checks.

What I would really like to understand at this point is a) is the technical direction I am taking one that is appropriate, and b) what are the must-haves before this feature could be considered merge-able.

Thank you for your time and attention!

@timabdulla timabdulla marked this pull request as draft August 30, 2023 16:37
@timabdulla
Copy link
Contributor Author

@steve-chavez would appreciate your thoughts when you have a moment!

@steve-chavez
Copy link
Member

@timabdulla Thanks for taking the initiative!

the proposed limit on query cost to prevent denial-of-service failures

This part seems different now that we have Impersonated Role Settings, we could leave the job to https://github.com/pgexperts/pg_plan_filter. For users that don't want to deal with aggregates, we could add a config option to enable/disable them (better save this as a last step of the implementation though).

One other thing that we should consider is avoiding failure from clients whenever they add an aggregate but forget to specify GROUP BY. Using a window function for no GROUP BY (as mentioned on #2066 (comment), sorry this is all over the place) might be a good solution for this.

@@ -496,6 +519,7 @@ setConfigLocalJson prefix keyVals = [setConfigLocal mempty (prefix, gucJsonVal k
arrayByteStringToText :: [(ByteString, ByteString)] -> [(Text,Text)]
arrayByteStringToText keyVal = (T.decodeUtf8 *** T.decodeUtf8) <$> keyVal

-- Investigate this
Copy link
Member

@steve-chavez steve-chavez Aug 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timabdulla You don't need to worry about this part.

We could say that these are "final aggregates" (json_agg, string_agg, etc), they're used to send an "interchange format" to clients. i.e. json, csv, xml, binary, etc.

For this feature we're dealing with regular aggregates (sum, max, etc), used to fold a number of rows into a postgres native format.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we think in terms of the whole "data representation" feature and the big discussion in #1582, then I'm not sure whether you can actually make a reasonable distinction between those aggregations. @aljungberg WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comment @wolfgangwalther. To be honest, I don't have as much context on the bigger-picture elements of the architecture and the like, so I am not totally clear on the practical implications of your comment as it relates to this PR. Any examples or illustrations would be very helpful! Thank you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there are any practical implications for this PR right now. This was just a thought, not more, yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commenting without being fully read in on this PR, just to answer the question: are the 'final' aggregates different from other aggregation (in our envisioned future)?

So I think the answer is 'no'. With pervasive data representations, json_agg, CSV representation, or even more exotic forms like "encode into a protobuf buffer" are all just pluggable transformations which we would no longer hard-code special cases for.

So json_agg would just be one of the built-in data representations when "what we have" (an intermediate type that needs to be aggregated) differs from "what we want" (output consistent with the negotiated content type application/json).

@steve-chavez
Copy link
Member

Using a window function for no GROUP BY (as mentioned on #2066 (comment), sorry this is all over the place) might be a good solution for this.

Considering the above, it might possible to isolate a first pass of the feature to only supporting aggregates and not tie them to GROUP BY.

@timabdulla
Copy link
Contributor Author

Hey @steve-chavez,

Really appreciate your quick feedback!

In terms of GROUP BY, I was following the guidance from this comment, though not sure if it's still relevant:

#915 (comment)

Specifically, this part:

I thought a bit more about the proposed addition to the query string in the form of groupby=.... I think we can do without.
As long as we detect the possible aggregation functions via schema cache, we should know whenever an aggregate is called. Once that's the case, we can add all columns mentioned in the select= part, that are not aggregates to the GROUP BY. They'd need to be added anyway, if they were not functionally dependent on each other - and in that case it doesn't hurt to add them either.

The GROUP BY clause is automatically computed based on the columns selected. If only aggregates are selected, it defaults collapsing down to one row with aggregation occurring across the whole table (i.e. no GROUP BY.)

Let me know what you think. I'll also take a look at Impersonated Role Settings to get a better understanding of that feature.

@steve-chavez
Copy link
Member

In terms of GROUP BY, I was following the guidance from this comment, though not sure if it's still relevant:
#915 (comment)

Ah, I missed the above.

Regarding inferring the GROUP BY columns, I think it would be better to default to doing aggregates with the window function approach.

This way the elements in select are just transformations, they don't filter rows. Otherwise we might run into the same issue as /a?select=b!inner(*) which also filters rows (and later we found out that /a?select=b(*)&b=not.is.null was better as it's more explicit).

@wolfgangwalther Please give your feedback whenever you have the time.

@timabdulla
Copy link
Contributor Author

Ok, got it. Thanks for explaining. That seems a simple enough change to make: Basically without an explicit GROUP BY, default to doing an aggregate over an "empty" window (e.g. SUM(salary) OVER ()).

We would then need a new query parameter for groupby, but as you said, this could perhaps be kept for a separate PR. If we do that, that would also imply HAVING being a separate PR too, along with GROUP BY.

I also still need to add support for count() with no associated field, which seems like it may require its own separate bit of handling. Will do some thinking on that.

@wolfgangwalther
Copy link
Member

Regarding inferring the GROUP BY columns, I think it would be better to default to doing aggregates with the window function approach.

I disagree, for a couple of reasons:

  • An empty OVER () clause is almost useless for a window function. Almost always, you'd want to specify one anways.
  • A "group by" by default would be very useful, though. You'd often want to collapse everything into one row, so it makes a useful default.

This way the elements in select are just transformations, they don't filter rows. Otherwise we might run into the same issue as /a?select=b!inner() which also filters rows (and later we found out that /a?select=b()&b=not.is.null was better as it's more explicit).

It's not really the same. The inner join example actually changed the underlying resource (i.e. "which rows go into the resultset"). This is not the case here. The same rows still go into the resultset, they are just aggregated into a single value. But they have been part of the computation and have not been filtered out before.

The semantics are still very clear.

So we should:

  • Auto-derive the group by clause based on selected columns.
  • Create new syntax to mark a function as a window function and allow specifying some kind of partition later on.

Just the very basic "allow aggregate functions and auto-derive group by" should be plenty enough for this MR at first, I think.

@steve-chavez
Copy link
Member

A "group by" by default would be very useful

It is very useful, I don't disagree with that.

The same rows still go into the resultset, they are just aggregated into a single value. But they have been part of the computation and have not been filtered out before.

Hm, the fact remains that adding an element in ?select= reduces the number of rows.

I fear that violating a semantic we agreed upon before (select shouldn't affect the number of rows) will result in inconsistency and tech debt later. This is what makes me think that groupby should be explicit.

How about special syntax for auto-derive group by though? Could be like

groupby=*
#or
groupby=!

Idea taken from https://stackoverflow.com/a/416764/4692662

@wolfgangwalther
Copy link
Member

Hm, the fact remains that adding an element in ?select= reduces the number of rows.

As I said before, the problem is not the number of output rows, which is really just a different "format" of the response. This is very similar to a different content-type, which is only changed via a header - but also returns a much different format.

The problem is the number of input rows. An implicit inner join does change the input and should not be part of select. An aggregation does not change the input rows and is fine there.

This is what makes me think that groupby should be explicit.

How about special syntax for auto-derive group by though?

A syntax for that would only ever make sense, if we plan to support manual specification of group by as well...

.. but would that ever be useful?

I doubt that anyone really needs to either:

  • hide columns in the ouput that they want to group on
  • select more columns, which are not grouped on, only to receive an error by postgres about that

None of that makes sense.

The only useful thing we could do would be to allow different grouping sets or rollup etc: https://www.postgresql.org/docs/current/queries-table-expressions.html#QUERIES-GROUPING-SETS

So if we want to introduce any syntax now, we should think about how to represent those as syntax.. and then go on from there for the default / simple case. However, I'd feel fine to use the default / simple case in the absence of any specific syntax.

One advantage of that is, that we will also have the case of select=count covered, which already works right now, although not on purpose. Actually... I think this would be something that would be a problem in your interpretation, @steve-chavez, anyway. Doing a query without select will fetch the full response, while adding a single column to select to make it select=count changes the query heavily. And we should certainly not require a group by clause for a query with only aggregates columns, because that's not required in SQL either.

@steve-chavez
Copy link
Member

Doing a query without select will fetch the full response, while adding a single column to select to make it select=count changes the query heavily. And we should certainly not require a group by clause for a query with only aggregates columns, because that's not required in SQL either.

You're right. My bad, I forgot about the count case.

I doubt that anyone really needs to either:

  • hide columns in the ouput that they want to group on
  • select more columns, which are not grouped on, only to receive an error by postgres about that

One advantage of that is, that we will also have the case of select=count covered, which already works right now, although not on purpose.

Great points. Agree, we should omit groupby then.

@timabdulla Sorry for the back and forth on this. Let's stick with Wolfgang's proposal.

@timabdulla
Copy link
Contributor Author

Thanks folks! Then I'll work to get this finished as per the original plan. One thing: I might omit support for HAVING, as it doesn't seem necessary for a minimal first release to me -- any thoughts on that?

@wolfgangwalther
Copy link
Member

One thing: I might omit support for HAVING, as it doesn't seem necessary for a minimal first release to me -- any thoughts on that?

Agreed.

@timabdulla
Copy link
Contributor Author

Hi folks,

I'm getting pretty close with my changes, but I had a few things I wanted to bring up:

'Hoisting' Aggregate Functions in the Case of Has-One Spreads

Imagine the following example (and please forgive me if the details are a little contrived): We have a table called projects that is related to another table, project_invoices, through a has-one relationship, and we have the following query: GET /projects?select=client_id,...project_invoices(invoice_total.sum()).

As it stands, the aggregation occurs within the subquery of the lateral join, and thus the sum basically does nothing: There is only one row within the subquery (this being a has-one relationship), and thus the sum of the invoice_total is the invoice_total itself.

Given this, I can see an argument for why we might 'hoist' the aggregate function to the top-level in the case of has-one spreads: Aggregate functions would be brought to the top-level (along with the associated GROUP BY terms). In the above example, that would mean that we would get the sum of the invoice_total by client_id.

But I then wonder a bit about consistency, as without resource spreading, this form of hoisting is not possible (or sensible, really), which means that GET /projects?select=client_id,project_invoices(invoice_total.sum()) would exhibit the same behaviour as shown before. The same would apply to has-many relationships, as spreading is not possible with has-many relationships.

It is important to note that in any case, there is also a workaround, in that if you invert the original query (i.e. GET /project_invoices?select=invoice_total.sum(),...projects(client_id)), you can get the sum of the invoice_total grouped by client_id.

The question: Should aggregate functions be hoisted in the case of has-one spreads? The implementation is not trivial, but it's also not hugely complex.

Order of Operations on Selects

Currently, the select syntax is implemented like so: ?select=alias:column->jsonField::cast.aggregateFunction(). With the addition of aggregate functions, the order of operations becomes somewhat ambiguous, specifically with respect to casting.

I have opted for casting to be performed on the column itself before aggregation, as this seems much more useful than casting the return value of the aggregate function itself. This would be most useful in the case of wanting to perform an aggregation on a jsonb value, as, for instance, a number coming from a jsonb object would need to be casted to a numeric before aggregation can be performed (e.g. ?select=invoice_details->>total::numeric.sum()).

Please let me know if you have other thoughts or otherwise disagree with this choice.

count()

Support for count() is included, both as count(*) (?select=count(),project_id), or as count(column) (?select=project_id.count()). That said, I did wonder about the existing (accidental) support for count (?select=count).

I don't think it should be eliminated in this PR, but I would suggest that it should ultimately be eliminated as it is inconsistent with this feature and is slightly confusing. Perhaps it may be sensible to issue a deprecation notice of some kind, with a plan to eventually remove count? Although, I am not sure what form this elimination would take... You can have a column named count so maybe it's more trouble than it's worth to try to distinguish between the two cases (accidental count operation vs selecting a real column named count).

I am quite close to be finished, and I hope to have a final PR ready for review early next week. Thanks for reading!

@wolfgangwalther
Copy link
Member

Should aggregate functions be hoisted in the case of has-one spreads?

I am thinking of the spread operator as a "shortcut" to express something like this (pseudo-syntax):

GET /projects?select=client_id,project_invoices.invoice_total,project_invoices.invoice_date

Ofc, this "direct access to a single nested field" is not possible with this syntax anyway, but the spread operator is kind of a shortcut to put multiple of those together.

As such, I consider whatever is inside the spread operator to be part of the outer query afterwards.

Since the spread operator is only possible with has-one relationships and, as you say, aggregation for 1 row doesn't make sense... I think hoisting you propose is a very nice solution. It has nice semantics and should be explainable in the docs, I think.

In a perfect world our query syntax would only allow us to specify sensible queries. Given that hoisting would eliminate some variants that are not sensible and gives us better options.. I don't see much argument against it.

👍

Currently, the select syntax is implemented like so: ?select=alias:column->jsonField::cast.aggregateFunction().

I can see reasons for both casting the column before aggregation (like you gave an example), but also after aggregation (thinking data representations and how I actually want the final value to be rendered to the client.

Can we do ?select=alias:column->jsonField::cast.aggregateFunction()::cast, i.e. allow casting both before and after the aggregation?

That said, I did wonder about the existing (accidental) support for count. I don't think it should be eliminated in this PR, but I would suggest that it should ultimately be eliminated as it is inconsistent with this feature and is slightly confusing. Perhaps it may be sensible to issue a deprecation notice of some kind, with a plan to eventually remove count? Although, I am not sure what form this elimination would take...

Certainly out of scope for this PR, yes. The proper way to get rid of that is to validate all columns in select for whether they actually exist on the table / in the schema cache or not. Right now we just pass them on to PG and let it fail - but we discussed elsewhere, that we should be more strict about that kind of validation. This is also required for some kind of unwanted side-effects when using computed relationships as computed columns etc. - same story there.

@timabdulla
Copy link
Contributor Author

Thanks for the quick reply!

  1. I will add the hoisting then, stay tuned.
  2. Yes, it should be fairly easy to add an additional cast option. Will add that.
  3. Understood, let's leave for future work then.

@timabdulla
Copy link
Contributor Author

Random thought: Perhaps (not in this PR) there could be a possibility of has-many spreads, so long as aggregate functions are involved. Example use case: GET /invoices?select=customer_id,...invoice_line_items(line_amount.sum()). Basically, the line_amount would get hoisted into the top-level, and what you would end up with is the invoice total (defined to be the sum of all line_amounts) by the customer_id.

@timabdulla
Copy link
Contributor Author

Ok, I now have something I feel that is pretty feature-complete. Still pending code cleanup (not yet ready for review, I would say), automated tests and docs. Hopefully I can knock that off that off this week, and then bring it for review. The PR has grown a fair bit in size, as handling all the various cases well took a bit of refactoring.

@timabdulla timabdulla changed the title First pass at adding aggregate functions Add aggregate functions Sep 18, 2023
@timabdulla timabdulla marked this pull request as ready for review October 17, 2023 19:27
@timabdulla
Copy link
Contributor Author

Ok, this PR is finally ready for review. The only thing I've yet to do is finish rounding out the tests, though I already have some tests in. That said, the actual code itself is more or less complete from my perspective. I welcome your comments and feedback; while I await that, I'll finish up the tests.

Overview

As a reminder, this PR enables support for aggregate functions.

The basic request syntax works like so: GET /table?select=foo.sum(),bar,baz. The group by is determined implicitly via the fields in the select list.

You may also optionally provide a cast for the output of the aggregate function, like so: GET /table?select=foo.sum()::text,bar,baz. This is distinct from the cast that can be provided on the input column. Here is an example showing both casts in use: GET /table?select=foo::int.sum()::text,bar,baz.

You may also provide a count() without reference to a specific field, like so: GET /table?select=count(). In this case, you can still provide an alias and/or an aggregate cast.

The following aggregate functions are supported: sum, min, max, avg, and count.

Aggregate functions are also supported in the case of embedded resource spreads by means of 'hoisting'. Further detail provided here: https://github.com/PostgREST/postgrest/pull/2925/files#diff-2295ae0baafabcf593f2991ae456859ad5c3a8190ff532155da6e8760d08411cR611

Notes and observations

  • Generally speaking, I've been trying to nudge things in the direction of the ReadPlanTree being the abstract representation of the query to be generated, with the SQL generation code being more confined to the mechanical process of generation rather than making many decisions of its own. This makes it easier to reflect on and manipulate the query, and I think just makes things simpler to understand. For instance, we already had a function addDataRepresentationAliases in the Plan module, but automatically determining the alias for a field that arises from an operator on a JSON column was done in the Query.SqlFragment module. I've now combined that in a more general addAliases function in the Plan. This arose out of a specific need I had around needing to know the alias at plan time (i.e. not a proactive refactor), but it turns out I had similar needs multiple times, and this was a pattern I turned to frequently.
  • I think there's probably a need for a refactor of some of the core types. There's a parallel type hierarchy that is emerging between ApiRequest.Types and Plan.Types on a subset of the types (e.g. CoercibleOrderTerm and OrderTerm, CoercibleLogicTree and LogicTree, etc.) Additionally, Plan.Types ends up importing a lot of things from ApiRequest.Types, which makes it tricky to clearly see the delineation between the two.

Thank you folks. I look forward to hearing from you.

if ("*" `elem` map (\(field, _, _) -> cfName field) selectItems) && any hasOutputRep knownColumns
then rplan{select=concatMap (expandStarSelectItem knownColumns) selectItems}
else rplan
adjustContext context@ResolverContext{qi=ctxQI} (QualifiedIdentifier "" "pgrst_source") (Just a) = context{qi=ctxQI{qiName=a}}
Copy link
Contributor

@aljungberg aljungberg Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice move sticking this bit of the action into adjustContext. That feels quite natural.

Somehow I do miss the explicitness of matching on from=..., fromAlias=...tblAlias directly in the plan though, to me that really spelled out what condition triggers this special case. This is probably bike-shedding but to get back a little of that one could perhaps imagine renaming adjustContext to resolveTable since that's basically what it does (if we are accessing indirectly via pgrst_source we figure out the true underlying table as documented, otherwise we're already resolved), and a in the capture to fromAlias.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the suggestion, will do!

-- We expand if either of the below are true:
-- * We have a '*' select AND there is an aggregate function in this ReadPlan's sub-tree.
-- * We have a '*' select AND the target table has at least one data representation.
-- We ignore any '*' selects that have an aggregate function attached (i.e for COUNT(*)).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's a case where an aggregate function needs to operate on a data rep domain column. It doesn't seem out of the realm of possibility. In that case we'd need to expand these stars too, wouldn't we? But I guess that would be a future enhancement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stars will be expanded in the case of data reps as well. I think the code comment I left is poorly worded, will revise.

In some sense, you can use aggregate functions already with data rep columns, but the semantics aren't great, tbh. For instance, if I am requesting JSON from an endpoint, then all data rep columns will be cast to JSON (by design.) The issue becomes that even though you could use any of the aggregate functions, you would then be applying an aggregate function to a JSON column, which generally won't work.

I think some thought will be required in order to make data reps play nicely with aggregates, and I would concur that this should be a future enhancement, as this PR is already big enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I imagine the same and like discussed one could even express the idea of aggregation itself as a kind of data rep. But no need to try to solve every problem at once. This looks like a nice feature and it's ready to go as is.

Did you by any chance do any testing of what happens when domain driven data reps are involved as the source data for an aggregation? Like maybe a data rep formats dates as strings and then the results are fed through string_agg via your aggregation syntax. I don't think I saw any domains used in the tests. (If it doesn't work, I think that's fine, no need to let perfection be the enemy of good.)

@aljungberg
Copy link
Contributor

Agree with the strategy of doing most of the heavy lifting in the planning stage.

And yes, the 'shadow' type structure we're arriving at is perhaps a little awkward. Part of it started with data reps because we had different correctness guarantees in the query versus the plan, if I remember right, so we decided to not just extend the existing types.

@timabdulla
Copy link
Contributor Author

And yes, the 'shadow' type structure we're arriving at is perhaps a little awkward. Part of it started with data reps because we had different correctness guarantees in the query versus the plan, if I remember right, so we decided to not just extend the existing types.

Got it. It's not an impediment or anything to development, but there is a nagging feeling that perhaps it could be simplified, not that I have any proposal for how to accomplish that. At the moment, there may be little value in that anyway, but just noting my own experience. Thank you very much for your review!

@timabdulla
Copy link
Contributor Author

@steve-chavez I see some of my tests are failing due to test data differences for different versions of Postgres. Is that deliberate? It looks like test.car_model_sales is missing in PG <= 11.

@timabdulla
Copy link
Contributor Author

Hey @steve-chavez,

I think I added the missing test that should hopefully fix the coverage issue. Note that the very helpful tests that you added do not actually cover hoisting, as hoisting is only used in the case of spreads; without spreads, aggregates are applied within the context the joined table (and thus not hoisted.)

I've added a test to cover hoisting/spreads, but I'm afraid hoisting isn't as useful as I'd hoped without support for has-many spreads. That said, I think we can get an easy win once has-many spreads are supported.

I think adding support for domain reps would be good in the future too, but there are some conflicting semantics that need to be worked out.

@timabdulla
Copy link
Contributor Author

@steve-chavez seems like the latest tests fixed the code coverage issue and all checks are now passing. Two questions:

  1. Is the typical procedure to squash the PR commits before merge?
  2. Is there anything else needed before merge?

@steve-chavez
Copy link
Member

steve-chavez commented Nov 22, 2023

@timabdulla Great work on adding the remaining tests!

Is the typical procedure to squash the PR commits before merge?

Yes, I'll squash here. Makes sense when all the commits are related to the same feature.

Is there anything else needed before merge?

  • I was thinking about enabling this feature with plan_filter.statement_cost_limit for users that want to control the cost of the generated queries. But that doesn't seem right since we would depend on an specific extension. Additionally it would be messy in the code.
  • Thinking more about it, even regular resource embedding can be expensive. Users that want full control maybe want to disable it altogether. Which reminds of the db-fk-embedding config that Wolfgang proposed here. WDYT? With that we can release this feature more safely. This is no good as even with a single table the query could get expensive.
  • The only way to make this safe would be with feat: add request.param guc #1710, applied in the way it's suggested here. WDYT?

I've added a test to cover hoisting/spreads, but I'm afraid hoisting isn't as useful as I'd hoped without support for has-many spreads. That said, I think we can get an easy win once has-many spreads are supported.

  • Once we've added support for has-many, would the response format change? I believe many users will rely on aggregate functions as it is and then it would become a painful breaking change.

@timabdulla
Copy link
Contributor Author

The only way to make this safe would be with #1710, applied in the way it's suggested #915 (comment). WDYT?

What about adding a config option to this PR that makes aggregate functions opt-in by default? That should be a relatively small change (I think) that would allow users to decide whether it's safe in their situation (for instance, they may have a small dataset, or they may have other safeguards, like the pg_plan_filter extension or a statement_timeout.) Later on, more advanced scenarios could be supported, like the example you gave of using the request variables to determine what is safe or not.

Once we've added support for has-many, would the response format change? I believe many users will rely on aggregate functions as it is and then it would become a painful breaking change.

No, it wouldn't, I don't think.

@steve-chavez
Copy link
Member

steve-chavez commented Nov 22, 2023

What about adding a config option to this PR that makes aggregate functions opt-in by default?

@timabdulla Sounds good! Maybe db-aggregate-functions?

@timabdulla
Copy link
Contributor Author

@timabdulla Sounds good! Maybe db-aggregate-functions?

added!

@timabdulla timabdulla force-pushed the aggregate-functions branch 2 times, most recently from 38c26a7 to 4a4e73f Compare November 22, 2023 17:36
@timabdulla
Copy link
Contributor Author

I went ahead and squashed the commits. I think all the CI steps should pass now. By the way, is there a way to run all the CI steps in the nix shell environment? Like one command to do the specs, the IO tests, linting, etc?

@steve-chavez
Copy link
Member

@timabdulla There's postgrest-git-hooks:

postgrest-git-hooks -h
postgrest-git-hooks
Usage: postgrest-git-hooks [-h|--help] [--hook <HOOK>] [--] <operation> [<mode>]
        <operation>: Operation. Can be one of: 'disable', 'enable' and 'run'
        <mode>: Mode. Can be one of: 'basic' and 'full' (default: 'basic')
        -h, --help: Prints help
        --hook: Hook. Can be one of: 'pre-commit' and 'pre-push' (default: 'pre-commit')

Enable or disable git pre-commit and pre-push hooks.

Basic is faster and will only run:
  - pre-commit: postgrest-style
  - pre-push: postgrest-lint

Full takes a lot more time and will run:
  - pre-commit: postgrest-style && postgrest-lint
  - pre-push: postgrest-check

Changes made by postgrest-style will be staged automatically.

Example usage:
  postgrest-git-hooks disable
  postgrest-git-hooks enable basic
  postgrest-git-hooks enable full

The "run" operation and "--hook" argument are only used internally.

But maybe it makes sense to add a single postgrest-test-full command.

@@ -45,7 +45,8 @@ prefix = "pgrst."
dbSettingsNames :: [Text]
dbSettingsNames =
(prefix <>) <$>
["db_anon_role"
["db_aggregates_enabled"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was missing for the in-db config to work.

Will refactor it soon so it can be missed.

@@ -137,7 +138,8 @@ toText conf =
where
-- apply conf to all pgrst settings
pgrstSettings = (\(k, v) -> (k, v conf)) <$>
[("db-anon-role", q . T.decodeUtf8 . fromMaybe "" . configDbAnonRole)
[("db-aggregates-enabled", T.toLower . show . configDbAggregates)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the name so it's more consistent with other boolean configs

timabdulla and others added 3 commits November 23, 2023 13:38
The aggregate functions SUM(), MAX(), MIN(), AVG(),
and COUNT() are now supported.
@steve-chavez steve-chavez merged commit 1c60b50 into PostgREST:main Nov 23, 2023
31 checks passed
@steve-chavez
Copy link
Member

@timabdulla Awesome work 🔥 ! Would you help me document this feature?

It should be in a new page https://postgrest.org/en/latest/references/api.html

https://github.com/PostgREST/postgrest-docs/tree/main/docs/references/api

@steve-chavez
Copy link
Member

But maybe it makes sense to add a single postgrest-test-full command.

@timabdulla Btw, forgot that postgrest-check already does that:

{
name = "postgrest-check";
docs =
''
Run most checks that will also run on CI, but only against the
latest PostgreSQL version.
This currently excludes the memory and spec-idempotence tests,
as those are particularly expensive.
'';
inRootDir = true;
}
''
${tests}/bin/postgrest-test-spec
${tests}/bin/postgrest-test-doctests
${tests}/bin/postgrest-test-io
${style}/bin/postgrest-lint
${style}/bin/postgrest-style-check
'';

@timabdulla
Copy link
Contributor Author

@timabdulla Awesome work 🔥 ! Would you help me document this feature?

It should be in a new page https://postgrest.org/en/latest/references/api.html

https://github.com/PostgREST/postgrest-docs/tree/main/docs/references/api

Thanks for all your help getting this over the line! It's very much appreciated. Sure thing - I will write something up shortly and leave it for your review.

@timabdulla timabdulla deleted the aggregate-functions branch November 29, 2023 20:32
@jdgamble555
Copy link

jdgamble555 commented Dec 23, 2023

Do the new aggregate functions include sorting by aggregate functions?

https://postgrest.org/en/stable/references/api/aggregate_functions.html

J

@timabdulla
Copy link
Contributor Author

@jdgamble555 no, not yet, but wouldn't be too tricky to add, I don't think.

With the size of this PR being so large, I felt that was better to defer to a future PR, though it's obviously a super helpful feature.

@jaulz
Copy link

jaulz commented Jan 26, 2024

Does this PR actually support counting on embedded resources? Before it worked like this:
GET /countries?select=name,languages(count)

So is there any "modern" equivalent after this PR? 😊

@wolfgangwalther
Copy link
Member

Does this PR actually support counting on embedded resources? Before it worked like this: GET /countries?select=name,languages(count)

So is there any "modern" equivalent after this PR? 😊

This was never really intended to work. You can read about the new way here: https://postgrest.org/en/stable/references/api/aggregate_functions.html#the-case-of-count

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants