Re: Partial aggregates pushdown

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Partial aggregates pushdown
Date: 2021-10-15 19:31:33
Message-ID: 20211015193132.GQ20998@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Tomas Vondra (tomas(dot)vondra(at)enterprisedb(dot)com) wrote:
> On 10/15/21 17:05, Alexander Pyhalov wrote:
> >Tomas Vondra писал 2021-10-15 17:56:
> >>And then we should extend this for aggregates with more complex
> >>internal states (e.g. avg), by supporting a function that "exports"
> >>the aggregate state - similar to serial/deserial functions, but needs
> >>to be portable.
> >>
> >>I think the trickiest thing here is rewriting the remote query to call
> >>this export function, but maybe we could simply instruct the remote
> >>node to use a different final function for the top-level node?
> >
> >If we have some special export function, how should we find out that
> >remote server supports this? Should it be server property or should it
> >somehow find out it while connecting to the server?
>
> Good question. I guess there could be some initial negotiation based on
> remote node version etc. And we could also disable this pushdown for older
> server versions, etc.

Yeah, I'd think we would just only support it on versions where we know
it's available. That doesn't seem terribly difficult.

> But after that, I think we can treat this just like other definitions
> between local/remote node - we'd assume they match (i.e. the remote server
> has the export function), and then we'd get an error if it does not. If you
> need to use remote nodes without an export function, you'd have to disable
> the pushdown.
>
> AFAICS this works both for case with explicit query rewrite (i.e. we send
> SQL with calls to the export function) and implicit query rewrite (where the
> remote node uses a different finalize function based on mode, specified by
> GUC).

Not quite sure where to drop this, but I've always figured we'd find a
way to use the existing PartialAgg / FinalizeAggregate bits which are
used for parallel query when it comes to pushing down to foreign servers
to perform aggregates. That also gives us how to serialize the results,
though we'd have to make sure that works across different
architectures.. I've not looked to see if that's the case today.

Then again, being able to transform an aggregate into a partial
aggregate that runs as an actual SQL query would mean we do partial
aggregate push-down against non-PG FDWs and that'd be pretty darn neat,
so maybe that's a better way to go, if we can figure out how.

(I mean, for avg it's pretty easy to just turn that into a SELECT that
grabs the sum and the count and use that.. other aggregates are more
complicated though and that doesn't work, maybe we need both?)

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2021-10-15 19:52:13 Re: [PATCH] Proposal for HIDDEN/INVISIBLE column
Previous Message Stephen Frost 2021-10-15 19:22:48 Re: XTS cipher mode for cluster file encryption