Re: Partial aggregates pushdown

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp" <Fujii(dot)Yuki(at)df(dot)mitsubishielectric(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
Subject: Re: Partial aggregates pushdown
Date: 2023-11-21 20:34:19
Message-ID: ZV0US_1taS8Utkxk@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 21, 2023 at 12:16:41PM -0500, Robert Haas wrote:
> On Mon, Nov 20, 2023 at 5:48 PM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > > I do have a concern about this, though. It adds a lot of bloat. It
> > > adds a whole lot of additional entries to pg_aggregate, and every new
> > > aggregate we add in the future will require a bonus entry for this,
> > > and it needs a bunch of new pg_proc entries as well. One idea that
> > > I've had in the past is to instead introduce syntax that just does
> > > this, without requiring a separate aggregate definition in each case.
> > > For example, maybe instead of changing string_agg(whatever) to
> > > string_agg_p_text_text(whatever), you can say PARTIAL_AGGREGATE
> > > string_agg(whatever) or string_agg(PARTIAL_AGGREGATE whatever) or
> > > something. Then all aggregates could be treated in a generic way. I'm
> > > not completely sure that's better, but I think it's worth considering.
> >
> > So use an SQL keyword to indicates a pushdown call? We could then
> > automate the behavior rather than requiring special catalog functions?
>
> Right. It would require more infrastructure in the parser, planner,
> and executor, but it would be infinitely reusable instead of needing a
> new thing for every aggregate. I think that might be better, but to be
> honest I'm not totally sure.

It would make it automatic. I guess we need to look at how big the
patch is to do it.

> > > I don't think the patch does a good job explaining why HAVING,
> > > DISTINCT, and ORDER BY are a problem. It seems to me that HAVING
> > > shouldn't really be a problem, because HAVING is basically a WHERE
> > > clause that occurs after aggregation is complete, and whether or not
> > > the aggregation is safe shouldn't depend on what we're going to do
> > > with the value afterward. The HAVING clause can't necessarily be
> > > pushed to the remote side, but I don't see how or why it could make
> > > the aggregate itself unsafe to push down. DISTINCT and ORDER BY are a
> > > little trickier: if we pushed down DISTINCT, we'd still have to
> > > re-DISTINCT-ify when combining locally, and if we pushed down ORDER
> > > BY, we'd have to do a merge pass to combine the returned values unless
> > > we could prove that the partitions were non-overlapping ranges that
> > > would be visited in the correct order. Although that all sounds
> > > doable, I think it's probably a good thing that the current patch
> > > doesn't try to handle it -- this is complicated already. But it should
> > > explain why it's not handling it and maybe even a bit about how it
> > > could be handling in the future, rather than just saying "well, this
> > > kind of thing is not safe." The trouble with that explanation is that
> > > it does nothing to help the reader understand whether the thing in
> > > question is *fundamentally* unsafe or whether we just don't have the
> > > right code to make it work.
> >
> > Makes sense.
>
> Actually, I think I was wrong about this. We can't handle ORDER BY or
> DISTINCT because we can't distinct-ify or order after we've already
> partially aggregated. At least not in general, and not without
> additional aggregate support functions. So what I said above was wrong
> with respect to those. Or so I believe, anyway. But I still don't see
> why HAVING should be a problem.

This should probably be documented in the patch.

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2023-11-21 20:48:36 Re: pg_class.reltuples of brin indexes
Previous Message Bruce Momjian 2023-11-21 20:33:02 Re: vacuum_cost_limit doc description patch