Re: hashagg slowdown due to spill changes

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hashagg slowdown due to spill changes
Date: 2020-06-25 00:26:07
Message-ID: CAAKRu_YsSu2uTyi5wHUGk2VYmtc2giciNzciXhtr+L07uoNqSw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 23, 2020 at 10:06 AM Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> On 2020-06-23 09:23:57 -0700, Melanie Plageman wrote:
> > On Mon, Jun 22, 2020 at 9:02 PM Andres Freund <andres(at)anarazel(dot)de>
> wrote:
> > > It's not this patch's fault, but none, really none, of this stuff
> should
> > > be in the executor.
> > >
> > >
> > Were you thinking it could be done in grouping_planner() and then the
> > bitmaps could be saved in the PlannedStmt?
> > Or would you have to wait until query_planner()? Or are you imagining
> > somewhere else entirely?
>
> I haven't thought about it in too much detail, but I would say
> create_agg_plan() et al. I guess there's some argument to be made to do
> it in setrefs.c, because we already do convert_combining_aggrefs() there
> (but I don't like that much).
>
> There's no reason to do it before we actually decided on one specific
> path, so doing it earlier than create_plan() seems unnecessary. And
> having it in agg specific code seems better than putting it into global
> routines.
>
> There's probably an argument for having a bit more shared code between
> create_agg_plan(), create_group_plan() and
> create_groupingsets_plan(). But even just adding a new extract_*_cols()
> call to each of those would probably be ok.
>
>
So, my summary of this point in the context of the other discussion
upthread is:

Planner should extract the columns that hashagg will need later during
planning. Planner should not have HashAgg/MixedAgg nodes request smaller
targetlists from their children with CP_SMALL_TLIST to avoid unneeded
projection overhead.
Also, even this extraction should only be done when the number of groups
is large enough to suspect a spill.

So, I wrote a patch that extracts the columns the same way as in
ExecInitAgg but in create_agg_plan() and it doesn't work because we
haven't called set_plan_references().

Then, I wrote a patch that does this in set_upper_references(), and it
seems to work. I've attached that one.
It is basically Jeff's patch (based somewhat on my patch) which extracts
the columns in ExecInitAgg but I moved the functions over to setrefs.c
and gave them a different name.

It's not very elegant.
I shoved it in at the end of set_upper_references(), but I think there
should be a nice way to do it while setting the references for each var
instead of walking over the nodes again.
Also, I think that the bitmapsets for the colnos should maybe be put
somewhere less prominent (than in the main Agg plan node?), since they
are only used in one small place.
I tried putting both bitmaps in an array of two bitmaps in the Agg node
(since there will always be two) to make it look a bit neater, but it
was pretty confusing and error prone to remember which one was
aggregated and which one was unaggregated.

Note that I didn't do anything with costing like only extracting the
columns if there are a lot of groups.

Also, I didn't revert the CP_SMALL_TLIST change in create_agg_plan() or
create_groupingsets_plan().

Not to stir the pot, but I did notice that hashjoin uses CP_SMALL_TLIST
in create_hashjoin_plan() for the inner side sub-tree and the outer side
one if there are multiple batches. I wondered what was different about
that vs hashagg (i.e. why it is okay to do that there).

--
Melanie Plageman

Attachment Content-Type Size
v1-0001-Move-extracting-columns-for-hashagg-to-planner.patch text/x-patch 15.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2020-06-25 01:05:22 Re: [PATCH] Initial progress reporting for COPY command
Previous Message Bruce Momjian 2020-06-24 23:38:37 Re: Default setting for enable_hashagg_disk