Re: hashagg slowdown due to spill changes

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hashagg slowdown due to spill changes
Date: 2020-06-25 09:10:24
Message-ID: 20200625091024.dhslw2gqgqtgjsoh@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 24, 2020 at 05:26:07PM -0700, Melanie Plageman wrote:
>On Tue, Jun 23, 2020 at 10:06 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
>> Hi,
>>
>> On 2020-06-23 09:23:57 -0700, Melanie Plageman wrote:
>> > On Mon, Jun 22, 2020 at 9:02 PM Andres Freund <andres(at)anarazel(dot)de>
>> wrote:
>> > > It's not this patch's fault, but none, really none, of this stuff
>> should
>> > > be in the executor.
>> > >
>> > >
>> > Were you thinking it could be done in grouping_planner() and then the
>> > bitmaps could be saved in the PlannedStmt?
>> > Or would you have to wait until query_planner()? Or are you imagining
>> > somewhere else entirely?
>>
>> I haven't thought about it in too much detail, but I would say
>> create_agg_plan() et al. I guess there's some argument to be made to do
>> it in setrefs.c, because we already do convert_combining_aggrefs() there
>> (but I don't like that much).
>>
>> There's no reason to do it before we actually decided on one specific
>> path, so doing it earlier than create_plan() seems unnecessary. And
>> having it in agg specific code seems better than putting it into global
>> routines.
>>
>> There's probably an argument for having a bit more shared code between
>> create_agg_plan(), create_group_plan() and
>> create_groupingsets_plan(). But even just adding a new extract_*_cols()
>> call to each of those would probably be ok.
>>
>>
>So, my summary of this point in the context of the other discussion
>upthread is:
>
>Planner should extract the columns that hashagg will need later during
>planning. Planner should not have HashAgg/MixedAgg nodes request smaller
>targetlists from their children with CP_SMALL_TLIST to avoid unneeded
>projection overhead.
>Also, even this extraction should only be done when the number of groups
>is large enough to suspect a spill.
>

IMO we should extract the columns irrespectedly of the estimates,
otherwise we won't be able to handle underestimates efficiently.

>
>Not to stir the pot, but I did notice that hashjoin uses CP_SMALL_TLIST
>in create_hashjoin_plan() for the inner side sub-tree and the outer side
>one if there are multiple batches. I wondered what was different about
>that vs hashagg (i.e. why it is okay to do that there).
>

Yeah. That means that if we have to start batching during execution, we
may need to spill much more datai. I'd say that's a hashjoin issue that
we should fix too (in v14).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2020-06-25 09:15:03 Re: Missing some ifndef FRONTEND at the top of logging.c and file_utils.c
Previous Message Amit Kapila 2020-06-25 09:01:51 Re: Backpatch b61d161c14 (Introduce vacuum errcontext ...)