Re: Trouble with hashagg spill I/O pattern and costing

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, HeikkiLinnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: Trouble with hashagg spill I/O pattern and costing
Date: 2020-05-28 18:57:50
Message-ID: 20200528185750.aamptuqdzrggqi5c@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 27, 2020 at 11:07:04AM +0200, Tomas Vondra wrote:
>On Tue, May 26, 2020 at 06:42:50PM -0700, Melanie Plageman wrote:
>>On Tue, May 26, 2020 at 5:40 PM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>>
>>>On Tue, 2020-05-26 at 21:15 +0200, Tomas Vondra wrote:
>>>>
>>>> As for the tlist fix, I think that's mostly ready too - the one thing
>>>> we
>>>> should do is probably only doing it for AGG_HASHED. For AGG_SORTED
>>>> it's
>>>> not really necessary.
>>>
>>>Melanie previously posted a patch to avoid spilling unneeded columns,
>>>but it introduced more code:
>>>
>>>
>>>
>>>https://www.postgresql.org/message-id/CAAKRu_aefEsv+UkQWqu+ioEnoiL2LJu9Diuh9BR8MbyXuZ0j4A@mail.gmail.com
>>>
>>>and it seems that Heikki also looked at it. Perhaps we should get an
>>>acknowledgement from one of them that your one-line change is the right
>>>approach?
>>>
>>>
>>I spent some time looking at it today, and, it turns out I was wrong.
>>
>>I thought that there was a case I had found where CP_SMALL_TLIST did not
>>eliminate as many columns as could be eliminated for the purposes of
>>spilling, but, that turned out not to be the case.
>>
>>I changed CP_LABEL_TLIST to CP_SMALL_TLIST in
>>create_groupingsets_plan(), create_agg_plan(), etc and tried a bunch of
>>different queries and this 2-3 line change worked for all the cases I
>>tried. Is that where you made the change?
>
>I've only made the change in create_agg_plan, because that's what was in
>the query plan I was investigating. You may be right that the same fix
>is needed in additional places, though.
>

Attached is a patch adding CP_SMALL_TLIST to create_agg_plan and
create_groupingsets_plan. I've looked at the other places that I think
seem like they might benefit from it (create_upper_unique_plan,
create_group_plan) but I think we don't need to modify those - there'll
either be a Sort of HashAgg, which will take care about the projection.

Or do you think some other places need CP_SMALL_TLIST?

>>And then are you proposing to set it based on the aggstrategy to either
>>CP_LABEL_TLIST or CP_SMALL_TLIST here?
>>
>
>Yes, something like that. The patch I shared on on 5/21 just changed
>that, but I'm wondering if that could add overhead for sorted
>aggregation, which already does the projection thanks to the sort.
>

I ended up tweaking the tlist only for AGG_MIXED and AGG_HASHED. We
clearly don't need it for AGG_PLAIN or AGG_SORTED. This way we don't
break regression tests by adding unnecessary "Subquery Scan" nodes.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
hashagg-tlist-fix.patch text/plain 1.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2020-05-28 19:35:17 Re: Delaying/avoiding BTreeTupleGetNAtts() call within _bt_compare()
Previous Message Jesse Zhang 2020-05-28 18:49:42 Re: Fix compilation failure against LLVM 11