Re: POC: GROUP BY optimization

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: GROUP BY optimization
Date: 2018-06-16 15:59:24
Message-ID: 623f5e50-eeb2-1279-d30b-dc43611fb063@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 06/13/2018 06:56 PM, Teodor Sigaev wrote:
>> I.e. we already do reorder the group clauses to match ORDER BY, to only
>> require a single sort. This happens in preprocess_groupclause(), which
>> also explains the reasoning behind that.
> Huh. I missed that. That means group_keys_reorder_by_pathkeys() call
> inside get_cheapest_group_keys_order() duplicates work of
> preprocess_groupclause().
>  And so it's possible to replace it to simple counting number of the
> same pathkeys in beginning of lists. Or remove most part of
> preprocess_groupclause() - but it seems to me we should use first
> option, preprocess_groupclause() is simpler, it doesn't operate with
> pathkeys only with  SortGroupClause which is simpler.
>
> BTW, incremental sort path provides pathkeys_common(), exactly what we
> need.
>
>> I wonder if some of the new code reordering group pathkeys could/should
>> be moved here (not sure, maybe it's too early for those decisions). In
>> any case, it might be appropriate to update some of the comments before
>> preprocess_groupclause() which claim we don't do certain things added by
>> the proposed patches.
>
> preprocess_groupclause() is too early step to use something like
> group_keys_reorder_by_pathkeys() because pathkeys is not known here yet.
>
>
>> This probably also somewhat refutes my claim that the order of
>> grouping keys is currently fully determined by users (and so they
>> may pick the most efficient order), while the reorder-by-ndistinct
>> patch would make that impossible. Apparently when there's ORDER BY,
>> we already mess with the order of group clauses - there are ways to
>> get around it (subquery with OFFSET 0) but it's much less clear.
>
> I like a moment when objections go away :)
>

I'm afraid that's a misunderstanding - my objections did not really go
away. I'm just saying my claim that we're not messing with order of
grouping keys is not entirely accurate, because we do that in one
particular case.

I still think we need to be careful when introducing new optimizations
in this area - reordering the grouping keys by ndistinct, ORDER BY or
whatever. In particular I don't think we should commit these patches
that may quite easily cause regressions, and then hope some hypothetical
future patch fixes the costing. Those objections still stand.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-06-16 17:29:55 Re: GCC 8 warnings
Previous Message Tom Lane 2018-06-16 14:42:26 Re: row_to_json(), NULL values, and AS