Re: POC: GROUP BY optimization

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: GROUP BY optimization
Date: 2018-06-07 16:22:13
Message-ID: a02225c6-96d3-2e85-9907-9459105d1244@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>> Yes. But again, this description is a bit short. First it works after
>> first patch and might get some preordered leading pathkeys. Second, it
>> tries to match ORDER BY clause order if there is no preordered leading
>> pathkeys from first patch (it was introduced in v7). And third, if there
>> is a tail of unmatched pathkeys on previous stages then it will reorder
>> that tail.
>>
>
> OK, I haven't looked at v7 yet, but if I understand correctly it tries
> to maintain the ordering as much as possible? Does that actually help? I
> mean, the incremental sort patch allows the sorting to happen by pieces,
> but here we still need to sort all the data, right?
>
> Can you give an example demonstrating the benefit?

See tst.sql. queries are marked with opt (optimization is on) and noopt.

Query 1: select count(*) from btg group by v, r;
Query 2: select count(*) from btg group by n, v, r order by n;

For both queries it's possible to reorder v and r column, n column has the
single distinct value.

On my laptop:
Query 1opt vs 1noopt: 3177.500 ms vs 6604.493 ms
2opt vs 2noopt: 5800.307 ms vs 7486.967 ms

So, what we see:
1) for query 1 optimization gives 2 times better performance, for query 2 only
30%. if column 'n' will be unique then time for query 1 and 2 should be the
same. We could add check for preordered pathkeys in
get_cheapest_group_keys_order() and if estimate_num_groups(reordered pathkeys)
is close to 1 then we could do not reordering of tail of pathkeys.

2) Planing cost is the same for all queries. So, cost_sort() doesn't take into
account even number of columns.

> FWIW I think it would be useful to have "development GUC" that would
> allow us to enable/disable these options during development, because
> that makes experiments much easier. But then remove them before commit.
Added, v9, debug_enable_group_by_match_order_by and
debug_enable_cheapest_group_by. I also checked compatibility with incremental
sort patch, and all works except small merge conflict which could be resolved
right before committing.

Next, I had a look on cost_incremental_sort() provided by incremental sort patch
and, it's a pity, it doesn't solve our problem with the impact of the cost of
per-column comparison function and number of its calls.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

Attachment Content-Type Size
0001-opt_group_by_index-v9.patch text/x-patch 16.9 KB
0002-opt_group_by_index_and_order-v9.patch text/x-patch 17.1 KB
tst.sql application/sql 801 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-06-07 17:52:49 Re: Transform for pl/perl
Previous Message Tom Lane 2018-06-07 15:43:12 Re: why partition pruning doesn't work?