|From:||Teodor Sigaev <teodor(at)sigaev(dot)ru>|
|To:||Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>|
|Subject:||Re: POC: GROUP BY optimization|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
>> Yes. But again, this description is a bit short. First it works after
>> first patch and might get some preordered leading pathkeys. Second, it
>> tries to match ORDER BY clause order if there is no preordered leading
>> pathkeys from first patch (it was introduced in v7). And third, if there
>> is a tail of unmatched pathkeys on previous stages then it will reorder
>> that tail.
> OK, I haven't looked at v7 yet, but if I understand correctly it tries
> to maintain the ordering as much as possible? Does that actually help? I
> mean, the incremental sort patch allows the sorting to happen by pieces,
> but here we still need to sort all the data, right?
> Can you give an example demonstrating the benefit?
See tst.sql. queries are marked with opt (optimization is on) and noopt.
Query 1: select count(*) from btg group by v, r;
Query 2: select count(*) from btg group by n, v, r order by n;
For both queries it's possible to reorder v and r column, n column has the
single distinct value.
On my laptop:
Query 1opt vs 1noopt: 3177.500 ms vs 6604.493 ms
2opt vs 2noopt: 5800.307 ms vs 7486.967 ms
So, what we see:
1) for query 1 optimization gives 2 times better performance, for query 2 only
30%. if column 'n' will be unique then time for query 1 and 2 should be the
same. We could add check for preordered pathkeys in
get_cheapest_group_keys_order() and if estimate_num_groups(reordered pathkeys)
is close to 1 then we could do not reordering of tail of pathkeys.
2) Planing cost is the same for all queries. So, cost_sort() doesn't take into
account even number of columns.
> FWIW I think it would be useful to have "development GUC" that would
> allow us to enable/disable these options during development, because
> that makes experiments much easier. But then remove them before commit.
Added, v9, debug_enable_group_by_match_order_by and
debug_enable_cheapest_group_by. I also checked compatibility with incremental
sort patch, and all works except small merge conflict which could be resolved
right before committing.
Next, I had a look on cost_incremental_sort() provided by incremental sort patch
and, it's a pity, it doesn't solve our problem with the impact of the cost of
per-column comparison function and number of its calls.
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
|Next Message||Tom Lane||2018-06-07 17:52:49||Re: Transform for pl/perl|
|Previous Message||Tom Lane||2018-06-07 15:43:12||Re: why partition pruning doesn't work?|