Re: Group by reordering optimization

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: Group by reordering optimization
Date: 2020-09-02 16:12:01
Message-ID: 20200902161201.ogkjrpf232a7htn3@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 01, 2020 at 03:09:14PM -0700, Peter Geoghegan wrote:
>On Tue, Sep 1, 2020 at 2:09 PM Tomas Vondra
><tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> >* Instead of changing the order directly, now patch creates another patch with
>> > modifier order of clauses. It does so for the normal sort as well as for
>> > incremental sort. The whole thing is done in two steps: first it finds a
>> > potentially better ordering taking into account number of groups, widths and
>> > comparison costs; afterwards this information is used to produce a cost
>> > estimation. This is implemented via a separate create_reordered_sort_path to
>> > not introduce too many changes, I couldn't find any better place.
>> >
>> I haven't tested the patch with any queries, but I agree this seems like
>> the right approach in general.
>If we're creating a new sort path anyway, then perhaps we can also
>change the collation -- it might be possible to "reduce" it to the "C"
>collation without breaking queries.
>This is admittedly pretty hard to do well. It could definitely work
>out when we have to do a sort anyway -- a sort with high cardinality
>abbreviated keys will be very fast (though we can't use abbreviated
>keys with libc collations right now). OTOH, it would be quite
>counterproductive if we were hoping to get an incremental sort that
>used some available index that happens to use the default collation
>(which is not the C collation in cases where this optimization is
>expected to help).

Even if reducing collations like this was possible (I have no idea how
tricky it is, my knowledge of collations is pretty minimal and from what
I know I'm not dying to learn more), I suggest we consider that out of
scope for this particular patch.

There are multiple open issues already - deciding which pathkeys are
interesting, reasonable costing, etc. Once those issues are solved, we
can consider tweaking collations as an additional optimizations.

Or maybe we can consider it entirely separately, i.e. why would it
matter if we re-order the GROUP BY keys? The collation reduction can
just as well help even if we use the same pathkeys.


Tomas Vondra
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Page 2020-09-02 16:20:03 Re: Kerberos support broken on MSVC builds for Windows x64?
Previous Message Juan José Santamaría Flecha 2020-09-02 15:51:27 Re: A micro-optimisation for walkdir()