Re: POC: GROUP BY optimization

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: GROUP BY optimization
Date: 2019-04-09 15:21:00
Message-ID: 20190409152100.5q25whnxs27zws5m@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 04, 2019 at 05:11:09PM +0200, Dmitry Dolgov wrote:
>> On Thu, Jan 31, 2019 at 12:24 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>>
>> As nothing has happened since, I'm marking this as returned with
>> feedback.
>
>This patch was on my radar for some time in the past and we've seen use cases
>where it could be pretty useful (probably even without the incremental sort
>patch). I would like to make some progress here and see if it's possible to
>continue it's development. I've attached the rebased version with a small
>changes, e.g. I've created a separate patch with group by reordering tests to
>make it easy to see what changes were introduced, and after some experiments
>removed part that seems to duplicate "group by" reordering to follow "order
>by". Also looks like it's possible to make these patches independent by having
>a base patch with the isolated group_keys_reorder_by_pathkeys (they're
>connected via n_preordered), but I haven't done this yet.
>
>I went through the thread to summarize the objections, that were mentioned so
>far. Most of them are related to the third patch in the series, where
>reordering based on "ndistincs" is implemented, and are about cost_sort (all
>the possible problems that could happen without proper cost estimation due to
>non uniform distribution, different comparison costs and so on) and figuring
>out how to limit number of possible combinations of pathkeys to compare. I
>haven't looked at the proposed backtracking approach, but taking into account
>that suggested patch for cost_sort [1] is RWF, I wonder what would be the best
>strategy to proceed?
>
>[1]: https://commitfest.postgresql.org/21/1706/

Dunno. It seems the progres on the sort-related patches was rather limited
in the PG12 cycle in general :-( There's the Incremental Sort patch, GROUP
BY optimization and then the cost_sort patch.

Not sure about the best strategy, though. One obvious option is to rely on
cost_sort patch to do all the improvements needed for the other patches,
but that assumes that patch moves reasonably fast.

So I personally would suggest to treat those patches as independent until
the very last moment, develop the costing improvements needed by each
of them, and then decide which of them are committable / in what order.

At the end of PG11 cycle I've offered my help with testing / reviewing
those patches, if there is progress. That still holds, if there are new
patch versions I'll look at them.

cheers

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2019-04-09 15:36:24 Re: [PATCH v20] GSSAPI encryption support
Previous Message Heikki Linnakangas 2019-04-09 15:08:40 Re: Zedstore - compressed in-core columnar storage