Re: POC: GROUP BY optimization

From: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: POC: GROUP BY optimization
Date: 2018-06-29 21:19:13
Message-ID: 0b3ee377-2f67-f58d-6177-bc2ed4626499@archidevsys.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 30/06/18 03:03, Tomas Vondra wrote:
> On 06/29/2018 04:51 PM, Teodor Sigaev wrote:
>>
>>>> I tried to attack the cost_sort() issues and hope on that basis we
>>>> can solve problems with 0002 patch and improve incremental sort patch.
>>>>
>>>
>>> OK, will do. Thanks for working on this!
>>
>> I hope, now we have a better cost_sort(). The obvious way is a try
>> all combination of pathkeys in get_cheapest_group_keys_order() and
>> choose cheapest one by cost_sort().
>
>> But it requires N! operations and potentially could be very
>> expensive in case of large number of pathkeys and doesn't solve the
>> issue with user-knows-what-he-does pathkeys.
>
> Not sure. There are N! combinations, but this seems like a good
> candidate for backtracking [1]. You don't have to enumerate and
> evaluate all N! combinations, just construct one and then abandon
> whole classes of combinations as soon as they get more expensive than
> the currently best one. That's thanks to additive nature of the
> comparison costing, because appending a column to the sort key can
> only make it more expensive. My guess is this will make this a non-issue.
>
> [1] https://en.wikipedia.org/wiki/Backtracking
>
>>
>> We could suggest an order of pathkeys as patch suggests now and if
>> cost_sort() estimates cost is less than 80% (arbitrary chosen) cost
>> of user-suggested pathkeys then it use our else user pathkeys.
>>
>
> I really despise such arbitrary thresholds. I'd much rather use a more
> reliable heuristics by default, even if it gets it wrong in some cases
> (which it will, but that's natural).
>
> regards
>
Additionally put an upper limit threshold on the number of combinations
to check, fairly large by default?

If first threshold is exceeded, could consider checking out a few more
selected at random from paths not yet checked, to avoid any bias caused
by stopping a systematic search.  This might prove important when N! is
fairly large.

Cheers,
Gavin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2018-06-29 21:27:17 Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian
Previous Message Peter Eisentraut 2018-06-29 20:47:00 TLS 1.3 and OpenSSL