Re: pg16: XX000: could not find pathkey item to sort

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pg16: XX000: could not find pathkey item to sort
Date: 2023-10-08 23:42:13
Message-ID: CAApHDvrAHkghCDNFG5+K2adWCDCC9cdZT2KZamVRkqYy31kL9w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 8 Oct 2023 at 23:52, Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> On Thu, Oct 5, 2023 at 2:26 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>>
>> So in short, I propose the attached fix without any regression tests
>> because I feel that any regression test would just mark that there was
>> a big in create_agg_path() and not really help with ensuring we don't
>> end up with some similar problem in the future.
>
>
> If the pathkeys that were added by adjust_group_pathkeys_for_groupagg()
> are computable from the targetlist, it seems that we do not need to trim
> them off, because prepare_sort_from_pathkeys() will add resjunk target
> entries for them. But it's also no harm if we trim them off. So I
> think the patch is a pretty safe fix. +1 to it.

hmm, I think one of us does not understand what is going on here. I
tried to explain in [1] why we *need* to strip off the pathkeys added
by adjust_group_pathkeys_for_groupagg().

Given the following example:

create table ab (a int,b int);
explain (costs off) select a,count(distinct b) from ab group by a;
QUERY PLAN
----------------------------
GroupAggregate
Group Key: a
-> Sort
Sort Key: a, b
-> Seq Scan on ab
(5 rows)

adjust_group_pathkeys_for_groupagg() will add the pathkey for the "b"
column and that results in the Sort node sorting on {a,b}. It's
simply not at all valid to have the GroupAggregate path claim that its
pathkeys are also (effectively) {a,b}" as "b" does not and cannot
legally exist after the aggregation takes place. We cannot put a
resjunk "b" in the targetlist of the GroupAggregate either as there
could be any number "b" values aggregated.

Can you explain why you think we can put a resjunk "b" in the target
list of the GroupAggregate in the above case?

>>
>> I have some concerns that the assert_pathkeys_in_target() function
>> might be a little heavyweight for USE_ASSERT_CHECKING builds. So I'm
>> not proposing to commit that without further discussion.
>
>
> Yeah, it looks like some heavy to call assert_pathkeys_in_target() for
> each path node. Can we run some benchmarks to see how much overhead it
> would bring to USE_ASSERT_CHECKING build?

I think it'll be easy to show that there is an overhead to it. It'll
be in the realm of the ~41ms patched vs ~24ms unpatched that I showed
in [2]. That's quite an extreme case.

Maybe it's worth checking the total planning time spent in a run of
the regression tests with and without the patch to see how much
overhead it adds to the "average case".

David

[1] https://postgr.es/m/CAApHDvpJJigQRW29TppTOPYp+Aui0mtd3MpfRxyKv=N-tB62jQ@mail.gmail.com
[2] https://postgr.es/m/CAApHDvo7RzcQYw-gnkZr6QCijCqf8vJLkJ4XFk-KawvyAw109Q@mail.gmail.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2023-10-09 00:52:40 Re: Draft LIMIT pushdown to Append and MergeAppend patch
Previous Message David Rowley 2023-10-08 23:26:24 Re: Problem, partition pruning for prepared statement with IS NULL clause.