Re: pg16: XX000: could not find pathkey item to sort

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pg16: XX000: could not find pathkey item to sort
Date: 2023-10-02 20:11:43
Message-ID: CAApHDvpJJigQRW29TppTOPYp+Aui0mtd3MpfRxyKv=N-tB62jQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 19 Sept 2023 at 23:45, Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> My first thought about the fix is that we artificially add resjunk
> target entries to parse->targetList for the ordered aggregates'
> arguments that are ORDER BY expressions, as attached. While this can
> fix the given query, it would cause Assert failure for the query in
> sql/triggers.sql.

> Any thoughts?

Unfortunately, we can't do that as it'll lead to target entries
existing in the GroupAggregate's target list that have been
aggregated.

postgres=# explain verbose SELECT a, COUNT(DISTINCT b) FROM rp GROUP BY a;
QUERY PLAN
--------------------------------------------------------------------------------------
Append (cost=88.17..201.39 rows=400 width=44)
-> GroupAggregate (cost=88.17..99.70 rows=200 width=44)
Output: rp.a, count(DISTINCT rp.b), rp.b

Your patch adds rp.b as an output column of the GroupAggregate.
Logically, that column cannot exist there as there is no correct
single value of rp.b after aggregation.

I think the fix needs to go into create_agg_path(). The problem is
that for AGG_SORTED we do:

if (aggstrategy == AGG_SORTED)
pathnode->path.pathkeys = subpath->pathkeys; /* preserves order */

which assumes that all of the columns before the aggregate will be
available after the aggregate. That likely used to work ok before
1349d2790 as the planner wouldn't have requested any Pathkeys for
columns that were not available below the Agg node.

We can no longer take the subpath pathkey's verbatim. We need to strip
off pathkeys for columns that are not in pathnode's targetlist.

I've attached a patch which adds a new function to pathkeys.c to strip
off any PathKeys in a list that don't have a corresponding item in the
given PathTarget and just return a prefix of the input pathkey list up
until the first expr that can't be found.

I'm concerned that this patch will be too much overhead when creating
paths when a PathKey's EquivalenceClass has a large number of members
from partitioned tables. I wondered if we should instead just check
if the subpath's pathkeys match root->group_pathkeys and if they do
set the AggPath's pathkeys to list_copy_head(subpath->pathkeys,
root->num_groupby_pathkeys), that'll be much cheaper, but it just
feels a bit too much like a special case.

David

Attachment Content-Type Size
strip_aggregated_pathkeys_from_aggpaths.patch application/octet-stream 2.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2023-10-02 20:16:41 Re: CHECK Constraint Deferrable
Previous Message Robert Haas 2023-10-02 20:06:09 Re: Pre-proposal: unicode normalized text