Re: Propagate pathkeys from CTEs up to the outer query

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Propagate pathkeys from CTEs up to the outer query
Date: 2024-03-27 06:52:58
Message-ID: CAMbWs49qUFhW1=Mm9XSVeZn5M7aUgGJPqWGQbf9nw6PnoObx9w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 27, 2024 at 1:20 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Richard Guo <guofenglinux(at)gmail(dot)com> writes:
> > I agree with your points. Previously I was thinking that CTEs were the
> > only scenario where we needed to remember the best path and only
> > required the best path's pathkeys. However, considering potential
> > future use cases as you mentioned, I concur that having a per-subplan
> > list of paths would be more future-proof. Please see attached v4 patch.
>
> Hm, well, you didn't actually fill in the paths for the other
> subqueries. I agree that it's not worth doing so in
> SS_make_initplan_from_plan, but a comment explaining that decision
> seems in order. Also, there's nothing stopping us from saving the
> path for subplans made in build_subplan, except adding a parameter
> to pass them down. So I did that, and made a couple other cosmetic
> changes, and pushed it.

Thanks for the adjustments and pushing!

> That's not the fault of anything we did here; the IndexOnlyScan path
> in the subquery is in fact not marked with any pathkeys, even though
> clearly its result is sorted. I believe that's an intentional
> decision from way way back, that pathkeys only correspond to orderings
> that are of interest in the current query level. "select unique1 from
> tenk1 b order by unique1" has an interest in ordering by unique1,
> but "select unique1 from tenk1 b" does not, so it's choosing that
> path strictly according to cost. Not generating pathkeys in such a
> query saves a few cycles and ensures that we won't improperly prefer
> a path on the basis of pathkeys if it hasn't got a cost advantage.
> So I'm quite hesitant to muck with that old decision, especially in
> the waning days of a development cycle, but the results do feel a
> little strange here.

Yeah, I also noticed this while writing the test case. That's why I
added 'order by unique1' explicitly in the CTE subquery. This also
happens to subquery RTEs, such as

explain (costs off)
select * from (select unique1 from tenk1 offset 0) order by unique1;
QUERY PLAN
----------------------------------------------------
Sort
Sort Key: tenk1.unique1
-> Index Only Scan using tenk1_unique1 on tenk1
(3 rows)

I agree that mucking with the old decision might not be a good idea. In
addition, for a MATERIALIZED CTE, generating pathkeys according to the
outer query's ordering requirements breaks the idea of optimization
fence: the outer query should not affect the plan chosen for the CTE
query.

Thanks
Richard

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2024-03-27 06:58:06 Re: pgsql: Track last_inactive_time in pg_replication_slots.
Previous Message John Naylor 2024-03-27 06:44:10 Re: Change GUC hashtable to use simplehash?