Re: Even when the data is already ordered, MergeAppend still adds a Sort node

From: Zhang Mingli <zmlpostgres(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org, feichanghong <feichanghong(at)qq(dot)com>
Subject: Re: Even when the data is already ordered, MergeAppend still adds a Sort node
Date: 2025-07-19 05:15:04
Message-ID: ba1ccd6e-d503-4e97-b635-17cefd6df2ca@Spark
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Jul 18, 2025 at 22:52 +0800, feichanghong <feichanghong(at)qq(dot)com>, wrote:
>
> explain select * from (select * from t where a = 1 and b > 1 order by a, b) order by a, b limit 1;
>                                        QUERY PLAN
> -----------------------------------------------------------------------------------------
>  Limit  (cost=366.26..366.27 rows=1 width=8)
>    ->  Sort  (cost=366.26..391.26 rows=9999 width=8)
>          Sort Key: t.a, t.b
>          ->  Index Only Scan using t_a_b_idx on t  (cost=0.29..316.27 rows=9999 width=8)
>                Index Cond: ((a = 1) AND (b > 1))
> (5 rows)
> ```
>
> Should we retain the complete `pathkeys` in `Path->pathkeys` for use by the
> upper layers of the subquery, rather than just keeping the portion trimmed by
> `PlannerInfo->query_pathkeys`? I'm not sure if my understanding is correct.

The subquery has a qualifier a = 1 that forms an Equivalence Class (EC) whose ec_member contains a constant.
As a result, subroot->sort_pathkeys doesn't need to include column a.
However, in the outer query, there are no such qualifiers to form a similar EC, and the ORDER BY a, b clause means root->sort_pathkeys requires both columns a and b.
When convert_subquery_pathkeys is called, the subpath lacks the pathkeys for column a.
> Furthermore, is there a more efficient way to write this, to avoid the
> `Sort` node mentioned above?

A simple solution is to add an EC using a qual:

EXPLAIN SELECT * FROM (SELECT * FROM t WHERE a = 1 AND b > 1 ORDER BY a, b) WHERE a = 1 ORDER BY a, b LIMIT 1;
 QUERY PLAN
-----------------------------------------------------------------------------------
 Limit (cost=0.29..0.32 rows=1 width=8)
 -> Index Only Scan using t_a_b_idx on t (cost=0.29..316.27 rows=9999 width=8)
 Index Cond: ((a = 1) AND (b > 1))
(3 rows)

--
Zhang Mingli
HashData

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-07-19 06:05:56 Re: Improve error reporting in 027_stream_regress test
Previous Message Michael Paquier 2025-07-19 05:04:52 Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem)