From: | feichanghong <feichanghong(at)qq(dot)com> |
---|---|
To: | Zhang Mingli <zmlpostgres(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Even when the data is already ordered, MergeAppend still adds a Sort node |
Date: | 2025-07-20 08:49:16 |
Message-ID: | tencent_6815C70B82A2D2D33B56867E67C0478D8906@qq.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> On Jul 19, 2025, at 13:15, Zhang Mingli <zmlpostgres(at)gmail(dot)com> wrote:
>
> Hi,
>
> On Jul 18, 2025 at 22:52 +0800, feichanghong <feichanghong(at)qq(dot)com>, wrote:
>
> explain select * from (select * from t where a = 1 and b > 1 order by a, b) order by a, b limit 1;
> QUERY PLAN
> -----------------------------------------------------------------------------------------
> Limit (cost=366.26..366.27 rows=1 width=8)
> -> Sort (cost=366.26..391.26 rows=9999 width=8)
> Sort Key: t.a, t.b
> -> Index Only Scan using t_a_b_idx on t (cost=0.29..316.27 rows=9999 width=8)
> Index Cond: ((a = 1) AND (b > 1))
> (5 rows)
> ```
>
> Should we retain the complete `pathkeys` in `Path->pathkeys` for use by the
> upper layers of the subquery, rather than just keeping the portion trimmed by
> `PlannerInfo->query_pathkeys`? I'm not sure if my understanding is correct.
>
> The subquery has a qualifier a = 1 that forms an Equivalence Class (EC) whose ec_member contains a constant.
> As a result, subroot->sort_pathkeys doesn't need to include column a.
> However, in the outer query, there are no such qualifiers to form a similar EC, and the ORDER BY a, b clause means root->sort_pathkeys requires both columns a and b.
> When convert_subquery_pathkeys is called, the subpath lacks the pathkeys for column a.
> Furthermore, is there a more efficient way to write this, to avoid the
> `Sort` node mentioned above?
Yes, your understanding is basically consistent with mine.
> A simple solution is to add an EC using a qual:
>
>
>
> EXPLAIN SELECT * FROM (SELECT * FROM t WHERE a = 1 AND b > 1 ORDER BY a, b) WHERE a = 1 ORDER BY a, b LIMIT 1;
> QUERY PLAN
> -----------------------------------------------------------------------------------
> Limit (cost=0.29..0.32 rows=1 width=8)
> -> Index Only Scan using t_a_b_idx on t (cost=0.29..316.27 rows=9999 width=8)
> Index Cond: ((a = 1) AND (b > 1))
> (3 rows)
Thank you for your suggestion, this method can address simple subquery
scenarios. However, my situation involves a union all, so it's not possible to
add the corresponding equality qualifier at the top level. The SQL is as
follows:
```sql
explain select a, b from (
(select a, b from t t1 where a > 19000 order by a, b)
union all
(select a, b from t t2 where a = 1 and b > 1 order by a, b)
) t order by a, b limit 1;
```
Currently, I have not found a better way to rewrite this, except by optimizing
this scenario from the pg kernel side.
Best Regards,
Fei Changhong
From | Date | Subject | |
---|---|---|---|
Next Message | Konstantin Knizhnik | 2025-07-20 11:48:11 | DSA overflow in hash join |
Previous Message | Nikhil Kumar Veldanda | 2025-07-20 08:28:08 | Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem) |