Re: MergeAppend could consider sorting cheapest child path

From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Andy Fan <zhihuifan1213(at)163(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Nikita Malakhov <HukuToc(at)gmail(dot)com>
Subject: Re: MergeAppend could consider sorting cheapest child path
Date: 2025-10-02 22:42:38
Message-ID: 301833a9-3482-4b36-8947-f35e4bace303@postgrespro.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 07.09.2025 14:26, Alexander Korotkov wrote:
> On Fri, Sep 5, 2025 at 11:45 AM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
>> On 1/9/2025 22:26, Alexander Korotkov wrote:
>>> On Thu, Jul 31, 2025 at 5:20 PM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
>>>> See this minor correction in the attachment. postgres_fdw tests are
>>>> stable now.
>>> I have another idea. What if we allow MergeAppend paths only when at
>>> least one subpath is preordered. This trick also allow us to exclude
>>> MergeAppend(Sort) dominating Sort(Append). I see the regression tests
>>> changes now have much less volume and looks more reasonable. What do
>>> you think?
>> I believe a slight mistake has been made with the total_has_ordered /
>> startup_has_ordered parameters, which has caused unnecessary test
>> changes in inherit.out (See updated patch in the attachment). Although
>> not the best test in general (it depends on the autovacuum), it
>> highlighted the case where a startup-optimal strategy is necessary, even
>> when a fractional-optimal path is available, which may lead to continue
>> of the discussion [1].>
>>> Also, do you think get_cheapest_fractional_path_for_pathkeys_ext() and
>>> get_cheapest_path_for_pathkeys_ext() should consider incremental sort?
>>> The revised patch teaches them to do so.
>> Following 55a780e9476 [2] it should be considered, of course.
> Great, thank you for catching this. The diff of costs is attached. I
> see the costs now are better or within the fuzz factor.
>
Hi! I looked at regression test changes but one of them confused me.

Example where the plan shape changed:

EXPLAIN (COSTS OFF)
SELECT t1.a, t2.b
FROM (SELECT * FROM prt1 WHERE a < 450) t1
LEFT JOIN (SELECT * FROM prt2 WHERE b > 250) t2
  ON t1.a = t2.b
WHERE t1.b = 0
ORDER BY t1.a, t2.b;

-- before
                        QUERY PLAN
-----------------------------------------------------------
 Incremental Sort
   Sort Key: prt1.a, prt2.b
   Presorted Key: prt1.a
   ->  Merge Left Join
         Merge Cond: (prt1.a = prt2.b)
         ->  Sort
               Sort Key: prt1.a
               ->  Append
                     ->  Seq Scan on prt1_p1 prt1_1
                           Filter: ((a < 450) AND (b = 0))
                     ->  Seq Scan on prt1_p2 prt1_2
                           Filter: ((a < 450) AND (b = 0))
         ->  Sort
               Sort Key: prt2.b
               ->  Append
                     ->  Seq Scan on prt2_p2 prt2_1
                           Filter: (b > 250)
                     ->  Seq Scan on prt2_p3 prt2_2
                           Filter: (b > 250)
(19 rows)

-- now
                           QUERY PLAN
-----------------------------------------------------------------
 Sort
   Sort Key: prt1.a, prt2.b
   ->  Merge Right Join
         Merge Cond: (prt2.b = prt1.a)
         ->  Append
               ->  Sort
                     Sort Key: prt2_1.b
                     ->  Seq Scan on prt2_p2 prt2_1
                           Filter: (b > 250)
               ->  Sort
                     Sort Key: prt2_2.b
                     ->  Seq Scan on prt2_p3 prt2_2
                           Filter: (b > 250)
         ->  Materialize
               ->  Append
                     ->  Sort
                           Sort Key: prt1_1.a
                           ->  Seq Scan on prt1_p1 prt1_1
                                 Filter: ((a < 450) AND (b = 0))
                     ->  Sort
                           Sort Key: prt1_2.a
                           ->  Seq Scan on prt1_p2 prt1_2
                                 Filter: ((a < 450) AND (b = 0))
(23 rows)

Previously we had incremental sort on (t1.a, t2.b) with prt1.a already
presorted; now we sort both t1.a and t2.b after a merge right join. It
looks inefficiently or I missed something?

Other tests looked fine for me.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2025-10-02 23:16:43 Re: Support getrandom() for pg_strong_random() source
Previous Message Sami Imseih 2025-10-02 22:27:06 Re: Add stats_reset to pg_stat_all_tables|indexes and related views