Re: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Lori Corbani <Lori(dot)Corbani(at)jax(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results
Date: 2025-10-29 01:46:35
Message-ID: CAMbWs48BmV_eT2+AM6J7RA_09XgUjf3HG3VkY8-uvTUTML++XA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, Oct 29, 2025 at 8:07 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Code changes look good, and I confirm that I can't reproduce the
> failure anymore with this patch.

Thanks for the review and confirmation.

> I'm not convinced that the new regression test case is worth the
> cycles, at least not in this form. The main thing that's annoying me
> about it is creating/populating/analyzing its own large one-use table;
> that approach soon leads to regression suites that take forever.

Fair point.

> You could answer that objection by making use of some existing
> regression table, for instance this seems to work as well:
>
> explain select * from tenk1 t1
> where exists(select 1 from tenk1 t2 where tenthous = t1.tenthous);
>
> However, I feel like it may still not be a great test, because it only
> shows that the planner *didn't* pick PRSJ, not that it *couldn't*.
> The cost differential between PRSJ with these settings and the Hash
> Semi Join plan that you get after applying the patch is not very
> great; so it's easy to imagine future changes that'd mean we'd not
> prefer PRSJ here anyway. But I'm not sure what we could do about
> that, so operationally this may be as good a test as we can get
> anyway.

To make the right-semi join look more appealing, I wonder if we could
apply a filter to t1 to make its output smaller than t2, so that the
planner is more likely to choose t1 as the inner side for building the
hash table.

explain select * from tenk1 t1
where exists(select 1 from tenk1 t2 where fivethous = t1.fivethous)
and t1.fivethous < 5;

(I'm using fivethous instead of tenthous to avoid interference from
index scan.)

However, this doesn't seem to move the needle any further. The costs
of PRSJ (unpatched) and PSJ (patched) are 755.67 and 777.54. The cost
difference is still not very great.

> Another thought is that rather than having to remember to reset all
> those planner options, you could make the test a bit shorter and
> more maintainable by writing something like
>
> begin;
> set local parallel_setup_cost=0;
> set local ...
>
> explain ...
>
> rollback;

Brilliant! Thanks for the suggestion.

- Richard

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2025-10-29 04:41:04 BUG #19097: System catalog modifications are allowed by alter
Previous Message Tom Lane 2025-10-28 23:07:34 Re: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results