Quick Links

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

From:	Marco Boeringa <marco(at)boeringa(dot)demon(dot)nl>
To:	David Rowley <dgrowleyml(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-bugs(at)lists(dot)postgresql(dot)org, Thom Brown <thom(at)linux(dot)com>
Subject:	Re: Potential "AIO / io workers" inter-worker locking issue in PG18?
Date:	2025-10-20 22:46:40
Message-ID:	9a9b6c48-5707-425f-889e-d8acfbaba74e@boeringa.demon.nl
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

Hi David,

I totally understood your remarks about no caching going on, and don't
intend to keep the query as-is with self join if I decide to make and
benchmark any changes. Any changes will certainly either involve a WITH
/ CTE materialized, or first writing out some temporary table with the
results of the expensive function calls to be able to re-use them from
there.

Whether any of that will result in any measurable change and benefits,
will require thorough testing.

Yes, you are right this has drifted. I don't need any more help and this
issue can essentially be closed, I just wanted to report the fact that
an application and workflow that has run for years without issues, was
heavily affected by the (planner) changes for PG18. I have submitted all
data I can on this, and the culprit is now clear, also thanks to
Andres's useful (debugging) suggestions. It is also clear what the
solution in my case is, the extra ANALYZE run I added to my code.

So, as said, I don't need more help, although answers to some of
unanswered questions regarding specific observations I made (e.g. PG18
versus PG17 and lower) would still be welcome, e.g. to repeat from
previous posts.:

- As I wrote in a previous post, just before entering the Python
multi-threaded processing that generates the jobs, I am adding a primary
key column with unique objectids (GENERATED BY DEFAULT AS IDENTITY),
that is subsequently used in the join. I realize this actually an
enhancement request, but I am wondering why, if PostgreSQL already needs
to dig through the entire table to add a new column with unique objectid
values, it doesn't automatically update the statistics of the newly
added column and write out the proper number of records of the table,
that must be known by the end of the addition of the column, to the
'pg_class.reltuples' table you refer to? It seems to me it would save
the need of the ANALYZE here, and at least ensure that there is some
useful statistics available on the vital primary key column and about
the relation size?

- As I wrote in a previous post, the multi-threaded Python code creates
multiple jobs (dozens). What I am seeing is that only part of the jobs
is failing, and with some runs, none. E.g. I can run my tool 3 times
without issues, than the fourth run some of the jobs get the bad plan (I
only see this behavior in PG18, I never saw it in <= PG17). In all of
these cases, the input data is *exactly* the same. The planning behavior
therefor appears non-deterministic. As I understood it, PostgreSQL may
indeed have non-deterministic behavior if it switches to the genetic
algorithm on complex queries with many joins, but I have the feeling
that my query doesn't quite satisfy that level of complexity? Or am I
wrong here, and do you consider it likely it went through the genetic
algorithm? It would actually be desirable if EXPLAIN (especially
'auto_explain') output always showed whether the genetic algorithm was
activated, so one could judge if non-deterministic behavior of the
planner is expected.

But having no answers to these questions is not a major issue, it is
just my curiosity.

Thanks for your answers and contributions so far,

Marco

Op 21-10-2025 om 00:16 schreef David Rowley:
> On Tue, 21 Oct 2025 at 10:57, Marco Boeringa <marco(at)boeringa(dot)demon(dot)nl> wrote:
>> PostGIS functions can be very expensive, especially in the context of
>> the fact that Polygon and Line geometries can vary vastly in size in
>> terms of the number of vertices that constitute them, which has a
>> profound impact on some PostGIS function calls, merely due to the
>> enormous complexity of some shapes.
> How expensive the function call is is irrelevant as there simply is no
> function result caching going on and there's nothing wired up in any
> released version of PostgreSQL which gives you this with the query
> you've written.
>
> You still seem to be under the illusion that the self-join is giving
> you some sort of caching. If you remain content in not trusting me on
> that, by all means, create a plpgsql function with a RAISE NOTICE and
> try it out for yourself.
>
>> But of course you're right that any change will need some thorough
>> testing before assuming it will actually benefit the queries.
> I don't recall talking about testing... (It may help if you quote
> things you're replying to. This conversation will be quite hard to
> follow with your top post replies.)
>
> This whole conversation has drifted well off what the original report
> was about, so I think it's better if you need more help on this to use
> pgsql-performance(at)lists(dot)postgresql(dot)org
>
> David

In response to

Re: Potential "AIO / io workers" inter-worker locking issue in PG18? at 2025-10-20 22:16:13 from David Rowley

Responses

Re: Potential "AIO / io workers" inter-worker locking issue in PG18? at 2025-10-21 00:44:52 from David Rowley

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	David Rowley	2025-10-21 00:44:52	Re: Potential "AIO / io workers" inter-worker locking issue in PG18?
Previous Message	David Rowley	2025-10-20 22:16:13	Re: Potential "AIO / io workers" inter-worker locking issue in PG18?