Re: pgsql: Teach DSM registry to ERROR if attaching to an uninitialized ent

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Nathan Bossart <nathan(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgsql: Teach DSM registry to ERROR if attaching to an uninitialized ent
Date: 2025-11-20 18:18:56
Message-ID: CA+TgmoZCHER55cGmnL_svMjH=k-S0w29ih+ym_+=t032TSsOFA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Thu, Nov 20, 2025 at 11:12 AM Nathan Bossart
<nathandbossart(at)gmail(dot)com> wrote:
> ISTM that besides pure coding errors, the most likely reasons for an ERROR
> during DSM segment initialization are things like out-of-memory, too many
> LWLock tranches registered, etc. In those cases, do we want every single
> backend that tries to attach to the segment to retry and most likely fail
> again? And what if the initialization callback completed halfway before
> failing? Do we expect every extension author to carefully craft their
> callbacks to handle that?

Well, I don't want to pretend like I know all the answers here. I do
think it's pretty clear that making it the initialization callback's
job to deal with a partially-failed initialization would be crazy.
What I'd be inclined to do after a failure is tear down the entire
segment. Even if there's no mechanism to retry, you're saving the
memory that the segment would have consumed. And maybe there is a
mechanism to retry. If these initialization callbacks are cheap
enough, maybe having every backend try is not really a big deal --
maybe it's good, insofar as an interactive session would be more
likely to see an error indicating the real cause of failure than a
fake error that basically tells you at some point in the past
something went wrong. If repeated failures are too painful, another
idea could be to just mark the entry as being in a failed state (maybe
with a timestamp) and let the extension decide when it wants to press
the reset button. Or maybe that's all too complicated for too little
utility. I'm not sure. My perception is that this is setting a
significantly lower standard for error tolerance than what we
typically seek to achieve, but sometimes my perceptions are wrong, and
sometimes better options are hard to come by.

> On the other hand, if we don't handle ERRORs and someone does run into an
> OOM or whatnot, do we want other backends to attach and use the unitialized
> segment (leading to things like seg-faults)? Tracking whether an entry was
> initialized seems like cheap insurance against these problems.

That's fair.

> Finally, the only report from the field about any of this involves SQLsmith
> allocating tons of LWLock tranches via test_dsa, which has been fixed
> independently. So I have no evidence to suggest that we need to make it
> more robust. In fact, without this one arguably contrived report, I
> probably wouldn't have done anything at all.

Yeah, I'm not clear who is using this or how, so that makes it hard to judge.

> I'll grant you that this is a judgment call. I tried to find a good
> middle ground, but I would be unsurprised to learn that other committers
> would have done things differently.

Such is life!

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Bruce Momjian 2025-11-20 20:24:00 pgsql: tools: update tools/codelines to use "git ls-files"
Previous Message Melanie Plageman 2025-11-20 16:44:10 pgsql: Update PruneState.all_[visible|frozen] earlier in pruning

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2025-11-20 18:35:52 Re: another autovacuum scheduling thread
Previous Message Jacob Champion 2025-11-20 18:16:19 Re: Remove useless casts to (void *)