| From: | "Joel Jacobson" <joel(at)compiler(dot)org> |
|---|---|
| To: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | "Chao Li" <li(dot)evan(dot)chao(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Optimize LISTEN/NOTIFY |
| Date: | 2025-11-23 15:49:26 |
| Message-ID: | 3a919670-037b-4073-be33-745ca69a6232@app.fastmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Sat, Nov 22, 2025, at 22:30, Joel Jacobson wrote:
> On Thu, Nov 20, 2025, at 21:26, Tom Lane wrote:
>> I took a brief look through the v28 patch, and I'm fairly distressed
>> at how much new logic has been stuffed into what's effectively a
>> critical section. It's totally not okay for AtCommit_Notify or
>> anything it calls to throw an error
>
> Right, I agree. Thanks for guidance.
>
>> So I think there needs to be a serious effort made to move as
>> much as we possibly can of the potentially-risky stuff into
>> PreCommit_Notify. In particular I think we ought to create
>> the shared channel hash entry then, and even insert our PID
>> into it. We could expand the listenersArray entries to include
>> both a PID and a boolean "is it REALLY listening?", and then
>> during Exec_ListenCommit we'd only be required to find an
>> entry we already added and set its boolean, so there's no OOM
>> hazard. Possibly do something similar with the local
>> listenChannelsHash, so as to remove that possibility of OOM
>> failure as well.
>
> Thanks for the idea, I like this approach. I've expanded the
> listenersArray like suggested, and moved all risky stuff from
> Exec_ListenCommit to PreCommit_Notify.
>
>> (An alternative design could be to go ahead and insert our
>> PID during PreCommit_Notify, and just tolerate the small
>> risk of getting signaled when we didn't need to be. But
>> then we'd need some mechanism for cleaning out the bogus
>> entry during AtAbort_Notify.)
>
> We seem to need a cleanup mechanism also with the boolean field design,
> since a channel could end up being added only to listenChannelsHash, but
> not channelHash, which would trick IsListeningOn() into falsely thinking
> we're listening on such channel when we're not. This could happen if
> successfully adding the channel to listenChannelsHash, but OOM when
> trying to add it to channelHash.
>
> AtAbort_Notify now handles such half-state, by reconciling all channels
> that had LISTEN_LISTEN actions, using the channelHash as the single
> source of truth, removing channels from both listenChannelsHash and
> channelHash, unless the active field is true (which means we were
> already listening to the channel due to a previous transaction).
>
> I've tested triggering the cleanup logic by adding elog ERROR that
> triggered after listenChannelsHash insert, and another test that
> triggered after channelHash insert, and it cleaned it up correctly. I
> haven't created a test for it in tree though, would we want that?
>
>> I'm not sure what I think about the new logic in SignalBackends
>> from this standpoint. Making it very-low-probability-of-error
>> definitely needs some consideration though.
>
> The initChannelHash call in SignalBackends is now gone, moved to
> PreCommit_Notify (called if there are any pendingNotifies).
>
> I also took the liberty of fixing the XXX comment, to lazily preallocate
> the signals arrays during PreCommit_Notify. It felt too inconsistent to
> just leave that unfixed, but maybe should be a separate commit?
I've extracted the preallocation of signals arrays into a separate patch:
https://commitfest.postgresql.org/patch/6248/
> I wonder how risky the remaining new logic in SignalBackends is. For
> instance, I looked at dshash_find(..., false), and note it calls
> LWLockAcquire which in turn could elog ERROR if num locks is exceeded,
> but in master we're already calling LWLockAcquire in SignalBackends, so
> should be fine I guess?
>
> Apart from that, I don't see any new logic in SignalBackends, that could
> potentially be risky.
/Joel
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andres Freund | 2025-11-23 15:55:21 | Re: [PATCH] Fix ARM64/MSVC atomic memory ordering issues on Win11 by adding explicit DMB barriers |
| Previous Message | Greg Burd | 2025-11-23 15:22:47 | Re: Trying out <stdatomic.h> |