Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Matheus Alcantara <matheusssilv97(at)gmail(dot)com>, Arseniy Mukhin <arseniy(dot)mukhin(dot)dev(at)gmail(dot)com>
Cc: Joel Jacobson <joel(at)compiler(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>
Subject: Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
Date: 2025-11-06 10:05:20
Message-ID: a8892a0e-7660-42d1-98bb-b677f6c15069@iki.fi
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 05/11/2025 21:02, Matheus Alcantara wrote:
> On Wed Nov 5, 2025 at 9:59 AM -03, Heikki Linnakangas wrote:
>>> In case of an error on TransactionIdDidCommit() I think that we will
>>> have the same behavior as we advance the "current" position before of
>>> DidCommit call the PG_FINALLY block will set the backend position past
>>> the failing notification entry.
>>
>> With my patch, if TransactionIdDidCommit() fails, we will lose all the
>> notifications that we have buffered in the local buffer but haven't
>> passed to NotifyMyFrontEnd() yet. On 'master', you only lose a single
>> notification, the one where TransactionIdDidCommit() failed.
>>
> Yeah, that's true.
>
>>> How bad would be to store the notification entries within a list and
>>> store the next position with the notification entry and then wrap the
>>> NotifyMyFrontEnd() call within a PG_TRY and update the "current" to the
>>> saved "next position" on PG_CATCH? Something like this:
>>
>> [ ...]
>>
>> That addresses the failure on NotifyMyFrontEnd, but not on
>> TransactionIdDidCommit.
>>
>> IMHO we should just make these errors FATAL. TransactionIdDidCommit()
>> should not really fail, after fixing the bug we're discussing.
>> NotifyMyFrontEnd() could fail on OOM, but that seems pretty unlikely on
>> an otherwise idle connection. Or it could fail if the client connection
>> is lost, but then the backend is about to die anyway. And arguably
>> closing the connection is better than losing even a single notification,
>> anyway.
>>
> My only concern with making these errors FATAL is that if a notification
> entry causes a different, recoverable error, all subsequent messages
> will be lost. This is because if backend die and the user open a new
> connection and execute LISTEN on the channel it will not see these
> notifications past the one that caused the error. I'm not sure if we are
> completely safe from this case of a recoverable error, what do you
> think?

I did some more testing of the current behavior, using the encoding
conversion to cause an error:

In backend A:

SET client_encoding='latin1';
LISTEN foo;

Backend b:

NOTIFY foo, 'ハ';

Observations:

- If the connection is idle when the notification is received, the ERROR
is turned into FATAL anyway:

postgres=# SET client_encoding='latin1';
SET
postgres=# LISTEN foo;
LISTEN
postgres=# select 1; -- do the NOTIFY in another connection before this
ERROR: character with byte sequence 0xe3 0x83 0x8f in encoding "UTF8"
has no equivalent in encoding "LATIN1"
FATAL: terminating connection because protocol synchronization was lost
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

- If there are multiple notifications pending, we stop processing the
subsequent notifications on the first error. The subsequent
notifications will only be processed when another notify interrupt is
received, i.e. when a backend sends yet another notification.

I'm getting more and more convinced that escalating all ERRORs to FATALs
during notify processing is the right way to go. Attached is a new patch
set that does that.

- Heikki

Attachment Content-Type Size
v3-0001-Escalate-ERRORs-during-async-notify-processing-to.patch text/x-patch 5.1 KB
v3-0002-Fix-bug-where-we-truncated-CLOG-that-was-still-ne.patch text/x-patch 14.6 KB
v3-0003-Fix-remaining-race-condition-with-CLOG-truncation.patch text/x-patch 6.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2025-11-06 10:10:00 Re: [Patch] Windows relation extension failure at 2GB and 4GB
Previous Message John Naylor 2025-11-06 09:38:31 Re: GiST README typos