Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Matheus Alcantara <matheusssilv97(at)gmail(dot)com>, Arseniy Mukhin <arseniy(dot)mukhin(dot)dev(at)gmail(dot)com>
Cc: Joel Jacobson <joel(at)compiler(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
Date: 2025-11-05 12:59:15
Message-ID: f267b7f5-bb8d-47b6-a76a-8107be9ad032@iki.fi
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 05/11/2025 14:40, Matheus Alcantara wrote:
> On Wed Nov 5, 2025 at 6:21 AM -03, Heikki Linnakangas wrote:
>> On 04/11/2025 16:40, Arseniy Mukhin wrote:
>>> If we want to preserve the previous behavior, maybe we could use a new
>>> local position while copying notifications and only advance the
>>> "current" position while sending notifications to the frontend?
>>
>> That's not good. The loop advances 'current' before calling
>> TransactionIdDidCommit() to ensure that if there's a broken entry in the
>> queue for which TransactionIdDidCommit() throws an error, we advance
>> 'current' past that point. Otherwise we would get back later to try to
>> process the same broken entry again and again.
>>
>> We could put the NotifyMyFrontEnd() calls in a PG_FINALLY block, so that
>> the copied notifications get sent even on error. But I'm a reluctant to
>> put that in PG_FINALLY, it's not clear to me if NotifyMyFrontEnd() is
>> safe to call during error processing. And if the client is not
>> processing the notifications, the abort processing could be delayed for
>> a long time.
>>
>> One idea is to close the client connection on a TransactionIdDidCommit
>> error, i.e. make the error FATAL. The current behavior where we skip the
>> notification seems problematic already. Closing the connection would be
>> a clear signal that some notifications have been lost.
>>
>> Or we can just accept new behavior that if TransactionIdDidCommit()
>> fails, we might lose more notifications than the one that caused the error.
>>
> I think that we may have an error on TransactionIdDidCommit() and on
> NotifyMyFrontEnd() right?

True.

> In case of an error on TransactionIdDidCommit() I think that we will
> have the same behavior as we advance the "current" position before of
> DidCommit call the PG_FINALLY block will set the backend position past
> the failing notification entry.

With my patch, if TransactionIdDidCommit() fails, we will lose all the
notifications that we have buffered in the local buffer but haven't
passed to NotifyMyFrontEnd() yet. On 'master', you only lose a single
notification, the one where TransactionIdDidCommit() failed.

> I think that the problem is on NotifyMyFrontEnd() call that if it fails
> we will set the current backend position to the head of the queue
> or the beginning of the next page possibly losing notifications.

True, a failure on NotifyMyFrontEnd can happen too, and it will also
lead to losing all the other notifications in the local buffer.

> How bad would be to store the notification entries within a list and
> store the next position with the notification entry and then wrap the
> NotifyMyFrontEnd() call within a PG_TRY and update the "current" to the
> saved "next position" on PG_CATCH? Something like this:
> do
> {
> ListEntry le;
> qe = (AsyncQueueEntry *) (page_buffer + QUEUE_POS_OFFSET(thisentry));
> reachedEndOfPage = asyncQueueAdvance(current, qe->length);
> ...
> else if (TransactionIdDidCommit(qe->xid))
> {
> le->entry = qe;
> le->nextPos = current;
> }
> } while (!reachedEndOfPage);
>
> foreach(lc, notificationEntries)
> {
> ListEntry *le = lfirst(lc);
> AsyncQueueEntry *qe = le->entry;
> char *channel = qe->data;
>
> PG_TRY()
> {
> if (IsListeningOn(channel))
> {
> char *payload = qe->data + strlen(channel) + 1;
> NotifyMyFrontEnd(channel, payload, qe->srcPid);
> }
> }
> PG_CATCH()
> {
> current = le->nextPos;
> PG_RE_THROW();
> }
> }

That addresses the failure on NotifyMyFrontEnd, but not on
TransactionIdDidCommit.

IMHO we should just make these errors FATAL. TransactionIdDidCommit()
should not really fail, after fixing the bug we're discussing.
NotifyMyFrontEnd() could fail on OOM, but that seems pretty unlikely on
an otherwise idle connection. Or it could fail if the client connection
is lost, but then the backend is about to die anyway. And arguably
closing the connection is better than losing even a single notification,
anyway.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vaibhav Dalvi 2025-11-05 13:16:38 Re: Non-text mode for pg_dumpall
Previous Message Matheus Alcantara 2025-11-05 12:40:38 Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue