Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

From: Matheus Alcantara <matheusssilv97(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Arseniy Mukhin <arseniy(dot)mukhin(dot)dev(at)gmail(dot)com>
Cc: Joel Jacobson <joel(at)compiler(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
Date: 2025-11-05 12:40:38
Message-ID: CAFY6G8etx0ZE51p59UHKBEZ7L3WMnmxp7v4xEDVHQdRp2UqhyA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed Nov 5, 2025 at 6:21 AM -03, Heikki Linnakangas wrote:
> On 04/11/2025 16:40, Arseniy Mukhin wrote:
>> It seems that 0002 handles errors during NotifyMyFrontEnd a little differently.
>>
>> With master, in case of a failure during NotifyMyFrontEnd, the
>> listener's position in PG_FINALLY is set to the beginning of the next
>> notification, since we advance the "current" position only if the
>> previous notification was successfully sent.
>>
>> With 0002, we advance the "current" position while copying
>> notifications to the local buffer, and begin sending them after the
>> position has already been advanced for all copied notifications. So in
>> case of a failure, the listener's position in PG_FINALLY is set to the
>> beginning of the next page or queue head. This means we can lose
>> notifications that were copied but were not sent.
>
> True.
>
>> If we want to preserve the previous behavior, maybe we could use a new
>> local position while copying notifications and only advance the
>> "current" position while sending notifications to the frontend?
>
> That's not good. The loop advances 'current' before calling
> TransactionIdDidCommit() to ensure that if there's a broken entry in the
> queue for which TransactionIdDidCommit() throws an error, we advance
> 'current' past that point. Otherwise we would get back later to try to
> process the same broken entry again and again.
>
> We could put the NotifyMyFrontEnd() calls in a PG_FINALLY block, so that
> the copied notifications get sent even on error. But I'm a reluctant to
> put that in PG_FINALLY, it's not clear to me if NotifyMyFrontEnd() is
> safe to call during error processing. And if the client is not
> processing the notifications, the abort processing could be delayed for
> a long time.
>
> One idea is to close the client connection on a TransactionIdDidCommit
> error, i.e. make the error FATAL. The current behavior where we skip the
> notification seems problematic already. Closing the connection would be
> a clear signal that some notifications have been lost.
>
> Or we can just accept new behavior that if TransactionIdDidCommit()
> fails, we might lose more notifications than the one that caused the error.
>
I think that we may have an error on TransactionIdDidCommit() and on
NotifyMyFrontEnd() right?

In case of an error on TransactionIdDidCommit() I think that we will
have the same behavior as we advance the "current" position before of
DidCommit call the PG_FINALLY block will set the backend position past
the failing notification entry.

I think that the problem is on NotifyMyFrontEnd() call that if it fails
we will set the current backend position to the head of the queue
or the beginning of the next page possibly losing notifications.

How bad would be to store the notification entries within a list and
store the next position with the notification entry and then wrap the
NotifyMyFrontEnd() call within a PG_TRY and update the "current" to the
saved "next position" on PG_CATCH? Something like this:
do
{
ListEntry le;
qe = (AsyncQueueEntry *) (page_buffer + QUEUE_POS_OFFSET(thisentry));
reachedEndOfPage = asyncQueueAdvance(current, qe->length);
...
else if (TransactionIdDidCommit(qe->xid))
{
le->entry = qe;
le->nextPos = current;
}
} while (!reachedEndOfPage);

foreach(lc, notificationEntries)
{
ListEntry *le = lfirst(lc);
AsyncQueueEntry *qe = le->entry;
char *channel = qe->data;

PG_TRY()
{
if (IsListeningOn(channel))
{
char *payload = qe->data + strlen(channel) + 1;
NotifyMyFrontEnd(channel, payload, qe->srcPid);
}
}
PG_CATCH()
{
current = le->nextPos;
PG_RE_THROW();
}
}

My concern with this idea is about the overhead of using a list to store
the notification entries and also the PG_TRY block but perhaps it's
worth it to prevent losing notifications?

--
Matheus Alcantara
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2025-11-05 12:59:15 Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
Previous Message Amit Kapila 2025-11-05 12:39:48 Re: Logical Replication of sequences