Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

From: Matheus Alcantara <matheusssilv97(at)gmail(dot)com>
To: Daniil Davydov <3danissimo(at)gmail(dot)com>
Cc: Álvaro Herrera <alvherre(at)kurilemu(dot)de>, Alexandra Wang <alexandra(dot)wang(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
Date: 2025-08-19 11:31:00
Message-ID: CAFY6G8dyimRikP3nEK5gqMPnMZe5RTZjFvKvCFLJAvsajxK1fg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue Aug 19, 2025 at 12:57 AM -03, Daniil Davydov wrote:
>> I think that this definition is correct, but IIUC the tail can still
>> have notifications with xid's that were already truncated by vacuum
>> freeze. When the LISTEN is executed, we first loop through the
>> notification queue to try to advance the queue pointers and we can
>> eventually iterate over a notification that was added on the queue
>> without any listener but it has a xid that is already truncated by vacuum
>> freeze, so in this case it will fail to get the transaction status. On
>> Alex steps to reproduce the issue it first executes the NOTIFY and
>> then executes the LISTEN which fails after vacuum freeze.
>>
>
> Yeah, you are right. I looked at the code again, and found out that even
> if there are no active listeners, new listener should iterate from the head
> to the tail. Thus, it may encounter truncated xid. Anyway, I still think that
> dropping notifications is not the best way to resolve this issue.
>
In the steps that Alex shared, is it expected that the "LISTEN c1" command
consumes the notification that was sent previously with NOTIFY? IIUC the
LISTEN command should be executed before of any NOTIFY, so executing the
LISTEN after a NOTIFY will not consume any previous notification added
on the channel, so how bad would be to drop this notification from the
queue in this situation?

>> > If the "inactive" listener is the backend which is stuck somewhere, the
>> > answer is "no" - this backend should be able to process all notifications.
>> >
>> I tried to reproduce the issue by using some kind of "inactive"
>> listener but so far I didn't manage to trigger the error.
>>
>> After the vacuum freeze I still can see the same files on pg_xact/ and
>> if I cancel the long query the notification is received correctly, and
>> then if I execute vacuum freeze again on every database the oldest
>> pg_xact file is truncated.
>>
>> So, if my tests are correct I don't think that storing the oldest xid is
>> necessary anymore since I don't think that we can lose notifications
>> using the patch from Daniil or I'm missing something here?
>>
>
> You have started a very long transaction, which holds its xid and prevents
> vacuum from freezing it. But what if the backend is stuck not inside a
> transaction? Maybe we can just hardcode a huge delay (not inside the
> transaction) or stop process execution via breakpoint in gdb. If we will use it
> instead of a long query, I think that this error may be reproducible.
>
But how could this happen in real scenarios? I mean, how the backend
could be stuck outside a transaction?

--
Matheus Alcantara

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2025-08-19 11:56:45 Re: VM corruption on standby
Previous Message Michael Paquier 2025-08-19 11:17:49 Re: ReplicationSlotRelease() crashes when the instance is in the single user mode