Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

From: "Matheus Alcantara" <matheusssilv97(at)gmail(dot)com>
To: "Daniil Davydov" <3danissimo(at)gmail(dot)com>
Cc: Álvaro Herrera <alvherre(at)kurilemu(dot)de>, "Alexandra Wang" <alexandra(dot)wang(dot)oss(at)gmail(dot)com>, "PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
Date: 2025-08-20 21:18:32
Message-ID: DC7KGTXW3QSG.OZA24HONT78J@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue Aug 19, 2025 at 2:37 PM -03, Daniil Davydov wrote:
> Hi,
>
> On Tue, Aug 19, 2025 at 6:31 PM Matheus Alcantara
> <matheusssilv97(at)gmail(dot)com> wrote:
>>
>> On Tue Aug 19, 2025 at 12:57 AM -03, Daniil Davydov wrote:
>> > You have started a very long transaction, which holds its xid and prevents
>> > vacuum from freezing it. But what if the backend is stuck not inside a
>> > transaction? Maybe we can just hardcode a huge delay (not inside the
>> > transaction) or stop process execution via breakpoint in gdb. If we will use it
>> > instead of a long query, I think that this error may be reproducible.
>> >
>> But how could this happen in real scenarios? I mean, how the backend
>> could be stuck outside a transaction?
>>
>
> For now, I cannot come up with a situation where it may be possible.
> Perhaps, such a lagging may occur during network communication,
> but I couldn't reproduce it. Maybe other people know how we can achieve
> this?
>
Reading more the code I understand that once the a NOTIFY command is
received by a backend (and the transaction is committed) it will
emedialy signal all other listener backends and if the listener backend
is in idle it will consume the notification and then send it back to the
client as a PqMsg_NotificationResponse, so if there is a network delay
to send the notification from the listener backend back to the client I
don't think that it would be possible to get this error, because the
message was already dispatched by the backend and it will eventually get
to the client and once the notification is dispatched the backend
doesn't need to track it anymore (the queue pointers of the backend are
advanced after the dispatch).

Assuming that every SQL command is wrapped into a transaction (if it's
not already inside in) I think a busy listener backend will always
prevent the vacuum from freezing clog files past from its current xid,
so any notification that is sent while the backend is busy will not have
their transaction status removed from clog files anyway.

Is all these understandings and assumptions correct or am I missing
something here?

> I think that if such a situation may be possible, the suggestion to delete
> messages will no longer be relevant. Therefore, first of all, I would like to
> clarify this issue.
>
From what I've understood until now it seems to me that this can happen
only if we have a notification on the queue without any listener, so the
notification may stay on the queue from a long time and a vacuum freeze
can be executed during this time and then when we have a new listener
(even for a different channel) it will fail to advance the pointers at
listener setup(Exec_ListenPreCommit()) because it would not be able to
get the transition status of this very old notification.

(please note that I'm not trying to invalidate your concern, I'm also
have this concern but unfortunately I'm unable to reproduce it and I'm
just sharing my thoughts to see if this issue is really possible or not)

--
Matheus Alcantara

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Matheus Alcantara 2025-08-20 21:22:25 Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
Previous Message Przemysław Sztoch 2025-08-20 21:13:26 Re: date_trunc function in interval version