Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

From: "Matheus Alcantara" <matheusssilv97(at)gmail(dot)com>
To: Álvaro Herrera <alvherre(at)kurilemu(dot)de>, "Alexandra Wang" <alexandra(dot)wang(dot)oss(at)gmail(dot)com>
Cc: "Daniil Davydov" <3danissimo(at)gmail(dot)com>, "PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
Date: 2025-08-11 13:41:08
Message-ID: DBZN3PVLZ1UW.25XLKCUXYBVXV@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed Aug 6, 2025 at 7:44 AM -03, Álvaro Herrera wrote:
>> My questions:
>>
>> 1. Is it acceptable to drop notifications from the async queue if
>> there are no active listeners? There might still be notifications that
>> haven’t been read by any previous listener.
>
> I'm somewhat wary of this idea -- could these inactive listeners become
> active later and expect to be able to read their notifies?
>
I'm bit worry about this too.

>> 2. If the answer to 1 is no, how can we teach VACUUM to respect the
>> minimum xid stored in all AsyncQueueEntries?
>
> Maybe we can have AsyncQueueAdvanceTail return the oldest XID of
> listeners, and back off the pg_clog truncation based on that.  This
> could be done by having a new boolean argument that says to look up the
> XID from the PGPROC using BackendPidGetProc(QUEUE_BACKEND_PID) (which
> would only be passed true by vac_update_datfrozenxid(), to avoid
> overhead by other callers), then collect the oldest of those and return
> it.
>
The problem with only considering the oldest XID of listeners is that
IIUC we may have notifications without listeners, and in this case we
may still get this error because when the LISTEN is executed we loop
through the AsyncQueueEntry's on asyncQueueProcessPageEntries() and we
call TransactionIdDidCommit() that raise the error before
IsListeningOn(channel) is called.

Another option would be to add a minXid field on AsyncQueueControl and
then update this value on asyncQueueProcessPageEntries() and
asyncQueueAddEntries() routines, and then we could check this value on
vac_update_datfrozenxid().

> This does create the problem that an inactive listener could cause the
> XID counter to stay far in the past.  Maybe we could try to avoid this
> by adding more signalling (e.g, AsyncQueueAdvanceTail() itself could
> send PROCSIG_NOTIFY_INTERRUPT signal?), and terminating backends that
> are way overdue on reading notifies.  I'm not sure if this is really
> needed or useful; consider a backend stuck on SIGSTOP (debugger or
> whatever): it will just sit there forever.
>
With this idea that I've proposed we still could have this problem, if a
listener take too long to consume a message we would block vacuum freeze
to advance the xid. For this I think that we could have two GUC's; One
to enable and disable the oldest xmin check on async queue and the
second to control how far we want to prevent the vacuum from freezing
the oldest async queue xid, and if the min xid raises this limit we
ignore and truncate the xid.

I've write a draft patch that plays with the idea, see attached.

--
Matheus Alcantara

Attachment Content-Type Size
v0-0001-Consider-async-queue-min-xid-on-VACUUM-FREEZE.patch text/plain 4.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2025-08-11 13:42:55 Re: SQL Property Graph Queries (SQL/PGQ)
Previous Message Kirill Reshke 2025-08-11 13:20:31 TAB completion for ALTER TABLE ... ALTER CONSTRAINT ... ENFORCED