Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "Matheus Alcantara" <matheusssilv97(at)gmail(dot)com>, "Masahiko Sawada" <sawada(dot)mshk(at)gmail(dot)com>
Cc: Álvaro Herrera <alvherre(at)kurilemu(dot)de>, "Arseniy Mukhin" <arseniy(dot)mukhin(dot)dev(at)gmail(dot)com>, "Rishu Bagga" <rishu(dot)postgres(at)gmail(dot)com>, "Yura Sokolov" <y(dot)sokolov(at)postgrespro(dot)ru>, "Daniil Davydov" <3danissimo(at)gmail(dot)com>, "Alexandra Wang" <alexandra(dot)wang(dot)oss(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
Date: 2025-10-22 00:02:08
Message-ID: 7726d706-4a11-4747-900e-ea27f8de9b65@app.fastmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 22, 2025, at 02:16, Matheus Alcantara wrote:
>> Regarding the v8 patch, it introduces a fundamentally new way of
>> managing notification entries (adding entries with 'committed' state
>> and marking them 'aborted' in abort paths). This affects all use
>> cases, not just those involving very old unconsumed notifications, and
>> could introduce more serious bugs like PANIC or SEGV. For
>> backpatching, I prefer targeting just the problematic behavior while
>> leaving unrelated parts unchanged. Though Álvaro might have a
>> different perspective on this.
>>
> Thanks very much for this explanation and for what you've previously
> wrote on [1]. It's clear to me now that the v8 architecture is not a
> good way to go.

How about doing some more work in vac_update_datfrozenxid()?

Pseudo-code sketch:

```
void
vac_update_datfrozenxid(void)
{

/* After computing newFrozenXid from all known sources... */

TransactionId oldestNotifyXid = GetOldestQueuedNotifyXid();

if (TransactionIdIsValid(oldestNotifyXid) &&
TransactionIdPrecedes(oldestNotifyXid, newFrozenXid))
{
/*
* The async queue has XIDs older than our proposed freeze point.
* Attempt cleanup, then back off and let the next VACUUM benefit.
*/

if (asyncQueueHasListeners())
{
/*
* Wake all listening backends across *all* databases
* that are not already at QUEUE_HEAD.
* They'll hopefully process notifications and advance
* their pointers, allowing the next VACUUM to freeze further.
*/
asyncQueueWakeAllListeners();
}
else
{
/*
* No listeners exist - discard all unread notifications.
* The next VACUUM should succeed in advancing datfrozenxid.
* asyncQueueAdvanceTailNoListeners() would take exclusive lock
* on NotifyQueueLock before checking
* QUEUE_FIRST_LISTENER == INVALID_PROC_NUMBER
*/
asyncQueueAdvanceTailNoListeners();
}

/*
* Back off datfrozenxid to protect the old XIDs.
* The cleanup we just performed should allow the next VACUUM
* to freeze further.
*/
newFrozenXid = oldestNotifyXid;
}
}
```

Maybe it wouldn't solve all problematic situations, but to me it seems
like these measures could help many of them, or am I missing some
crucial insight here?

/Joel

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Matt Smith (matts3) 2025-10-22 00:57:47 Meson install warnings when running postgres build from a sandbox
Previous Message Peter Smith 2025-10-21 23:19:12 Re: Should we say "wal_level = logical" instead of "wal_level >= logical"