From: | "Joel Jacobson" <joel(at)compiler(dot)org> |
---|---|
To: | "Matheus Alcantara" <matheusssilv97(at)gmail(dot)com>, "Masahiko Sawada" <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | Álvaro Herrera <alvherre(at)kurilemu(dot)de>, "Arseniy Mukhin" <arseniy(dot)mukhin(dot)dev(at)gmail(dot)com>, "Rishu Bagga" <rishu(dot)postgres(at)gmail(dot)com>, "Yura Sokolov" <y(dot)sokolov(at)postgrespro(dot)ru>, "Daniil Davydov" <3danissimo(at)gmail(dot)com>, "Alexandra Wang" <alexandra(dot)wang(dot)oss(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue |
Date: | 2025-10-22 00:02:08 |
Message-ID: | 7726d706-4a11-4747-900e-ea27f8de9b65@app.fastmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Oct 22, 2025, at 02:16, Matheus Alcantara wrote:
>> Regarding the v8 patch, it introduces a fundamentally new way of
>> managing notification entries (adding entries with 'committed' state
>> and marking them 'aborted' in abort paths). This affects all use
>> cases, not just those involving very old unconsumed notifications, and
>> could introduce more serious bugs like PANIC or SEGV. For
>> backpatching, I prefer targeting just the problematic behavior while
>> leaving unrelated parts unchanged. Though Álvaro might have a
>> different perspective on this.
>>
> Thanks very much for this explanation and for what you've previously
> wrote on [1]. It's clear to me now that the v8 architecture is not a
> good way to go.
How about doing some more work in vac_update_datfrozenxid()?
Pseudo-code sketch:
```
void
vac_update_datfrozenxid(void)
{
/* After computing newFrozenXid from all known sources... */
TransactionId oldestNotifyXid = GetOldestQueuedNotifyXid();
if (TransactionIdIsValid(oldestNotifyXid) &&
TransactionIdPrecedes(oldestNotifyXid, newFrozenXid))
{
/*
* The async queue has XIDs older than our proposed freeze point.
* Attempt cleanup, then back off and let the next VACUUM benefit.
*/
if (asyncQueueHasListeners())
{
/*
* Wake all listening backends across *all* databases
* that are not already at QUEUE_HEAD.
* They'll hopefully process notifications and advance
* their pointers, allowing the next VACUUM to freeze further.
*/
asyncQueueWakeAllListeners();
}
else
{
/*
* No listeners exist - discard all unread notifications.
* The next VACUUM should succeed in advancing datfrozenxid.
* asyncQueueAdvanceTailNoListeners() would take exclusive lock
* on NotifyQueueLock before checking
* QUEUE_FIRST_LISTENER == INVALID_PROC_NUMBER
*/
asyncQueueAdvanceTailNoListeners();
}
/*
* Back off datfrozenxid to protect the old XIDs.
* The cleanup we just performed should allow the next VACUUM
* to freeze further.
*/
newFrozenXid = oldestNotifyXid;
}
}
```
Maybe it wouldn't solve all problematic situations, but to me it seems
like these measures could help many of them, or am I missing some
crucial insight here?
/Joel
From | Date | Subject | |
---|---|---|---|
Next Message | Matt Smith (matts3) | 2025-10-22 00:57:47 | Meson install warnings when running postgres build from a sandbox |
Previous Message | Peter Smith | 2025-10-21 23:19:12 | Re: Should we say "wal_level = logical" instead of "wal_level >= logical" |