From: | "Matheus Alcantara" <matheusssilv97(at)gmail(dot)com> |
---|---|
To: | "Masahiko Sawada" <sawada(dot)mshk(at)gmail(dot)com>, "Matheus Alcantara" <matheusssilv97(at)gmail(dot)com> |
Cc: | Álvaro Herrera <alvherre(at)kurilemu(dot)de>, "Joel Jacobson" <joel(at)compiler(dot)org>, "Arseniy Mukhin" <arseniy(dot)mukhin(dot)dev(at)gmail(dot)com>, "Rishu Bagga" <rishu(dot)postgres(at)gmail(dot)com>, "Yura Sokolov" <y(dot)sokolov(at)postgrespro(dot)ru>, "Daniil Davydov" <3danissimo(at)gmail(dot)com>, "Alexandra Wang" <alexandra(dot)wang(dot)oss(at)gmail(dot)com>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue |
Date: | 2025-10-22 15:23:42 |
Message-ID: | DDOYDH17RU0G.1R4MKAZKP87QV@gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed Oct 22, 2025 at 1:31 AM -03, Masahiko Sawada wrote:
> On Tue, Oct 21, 2025 at 4:16 PM Matheus Alcantara
>> > I think adding a new GUC would be overkill for this fix. As for
>> > dropping old notifications from the queue, we probably don't need to
>> > make it configurable - we could simply drop notifications whose commit
>> > status is no longer available (instead of raising an error).
>> >
>> IIUC this is about not making the vacuum freeze considering the oldest
>> xid on the queue but just remove notifications whose transaction status
>> is no longer available right? Since currently when the error happens we
>> already can't process the notifications it seems a reasonable way to go
>> IMO.
>
> On second thought, simply hiding the error would be worse than our
> current behavior. Users wouldn't know their notifications are being
> dropped, as they often don't check WARNINGs. The more frequently they
> try to freeze XIDs, the more notifications they'd lose. To avoid
> silent discards, they would need to increase
> autovacuum_vacuum_max_freeze_age to accommodate more clog entries, but
> this increases the risk of XID wraparound. I think the proposed
> approach modifying the vacuum freeze to consider the oldest XID on the
> queue would be better. This has a downside as I mentioned: processes
> in idle-in-transaction state even without backend_xmin and backend_xid
> can still accumulate unconsumed notifications. However, leaving
> transactions in idle-in-transaction state for a long time is bad
> practice anyway. While we might want to consider adding a safeguard
> for this case, I guess it would rarely occur in practice.
>
I'm attaching a v9 patch which is based on the idea of changing the
vacuum freeze to consider the oldest xid on the listen/notify queue. The
0001 patch is from Joel that it was previously sent on [1] with some
small tweaks and the 0002 is the TAP tests introduced on the previously
versions by me and by Arseniy. I keep it separate because I'm not sure
if it's all suitable for back-pacthing.
I'm wondering if the 002_aborted_tx_notifies.pl is still needed with
this architecture being used. I think that it's not, but perhaps is a
good test to keep it?
[1] https://www.postgresql.org/message-id/25651193-da4e-4185-a564-f2efa6b0c8a4%40app.fastmail.com
--
Matheus Alcantara
Attachment | Content-Type | Size |
---|---|---|
v9-0001-Prevent-VACUUM-from-truncating-XIDs-still-present.patch | text/plain | 7.7 KB |
v9-0002-Add-tap-tests-for-listen-notify-vacuum-freeze.patch | text/plain | 8.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Xuneng Zhou | 2025-10-22 15:25:51 | Re: Fix lag columns in pg_stat_replication not advancing when replay LSN stalls |
Previous Message | Fujii Masao | 2025-10-22 14:45:39 | Re: Fix lag columns in pg_stat_replication not advancing when replay LSN stalls |