Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Matheus Alcantara <matheusssilv97(at)gmail(dot)com>
Cc: Álvaro Herrera <alvherre(at)kurilemu(dot)de>, Joel Jacobson <joel(at)compiler(dot)org>, Arseniy Mukhin <arseniy(dot)mukhin(dot)dev(at)gmail(dot)com>, Rishu Bagga <rishu(dot)postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Daniil Davydov <3danissimo(at)gmail(dot)com>, Alexandra Wang <alexandra(dot)wang(dot)oss(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
Date: 2025-10-21 21:42:18
Message-ID: CAD21AoA8sjvRgL6h-wC-3x7VXu7ZjB3K1vLGgND-SbD-NPdQDg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 20, 2025 at 11:19 AM Matheus Alcantara
<matheusssilv97(at)gmail(dot)com> wrote:
>
> On Mon Oct 20, 2025 at 11:18 AM -03, Álvaro Herrera wrote:
> > On 2025-Oct-20, Matheus Alcantara wrote:
> >
> >> This is similar to what was already proposed at [1]. This approach was
> >> abandoned because a notification on the queue may block datfrozenxid
> >> advance and clog truncation which can cause other issues for the users [2].
> >
> > Well, I think that this is the right solution for backpatching, and that
> > you were wrong to abandon it. You can continue to design a better
> > mechanism for the master branch, but in old branches we cannot really do
> > all those things you're proposing to do.
> >
> I actually would prefer this approach TBH, but since this can cause
> other issues like transaction wraparound due to not consumed
> notifications we would need other mechanisms to prevent that and I'm not
> sure if users should expect this kind of behavior changes on minor
> version updates?

True, unconsumed notifications could cause transaction wraparound by
preventing datfrozenxid from advancing. However, this risk only
applies when users have long-term unconsumed notifications, which is
uncommon. That said, we should note that, as I mentioned
previously[1], a process can accumulate unconsumed notifications
simply by being in idle-in-transaction state, even without
backend_xmin and backend_xid, which prevents datfrozenxid from
advancing. While this might not be problematic in practice if it's
rare, I find it concerning that we have no way to check the age of
unconsumed notifications.

>
> I think that to go with this solution we would need some way to drop too
> old notifications from the queue to advance the datfrozenxid, so I
> imagine that we would need some GUC to make this configurable and we can
> configure a default value of course but some use cases may not be the
> best configuration, this is something that users should expected to deal
> on minor version updates?

I think adding a new GUC would be overkill for this fix. As for
dropping old notifications from the queue, we probably don't need to
make it configurable - we could simply drop notifications whose commit
status is no longer available (instead of raising an error).

>
> Going with the "self contained" idea sound more easier to backpatch
> actually, so this is the main reason that I abandoned this other
> approach. Could you please point what make the v8 version not visible
> for bachpatching?

Regarding the v8 patch, it introduces a fundamentally new way of
managing notification entries (adding entries with 'committed' state
and marking them 'aborted' in abort paths). This affects all use
cases, not just those involving very old unconsumed notifications, and
could introduce more serious bugs like PANIC or SEGV. For
backpatching, I prefer targeting just the problematic behavior while
leaving unrelated parts unchanged. Though Álvaro might have a
different perspective on this.

Regards,

[1] https://www.postgresql.org/message-id/CAD21AoCD%2BHXoc2QZCAS9d8ahDeikNqbnU0i6cQzpMFOEurkPPg%40mail.gmail.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2025-10-21 21:44:32 Re: abi-compliance-check failure due to recent changes to pg_{clear,restore}_{attribute,relation}_stats()
Previous Message Tom Lane 2025-10-21 21:08:11 Re: There is a redundant check in check_outerjoin_delay() in version 15.14 and below