Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Joel Jacobson <joel(at)compiler(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
Date: 2025-10-29 13:05:27
Message-ID: 10f4af2c-611b-4640-9aba-097b07aa3c2b@iki.fi
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 29/10/2025 14:30, Joel Jacobson wrote:
> On Wed, Oct 29, 2025, at 12:40, Heikki Linnakangas wrote:
>> On 25/10/2025 21:01, Joel Jacobson wrote:
>>> I also want to share a new idea that Heikki Linnakangas came up with
>>> during PgConf25 in Riga.
>>>
>>> The idea is basically to scan the notification queue from tail to head,
>>> and update xids that precedes newFrozenXid, to either
>>> FrozenTransactionId if committed, or InvalidTransactionId if not.
>>>
>>> I've implemented this by adding a new extern function
>>> AsyncNotifyFreezeXids to async.h, called from vac_update_datfrozenxid.
>>>
>>> This way, we don't need to hold back the advancement of newFrozenXid in
>>> vac_update_datfrozenxid.
>>>
>>> Attempt of implementing Heikki's idea:
>>> - async_notify_freeze_xids.txt
>>
>> Thanks!
>
> Thanks for the idea!

For the record, Yura suggested the same approach earlier in this thread
[1]. That line of discussion led to more complicated patches but I think
the complications came from trying to set the flag as part of
commit/abort. The changes are more straightforward and backpatchable if
we only do it at vacuum.

>>> This idea has the benefit of never holding back newFrozenXid,
>>> however, I see some problems with it:
>>>
>>> - If there is a bug in the code, we could accidentally overwrite
>>> xid values we didn't intend to overwrite, and maybe we would never
>>> find out that we did.
>>>
>>> - We wouldn't know for sure that we've understood the cause of
>>> this bug.
>>>
>>> With v12-vacuum_notify_queue_cleanup, we instead have the downside of
>>> possibly holding back newFrozenXid, but with the advantage of giving us
>>> higher confidence in that we've correctly identified the cause of the
>>> bug.
>>
>> Meh. Robustness is good and all, and in heap tuples we don't overwrite
>> the xmin/xmax but just set a FROZEN flag, precisely so that we preserve
>> evidence if something goes wrong. But I can't get too worked up about
>> that for the async notification queue.
>
> Having slept on this for some days, I'm less worried about this
> approach. I like the simplicity of it, and that we don't bolt on complexity
> to another subsystem, just for the sake of improved debugging
> capabilities of a different subsystem.
>
> I think a different way of looking at this, is that we wouldn't conceal
> a bug in async, but rather we would let vacuum do part of its job, of
> checking the commit status of the xids, when needed, and be okay with
> that split responsibility.

Ok, at quick glance I think this patch is in pretty good shape. I'll
review it more thoroughly, and see if I can incorporate the test from
Matheus Alcantara's or Arseniy Mukhin's latest patches, and commit and
backpatch this.

- Heikki

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mircea Cadariu 2025-10-29 14:11:34 Re: [BUG] temporary file usage report with extended protocol and unnamed portals
Previous Message Euler Taveira 2025-10-29 12:53:16 Re: [PATCH] Add Windows support for backtrace_functions (MSVC only)