| From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
|---|---|
| To: | Joel Jacobson <joel(at)compiler(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue |
| Date: | 2025-10-29 13:05:27 |
| Message-ID: | 10f4af2c-611b-4640-9aba-097b07aa3c2b@iki.fi |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 29/10/2025 14:30, Joel Jacobson wrote:
> On Wed, Oct 29, 2025, at 12:40, Heikki Linnakangas wrote:
>> On 25/10/2025 21:01, Joel Jacobson wrote:
>>> I also want to share a new idea that Heikki Linnakangas came up with
>>> during PgConf25 in Riga.
>>>
>>> The idea is basically to scan the notification queue from tail to head,
>>> and update xids that precedes newFrozenXid, to either
>>> FrozenTransactionId if committed, or InvalidTransactionId if not.
>>>
>>> I've implemented this by adding a new extern function
>>> AsyncNotifyFreezeXids to async.h, called from vac_update_datfrozenxid.
>>>
>>> This way, we don't need to hold back the advancement of newFrozenXid in
>>> vac_update_datfrozenxid.
>>>
>>> Attempt of implementing Heikki's idea:
>>> - async_notify_freeze_xids.txt
>>
>> Thanks!
>
> Thanks for the idea!
For the record, Yura suggested the same approach earlier in this thread
[1]. That line of discussion led to more complicated patches but I think
the complications came from trying to set the flag as part of
commit/abort. The changes are more straightforward and backpatchable if
we only do it at vacuum.
>>> This idea has the benefit of never holding back newFrozenXid,
>>> however, I see some problems with it:
>>>
>>> - If there is a bug in the code, we could accidentally overwrite
>>> xid values we didn't intend to overwrite, and maybe we would never
>>> find out that we did.
>>>
>>> - We wouldn't know for sure that we've understood the cause of
>>> this bug.
>>>
>>> With v12-vacuum_notify_queue_cleanup, we instead have the downside of
>>> possibly holding back newFrozenXid, but with the advantage of giving us
>>> higher confidence in that we've correctly identified the cause of the
>>> bug.
>>
>> Meh. Robustness is good and all, and in heap tuples we don't overwrite
>> the xmin/xmax but just set a FROZEN flag, precisely so that we preserve
>> evidence if something goes wrong. But I can't get too worked up about
>> that for the async notification queue.
>
> Having slept on this for some days, I'm less worried about this
> approach. I like the simplicity of it, and that we don't bolt on complexity
> to another subsystem, just for the sake of improved debugging
> capabilities of a different subsystem.
>
> I think a different way of looking at this, is that we wouldn't conceal
> a bug in async, but rather we would let vacuum do part of its job, of
> checking the commit status of the xids, when needed, and be okay with
> that split responsibility.
Ok, at quick glance I think this patch is in pretty good shape. I'll
review it more thoroughly, and see if I can incorporate the test from
Matheus Alcantara's or Arseniy Mukhin's latest patches, and commit and
backpatch this.
- Heikki
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Mircea Cadariu | 2025-10-29 14:11:34 | Re: [BUG] temporary file usage report with extended protocol and unnamed portals |
| Previous Message | Euler Taveira | 2025-10-29 12:53:16 | Re: [PATCH] Add Windows support for backtrace_functions (MSVC only) |