Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "Heikki Linnakangas" <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
Date: 2025-10-29 12:30:12
Message-ID: 2297af80-dd5d-4fcc-815b-75bd64b00bf9@app.fastmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 29, 2025, at 12:40, Heikki Linnakangas wrote:
> On 25/10/2025 21:01, Joel Jacobson wrote:
>> I also want to share a new idea that Heikki Linnakangas came up with
>> during PgConf25 in Riga.
>>
>> The idea is basically to scan the notification queue from tail to head,
>> and update xids that precedes newFrozenXid, to either
>> FrozenTransactionId if committed, or InvalidTransactionId if not.
>>
>> I've implemented this by adding a new extern function
>> AsyncNotifyFreezeXids to async.h, called from vac_update_datfrozenxid.
>>
>> This way, we don't need to hold back the advancement of newFrozenXid in
>> vac_update_datfrozenxid.
>>
>> Attempt of implementing Heikki's idea:
>> - async_notify_freeze_xids.txt
>
> Thanks!

Thanks for the idea!

>> This idea has the benefit of never holding back newFrozenXid,
>> however, I see some problems with it:
>>
>> - If there is a bug in the code, we could accidentally overwrite
>> xid values we didn't intend to overwrite, and maybe we would never
>> find out that we did.
>>
>> - We wouldn't know for sure that we've understood the cause of
>> this bug.
>>
>> With v12-vacuum_notify_queue_cleanup, we instead have the downside of
>> possibly holding back newFrozenXid, but with the advantage of giving us
>> higher confidence in that we've correctly identified the cause of the
>> bug.
>
> Meh. Robustness is good and all, and in heap tuples we don't overwrite
> the xmin/xmax but just set a FROZEN flag, precisely so that we preserve
> evidence if something goes wrong. But I can't get too worked up about
> that for the async notification queue.

Having slept on this for some days, I'm less worried about this
approach. I like the simplicity of it, and that we don't bolt on complexity
to another subsystem, just for the sake of improved debugging
capabilities of a different subsystem.

I think a different way of looking at this, is that we wouldn't conceal
a bug in async, but rather we would let vacuum do part of its job, of
checking the commit status of the xids, when needed, and be okay with
that split responsibility.

If we can prove that the optimizations we're working on, will render
that new code unnecessary, then I guess we can just remove it.

> That said, we could add COMMITTED/INVALID hint bits to AsyncQueueEntry,
> similar to heap tuples, and set the hint bit instead of replacing the
> original xid. That might be good for performance too: If the first
> backend to call TransactionIdDidCommit() on an AsyncQueueEntry would set
> the hint bit, that would save the effort for other listening backends.

Interesting, sounds like a really promising approach.

/Joel

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2025-10-29 12:36:05 Re: apply_scanjoin_target_to_paths and partitionwise join
Previous Message Andrey Borodin 2025-10-29 12:19:08 Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions