Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

From: "Matheus Alcantara" <matheusssilv97(at)gmail(dot)com>
To: "Arseniy Mukhin" <arseniy(dot)mukhin(dot)dev(at)gmail(dot)com>
Cc: "Masahiko Sawada" <sawada(dot)mshk(at)gmail(dot)com>, "Rishu Bagga" <rishu(dot)postgres(at)gmail(dot)com>, "Yura Sokolov" <y(dot)sokolov(at)postgrespro(dot)ru>, "Daniil Davydov" <3danissimo(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>, "Alexandra Wang" <alexandra(dot)wang(dot)oss(at)gmail(dot)com>, "PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Joel Jacobson" <joel(at)compiler(dot)org>
Subject: Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
Date: 2025-09-19 14:36:43
Message-ID: DCWUPJ39T21K.VY51UY0K8LE1@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri Sep 19, 2025 at 9:34 AM -03, Arseniy Mukhin wrote:
> Hi,
>
Thanks for taking a look at this!

> I like this approach. We got rid of dependency on clog and don't limit
> vacuum. Several points about the fix:
>
> Is it correct to remember and reuse slru slots here? IIUC we can't do
> it if we don't hold SLRU bank lock, because by the time we get in
> AtAbort_Notify() the queue page could be already evicted. Probably we
> need to use SimpleLruReadPage after we acquire the lock in
> AtAbort_Notify()?
>
Yeah, I think it make sense. Maybe we could just store the page number
and offset and call SimpleLruReadPage() inside AtAbort_Notify() to get
the slotno?

> I think adding a boolean 'committed' is a good approach, but what do
> you think about setting the queue head back to the position where
> aborted transaction notifications start? We can do such a reset in
> AtAbort_Notify(). So instead of marking notifications as
> 'commited=false' we completely erase them from the queue by moving the
> queue head back. From listeners perspective if there is a notification
> of completed transaction in the queue - it's always a committed
> transaction, so again get rid of TransactionIdDidCommit() call. It
> seems like a simpler approach because we don't need to remember all
> notifications positions in the queue and don't need the additional
> field 'committed'. All we need is to remember the head position before
> we write anything to the queue, and reset it back if there is an
> abort. IIUC Listeners will never send such erased notifications:
> - while the aborted transaction is looking like 'in progress',
> listeners can't send its notifications.
> - by the time the aborted transaction is completed, the head is
> already set back so erased notifications are located after the queue
> head and listeners can't read it.
>
I think that with this approach we may have a scenario where when we
enter the AtAbort_Notify() the queue head may not be the notification
from the transaction being aborted but it can be a notification from
another committed transaction, so resetting the queue head back to the
position before the aborted transaction would make us lose the
notification from the committed transaction. Is that make sense? What do
you think?

> I think it's a good test to have. FWIW there is a way to reproduce the
> test condition without the injection point. We can use the fact that
> serializable conflicts are checked after tx adds notifications to the
> queue. Please find the attached patch with the example tap test. Not
> sure if using injections points is more preferable?
>
Great, I think that if we can have a way to reproduce this without an
injection point would be better. I dind't look the test yet but I'll try
to incorporate this on the next patch version. Thanks!

--
Matheus Alcantara

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2025-09-19 14:41:23 Re: Report bytes and transactions actually sent downtream
Previous Message Nathan Bossart 2025-09-19 14:25:52 Re: PG 18 relnotes and RC1