Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

From: Arseniy Mukhin <arseniy(dot)mukhin(dot)dev(at)gmail(dot)com>
To: Rishu Bagga <rishu(dot)postgres(at)gmail(dot)com>
Cc: Matheus Alcantara <matheusssilv97(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Daniil Davydov <3danissimo(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>, Alexandra Wang <alexandra(dot)wang(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Joel Jacobson <joel(at)compiler(dot)org>
Subject: Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
Date: 2025-09-04 14:14:22
Message-ID: CAE7r3M+p=YzizrO2t-WWWhqaaeryMUMa3ChGirh=g8KCLaF5aA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Thu, Sep 4, 2025 at 2:51 AM Rishu Bagga <rishu(dot)postgres(at)gmail(dot)com> wrote:
>
> On Wed, Sep 3, 2025 at 2:14 PM Matheus Alcantara
>
> <matheusssilv97(at)gmail(dot)com> wrote:
>
>
> > IIUC we don't store notifications of aborted transactions on the
>
> > queue. On PreCommit_Notify we add the notifications on the queue
>
> > and on Commit_Notify() we signal the backends.
>
> >
>
> > Or I'm missing something here?
>
>
> My understanding is that something could go wrong in between
>
> PreCommit_Notify and AtCommit_Notify, which could cause the
>
> transaction to abort, and the notification might be in the queue at
>
> this point, even though the transaction aborted, hence the dependency
>
> on the commit log.

I have the same understanding.

>
> > Maybe we could have a boolean "committed" field on AsyncQueueEntry
>
> > and check this field before sending the notification instead of
>
> > call TransactionIdDidCommit()? Is something similar what you are
>
> > proposing?
>
>
> I like this direction, as it opens up the ability to remove the
>
> global lock held from PreCommit_Notify to the end of the transaction.
>
> In the same vein as this idea, I was experimenting with a patch
>
> inspired by Tom Lane’s idea in [1] where we write the actual
>
> notification data into the WAL, and just write the db, lsn, and xid
>
> into the notify queue in AtCommit_Notify, so the notify queue only
>
> contains information about committed transactions. Unfortunately,
>
> the additional WAL write overhead in this approach seemed to outweigh
>
> the benefits of removing the lock.

Interesting, have you shared your patch and results somewhere? IIUC
Tom's approach resolves this bug, because with it we have queue
entries produced by committed transactions only, so we don't need to
check their status and don't have dependency on clog.

>
> Following that idea - I think perhaps we could have two queues in the
>
> notify system, one that stores the notification data, and another
>
> that stores the position of the committed transaction’s notification,
>
> which we would append to in AtCommit_Notify. Then the listener would
>
> go through the position queue, and find / read the notifications from
>
> the notify data queue.
>

If I'm not wrong, by the time we got into AtCommit_Notify we no longer
hold the global lock, so we can't provide notifications in the commit
order. I think it would work if we add entries to the position queue
concurrently with adding commit entries to the wal, as in Tom's idea
3rd step.
Another question: how to truncate notification data queue. It sounds
like it will be more difficult to figure out what data in the queue is
still needed.

Best regards,
Arseniy Mukhin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mihail Nikalayeu 2025-09-04 14:16:20 Re: Unexpected changes of CurrentResourceOwner and CurrentMemoryContext
Previous Message Andres Freund 2025-09-04 13:55:16 Re: Solaris compiler status