Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

From: Arseniy Mukhin <arseniy(dot)mukhin(dot)dev(at)gmail(dot)com>
To: Matheus Alcantara <matheusssilv97(at)gmail(dot)com>
Cc: Rishu Bagga <rishu(dot)postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Daniil Davydov <3danissimo(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>, Alexandra Wang <alexandra(dot)wang(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Joel Jacobson <joel(at)compiler(dot)org>
Subject: Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
Date: 2025-09-05 12:56:48
Message-ID: CAE7r3MKiENVC==S8riqNPiMcgOp41npCDsNZFg=QXANd1Tajfg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 5, 2025 at 1:44 PM Matheus Alcantara
<matheusssilv97(at)gmail(dot)com> wrote:
>
> On Wed Sep 3, 2025 at 8:51 PM -03, Rishu Bagga wrote:
> > On Wed, Sep 3, 2025 at 2:14 PM Matheus Alcantara
> >
> > <matheusssilv97(at)gmail(dot)com> wrote:
> >
> >
> >> IIUC we don't store notifications of aborted transactions on the
> >
> >> queue. On PreCommit_Notify we add the notifications on the queue
> >
> >> and on Commit_Notify() we signal the backends.
> >
> >>
> >
> >> Or I'm missing something here?
> >
> >
> > My understanding is that something could go wrong in between
> >
> > PreCommit_Notify and AtCommit_Notify, which could cause the
> >
> > transaction to abort, and the notification might be in the queue at
> >
> > this point, even though the transaction aborted, hence the dependency
> >
> > on the commit log.
> >
> Ok, I agree that that this may happen but I don't see this as a common
> case to fix the issue based on this behaviour. I think that we check the
> transaction status also to skip notifications that were added on the
> queue by transactions that are not fully committed yet, and I see this
> scenario as a more common case but I could be wrong.
>

IIUC we don't need clog to check if the notification transaction is
still in progress, we use a snapshot for that (XidInMVCCSnapshot()).
If we see that the notification transaction is still in progress, we
postpone processing of that notification. If we see that notification
transaction is not in progress, then we check its status with
TransactionIdDidCommit() (using clog) and only then fail if clog
doesn't have notification transcation data (as a quick check, we can
see in the stack trace that the failure occurs during the call to
TransactionIdDidCommit, not XidInMVCCSnapshot).

So we need to check notification transaction status only to understand
if a transaction was committed and we can send notification or it was
aborted (somewhere in between PreCommit_Notify and AtCommit_Notify)
and we should skip notification. And the solution that Yura Sokolov
suggested is to introduce a 'commited' flag, so we can check it
instead of calling TransactionIdDidCommit. Another option is to make
aborted transactions to clean up after themselves in AtAbort_Notify(),
so if listener see notification in the queue and transaction that
wrote it is not in progress, then such notification was always created
by committed transaction and we also don't need to call
TransactionIdDidCommit. One way for aborted transactions to clean up
is to set its notifications dbOid to 0, so listeners skip them.
Another option I think is to set queue HEAD back to the position where
it was when the aborted transaction started to write to the queue. It
also makes such aborted transaction notifications 'invisible' for
listeners. We have a global lock and there is always only one writer,
so it sounds possible. And there is another option which is the patch
that Rishu Bagga is working on. In the patch all transactions write to
the listen/notify queue only after they wrote a commit record to the
WAL, so all notifications in the queue are always by committed
transactions and we don't need to call TransactionIdDidCommit.

Best regards,
Arseniy Mukhin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2025-09-05 13:09:56 Re: Eager aggregation, take 3
Previous Message Peter Eisentraut 2025-09-05 12:30:05 Re: misleading error message in ProcessUtilitySlow T_CreateStatsStmt